Van Go is a personal art curator that uses natural language processing techniques to analyze descriptions of artworks from museums to help visitors find pieces that match their interests. It tokenizes, lemmatizes, and applies TF-IDF and latent semantic analysis to 100,000 descriptions from the British Museum. This allows it to find artworks with similar themes and styles based on their descriptions. It then demonstrates this by providing recommendations for a visitor interested in landscapes. The recommendations are validated using k-fold cross validation and the Jaccard index to test how well the model performs.
Picasso's Les Demoiselles d'Avignon- Kahee, Julia & Perus [2013/06]Perus Saranurak
Welcome to Modernity,
In 1907, Picasso publish Les Demoiselles d'Avignon which is the starting point of the most influential art movement of 20th century.
This presentation shows the art history of in the period of Modernism 20th century
Presented by Kahee, Julia & Perus
Picasso's Les Demoiselles d'Avignon- Kahee, Julia & Perus [2013/06]Perus Saranurak
Welcome to Modernity,
In 1907, Picasso publish Les Demoiselles d'Avignon which is the starting point of the most influential art movement of 20th century.
This presentation shows the art history of in the period of Modernism 20th century
Presented by Kahee, Julia & Perus
What's product design like at a startup like DogVacay? ...What is product design, anyway? This talk was for our friends and coworkers working with design in a new way or for the first time.
What's product design like at a startup like DogVacay? ...What is product design, anyway? This talk was for our friends and coworkers working with design in a new way or for the first time.
Large-Scale Quantitative Analysis of Painting Arts (2012 Fall KPS)danielykim
2012 Fall Korean Physical Society Meeting
Place: Bogwang Pheonix Park, Pyeongchang-gun, Gangwon-do, Korea, October 24--26, 2012
Note: Best oral presentation award
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
3. Problem When you’re visiting a new place your time is
limited
There is too much to see
4. Problem When you’re visiting a new place your time is
limited
There is too much to see
You need to make a choice
5. If a visitor wants to see only landscapes and plants,
there should be a way to choose a subset of museum
objects that makes the most of their time
6. ”This painting was said by him to have been inspired by
the work of Li Cheng, a tenth-century landscape artist.
Its spare, rather dry brushwork again repeats the
deliberately simple, austere quality that is the feature
of many so-called literati paintings. Hanging scroll.
Landscape. Bare trees in winter, with reference to Li
Cheng (919-67) and a river. Painted in a very dry style.
Inscriptions and seals. Ink on paper.”
100 000 curatorial descriptions of paintings and
drawings from British Museum’s database
7. ”this painting be say by him to have be inspire by the
work of li cheng , a tenth - century landscape artist .
its spare , rather dry brushwork again repeat the
deliberately simple , austere quality that be the feature
of many so - call literati painting . hang scroll .
landscape . bare tree in winter , with reference to li
cheng ( 919 - 67 ) and a river . paint in a very dry style
. inscription and seal . ink on paper.”
tokenizing and lemmatizing
100 000 curatorial descriptions of paintings and
drawings from British Museum’s database
8. tokenizing and lemmatizing
Count Vectorizer + TFIDF
100 000 curatorial descriptions of paintings and
drawings from British Museum’s database
9. tokenizing and lemmatizing
Count Vectorizer + TFIDF
Latent Semantic
Analysis (PCA)
100 000 curatorial descriptions of paintings and
drawings from British Museum’s database
10. tokenizing and lemmatizing
Count Vectorizer + TFIDF
Latent Semantic
Analysis (PCA)
Cosine similarity
100 000 curatorial descriptions of paintings and
drawings from British Museum’s database
20. Dataset 1
Train a
model
Dataset 2
Train a
model
Dataset 3
Train a
model
K-fold cross
validation to
verify the
model using
Jaccard index
“landscape”
21. About me
Undergraduate and Master’s in
Cognitive Neuroscience in Poland
Phd in Cognitive Neuroscience (NYU)
Hiker & art lover
Editor's Notes
At a museum there is an overabundance of art. Unless you have unlimited time, you need to make choices. Usually museums provide a list of highlights, must-see objects which are famous. But what if you want to explore a selection of pieces related to something which fascinates you, across times and cultures? This posits an interesting challenge which I was excited to tackle. My thinking was the following: nowadays, we are used to searching using words, like with popular search engines. Why not use a similar way to plan your visit at a museum? Traditional searches are not useful here since they are based on models which are very broad. I wanted to use expert knowledge. Using unsupervised machine learning algorithms I identified clusters of paintings and based on those relationships I
It assumes that similar words will occur in similar pieces of text.
Text data is in a high dimensional space which is sparsely populated. In such situation, words such as “feline” and “cat” are orthogonal - this is why it’s good to reduce the number of dimensions using PCA.
It assumes that similar words will occur in similar pieces of text.
Text data is in a high dimensional space which is sparsely populated. In such situation, words such as “feline” and “cat” are orthogonal - this is why it’s good to reduce the number of dimensions using PCA.
It assumes that similar words will occur in similar pieces of text.
Text data is in a high dimensional space which is sparsely populated. In such situation, words such as “feline” and “cat” are orthogonal - this is why it’s good to reduce the number of dimensions using PCA.
It assumes that similar words will occur in similar pieces of text.
Text data is in a high dimensional space which is sparsely populated. In such situation, words such as “feline” and “cat” are orthogonal - this is why it’s good to reduce the number of dimensions using PCA.
It assumes that similar words will occur in similar pieces of text.
Text data is in a high dimensional space which is sparsely populated. In such situation, words such as “feline” and “cat” are orthogonal - this is why it’s good to reduce the number of dimensions using PCA.
Intuitively you can tell that cosine similarity in the PCA space represents semantic similarity in the specific domain. Here, I plotted lists of words which my model returns when queried with words on the top of the list. Just by visual inspection we can tell that these results are reasonable.
One question we have to ask ourselves, though, is whether these similarities are a result of overfitting or more robust and consistent qualities of the data.
Intuitively you can tell that cosine similarity in the PCA space represents semantic similarity in the specific domain. Here, I plotted lists of words which my model returns when queried with words on the top of the list. Just by visual inspection we can tell that these results are reasonable.
One question we have to ask ourselves, though, is whether these similarities are a result of overfitting or more robust and consistent qualities of the data.
Intuitively you can tell that cosine similarity in the PCA space represents semantic similarity in the specific domain. Here, I plotted lists of words which my model returns when queried with words on the top of the list. Just by visual inspection we can tell that these results are reasonable.
One question we have to ask ourselves, though, is whether these similarities are a result of overfitting or more robust and consistent qualities of the data.
Intuitively you can tell that cosine similarity in the PCA space represents semantic similarity in the specific domain. Here, I plotted lists of words which my model returns when queried with words on the top of the list. Just by visual inspection we can tell that these results are reasonable.
One question we have to ask ourselves, though, is whether these similarities are a result of overfitting or more robust and consistent qualities of the data.
Intuitively you can tell that cosine similarity in the PCA space represents semantic similarity in the specific domain. Here, I plotted lists of words which my model returns when queried with words on the top of the list. Just by visual inspection we can tell that these results are reasonable.
One question we have to ask ourselves, though, is whether these similarities are a result of overfitting or more robust and consistent qualities of the data.
To verify that, I validated my model by running it on several subsets of my dataset and compared lists of similar words.
To verify that, I validated my model by running it on several subsets of my dataset and compared lists of similar words.
It assumes that similar words will occur in similar pieces of text.
Text data is in a high dimensional space which is sparsely populated. In such situation, words such as “feline” and “cat” are orthogonal - this is why it’s good to reduce the number of dimensions using PCA.
It assumes that similar words will occur in similar pieces of text.
Text data is in a high dimensional space which is sparsely populated. In such situation, words such as “feline” and “cat” are orthogonal - this is why it’s good to reduce the number of dimensions using PCA.