Overview of graph databases in general and Neo4J in particular. Includes examples of Java code to interact with embedded or REST based Neo4J instances.
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
For more details:
https://sease.io/2020/04/the-importance-of-online-testing-in-learning-to-rank-part-1.html
https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017 and Elasticsearch has an Open Source plugin released in 2018), organizations struggle with the problem of how to evaluate the quality of the models they train.
This talk explores all the major points in both Offline and Online evaluation.
Setting up correct infrastructures and processes for a fair and effective evaluation of the trained models is vital for measuring the improvements/regressions of a LTR system.
The talk is intended for:
– Product Owners, Search Managers, Business Owners
– Software Engineers, Data Scientists, and Machine Learning Enthusiast
Expect to learn :
the importance of Offline testing from a business perspective
how Offline testing can be done with Open Source libraries
how to build a realistic test set from the original data set in input avoiding common mistakes in the process
the importance of Online testing from a business perspective
A/B testing and Interleaving approaches: details and Pros/ Cons
common mistakes and how they can false the obtained results
Join us as we explore real-world scenarios and dos and don’ts from the e-commerce industry!
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
How to Build your Training Set for a Learning To Rank Project - HaystackSease
Presented by Alessandro Benedetti of Sease, Learning to Rank (LTR) is the application of machine learning techniques (typically supervised), in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular, organizations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
- model and collect the necessary feedback from the users (implicit or explicit)
- calculate for each training sample a relevance label that is meaningful and not ambiguous (Click Through Rate, Sales Rate ...)
- transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training libraries expect)
Join us as we explore real-world scenarios and dos and don'ts from the e-commerce industry.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Interactive Questions and Answers - London Information Retrieval MeetupSease
Answers to some questions about Natural Language Search, Language Modelling (Google Bert, OpenAI GPT-3), Neural Search and Learning to Rank made during our London Information Retrieval Meetup (December).
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
For more details:
https://sease.io/2020/04/the-importance-of-online-testing-in-learning-to-rank-part-1.html
https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017 and Elasticsearch has an Open Source plugin released in 2018), organizations struggle with the problem of how to evaluate the quality of the models they train.
This talk explores all the major points in both Offline and Online evaluation.
Setting up correct infrastructures and processes for a fair and effective evaluation of the trained models is vital for measuring the improvements/regressions of a LTR system.
The talk is intended for:
– Product Owners, Search Managers, Business Owners
– Software Engineers, Data Scientists, and Machine Learning Enthusiast
Expect to learn :
the importance of Offline testing from a business perspective
how Offline testing can be done with Open Source libraries
how to build a realistic test set from the original data set in input avoiding common mistakes in the process
the importance of Online testing from a business perspective
A/B testing and Interleaving approaches: details and Pros/ Cons
common mistakes and how they can false the obtained results
Join us as we explore real-world scenarios and dos and don’ts from the e-commerce industry!
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
How to Build your Training Set for a Learning To Rank Project - HaystackSease
Presented by Alessandro Benedetti of Sease, Learning to Rank (LTR) is the application of machine learning techniques (typically supervised), in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular, organizations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
- model and collect the necessary feedback from the users (implicit or explicit)
- calculate for each training sample a relevance label that is meaningful and not ambiguous (Click Through Rate, Sales Rate ...)
- transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training libraries expect)
Join us as we explore real-world scenarios and dos and don'ts from the e-commerce industry.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Interactive Questions and Answers - London Information Retrieval MeetupSease
Answers to some questions about Natural Language Search, Language Modelling (Google Bert, OpenAI GPT-3), Neural Search and Learning to Rank made during our London Information Retrieval Meetup (December).
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
RRE is an open-source search quality evaluation tool that can be used to produce a set of reports about the quality of a system, iteration after iteration, and that can be integrated within a continuous integration infrastructure to monitor quality metrics after each release.
Many aspects remained problematic though:
– how to directly evaluate a middle layer search-API that communicates with Apache Solr or Elasticsearch?
– how to easily generate explicit and implicit ratings without spending hours on tedious json files?
– how to better explore the evaluation results? with nice widgets and interesting insights?
Rated Ranking Evaluator Enterprise solves these problems and much more.
Join us as we introduce the next generation of open-source search quality evaluation tools, exploring the internals and real-world scenarios!
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. Building on the introduction the focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation in to Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored, such as how it works, and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
Two graph data models : RDF and Property Graphsandyseaborne
Talk given at ApacheConEU Big Data 2015.
This talk describes the two common graph data approaches, RDF and Property Graphs. It concludes with observations about the different emphasis of each and where each is focused.
Integrating an App with Amazon Web Services SimpleDB - A Matter of ChoicesMark Maslyn
There are many ways to integrate an Android app with an Amazon Web Services database. This presentation explores some of those possibilities and the choices I made for my app using the AWS SimpleDB NoSQL cloud database.
Graphs are a perfect solution to organize information and to determine the relatedness of content. Neo4j Developer Evangelist Kenny Bastani will discuss using Neo4j to perform document classification and text classification using a graph database. Kenny will demonstrate how to build a scalable architecture for classifying natural language text using a graph-based algorithm called Hierarchical Pattern Recognition. This approach encompasses a set of techniques familiar to Deep Learning practitioners.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
RRE is an open-source search quality evaluation tool that can be used to produce a set of reports about the quality of a system, iteration after iteration, and that can be integrated within a continuous integration infrastructure to monitor quality metrics after each release.
Many aspects remained problematic though:
– how to directly evaluate a middle layer search-API that communicates with Apache Solr or Elasticsearch?
– how to easily generate explicit and implicit ratings without spending hours on tedious json files?
– how to better explore the evaluation results? with nice widgets and interesting insights?
Rated Ranking Evaluator Enterprise solves these problems and much more.
Join us as we introduce the next generation of open-source search quality evaluation tools, exploring the internals and real-world scenarios!
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. Building on the introduction the focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation in to Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored, such as how it works, and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
Two graph data models : RDF and Property Graphsandyseaborne
Talk given at ApacheConEU Big Data 2015.
This talk describes the two common graph data approaches, RDF and Property Graphs. It concludes with observations about the different emphasis of each and where each is focused.
Integrating an App with Amazon Web Services SimpleDB - A Matter of ChoicesMark Maslyn
There are many ways to integrate an Android app with an Amazon Web Services database. This presentation explores some of those possibilities and the choices I made for my app using the AWS SimpleDB NoSQL cloud database.
Graphs are a perfect solution to organize information and to determine the relatedness of content. Neo4j Developer Evangelist Kenny Bastani will discuss using Neo4j to perform document classification and text classification using a graph database. Kenny will demonstrate how to build a scalable architecture for classifying natural language text using a graph-based algorithm called Hierarchical Pattern Recognition. This approach encompasses a set of techniques familiar to Deep Learning practitioners.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
Graph data and graph analytics are increasingly important in data science and engineering. Cypher is an open language used for querying and updating graph databases and analytics platforms, which is now available in the Apache Spark environment. Neo4j Morpheus leverages the open source graph language project to integrate data from Neo4j operational graph databases with Hive and JDBC SQL data sources, using new Cypher features like the Property Graph Catalog, named graphs, graph projection, parameterized graph view functions, and graph/table views. Input and output graphs can be loaded and stored as structured collections of DataFrames with strong graph schemas to ensure data consistency and graph query optimization. Property graphs can also be analyzed and transformed using graph algorithms such as those in the GraphFrames project. Besides describing and demonstrating these capabilities, this talk also discusses the Spark Project Improvement Proposal to bring Cypher into Spark 3.0, and outlines current work to unify Cypher with other graph query languages to form a new ISO standard Graph Query Language.
Speakers: Alastair Green, Martin Junghanns
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMartin Junghanns
Extending Apache Spark Graph for the Enterprise with Morpheus and Neo4j
The talk covers:
* Neo4j, Property Graph Model and Cypher
* Cypher query exectution in Apache Spark
* Neo4j graph algorithms
* Example Code
In this webinar Thomas Cook, Sales Director, AnzoGraph DB, provides a history lesson on the origins of SPARQL, including its roots in the Semantic Web, and how linked open data is used to create Knowledge Graphs. Then, he dives into "What is RDF?", "What is a URI?" and "What is SPARQL?", wrapping up with a real-world demonstration via a Zeppelin notebook.
An introduction to Graph databases and in particular Neo4j, including where Neo4j lives on the CAP Scale in relation to other databases, the Graph data model and a very quick introduction to the Cypher Query Language.
The trend nowadays is to represent the relationships between entities in a graph structure. Neo4j is a NOSQL graph database, which allows for fast and effective queries on connected data. Implementation of own algorithms is possible, which can improve the functionality of built in API. We make use of the graph database to model and recommend movies and other media content.
Applying large scale text analytics with graph databasesData Ninja API
Data Ninja Services collaborated with Oracle to reach a major milestone in the integration of text analytics with Oracle Spatial and Graph. The Data Ninja Services client in Java can be used to analyze free texts, extract entities, generate RDF semantic graphs, and choose from a number of graph analytics to infer entity relationships. We demonstrated two case studies involving mining health news and detecting anomalies in product reviews.
3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DBAthens Big Data
Title: Neo4j: The World's Leading Graph DB
Speaker: George Eleftheriadis (https://gr.linkedin.com/in/george-eleftheriadis-4526ba51/)
Date: Monday, April 18, 2016
Event: https://meetup.com/Athens-Big-Data/events/229812890/
Graph Databases in the Microsoft EcosystemMarco Parenzan
With SQL Server and Cosmos Db we now have graph databases broadly available, after being studied for decades in Db theory, or being a niche approach in Open Source with Neo4J. And then there are services like Microsoft Graph and Azure Digital Twins that give us vertical implementations of graph. So let's make a walkaround of graphs in the MIcrosoft ecosystem.
Annotating search results from web databases-IEEE Transaction Paper 2013Yadhu Kiran
Abstract—An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic
annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
A Talk on the Graph Database with tutorials
Introduction to the Graph databases and Cypher Query Language
Comparison of the SQL and the Cypher implementations
Social Network Analysis, Semantic Web and Learning NetworksRory Sie
Session 2 of the Learning Networks Social Networks Seminar. It presents a recap of SNA terms, and introduces the Semantic Web and how it could be applied to Learning Networks.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
1. NEO4J OPEN SOURCE GRAPH
DATABASE
Presented by Mark Maslyn – mmaslyn@msn.com to
Denver Open Source User Group (4/7/15) and
Graph Nerds of Boulder (5/14/15)
2. THREE PARTS TO THIS NEO4J PRESENTATION
• What Are Graph Databases and Why Are They
Useful
• Neo4J Application Modes and Java API Syntax
• Demo Animal Guessing Game using Neo4J
Decision Tree
3. WHAT ARE GRAPH DATABASES ?
• Graph Databases are a type of NoSQL ( non-
relational) database
• In a Graph Database entities (nouns) are
represented as nodes
• Relationships are represented by edges
connecting nodes
• Both nodes and relationships can have properties
5. SOME COMPANIES AND INDUSTRIES WITH GRAPH
DATABASE APPLICATIONS
• Social Media (Facebook, LinkedIn, Twitter)
• Configuration Management (Assimilation
Systems)
• Retail Recommendation (Walmart)
• Resource Authorization (Telenor)
• Fraud Detection ( Banks and Credit Card
Companies)
• Online Education (Pearson)
• Bioinformatics (Bio4j)
6. WHERE GRAPH DATABASES ARE
ADVANTAGEOUS
• Can provide a more natural representation
for the data
• Can give a faster query response
• Recommendation engines
13. EXECUTION TIME RDBMS VS GRAPH DATABASE FOR
DIFFERENT DEPTHS OF FRIEND SEARCH (1000 FRIENDS)
Depth Execution Time (sec) Count Result
2 0.028 ~900
3 0.213 ~999
4 10.273 ~999
5 92.613 ~999
Depth Execution Time (sec) Count Result
2 0.04 ~900
3 0.06 ~999
4 0.07 ~999
5 0.07 ~999
MYSQL RDMS
NEO4J
From Vukotic, et al (2014)
15. TYPICAL APPROACHES TO GENERATING
RECOMMENDATIONS (AND PREDICTIONS)
• COLLABORATIVE FILTERING – Recommendations based on
common attributes between you and / or your connections –
Online dating model
• CONTENT BASED FILTERING - Deriving a second level of
information from the data and use that to derive
recommendations – Content classification model
• GRAPH TOPOLOGY – Infer links based on graph topology.
16. COLLABORATIVE FILTERING RECOMMENDATION BASED ON
FRIENDS CHOICES – I WILL LIKE WHAT MY FRIENDS LIKE
FRIEND-OFFRIEND-OF
Properties:
Name: Mark M
Language: Java
Properties:
Name: Tom M
Language: Java, Scala
Movie: Empire Strikes Back
Properties:
Name: Tom F
Language: Scala, Java
Movie: Raiders of the Lost Ark
17. CONTENT BASED RECOMMENDATIONS BASED ON
CLASSIFICATION OF FRIENDS MOVIE SELECTION
FRIEND-OFFRIEND-OF
Properties:
Name: Mark M
Language: Java
Properties:
Name: Tom M
Language: Java, Scala
Movie: Empire Strikes Back
Properties:
Name: Tom F
Language: Scala, Java
Movie: Raiders of the Lost Ark
18. CONTENT CLASSIFICATION BASED ON TERMS
SPORTS CATEGORY
FOOTBALL BASEBALL
Broncos
Seahawks
Ronnie HillmanPeyton Manning
Rockies
1 .. N SPORTS
Russell Wilson
CATEGORY
SPORT
TEAM
PLAYER
1 .. N TEAMS
1 .. N PLAYERS
19. FINDING RELATIONSHIPS BASED ON GRAPH TOPOLOGY
FRIEND-OFFRIEND-OF
Transitive Relationship
A FRIEND OF B, B FRIEND OF C, Therefore High
Probability A FRIEND OF C
20. CONCLUSION: A TRIADIC CLOSURE OR MY FRIENDS ARE
LIKELY TO BE FRIENDS WITH EACH OTHER
FRIEND-OFFRIEND-OF
Properties:
Name: Mark M
Language: Java
Properties:
Name: Tom M
Language: Java
Properties:
Name: Tom F
Language: Scala
TRIADIC CLOSURE
23. PART II – NEO4J GRAPH DATABASE MODES
AND JAVA API SYNTAX
24. NEO4J GRAPH DATABASE MODES
• Embedded Mode Uses Local NEO4J Jar Files
• Server Mode Uses RESTFul API’s
• Browser Client Mode Using CYPHER Query
Language
25. EMBEDDED MODE WITH JAVA API – OPENING
A GRAPH DATABASE
private static GraphDatabaseService graphDb;
private static String ANIMAL_DATABASE_LOC =
"/home/ubuntu/animal_game/animal.db";
public AnimalGame() {
// open the graph database
graphDb = new GraphDatabaseFactory().newEmbeddedDatabase(
ANIMAL_DATABASE_LOC);
// enable clean db shutdown on Ctrl-C
registerShutdownHook( graphDb );
}
26. JAVA API – CREATING AN INDEX
// create index using a combination of the node label and
the name property
try (Transaction tx = graphDb.beginTx()) {
graphDb.schema().indexFor(NodeLabels.PERSON).on(
"name").create();
tx.success();
}
27. JAVA API – CREATING A DATABASE NODE AND
SETTING PROPERTY VALUES
// create node
Node node = graphDb.createNode();
// add label to identify what group the node belongs to
node.addLabel(nodeLabel);
// set property values
node.setProperty(AnimalConstants.NAME, name);
node.setProperty(AnimalConstants.TEXT, text);
node.setProperty(AnimalConstants.TYPE, nodeType);
// make a list of possible answers (answerList)
// these match with relationships
node.setProperty(AnimalConstants.ANSWERS, answerList);
28. JAVA API – CREATING RELATIONSHIPS
BETWEEN NODES
public enum RelTypes implements RelationshipType {
YES,
NO
}
public void createRelationship(Node node1,
Node node2,
RelTypes rt) {
// relationship between node1 and node2
// direction is from node1 to node2
node1.createRelationshipTo((Node) node2, rt);
}
29. JAVA API - QUERYING THE DATABASE FOR A
NODE BY LABEL AND NAME
public Node getNodeByLabelAndName(String nodeName) {
Node node = null;
ResourceIterable<Node>nodeList =
graphDb.findNodesByLabelAndProperty(
NodeLabels.PERSON, "name", nodeName);
If (nodeList != null) {
try (Transaction tx = graphDb.beginTx()) {
for ( Node nodeL : nodeList )
node = nodeL;
tx.success();
}
}
return node;
}
30. NEO4J SERVER RESTFUL API
• Neo4J Server Provides Fine-Grained REST Calls
• Returns JSON Content Type
• Higher Level Libraries Available for Java, .NET, Python,
etc. to Wrap the Lower Level Calls
32. NEO4J SERVER RESTFUL API – DRILLING DOWN FOR
META-DATA FROM ANIMAL DATABASE
REQUEST: http://localhost:7474/db/data/relationship/types
JSON RESPONSE:
[
"YES",
"NO"
]
33. RETRIEVING A SINGLE NODE AND ITS PROPERTIES
REQUEST: http://localhost:7474/db/data/node/0
JSON RESPONSE:
{
…..
"metadata": {
"id": 0,
"labels": [
“ANIMALS"
]
},
"data": {
"text": "Does the animal live on land ?",
"name": "start",
"answers": [
"Y/YES",
"N/NO",
"Q/QUIT"
],
"type": "question"
}
}
34. RETRIEVING A NODE USING THE RESTFUL LIBRARY
FROM JAVA CODE
private static GraphDatabaseService graphDb;
private static String ANIMAL_DATABASE_URL =
“http://localhost:7474/db/data”;
// public constructor
public AnimalGame() {
// connect to the graph database server
graphDb = new RestGraphDatabase(
ANIMAL_DATABASE_URL);
// wrap the request and retrieve the node
Node startNode =
graphDb.findNodesByLabelAndProperty(
NODE_LABEL,
“name”,
“start”);
35. NEO4J BROWSER BASED WEB ADMIN CLIENT
• Browser Connects to Neo4J Server.
• Graphically Displays Database Nodes and Relationships in a
Dashboard. Can Drill Down on Each Node.
• Allows User to Execute Cypher Queries on Database From
the Browser.
38. CYPHER QUERY LANGUAGE
• Cypher is Neo4J’s version of SQL
• Queries can be entered programmatically in Java
or through the browser interface
• Queries can retrieve nodes, relationships,
property values or call functions such as return
“shortest path traversal”
• Declarative syntax using “Ascii Art”
39. NODES - CYPHER SYNTAX
Cypher uses “Ascii Art” syntax that reflects graph elements. Nodes
are represented by parentheses “( )”, relationships by arrows “-->”.
The arrow head indicates the direction of the relationship
Adding constraints “find two Person group nodes a and b that are
connected by a relationship”.
Match (a) –-> (b)
Return a.name, b.name
Match (a:Person) --> (b:Person)
Return a.name, b.name
40. RELATIONSHIPS - CYPHER SYNTAX
Anonymous relationships match all relationship and are indicated
by the arrow alone
This query syntax is used to find two nodes a and b that are
connected by the specific “ACTED_IN” relationship to match
actors with movies
Match (a) --> (b)
Match (a) –[:ACTED_IN]-> (b)
42. SIX DEGREES OF KEVIN BACON GAME – HOW
MANY LINKS ARE REQUIRED TO CONNECT AN
ACTOR TO KEVIN BACON ?
Kevin Bacon (left) and Tom Hanks in Apollo 13 (1995)
43. CYPHER QUERY FOR SIX DEGREES OF KEVIN BACON
FOR MEG RYAN USING NEO4J SHORTEST PATH
FUNCTION
MATCH p=shortestPath(
(b:Person {name:"Kevin Bacon"})-[*]-(m:Person {name:"Meg
Ryan"})
)
RETURN p
45. NEO4J DECISION TREE – ANIMAL GUESSING GAME
Mammal ?
Has Stripes ? Slithers ?
Does it Growl ?
Has Trunk ?
Zebra ?Tiger ?
Elephant ?
Snake?
YES
YES
YES
YES
YES
NO
NO
NO