Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
Graph-based investigation often enables us to identify individuals who are of special interest, and their uniqueness is due in part to their pattern of interactions. For example:
-A patient whose carepath journey leverages best-practices gained from using pattern matching algorithms that find similar issues among the data of 50 million patients
-An individual who builds a successful portfolio by implementing actions recommended by similarity algorithms that find equivalent actions by successful investors
-A participant in a criminal ring whose attempts at swindling are blocked by matching them to patterns of known fraudulent activity
Once you have identified such a pattern and a key individual, you want to search your data for similar occurrences. Similarity algorithms are the answer.
Graph Databases and Machine Learning | November 2018TigerGraph
Graph Database and Machine Learning: Finding a Happy Marriage. Graph Databases and Machine Learning
both represent powerful tools for getting more value from data, learn how they can form a harmonious marriage to up-level machine learning.
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-25
A new weapon is available for businesses wanting to accomplish more with Hadoop: native parallel graphs can reveal the connections across multiple domains and datasets in data lakes and provide powerful insights to deliver superior outcomes. In this webinar we will explain how native parallel graphs can analyze the information in data lakes to enable the following outcomes:
Recommending next best actions such as promoting a student loan to someone heading off to college, advocating life insurance to a newly married couple, and so on
Improving network utilization by analyzing petabytes of data collected from millions of IoT devices across a smart grid
Accelerating M&A activity by intelligently merging data lakes from multiple businesses.
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-35
By attending this webinar you will:
-Learn how to use TigerGraph’s no-code capabilities;
-Understand how TigerGraph is built for scale and performance;
-Get a deep dive into TigerGraph 3.0 feature enhancements.
Graph Gurus Episode 17: Seven Key Data Science Capabilities Powered by a Nati...TigerGraph
This webinar will demonstrate seven key data science capabilities using TigerGraph’s intuitive GUI, GraphStudio and GSQL queries. In this episode, we:
-Share the capabilities and tie those to specific use cases across healthcare, pharmaceutical, financial services, Telecom, Internet and government industries.
-Walk you through a sample dataset, GraphStudio UI flow, and GSQL queries demonstrating the capabilities.
-Cover client case studies for Amgen, Intuit, China Mobile, Santa Clara County, and other enterprise customers
Full Webinar: https://info.tigergraph.com/graph-gurus-21
In this Graph Gurus episode, we:
Explain the architecture and technical implementation for a TigerGraph + Spark graph-enhanced Machine Learning pipeline
Use TigerGraph both before training to extract (graph and non-graph) features and after training to apply the model on streaming data
Use Spark to train and tune machine learning models at scale
Present a solution in production at China Mobile that detects and prevents phone-based scams using machine learning with TigerGraph
Demo the data flow between Spark and TigerGraph via TigerGraph’s JDBC driver
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
Graph-based investigation often enables us to identify individuals who are of special interest, and their uniqueness is due in part to their pattern of interactions. For example:
-A patient whose carepath journey leverages best-practices gained from using pattern matching algorithms that find similar issues among the data of 50 million patients
-An individual who builds a successful portfolio by implementing actions recommended by similarity algorithms that find equivalent actions by successful investors
-A participant in a criminal ring whose attempts at swindling are blocked by matching them to patterns of known fraudulent activity
Once you have identified such a pattern and a key individual, you want to search your data for similar occurrences. Similarity algorithms are the answer.
Graph Databases and Machine Learning | November 2018TigerGraph
Graph Database and Machine Learning: Finding a Happy Marriage. Graph Databases and Machine Learning
both represent powerful tools for getting more value from data, learn how they can form a harmonious marriage to up-level machine learning.
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-25
A new weapon is available for businesses wanting to accomplish more with Hadoop: native parallel graphs can reveal the connections across multiple domains and datasets in data lakes and provide powerful insights to deliver superior outcomes. In this webinar we will explain how native parallel graphs can analyze the information in data lakes to enable the following outcomes:
Recommending next best actions such as promoting a student loan to someone heading off to college, advocating life insurance to a newly married couple, and so on
Improving network utilization by analyzing petabytes of data collected from millions of IoT devices across a smart grid
Accelerating M&A activity by intelligently merging data lakes from multiple businesses.
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-35
By attending this webinar you will:
-Learn how to use TigerGraph’s no-code capabilities;
-Understand how TigerGraph is built for scale and performance;
-Get a deep dive into TigerGraph 3.0 feature enhancements.
Graph Gurus Episode 17: Seven Key Data Science Capabilities Powered by a Nati...TigerGraph
This webinar will demonstrate seven key data science capabilities using TigerGraph’s intuitive GUI, GraphStudio and GSQL queries. In this episode, we:
-Share the capabilities and tie those to specific use cases across healthcare, pharmaceutical, financial services, Telecom, Internet and government industries.
-Walk you through a sample dataset, GraphStudio UI flow, and GSQL queries demonstrating the capabilities.
-Cover client case studies for Amgen, Intuit, China Mobile, Santa Clara County, and other enterprise customers
Full Webinar: https://info.tigergraph.com/graph-gurus-21
In this Graph Gurus episode, we:
Explain the architecture and technical implementation for a TigerGraph + Spark graph-enhanced Machine Learning pipeline
Use TigerGraph both before training to extract (graph and non-graph) features and after training to apply the model on streaming data
Use Spark to train and tune machine learning models at scale
Present a solution in production at China Mobile that detects and prevents phone-based scams using machine learning with TigerGraph
Demo the data flow between Spark and TigerGraph via TigerGraph’s JDBC driver
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-27
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms. Join us for Part 2 of our five-part webinar series on using graph algorithms for advanced analytics.
By attending this webinar you will:
- Hear about use cases for centrality graph algorithms
- Learn how to select the right algorithm for your use case
- Be able to run and tailor GSQL graph algorithms
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetTigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-37
In this Graph Gurus Episode, we:
-Learn how to process text and extract entities (words and phrases) as well as classes linking the entities using SciSpacy, a Natural Language Processing (NLP) tool.
-Import the output of NLP and semantically link it in TigerGraph
-Run advanced analytics queries with TigerGraph to analyze the relationships and deliver insights
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationTigerGraph
What atmospheric data will help you predict if it's going to rain, snow, or be windy? What position should that new athlete play? How well can you guess a person's demographic background, based on their chat activity? These are all classification problems -- trying to pick the right category or label for an entity, based on observable features. They can also be solved with machine learning.
Full Webinar: https://info.tigergraph.com/graph-gurus-28
In this webinar, we will use the recommendation system problem, which can be efficiently solved as a graph problem, to demonstrate the in-database training capability of TigerGraph, a native graph database. A hybrid (memory-based + model-based) recommendation system will be implemented in TigerGraph. Specifically, the latent factor model used for recommendation will be trained within the database.
In this Graph Gurus episode, we will:
-Review multiple widely-used recommendation methods
-Introduce the concept of in-database machine learning
-Present an in-database machine learning solution for a real time recommendation system
Knowledge graphs generation is outpacing the ability to intelligently use the information that they contain. Octavian's work is pioneering Graph Artificial Intelligence to provide the brains to make knowledge graphs useful.
Our neural networks can take questions and knowledge graphs and return answers. Imagine:
a google assistant that reads your own knowledge graph (and actually works)
a BI tool reads your business' knowledge graph
a legal assistant that reads the graph of your case
Taking a neural network approach is important because neural networks deal better with the noise in data and variety in schema. Using neural networks allows people to ask questions of the knowledge graph in their own words, not via code or query languages.
Octavian's approach is to develop neural networks that can learn to manipulate graph knowledge into answers. This approach is radically different to using networks to generate graph embeddings. We believe this approach could transform how we interact with databases.
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry.
More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation.
In this presentation you will find out:
Why AI is crucial in combinatorial optimization
How it can be applied to two use cases: improving chip design and hardware-specific compilers
The state-of-the-art results achieved by Qualcomm AI Research
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-27
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms. Join us for Part 2 of our five-part webinar series on using graph algorithms for advanced analytics.
By attending this webinar you will:
- Hear about use cases for centrality graph algorithms
- Learn how to select the right algorithm for your use case
- Be able to run and tailor GSQL graph algorithms
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetTigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-37
In this Graph Gurus Episode, we:
-Learn how to process text and extract entities (words and phrases) as well as classes linking the entities using SciSpacy, a Natural Language Processing (NLP) tool.
-Import the output of NLP and semantically link it in TigerGraph
-Run advanced analytics queries with TigerGraph to analyze the relationships and deliver insights
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationTigerGraph
What atmospheric data will help you predict if it's going to rain, snow, or be windy? What position should that new athlete play? How well can you guess a person's demographic background, based on their chat activity? These are all classification problems -- trying to pick the right category or label for an entity, based on observable features. They can also be solved with machine learning.
Full Webinar: https://info.tigergraph.com/graph-gurus-28
In this webinar, we will use the recommendation system problem, which can be efficiently solved as a graph problem, to demonstrate the in-database training capability of TigerGraph, a native graph database. A hybrid (memory-based + model-based) recommendation system will be implemented in TigerGraph. Specifically, the latent factor model used for recommendation will be trained within the database.
In this Graph Gurus episode, we will:
-Review multiple widely-used recommendation methods
-Introduce the concept of in-database machine learning
-Present an in-database machine learning solution for a real time recommendation system
Knowledge graphs generation is outpacing the ability to intelligently use the information that they contain. Octavian's work is pioneering Graph Artificial Intelligence to provide the brains to make knowledge graphs useful.
Our neural networks can take questions and knowledge graphs and return answers. Imagine:
a google assistant that reads your own knowledge graph (and actually works)
a BI tool reads your business' knowledge graph
a legal assistant that reads the graph of your case
Taking a neural network approach is important because neural networks deal better with the noise in data and variety in schema. Using neural networks allows people to ask questions of the knowledge graph in their own words, not via code or query languages.
Octavian's approach is to develop neural networks that can learn to manipulate graph knowledge into answers. This approach is radically different to using networks to generate graph embeddings. We believe this approach could transform how we interact with databases.
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry.
More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation.
In this presentation you will find out:
Why AI is crucial in combinatorial optimization
How it can be applied to two use cases: improving chip design and hardware-specific compilers
The state-of-the-art results achieved by Qualcomm AI Research
Keynote presentation from ECBS conference. The talk is about how to use machine learning and AI in improving software engineering. Experiences from our project in Software Center (www.software-center.se).
WSO2 Machine Learner takes data one step further, pairing data gathering and analytics with predictive intelligence: this helps you understand not just the present, but to predict scenarios and generate solutions for the future.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
AI in Finance
A picture is worth a thousand words- implementation of machine learning techniques using Churn and Recommender systems - Image analytics helps in item tagging, image searching & automating task of categorizing millions of untagged product catalog images in real-time for e-commerce websites. The end-result - drives more intelligent and profitable business decisions
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/-rGRHrED94Y.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
Most machine learning systems enable two essential processes: creating a model and applying the model in a repeatable and controlled fashion. These two processes are interrelated and pose technological and organizational challenges as they evolve from research to prototype to production. This presentation outlines common design patterns for tackling such challenges while implementing machine learning in a production environment.
Sergei's Bio:
Dr. Sergei Izrailev is Chief Data Scientist at BeeswaxIO, where he is responsible for data strategy and building AI applications powering the next generation of real-time bidding technology. Before Beeswax, Sergei led data science teams at Integral Ad Science and Collective, where he focused on architecture, development and scaling of data science based advertising technology products. Prior to advertising, Sergei was a quant/trader and developed trading strategies and portfolio optimization methodologies. Previously, he worked as a senior scientist at Johnson & Johnson, where he developed intelligent tools for structure-based drug discovery. Sergei holds a Ph.D. in Physics and Master of Computer Science degrees from the University of Illinois at Urbana-Champaign.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
Leveraging data to become more customer-centric is a key factor for online retail sales. Using a host of Machine learning techniques like recommender systems, image analytics, customer churn and demand prediction- can impact sales, customer loyalty & improve revenues
This presentation was made on June 9th, 2020.
Video recording of the session can be viewed here: https://youtu.be/OCB9sTUnUug
In this meetup with Sanyam Bhutani, Machine Learning Engineer at H2O.ai, he gives a recap of the eight annual ICLR (International Conference on Learning Representations) 2020 - a niche deep learning conference whose focus is to study how to learn representations of data, which is basically what deep learning does.
Sanyam goes through a few of his favorite selected papers from this year’s ICLR, note this session may not be able to capture the richness of all papers or allow a detailed discussion.
You will be able to find Sanyam in our community slack (https://www.h2o.ai/slack-community/), please feel free to start a discussion with him, if you send a emoji greeting, you’ll find the answers.
Following are the papers we will look into:
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Your classifier is secretly an energy based model and you should treat it like one
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Reformer: The Efficient Transformer
Generative Models for Effective ML on Private, Decentralized Datasets
Once for All: Train One Network and Specialize it for Efficient Deployment
Thieves on Sesame Street! Model Extraction of BERT-based APIs
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
Real or Not Real, that is the Question
Similar to Fast Parallel Similarity Calculations with FPGA Hardware (20)
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Fast Parallel Similarity Calculations with FPGA Hardware
1. Fast Parallel Similarity Calculations
with FPGA Hardware
Dan McCreary, Distinguished Engineer, Optum
Kumar Deepak, Distinguished Engineer, Xilinx
Graph + AI World 2020
September 29, 2020
2.
Talk Description
The foundation of recommendation is finding similar customers and their
purchasing patterns. Yet, if you have 100 million customers it can take
hours to do similarity calculations on just 200 features. However, since
these calculations can be done in parallel, we show that using an FPGA
can allow these calculations to be done in under 30 msec. This session
will show how using TigerGraph User Defined Functions (UDF), similarity
calculations, and therefore product recommendations can be done in
real-time as customers visit your web site.
2
3.
Overview
Dan:
• What is graph similarity?
• Why is it critical in recommendation systems?
• Serial vs. Parallel algorithms
• Cosine similarity and graph embeddings
Kumar:
• What is an FPGA?
• FPGA configuration used in our benchmarks
• Calling FPGA from TigerGraph using a User Defined Function
• Benchmark Results
Summary: Both
3
4.
About Dan
4
• Distinguished Engineer at Optum Healthcare
(330K employees and 32K technical staff)
• Focused on AI enterprise knowledge graphs
• Help create the worlds largest healthcare
graph
• Coauthor of book "Making Sense of NoSQL”
• Worked at Bell Labs as a VLSI circuit designer
• Worked for Steve Jobs at NeXT
5.
Why Are Similarity Calculations Critical?
Similarity is at the foundation of recommendation engines
Recommendation engines power sites like:
• Google – recommend a document
• NetFlix™ – recommend a movie
• Amazon - recommend a product
• Pintrest™ – recommend an interest
• Healthcare – recommend a care path
Recommendations must take into account many factors
including recent searches
To be useful in interactive web sites we set a goal of
response times of under 200 milliseconds
5
Similarity
Recommendation
Next Best Action
6.
Real-Time Patient Similarity
• Given a new patient that arrives a clinical setting, how can we quickly find the most similar patients?
• Assumption: we have 10M clinical records of our population of 235 million members
• Can we find the 100 most similar patients in under 200 milliseconds?
6
New Patient
Arrives in ER
Sample Patient Populations (10s of millions)
Which ones are
the most similar
to this patient?
7.
Similarity Score – A scaled measure of “alikeness” for a context
• A single scaled dimension of comparison for a given setting or context
• Comparing a patient to itself would have a similarity score of 1.0
• Patients that have few common characteristics would have a score of 0.1
7
.43
.92
.1
8.
Graph Representation of Patients – Includes Structure
8
Sample Patient Population (10M)
Target Patient
9.
Graph Representation of Patients – Includes Structure
9
Sample Patient Population (10M)
Target Patient
How can I quickly compare these graphs and find the most similar patients?
Similarity score = .5
Similarity score = .9
Similarity score = .8
10.
Serial vs. Parallel Graph Algorithms
• One task cannot begin before the prior task is
complete
• Task order is important
• Serial algorithms work well on traditional CPUs
• Many tasks can be done independently
• Task order in not relevent
• Tasks can usually be done faster on GPU or
FPGAs
10
Serial Graph Algorithms Parallel Graph Algorithms
Task Task TaskStart End
Task
Task
Task
Start End
11.
The Human Brain Does Both Serial and Parallel Computation
Let’s use our brains as a demo!
The following slide has photos of two people
•One is a famous actor
•The other is a synthetically generated image of a person (generated by a AI
program)
How long will it take for you to recognize which one the famous actor?
11
13.
What just happened?
1. Your visual cortex received the images as electrical signals from your eyes
2. Your brain identified key features of each face from the images - in parallel
3. Your brain sent these features as electrical signals to your memories of people’s faces
4. Your brain compared these features to every memory you have ever had of a person’s face – in
parallel
5. Your brain sent their recognition scores to a control center of your brain
6. Your brain’s speech center vocalized the word “right” – in series
13
Key Questions:
1. How does the brain know to pay attention to specific features of a face?
2. What portions of real-time clinical decision support systems can be done cost effectively in parallel?
Answer: The human brain, comprised of around 84B neurons, does both parallel and series calculations
14.
Property-based Similarity Example
14
Used to find the most similar items in a graph by comparing properties and structure
Ideal when you a can compare individual features of an item numerically
Algorithms return a ranking of similarity between a target an a population based on the counts and
weights of properties that are similar
Target
Patient
Population of 10M Patients
15.
Vector Similarity
15
Vectors are similar in two dimensional space if they have the same length and direction
Compare all the “x” lengths and the “y” lengths and rank by the sum of the totals of the
difference
y
x
Population Vectors
Target
Vector 1st
last
16.
Vector Representation
16
Each item can be represented by a series of “feature” vectors
The numbers are scalers
x=8
y=10
16
16
8
6
4
3
7
9
-4
6
-4
6
-12
-8
Target
Vector
Population
Vectors
Most similar
Least similar
17.
Cosine Used in Comparing Direction
17
The dot product of two vectors a and b (sometimes called the inner product, or, since its
result is a scalar, the scalar product) is denoted by a ∙ b and is defined as:
If the lines are exactly in the same direction, then theta is 0, cos(0) = 1
If the lines are 90 or -90 degrees apart they are in orthogonal directions cos(90) = 0
a
b
18.
Why Vector Conversion
18
Computers are very good at comparing numeric values
Comparison of vectors is a well studied problem (weighted cosine similarity)
Comparing a target vector to many other vectors (50M) is a class of “Embarrassingly Parallel” type problem that is perfect
for hardware acceleration using FPGAs
1
10
5
8
14
3
6
57
34
15
5
66
Input Target Vector
1
32
5
8
14
3
6
57
34
15
5
66
0
45
5
8
14
3
6
57
34
15
5
66
1
10
5
8
14
3
6
57
34
15
5
66
1
10
5
8
14
3
6
57
34
15
5
66 Top 100 Patient IDs
(Tigergraph 64 bit vertex IDs)
Returned in 100msec
Vector Comparison Hardware Service
…50M…
1234
3467
5546
8234
1423
…100
PIDs…
PIDs
200 32-bit integers
19.
Can Machine Learning Tell us What Features are Important?
19
Old way: manually create a program that will extract 200 integers
for each customer that classifies their behavior
• Age
• Gender
• Location
• Responsiveness to e-mail survey
• How proactive are they about their health?
• Likely to recommend your company
• Slow process requiring manual coding of feature extraction
rules
New algorithms such as node2vec use a random walk algorithms
to automatically create the 200 integers that will help use
differentiate patients
Embedding: 200 32-bit integers
20.
The Rise of Automatic Feature Engineering
20
Recent years have seen a surge in approaches that
automatically learn to encode graph structure into
low-dimensional embeddings.
The central problem in machine learning on graphs is finding a
way to incorporate information about the structure of the graph
into the machine learning model.
From Representation Learning on Graphs: Methods and Applications by Hamilton (et. al.)
21.
Example of Graph Embedding – Encode and Decode
21
From Representation Learning on Graphs: Methods and Applications
22.
Cosine Used in Comparing Direction
22
The dot product of two vectors a and b (sometimes called the inner product, or, since its
result is a scalar, the scalar product) is denoted by a ∙ b and is defined as:
If the lines are exactly in the same direction, then theta is 0, cos(0) = 1
If the lines are 90 or -90 degrees apart they are in orthogonal directions cos(90) = 0
a
b
23.
The Rise of Automatic Feature Engineering
23
Recent years have seen a surge in approaches that automatically learn
to encode graph structure into low-dimensional embeddings.
The central problem in machine learning on graphs is finding a way to
incorporate information about the structure of the graph into the machine
learning model.
From Representation Learning on Graphs: Methods and Applications by Hamilton et. El.
24.
Example of Graph Embedding – Encode and Decode
24
From Representation Learning on Graphs: Methods and Applications by Hamilton et. El.
40.
Three General REST Services to Support Similarity
40
Bulk Upload Data: input - millions of vectors; output – success/failure code
Update Vector: input - vertex ID, 198 integers; output – success/failure code
Find Similar: input - vertex ID, 198 integers; output – 100 vertex IDs (64 bits)
Similarity Server
REST GET Results (csv, JSON)
41.
Onward to the Hardware Graph!
41
Single Node
Graph
Enterprise
Knowledge
Graph
Hardware
Graph
Data Hubs
Data Lake
Algorithmic
Richness
Scale out
42.
Related Use Cases
42
Recommendation Engines for Healthcare
• For any person calling in for a recommended provider or senior living facility, can we find similar
recommendations in the past?
Incident Reporting
• When trouble ticket is reported, what are the most similar problems and what were their solutions?
Errors in Log Files
• When there are are error messages in log files, how can we find similar errors and their solutions?
Learning Content
• Can we recommend learning content for employees that have similar goals?
Schema Mapping
• Automate the process of creating data transformation maps for new data to existing schemas