SlideShare a Scribd company logo
Thomas Cook
Sales Director, AnzoGraph DB
e: thomas.cook@cambridgesemantics.com
w: www.anzograph.com
Knowledge Graphs for Machine Learning
and Data Science
#DCAF 2020
Feb 6, 2020
Data Continues to Grow
AI and ML Adoption Grows for Better & Faster Insights
Need for:
– Automated Data Preparation & Better Understanding
– Explainable AI & ML with Provenance
– Improved Algorithms & Analytics
– Cost Efficient Operations
Context
Knowledge
Graphs
&
Graph
Analytics
Knowledge Graphs to Automate
Data Preparation & Improve Common Understanding
©2019 Cambridge Semantics Inc. All rights reserved.
The Data Preparation Problem
Data Access
● Manual ETL coding
● Practicalities limit the # of
sources and types of data
Data Processing
● Laborious discovery, profiling
and selection
● Use of rules and coding for
harmonization & cleansing
Feature Engineering
● Manual coding to transform
data
● Manual feature engineering
& selection
1 Cleaning Big Data, Forbes Magazine
70-80% of time spent in Data Preparation & Feature Engineering
4
Viewed as the “least enjoyable” part of work by 76% of data scientists1
Structured Data
Automated Deployment and Operations
Storage and Compute Integration
MODEL
Graph Data Model
• Lift Data into
Data Fabric
• Design Ontologies
• Connect Data
Models
ON-BOARD
Ingest & Map
• Automated ETL
• Collaborative
Mapping
• Metadata
Capture
Enterprise
Data Sources
Machine
Learning and AI
Enterprise
Search
“Last Mile”
Analytics Tools
Metadata Catalog
Semantic-based Metadata Management, Governance and Lineage
Cloud or On-Prem Data Storage Infrastructure
Data Storage Layer
Ingest
BLEND
GraphMarts
• Combine and Align
Related Data Sets
• In-memory MPP
OLAP Query Engine
• Data Layers
ACCESS
Hi-Res Analytics
• Analyze All
Data Together
• Fast, Iterative Queries
Ad Hoc, What if
• Code Free or API
Graphical Application Interface
Anzo - The Modern Data Discovery and Integration Layer for the Enterprise Data Fabric
©2019 Cambridge Semantics Inc. All rights reserved.
Automated Data Ingestion & Cataloging
Unstructured Data
Notes, Docs, Emails,
Articles
Structured Data
Relational, CSV,
HDFS, External
Data Feeds
CatalogIngest
NLP, Text Analytics,
Sentiment Analysis Data Catalog
Semantic
Layer
Data Harmonization – Structured or Unstructured
• Harmonize Many Data Sources
• Automated Unstructured Data Extraction &
Categorization
Data Wrangling Capability
• Profile, Rules… Manage & Clean incoming data
• Setup Re-usable Data Wrangling Jobs
• Provenance to Manage Data
Data Catalog
• Explore, Secure & Manage Dataset Assets
Result: Cleaner, quality data, faster & from many
more sources
6
©2018 Cambridge Semantics Inc. All rights reserved.
A big web of data
understandable at
the data level
©2019 Cambridge Semantics Inc. All rights reserved.
Allow for Easy Understanding & Handling of Data
Rules to Link &
Conform Data
Raw Data
Business Ready
Datasets Create Data
Layers
Build Graph
Marts
©2019 Cambridge Semantics Inc. All rights reserved.
Expedite & Optimize Feature Engineering
Use visual interface with no coding
for feature selection
• Query Knowledge Graph and generate
features
• Conduct data transformation using a library
of functions
• Compute new derived features
• De-normalize data
• Aggregate ranges
• Convert numeric values to alpha values
• Pivot values
and much more!
©2019 Cambridge Semantics Inc. All rights reserved.
Operationalizing of Machine Learning Models
10
Explainable
Insights
Manage Data Sets with
Provenance & Data Lineage
• Anzo retains end-to-end
data lineage
• Track transformations
Easy to Export Data to
ML & Data Science Tools
• Use Odata/REST APIs,
SQL ODBC/JDBC
• Export to R, Python,
downstream systems, …
Deploy & Continuously
Improve Model Performance
• Set up deployment pipelines with
learnings to help in feature selection
• Horizontally scale runtime environment
• Can be auto-deployed behind the
firewall or on the Cloud
Using Knowledge Graphs with
Graph Analytics Database as
Scalable Infrastructure for ML & Data Science
“Graph analytics will grow in the next few years
due to the need to ask complex questions across
complex data, which is not always practical or
even possible at scale using SQL queries”
…Gartner – Top 10 Data and Analytics Technology Trends for 2019
What it is:
● Fast, Scalable Graph Database
○ In-Memory Massively Parallel Processing
(MPP) ACID-Compliant Graph Database
○ Supports RDF & Labelled Property Graphs
What it does:
○ Fast Data Loading
○ Fast Query
○ Rich Analytics
■ Graph Algorithms
■ BI/DW Analytics
■ Inferencing
■ Data Science/Feature Engineering
Algorithms
■ Define-Your-Own Analytics
○ Linear Database Scaling
○ Persist data on cheap storage
Based on Open Standards
• Built on RDF & SPARQL 1.1 standards
• LPG with the RDF* /SPARQL*
• LPG with Cypher (in 2020)
Deploy on-prem or cloud
• Kubernetes/Helm on-demand cloud
deployment
• AWS, Google and Azure
AnzoGraph™ DB
Awards
Select Customers
217 X
AnzoGraph DB when compared
to Neo4j on and industry
standard
TPC-H & Graph 500
benchmarks
113 X
AnzoGraph’s LUBM
benchmark performance over
previous fastest result
30 X
AnzoGraph’s performance on
graph algorithms over SPARK
SQL and SPARK with
GraphFrames
Benchmarks
©2019 Cambridge Semantics Inc. All rights reserved.
Graph OLAP Built for Analytics at Scale and Speed
SQL OLAP vs Graph OLAP
SQL OLAP Graph OLAP
On-line Analytics at Massive Parallel
Processing (MPP) Scale with SQL Database
Example
Netezza
Amazon Redshift
Analytics
• Warehouse-Style BI Analytics
On-line Analytics at Massive Parallel Processing (MPP)
Scale with Native Graph Database
Example
AnzoGraph DB
Analytics
• Warehouse-Style BI Analytics
• Graph Algorithms
• Inferencing
• Data Science Functions
©2019 Cambridge Semantics Inc. All rights reserved.
Graph OLAP Built for Analytics at Scale and Speed
Graph OLTP vs Graph OLAP
Graph OLTP Graph OLAP
Transactional databases
• Built for building transactional
applications & individual
transactions
• Scales vertically
Example
Neo4j
AWS Neptune
Analytical databases
• Built for analytics and to deal with scale &
performance
• Deep Link analysis
• Analytics on the population
• Scales horizontally
• Can complement Graph OLTP systems
Example
AnzoGraph DB
Page
Labelled Property Graphs facilitates Analytics
isA: <Man>
birthday: 09/17/1975
isA: <Woman>
Birthday: 4/23/1979
isA: <Place>
has: Water
has: Trees
partOf: <TheMountain>
Person
: Jill
Person
: Jack
Place:
The
Hill
friendOf
WentUp
WentUp
metAt=<TheHill>
metDate=07/04/2018
Date=07/04/2018
Date=07/04/2018
Today with RDF* and SPARQL*
• Relationships can be described as
clearly as any LPG database
RDF*/SPARQL* extensions to the
standard make W3C open standards
databases even more capable
Page
Algorithms and Analytical Capabilities
Graph Patterns
Negation
Property Paths
BIND
Aggregates
Basic Federated Query
ORDER BY and offsets
Functions on Strings
Functions on Numerics
Functions on Dates and
Times
Hash Functions
Basic Graph Patterns
Count/Avg
Min/Max
GroupConcat
Sample
Page Rank
Shortest Path
All Path
Label Propagation
Weakly Connected
Components
K neighborhood
Counting Triangles
Inferences (RDFS+)
Labeled Property
Graphs (RDF*)
Window Aggregates
Advanced Grouping
Sets
Named Views
Named Queries
Conditional
Expressions
User-Defined
Extensions
SPARQL 1.1
Standards
AnzoGraph® DB
Extras
Graph Algorithms
and Inferencing
Data Science
Extensions (UDX)
Distributions
● Bernoulli
● Binomial
● Chi-squared
● Exponential
● Hypergeometric
● Laplace
● Log Normal
● Logarithmic Series
● Negative Binomial
● Normal
Correlations
● Pearson
Entropy
● Cross Entropy
● Differential Entropy
Page
User-defined Extensions (UDXs):
Allows users to extend AnzoGraph DB functionality for custom usage
User-Defined
Functions
(UDF)
Create and register custom analytic functions, such as functions that
concatenate values or convert integers to alternate currencies.
User-Defined
Aggregates
(UDA)
Create and register aggregate functions, such as functions that
compute the arithmetic mean or calculate the average number from
a list of maximum and minimum values.
User-Defined
Services
(UDS)
Create and register services that create local SPARQL endpoints.
User-Defined
Tables (UDT)
Create and register a function that is repeatedly invoked within a
query to generate the rows of a table on-the-fly.
Data
Science
Functions
User-
defined
Functions
(UDX)
Functions you can build in JAVA or C++
©2019 Cambridge Semantics Inc. All rights reserved.
Execute Supervised & Unsupervised ML with Graph Algorithms
Graph Algorithm
• PageRank
• Shortest Path
• K-neighbors
• All Paths
• Counting Triangles
• Weakly Connected
Components
• Label Propagation
• Triangle Enumeration
• Triangle Counting
• Clustering Coefficient
and more!
Who is the most influential person in your
customer list?
What’s the most important item relating to a
search of your knowledge graph?
What is the shortest path to your destination
across a route?
What’s the optimal path for packets to travel
across your network
source: Wikipedia
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
Graph Algorithms produce additional Features to train ML Models
Graph Algorithms
source: Wikipedia
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
Execute Inferencing using RDFS+ and OWL 2 RL
Person:
Jack
Person:
Jill
Is Married
Inference
Is Married
Person:
Jack
Person:
Sam
Knows
Inference
Knows
AnzoGraph allows you to insert inferred triples into
the specified target graph
If Jack is married to Jill, then you can
definitely infer that Jill is married to Jack
Jack knows Sam, but Sam may not know Jack.
Here, the inference is less clear
Both cases are supported.
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
•Wrangling, Blending, Munging, Transformations, Enrichment, Views
•Use statistical functions, transformations or enrichment to get the data
into the form needed for the downstream ML pipeline
INSERT {
graph <myNewGraph> {
?s a <Person>;
<fullname> ?fullname
}
}
USING <myOldGraph>
WHERE {
?s a <Person>;
<firstname> ?fname;
<lastname> ?lname;
BIND(CONCAT(?fname, “ “, ?lname) as ?fullname)
}
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
Materialized Views – good for heavy calculations - perform once - use many times
CREATE MATERIALIZED VIEW <ages> AS
CONSTRUCT { ?person <age> ?age . }
WHERE { GRAPH <tickit> {
{ SELECT ?person ((YEAR(?date))-(YEAR(xsd:dateTime(?birthdate))) AS ?age)
WHERE {
?person <birthday> ?birthdate .
BIND(xsd:dateTime(NOW()) AS ?date)
}
}
}
}
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
•Enrichment
Add new features from federated call using SERVICE call to
Linked Open Data Cloud or other internal SPARQL endpoints
Example:
Look up address and geocodes for company, census population data,
crime rate, demographics, etc. All these can be new features to fed
into ML pipeline
©2019 Cambridge Semantics Inc. All rights reserved.
ML Step #1: Data Prep: Data Discovery and Feature Engineering
2.1 Bernoulli Distribution Determines the probability of Success or Failure (or Yes or No).
2.2 Binomial Distribution Determines the probability of success versus failure.
2.3 Chi-squared Distribution Determines the relationship between two categorical variables.
2.4 Exponential Distribution Determines the probability of event occurrence in time interval when past event number is unknown
2.5 Hypergeometric Distribution Determines the probability of success versus failure of a specific scenario.
2.6 Laplace Distribution Determines the probability of intervals.
2.7 Log Normal Distribution To model certain instances, such as the change in price distribution of a stock or commodity positions.
2.8 Logarithmic Series Distribution Determines the probability of occurrence of events like claim frequencies in insurance companies.
2.9 Negative Binomial Distribution Determines the probability of success versus failure.
2.10 Normal Distribution Model and determines probabilities of all natural and social data.
2.11 Poisson Distribution Determines the probability that a certain number of events will occur in a specific time period.
2.12 Skellam Distribution Determines the probability of two independent variables.
2.13 Beta-binomial Distribution Model number of successes in n binomial trials when probability of success p is a Beta random variable.
2.14 Continuous Uniform Distribution Assigns equal probability to all values between its minimum and maximum.
2.15 Discrete Uniform Distribution Determines the probability of finite number of outcomes equally likely to happen.
2.16 Student’s t-Distribution Determines the probability when sample size is small.
2.17 Weibull Distribution Used to assess product reliability, analyse life data and model failure times.
©2019 Cambridge Semantics Inc. All rights reserved.
ML Step #1: Data Discovery and Feature Engineering
Correlations
3.1 Pearson Correlation Coefficient Determines the positive, negative or no relationship between two variables.
3.2 Matthews Correlation Coefficient Determines the positive, negative or no relationship between two binary variables (0 & 1).
3.3 Spearman’s Rank Correlation Coefficient Measures the strength of a linear relationship between paired data.
5.1 Principal Component Analysis Reduces the dimensionality of large data sets and making predictive models.
6.1 Geometric Mean Determines the average growth rates.
6.2 Skewness Metric Calculates Pearson’s coefficient of skewness on Numeric Values.
6.3 T-Digest Metric Determines the percentile and quantile values accurately.
Feature Exploration
Profiling Metrics
RDF*/SPARQL*
Real World Example: Airline Delay Data Analysis
Page
Labelled Property Graphs facilitates Analytics
isA: <Man>
birthday: 09/17/1975
isA: <Woman>
Birthday: 4/23/1979
isA: <Place>
has: Water
has: Trees
partOf: <TheMountain>
Person
: Jill
Person
: Jack
Place:
The
Hill
friendOf
WentUp
WentUp
metAt=<TheHill>
metDate=07/04/2018
Date=07/04/2018
Date=07/04/2018
Today with RDF* and SPARQL*
• Relationships can be described as
clearly as any LPG database
RDF*/SPARQL* extensions to the
standard make W3C open standards
databases even more capable
https://www.transtats.bts.gov/ot_delay/OT_DelayCause1.asp?pn=1
Public Flight Delay Data Analysis
YEAR
MONTH
DAY
DAY_OF_WEEK
AIRLINE
FLIGHT_NUMBER
TAIL_NUMBER
ORIGIN_AIRPORT
DESTINATION_AIRPORT
SCHEDULED_DEPARTURE
DEPARTURE_TIME
DEPARTURE_DELAY
TAXI_OUT
WHEELS_OFF
SCHEDULED_TIME
ELAPSED_TIME
Input CSV – 32 Columns - 5,819,080 records
ELAPSED_TIME
AIR_TIME
DISTANCE
WHEELS_ON
TAXI_IN
SCHEDULED_ARRIVAL
ARRIVAL_TIME
ARRIVAL_DELAY
DIVERTED
CANCELLED
CANCELLATION_REASON
AIR_SYSTEM_DELAY
SECURITY_DELAY
AIRLINE_DELAY
LATE_AIRCRAFT_DELAY
WEATHER_DELAY
Conversion from CSV to Graph – Defining Triples
Flight
Airport
Airport
FlightDeparture
FlightArrival
DESTINATION
FlightAirport
Airport
Conversion from CSV to Graph
Flight
AirportAirport
FlightDeparture FlightArrival
DESTINATION
Nodes have types and properties
Flight
YEAR
MONTH
DAY
DAY_OF_WEEK
AIRLINE
FLIGHT_NUMBER
TAIL_NUMBER
ORIGIN_AIRPORT
DESTINATION_AIRPORT
….
Node Type: Flight
Node Properties:
Airline,
Flight Number,
Tail Number,
etc
*Note: Types can also be called Labels, as in Labeled Property
Graphs or LPG
With RDF* edges can also have properties
AirportAirport
DESTINATION
DISTANCE = 187
AIRPORT_CODE = ‘BOS”
Edge Property:
DISTANCE
AIRPORT_CODE = ‘JFK”
Page
...
TABLE <s3://csi-notebook-datasets/Flight_Dataset/flights10k.csv>
('ContentType'='text/CSV','Schema'=',H,YEAR:int,MONTH:int,DAY:int,DAY_OF_WEEK:int,AIRLINE:c
har,FLIGHT_NUMBER:char,TAIL_NUMBER:char,ORIGIN_AIRPORT:char,DESTINATION_AIRPORT:char,SCHEDU
LED_DEPARTURE:char,DEPARTURE_TIME:char,DEPARTURE_DELAY:int,TAXI_OUT:int,WHEELS_OFF:char,SCH
EDULED_TIME:int,ELAPSED_TIME:int,AIR_TIME:int,DISTANCE:int,WHEELS_ON:char,TAXI_IN:int,SCHED
ULED_ARRIVAL:char,ARRIVAL_TIME:char,ARRIVAL_DELAY:int,DIVERTED:int,CANCELLED:int,CANCELLATI
ON_REASON:char,AIR_SYSTEM_DELAY:int,SECURITY_DELAY:int,AIRLINE_DELAY:int,LATE_AIRCRAFT_DELA
Y:int,WEATHER_DELAY:int')
Loading CSV with TABLE expression
Specify CSV file
name and location
CSV Column names
and Data Types
Page
INSERT { GRAPH <airline_flight_network> {
?OriginIRI a <Airport> ;
<AIRPORT_CODE> ?ORIGIN_AIRPORT .
<< ?OriginIRI <DESTINATION> ?DestinationIRI >> <DISTANCE> ?DISTANCE .
?DestinationIRI a <Airport> ;
<AIRPORT_CODE> ?DESTINATION_AIRPORT .
<< ?DestinationIRI <DESTINATION> ?OriginIRI >> <DISTANCE> ?DISTANCE .
?FlightIRI a <Flight> ;
<YEAR> ?YEAR ;
<MONTH> ?MONTH ;
<DAY> ?DAY ;
<DAY_OF_WEEK> ?DAY_OF_WEEK ;
<AIRLINE> ?AIRLINE;
<FLIGHT_NUMBER> ?FLIGHT_NUMBER;
<TAIL_NUMBER> ?TAIL_NUMBER;
<ORIGIN_AIRPORT> ?ORIGIN_AIRPORT;
<DESTINATION_AIRPORT> ?DESTINATION_AIRPORT;
Conversion from CSV to RDF* Triples via SPARQL
Node Type: Airport
Node Type: Airport
Node Type: Flight
Node Properties:
Flight Number,
Tail Number,
etc
Edge Property:
Distance
Flight Delay Data
Airport Info
Census Data
FAA Aircraft Registrations
Integration and ELT
Combining additional data sets
Flight
AirportAirport
FlightDeparture FlightArrival
DESTINATION
CityState
Aircraft
Airline
Country
Airline
Aircraft
CityState
Country
FAA Airline Census Data
Flight Delay
Now we are ready to ask questions like:
BI-Style Analytics
#1 Longest flight segments by distance from Boston (BOS)
#2 Airports less the 400 mi from Boston (BOS) - Network Viewer output
#3 Longest distances between two airports
#4 Longest flights by elapsed time
#5 Airlines with the longest average delays
#6 Airlines with the most flights
#7 Longest 2 segments reachable from Boston and the distances of each segment
#8 Which segments have the longest average departure delays
Graph Algorithms
#9 Page Rank - Graph Algorithm - Show most well-connected airports based on page rank algorithm
#10 Shortest Path Graph Algorithm - show shortest paths and # of segments (hops) from AUS
select * from <airline_flight_network> where
{ SERVICE <csi:shortest_path> {
[]
<csi:binding-source-vertex> ?source_vertex_variable_name ;
<csi:binding-vertex> ?node ;
<csi:binding-predecessor> ?predecessor_variable_name ;
<csi:binding-distance> ?distance ;
<csi:graph> <airline_flight_network> ;
<csi:source-vertex> <BOS> ;
<csi:destination-vertex> <HNL> ;
<csi:edge-label> <DESTINATION> ;
<csi:weighted> true .
}
}
Shortest Path graph algorithm leverages RDF*/SPARQL*
AIRLINE DEMO
©2019 Cambridge Semantics Inc. All rights reserved.
Scalability
Graph OLAP – Horizontally Scalable
Have more data. Need better performance. Add more servers
Deploy on VMs or bare metal with a TAR file that
is compatible with CentOS
Automated deployment in the Cloud.
Available in the AWS Marketplace & others soon
60 Day Full Feature Free Trial. Download or Cloud Deployment. Visit booth or AnzoGraph.com
Download AnzoGraph DB Free Edition Today! http://AnzoGraph.com
THANK YOU

More Related Content

What's hot

ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
Peter Haase
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
Ontotext
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Vivek Aanand Ganesan
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
Sergio Zenatti Filho
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
Neo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
Max De Marzi
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
Mark Kromer
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
Neo4j
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Harri Kauhanen
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 

What's hot (20)

ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 

Similar to Knowledge Graph for Machine Learning and Data Science

AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
Cambridge Semantics
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Cambridge Semantics
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
Neo4j
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
Amazon Web Services
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph Database
Neo4j
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Tech Triveni
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
Amazon Web Services
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
Amazon Web Services
 

Similar to Knowledge Graph for Machine Learning and Data Science (20)

AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph Database
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 

More from Cambridge Semantics

Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Cambridge Semantics
 
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Cambridge Semantics
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Cambridge Semantics
 
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Cambridge Semantics
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Cambridge Semantics
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
Cambridge Semantics
 
Introduction to RDF*
Introduction to RDF*Introduction to RDF*
Introduction to RDF*
Cambridge Semantics
 
AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101
Cambridge Semantics
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Cambridge Semantics
 
Healthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common DataHealthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common Data
Cambridge Semantics
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
Cambridge Semantics
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
Cambridge Semantics
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
Cambridge Semantics
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
Cambridge Semantics
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
Cambridge Semantics
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
Cambridge Semantics
 
Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
Cambridge Semantics
 
Accelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study AnalyticsAccelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study Analytics
Cambridge Semantics
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Cambridge Semantics
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Cambridge Semantics
 

More from Cambridge Semantics (20)

Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
 
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
 
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Introduction to RDF*
Introduction to RDF*Introduction to RDF*
Introduction to RDF*
 
AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Healthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common DataHealthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common Data
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Accelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study AnalyticsAccelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study Analytics
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Knowledge Graph for Machine Learning and Data Science

  • 1. Thomas Cook Sales Director, AnzoGraph DB e: thomas.cook@cambridgesemantics.com w: www.anzograph.com Knowledge Graphs for Machine Learning and Data Science #DCAF 2020 Feb 6, 2020
  • 2. Data Continues to Grow AI and ML Adoption Grows for Better & Faster Insights Need for: – Automated Data Preparation & Better Understanding – Explainable AI & ML with Provenance – Improved Algorithms & Analytics – Cost Efficient Operations Context Knowledge Graphs & Graph Analytics
  • 3. Knowledge Graphs to Automate Data Preparation & Improve Common Understanding
  • 4. ©2019 Cambridge Semantics Inc. All rights reserved. The Data Preparation Problem Data Access ● Manual ETL coding ● Practicalities limit the # of sources and types of data Data Processing ● Laborious discovery, profiling and selection ● Use of rules and coding for harmonization & cleansing Feature Engineering ● Manual coding to transform data ● Manual feature engineering & selection 1 Cleaning Big Data, Forbes Magazine 70-80% of time spent in Data Preparation & Feature Engineering 4 Viewed as the “least enjoyable” part of work by 76% of data scientists1 Structured Data
  • 5. Automated Deployment and Operations Storage and Compute Integration MODEL Graph Data Model • Lift Data into Data Fabric • Design Ontologies • Connect Data Models ON-BOARD Ingest & Map • Automated ETL • Collaborative Mapping • Metadata Capture Enterprise Data Sources Machine Learning and AI Enterprise Search “Last Mile” Analytics Tools Metadata Catalog Semantic-based Metadata Management, Governance and Lineage Cloud or On-Prem Data Storage Infrastructure Data Storage Layer Ingest BLEND GraphMarts • Combine and Align Related Data Sets • In-memory MPP OLAP Query Engine • Data Layers ACCESS Hi-Res Analytics • Analyze All Data Together • Fast, Iterative Queries Ad Hoc, What if • Code Free or API Graphical Application Interface Anzo - The Modern Data Discovery and Integration Layer for the Enterprise Data Fabric
  • 6. ©2019 Cambridge Semantics Inc. All rights reserved. Automated Data Ingestion & Cataloging Unstructured Data Notes, Docs, Emails, Articles Structured Data Relational, CSV, HDFS, External Data Feeds CatalogIngest NLP, Text Analytics, Sentiment Analysis Data Catalog Semantic Layer Data Harmonization – Structured or Unstructured • Harmonize Many Data Sources • Automated Unstructured Data Extraction & Categorization Data Wrangling Capability • Profile, Rules… Manage & Clean incoming data • Setup Re-usable Data Wrangling Jobs • Provenance to Manage Data Data Catalog • Explore, Secure & Manage Dataset Assets Result: Cleaner, quality data, faster & from many more sources 6
  • 7. ©2018 Cambridge Semantics Inc. All rights reserved. A big web of data understandable at the data level
  • 8. ©2019 Cambridge Semantics Inc. All rights reserved. Allow for Easy Understanding & Handling of Data Rules to Link & Conform Data Raw Data Business Ready Datasets Create Data Layers Build Graph Marts
  • 9. ©2019 Cambridge Semantics Inc. All rights reserved. Expedite & Optimize Feature Engineering Use visual interface with no coding for feature selection • Query Knowledge Graph and generate features • Conduct data transformation using a library of functions • Compute new derived features • De-normalize data • Aggregate ranges • Convert numeric values to alpha values • Pivot values and much more!
  • 10. ©2019 Cambridge Semantics Inc. All rights reserved. Operationalizing of Machine Learning Models 10 Explainable Insights Manage Data Sets with Provenance & Data Lineage • Anzo retains end-to-end data lineage • Track transformations Easy to Export Data to ML & Data Science Tools • Use Odata/REST APIs, SQL ODBC/JDBC • Export to R, Python, downstream systems, … Deploy & Continuously Improve Model Performance • Set up deployment pipelines with learnings to help in feature selection • Horizontally scale runtime environment • Can be auto-deployed behind the firewall or on the Cloud
  • 11. Using Knowledge Graphs with Graph Analytics Database as Scalable Infrastructure for ML & Data Science
  • 12. “Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries” …Gartner – Top 10 Data and Analytics Technology Trends for 2019
  • 13. What it is: ● Fast, Scalable Graph Database ○ In-Memory Massively Parallel Processing (MPP) ACID-Compliant Graph Database ○ Supports RDF & Labelled Property Graphs What it does: ○ Fast Data Loading ○ Fast Query ○ Rich Analytics ■ Graph Algorithms ■ BI/DW Analytics ■ Inferencing ■ Data Science/Feature Engineering Algorithms ■ Define-Your-Own Analytics ○ Linear Database Scaling ○ Persist data on cheap storage Based on Open Standards • Built on RDF & SPARQL 1.1 standards • LPG with the RDF* /SPARQL* • LPG with Cypher (in 2020) Deploy on-prem or cloud • Kubernetes/Helm on-demand cloud deployment • AWS, Google and Azure AnzoGraph™ DB Awards Select Customers
  • 14. 217 X AnzoGraph DB when compared to Neo4j on and industry standard TPC-H & Graph 500 benchmarks 113 X AnzoGraph’s LUBM benchmark performance over previous fastest result 30 X AnzoGraph’s performance on graph algorithms over SPARK SQL and SPARK with GraphFrames Benchmarks
  • 15. ©2019 Cambridge Semantics Inc. All rights reserved. Graph OLAP Built for Analytics at Scale and Speed SQL OLAP vs Graph OLAP SQL OLAP Graph OLAP On-line Analytics at Massive Parallel Processing (MPP) Scale with SQL Database Example Netezza Amazon Redshift Analytics • Warehouse-Style BI Analytics On-line Analytics at Massive Parallel Processing (MPP) Scale with Native Graph Database Example AnzoGraph DB Analytics • Warehouse-Style BI Analytics • Graph Algorithms • Inferencing • Data Science Functions
  • 16. ©2019 Cambridge Semantics Inc. All rights reserved. Graph OLAP Built for Analytics at Scale and Speed Graph OLTP vs Graph OLAP Graph OLTP Graph OLAP Transactional databases • Built for building transactional applications & individual transactions • Scales vertically Example Neo4j AWS Neptune Analytical databases • Built for analytics and to deal with scale & performance • Deep Link analysis • Analytics on the population • Scales horizontally • Can complement Graph OLTP systems Example AnzoGraph DB
  • 17. Page Labelled Property Graphs facilitates Analytics isA: <Man> birthday: 09/17/1975 isA: <Woman> Birthday: 4/23/1979 isA: <Place> has: Water has: Trees partOf: <TheMountain> Person : Jill Person : Jack Place: The Hill friendOf WentUp WentUp metAt=<TheHill> metDate=07/04/2018 Date=07/04/2018 Date=07/04/2018 Today with RDF* and SPARQL* • Relationships can be described as clearly as any LPG database RDF*/SPARQL* extensions to the standard make W3C open standards databases even more capable
  • 18. Page Algorithms and Analytical Capabilities Graph Patterns Negation Property Paths BIND Aggregates Basic Federated Query ORDER BY and offsets Functions on Strings Functions on Numerics Functions on Dates and Times Hash Functions Basic Graph Patterns Count/Avg Min/Max GroupConcat Sample Page Rank Shortest Path All Path Label Propagation Weakly Connected Components K neighborhood Counting Triangles Inferences (RDFS+) Labeled Property Graphs (RDF*) Window Aggregates Advanced Grouping Sets Named Views Named Queries Conditional Expressions User-Defined Extensions SPARQL 1.1 Standards AnzoGraph® DB Extras Graph Algorithms and Inferencing Data Science Extensions (UDX) Distributions ● Bernoulli ● Binomial ● Chi-squared ● Exponential ● Hypergeometric ● Laplace ● Log Normal ● Logarithmic Series ● Negative Binomial ● Normal Correlations ● Pearson Entropy ● Cross Entropy ● Differential Entropy
  • 19. Page User-defined Extensions (UDXs): Allows users to extend AnzoGraph DB functionality for custom usage User-Defined Functions (UDF) Create and register custom analytic functions, such as functions that concatenate values or convert integers to alternate currencies. User-Defined Aggregates (UDA) Create and register aggregate functions, such as functions that compute the arithmetic mean or calculate the average number from a list of maximum and minimum values. User-Defined Services (UDS) Create and register services that create local SPARQL endpoints. User-Defined Tables (UDT) Create and register a function that is repeatedly invoked within a query to generate the rows of a table on-the-fly. Data Science Functions User- defined Functions (UDX) Functions you can build in JAVA or C++
  • 20. ©2019 Cambridge Semantics Inc. All rights reserved. Execute Supervised & Unsupervised ML with Graph Algorithms Graph Algorithm • PageRank • Shortest Path • K-neighbors • All Paths • Counting Triangles • Weakly Connected Components • Label Propagation • Triangle Enumeration • Triangle Counting • Clustering Coefficient and more! Who is the most influential person in your customer list? What’s the most important item relating to a search of your knowledge graph? What is the shortest path to your destination across a route? What’s the optimal path for packets to travel across your network source: Wikipedia Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 21. ©2019 Cambridge Semantics Inc. All rights reserved. Graph Algorithms produce additional Features to train ML Models Graph Algorithms source: Wikipedia Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 22. ©2019 Cambridge Semantics Inc. All rights reserved. Execute Inferencing using RDFS+ and OWL 2 RL Person: Jack Person: Jill Is Married Inference Is Married Person: Jack Person: Sam Knows Inference Knows AnzoGraph allows you to insert inferred triples into the specified target graph If Jack is married to Jill, then you can definitely infer that Jill is married to Jack Jack knows Sam, but Sam may not know Jack. Here, the inference is less clear Both cases are supported. Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 23. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering •Wrangling, Blending, Munging, Transformations, Enrichment, Views •Use statistical functions, transformations or enrichment to get the data into the form needed for the downstream ML pipeline INSERT { graph <myNewGraph> { ?s a <Person>; <fullname> ?fullname } } USING <myOldGraph> WHERE { ?s a <Person>; <firstname> ?fname; <lastname> ?lname; BIND(CONCAT(?fname, “ “, ?lname) as ?fullname) }
  • 24. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering Materialized Views – good for heavy calculations - perform once - use many times CREATE MATERIALIZED VIEW <ages> AS CONSTRUCT { ?person <age> ?age . } WHERE { GRAPH <tickit> { { SELECT ?person ((YEAR(?date))-(YEAR(xsd:dateTime(?birthdate))) AS ?age) WHERE { ?person <birthday> ?birthdate . BIND(xsd:dateTime(NOW()) AS ?date) } } } }
  • 25. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering •Enrichment Add new features from federated call using SERVICE call to Linked Open Data Cloud or other internal SPARQL endpoints Example: Look up address and geocodes for company, census population data, crime rate, demographics, etc. All these can be new features to fed into ML pipeline
  • 26. ©2019 Cambridge Semantics Inc. All rights reserved. ML Step #1: Data Prep: Data Discovery and Feature Engineering 2.1 Bernoulli Distribution Determines the probability of Success or Failure (or Yes or No). 2.2 Binomial Distribution Determines the probability of success versus failure. 2.3 Chi-squared Distribution Determines the relationship between two categorical variables. 2.4 Exponential Distribution Determines the probability of event occurrence in time interval when past event number is unknown 2.5 Hypergeometric Distribution Determines the probability of success versus failure of a specific scenario. 2.6 Laplace Distribution Determines the probability of intervals. 2.7 Log Normal Distribution To model certain instances, such as the change in price distribution of a stock or commodity positions. 2.8 Logarithmic Series Distribution Determines the probability of occurrence of events like claim frequencies in insurance companies. 2.9 Negative Binomial Distribution Determines the probability of success versus failure. 2.10 Normal Distribution Model and determines probabilities of all natural and social data. 2.11 Poisson Distribution Determines the probability that a certain number of events will occur in a specific time period. 2.12 Skellam Distribution Determines the probability of two independent variables. 2.13 Beta-binomial Distribution Model number of successes in n binomial trials when probability of success p is a Beta random variable. 2.14 Continuous Uniform Distribution Assigns equal probability to all values between its minimum and maximum. 2.15 Discrete Uniform Distribution Determines the probability of finite number of outcomes equally likely to happen. 2.16 Student’s t-Distribution Determines the probability when sample size is small. 2.17 Weibull Distribution Used to assess product reliability, analyse life data and model failure times.
  • 27. ©2019 Cambridge Semantics Inc. All rights reserved. ML Step #1: Data Discovery and Feature Engineering Correlations 3.1 Pearson Correlation Coefficient Determines the positive, negative or no relationship between two variables. 3.2 Matthews Correlation Coefficient Determines the positive, negative or no relationship between two binary variables (0 & 1). 3.3 Spearman’s Rank Correlation Coefficient Measures the strength of a linear relationship between paired data. 5.1 Principal Component Analysis Reduces the dimensionality of large data sets and making predictive models. 6.1 Geometric Mean Determines the average growth rates. 6.2 Skewness Metric Calculates Pearson’s coefficient of skewness on Numeric Values. 6.3 T-Digest Metric Determines the percentile and quantile values accurately. Feature Exploration Profiling Metrics
  • 28. RDF*/SPARQL* Real World Example: Airline Delay Data Analysis
  • 29. Page Labelled Property Graphs facilitates Analytics isA: <Man> birthday: 09/17/1975 isA: <Woman> Birthday: 4/23/1979 isA: <Place> has: Water has: Trees partOf: <TheMountain> Person : Jill Person : Jack Place: The Hill friendOf WentUp WentUp metAt=<TheHill> metDate=07/04/2018 Date=07/04/2018 Date=07/04/2018 Today with RDF* and SPARQL* • Relationships can be described as clearly as any LPG database RDF*/SPARQL* extensions to the standard make W3C open standards databases even more capable
  • 31. YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF SCHEDULED_TIME ELAPSED_TIME Input CSV – 32 Columns - 5,819,080 records ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY LATE_AIRCRAFT_DELAY WEATHER_DELAY
  • 32. Conversion from CSV to Graph – Defining Triples Flight Airport Airport FlightDeparture FlightArrival DESTINATION FlightAirport Airport
  • 33. Conversion from CSV to Graph Flight AirportAirport FlightDeparture FlightArrival DESTINATION
  • 34. Nodes have types and properties Flight YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT …. Node Type: Flight Node Properties: Airline, Flight Number, Tail Number, etc *Note: Types can also be called Labels, as in Labeled Property Graphs or LPG
  • 35. With RDF* edges can also have properties AirportAirport DESTINATION DISTANCE = 187 AIRPORT_CODE = ‘BOS” Edge Property: DISTANCE AIRPORT_CODE = ‘JFK”
  • 37. Page INSERT { GRAPH <airline_flight_network> { ?OriginIRI a <Airport> ; <AIRPORT_CODE> ?ORIGIN_AIRPORT . << ?OriginIRI <DESTINATION> ?DestinationIRI >> <DISTANCE> ?DISTANCE . ?DestinationIRI a <Airport> ; <AIRPORT_CODE> ?DESTINATION_AIRPORT . << ?DestinationIRI <DESTINATION> ?OriginIRI >> <DISTANCE> ?DISTANCE . ?FlightIRI a <Flight> ; <YEAR> ?YEAR ; <MONTH> ?MONTH ; <DAY> ?DAY ; <DAY_OF_WEEK> ?DAY_OF_WEEK ; <AIRLINE> ?AIRLINE; <FLIGHT_NUMBER> ?FLIGHT_NUMBER; <TAIL_NUMBER> ?TAIL_NUMBER; <ORIGIN_AIRPORT> ?ORIGIN_AIRPORT; <DESTINATION_AIRPORT> ?DESTINATION_AIRPORT; Conversion from CSV to RDF* Triples via SPARQL Node Type: Airport Node Type: Airport Node Type: Flight Node Properties: Flight Number, Tail Number, etc Edge Property: Distance
  • 38. Flight Delay Data Airport Info Census Data FAA Aircraft Registrations Integration and ELT
  • 39. Combining additional data sets Flight AirportAirport FlightDeparture FlightArrival DESTINATION CityState Aircraft Airline Country Airline Aircraft CityState Country FAA Airline Census Data Flight Delay
  • 40. Now we are ready to ask questions like: BI-Style Analytics #1 Longest flight segments by distance from Boston (BOS) #2 Airports less the 400 mi from Boston (BOS) - Network Viewer output #3 Longest distances between two airports #4 Longest flights by elapsed time #5 Airlines with the longest average delays #6 Airlines with the most flights #7 Longest 2 segments reachable from Boston and the distances of each segment #8 Which segments have the longest average departure delays Graph Algorithms #9 Page Rank - Graph Algorithm - Show most well-connected airports based on page rank algorithm #10 Shortest Path Graph Algorithm - show shortest paths and # of segments (hops) from AUS
  • 41. select * from <airline_flight_network> where { SERVICE <csi:shortest_path> { [] <csi:binding-source-vertex> ?source_vertex_variable_name ; <csi:binding-vertex> ?node ; <csi:binding-predecessor> ?predecessor_variable_name ; <csi:binding-distance> ?distance ; <csi:graph> <airline_flight_network> ; <csi:source-vertex> <BOS> ; <csi:destination-vertex> <HNL> ; <csi:edge-label> <DESTINATION> ; <csi:weighted> true . } } Shortest Path graph algorithm leverages RDF*/SPARQL*
  • 43. ©2019 Cambridge Semantics Inc. All rights reserved. Scalability Graph OLAP – Horizontally Scalable Have more data. Need better performance. Add more servers Deploy on VMs or bare metal with a TAR file that is compatible with CentOS Automated deployment in the Cloud. Available in the AWS Marketplace & others soon 60 Day Full Feature Free Trial. Download or Cloud Deployment. Visit booth or AnzoGraph.com
  • 44. Download AnzoGraph DB Free Edition Today! http://AnzoGraph.com