SlideShare a Scribd company logo
AnzoGraph.com
Driving AI and Machine Insights with
Knowledge Graphs in a Connected
World
Thomas Cook
Sales Director, AnzoGraph DB
Data Continues to Grow
AI and ML Demand Increases
Complexity of Data Ecosystem Grows
• Need to Build on Existing Analytics Capabilities with:
• Automated Data Preparation & Better Understanding
• Explainable AI & ML with Provenance
• Improved Algorithms & Analytics
• Cost Efficient Operations
Context
Knowledge
Graphs
&
Graph
Analytics
The Data Preparation Problem
Data Access
● Manual ETL coding
● Practicalities limit the # of
sources and types of data
Data Processing
● Laborious discovery, profiling
and selection
● Use of rules and coding for
harmonization & cleansing
Feature Engineering
● Manual coding to transform
data
● Manual feature engineering
& selection
1 Cleaning Big Data, Forbes Magazine
70-80% of time spent in Data Preparation & Feature Engineering
3
Viewed as the “least enjoyable” part of work by 76% of data scientists1
Structured Data
Traditional approaches to connecting siloed data
Rigid data model
Relationships more difficult
Expensive
Does not adapt well to change
Concurrency & Performance
Raw operational data dumps become
unwieldly, difficult to consume and manage
Referred to as the “Data Swamp”
Data engineering efforts are costly,
complex, lack lineage, often times not
repeatable
Heavy volume Spark clusters are
difficult to manage and tune properly
Data Warehouse
Data Lake
“Graph analytics will grow in the next few years
due to the need to ask complex questions across
complex data, which is not always practical or
even possible at scale using SQL queries”
…Gartner – Top 10 Data and Analytics Technology Trends for 2019
Knowledge Graphs to Automate
Data Preparation & Improve Common Understanding
gg
Why Graph?
Graph’s Flexible Data
Model
Rich insights on
relationships, not just
entities
Leveraging Industry
Data Models
Process & analyze growing amounts of diverse data
Structured and
Unstructured Data
Natural Language
Sparse Data
Data you know you need to
analyze
Data you don’t know you need to
analyze
AI and ML
Traditional Analytics
Unique and Insightful
Analytics
Feature Engineering
Can evolve as data
changes
Refactoring often not
necessary as
data/needs evolve
What is a Knowledge Graph?
Data Architect View
One method to integrate data
from multiple data sets,
structured or unstructured, and
to leverage standard industry
ontologies to enhance analytics.
Executive View
Common understanding of all
disparate data.
Ontologist View
The best way to represent knowledge
and meaning and provide linkage and
relationship information in a data
analytics platform. Ontologies are at the
center providing a way to standardize
and enhance the conceptual model.
Inferencing provides semantic reasoning
for better understanding.
©2018 Cambridge Semantics Inc. All rights reserved.
Canonical data model
provides context for
common understanding
Easy to find and access the
right data
Automate complex data
preparation tasks
Perform deep link analysis
of complex relationships
for improved insights
Using Knowledge Graphs with
Graph Analytics Database as
Scalable Infrastructure for ML & Data Science
Analytical Capabilities - AnzoGraph DB
Negation
Property Paths
BIND
Aggregates
Basic Federated Query
ORDER BY and offsets
Functions on Strings
Functions on Numerics
Functions on Dates and
Times
Hash Functions
Basic Graph Patterns
Count/Avg
Min/Max
GroupConcat
Sample
Algorithms like:
• Page Rank
• Shortest Path
• All Path
• Label Propagation
• Weakly Connected
Components
• K neighborhood
• Counting Triangles
Inferencing
(RDFS+ OWL 2 DL)
OpenCypher (preview)
Labeled Property
Graphs (RDF*)
OLAP Scalability:
• Window Aggregates
• Advanced Grouping
Sets
• Named Views
• Named Queries
• Conditional
Expressions
Database via
SPARQL
AnzoGraph® DB
Extras
Graph Database
and Inferencing
Data Science
Extensions
UDX
Distributions
● Bernoulli
● Binomial
● Chi-squared
● Exponential
● Hypergeometric
● Laplace
● Log Normal
● Logarithmic Series
● Negative Binomial
● Normal
Correlations
● Pearson
Entropy
● Cross Entropy
● Differential Entropy
Design your own
Over 150 Geospatial Functions
AnzoGraph.com
Define Regions
• Points, polygon,
circles
Use Common
geospatial shape data
files
• Shp, Geojson, KML,
WKT, WKB
Understand
Relationships
• Equals, Intersects,
Overlaps, Disjoint,
Within, Touches
Convert Coordinate
Systems
• Cartesian, Spherical,
Cylindrical, Elliptical
Virtualization
AnzoGraph Benchmark Results
217 X
AnzoGraph DB when compared to Neo4j
on and industry standard
TPC-H benchmark
113 X
AnzoGraph’s LUBM benchmark
performance over previous fastest result
10-300X
AnzoGraph’s performance on graph
algorithms over SPARK SQL and SPARK
with GraphFrames
Analytical Benchmarks
Knowledge Graphs Use Cases
Traditional & Graph Analytics
Schema-less Data Model
Standards
Customizable Algorithms
Open Platform
Use Cases
AnzoGraph DB
• Data Harmonization & Analytics
• Enterprise Knowledge Graphs
• Scientific Data Discovery
• Customer 360
• Supply Chain
• IoT
• Fraud Detection
• Financial Research
• Network Optimization
• Anti-Money Laundering
Parabole and AnzoGraph Cognitive Analytics - alphaESG
• Extract text and
relationships from massive
amounts documents and
news feed
• Use AnzoGraph DB to
• Create cognitive models
• Contextualize news,
filings & reports
• Harmonize data from
SASB and various data
sources
• Provide customized
outputs and signals
• Execute analytics
Download AnzoGraph DB Free Edition Today http://AnzoGraph.com
AnzoGraph.com
Thank you
AnzoGraph.com
Extra Slides
©2019 Cambridge Semantics Inc. All rights reserved.
Execute Supervised & Unsupervised ML with Graph Algorithms
Graph Algorithm
• PageRank
• Shortest Path
• K-neighbors
• All Paths
• Counting Triangles
• Weakly Connected
Components
• Label Propagation
• Triangle Enumeration
• Triangle Counting
• Clustering Coefficient
and more!
Who is the most influential person in your
customer list?
What’s the most important item relating to a
search of your knowledge graph?
What is the shortest path to your destination
across a route?
What’s the optimal path for packets to travel
across your network
source: Wikipedia
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
Graph Algorithms produce additional Features to train ML Models
Graph Algorithms
source: Wikipedia
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
What it is:
● Fast, Scalable Graph Database
○ In-Memory Massively Parallel Processing
(MPP) ACID-Compliant Graph Database
○ Supports RDF & Labelled Property Graphs
What it does:
○ Fast Data Loading
○ Fast Query
○ Rich Analytics
■ Graph Algorithms
■ BI/DW Analytics
■ Inferencing
■ Data Science/Feature Engineering
Algorithms
■ Define-Your-Own Analytics
○ Linear Database Scaling
○ Persist data on cheap storage
Based on Open Standards
• Built on RDF & SPARQL 1.1 standards
• LPG with the RDF* /SPARQL*
• LPG with Cypher (in 2020)
Deploy on-prem or cloud
• Kubernetes/Helm on-demand cloud
deployment
• AWS, Google and Azure
AnzoGraph™ DB
Awards
Select Customers
©2019 Cambridge Semantics Inc. All rights reserved.
Graph OLAP Built for Analytics at Scale and Speed
SQL OLAP vs Graph OLAP
SQL OLAP Graph OLAP
On-line Analytics at Massive Parallel
Processing (MPP) Scale with SQL Database
Example
Netezza
Amazon Redshift
Analytics
• Warehouse-Style BI Analytics
On-line Analytics at Massive Parallel Processing (MPP)
Scale with Native Graph Database
Example
AnzoGraph DB
Analytics
• Warehouse-Style BI Analytics
• Graph Algorithms
• Inferencing
• Data Science Functions
©2019 Cambridge Semantics Inc. All rights reserved.
Graph OLAP Built for Analytics at Scale and Speed
Graph OLTP vs Graph OLAP
Graph OLTP Graph OLAP
Transactional databases
• Built for building transactional
applications & individual
transactions
• Scales vertically
Example
Neo4j
AWS Neptune
Analytical databases
• Built for analytics and to deal with scale &
performance
• Deep Link analysis
• Analytics on the population
• Scales horizontally
• Can complement Graph OLTP systems
Example
AnzoGraph DB
Page
Labelled Property Graphs facilitates Analytics
isA: <Man>
birthday: 09/17/1975
isA: <Woman>
Birthday: 4/23/1979
isA: <Place>
has: Water
has: Trees
partOf: <TheMountain>
Person
: Jill
Person
: Jack
Place:
The
Hill
friendOf
WentUp
WentUp
metAt=<TheHill>
metDate=07/04/2018
Date=07/04/2018
Date=07/04/2018
Today with RDF* and SPARQL*
• Relationships can be described as
clearly as any LPG database
RDF*/SPARQL* extensions to the
standard make W3C open standards
databases even more capable
Page
User-defined Extensions (UDXs):
Allows users to extend AnzoGraph DB functionality for custom usage
User-Defined
Functions
(UDF)
Create and register custom analytic functions, such as functions that
concatenate values or convert integers to alternate currencies.
User-Defined
Aggregates
(UDA)
Create and register aggregate functions, such as functions that
compute the arithmetic mean or calculate the average number from
a list of maximum and minimum values.
User-Defined
Services
(UDS)
Create and register services that create local SPARQL endpoints.
User-Defined
Tables (UDT)
Create and register a function that is repeatedly invoked within a
query to generate the rows of a table on-the-fly.
Data
Science
Functions
User-
defined
Functions
(UDX)
Functions you can build in JAVA or C++
©2019 Cambridge Semantics Inc. All rights reserved.
Execute Inferencing using RDFS+ and OWL 2 RL
Person:
Jack
Person:
Jill
Is Married
Inference
Is Married
Person:
Jack
Person:
Sam
Knows
Inference
Knows
AnzoGraph allows you to insert inferred triples into
the specified target graph
If Jack is married to Jill, then you can
definitely infer that Jill is married to Jack
Jack knows Sam, but Sam may not know Jack.
Here, the inference is less clear
Both cases are supported.
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
•Wrangling, Blending, Munging, Transformations, Enrichment, Views
•Use statistical functions, transformations or enrichment to get the data
into the form needed for the downstream ML pipeline
INSERT {
graph <myNewGraph> {
?s a <Person>;
<fullname> ?fullname
}
}
USING <myOldGraph>
WHERE {
?s a <Person>;
<firstname> ?fname;
<lastname> ?lname;
BIND(CONCAT(?fname, “ “, ?lname) as ?fullname)
}
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
Materialized Views – good for heavy calculations - perform once - use many times
CREATE MATERIALIZED VIEW <ages> AS
CONSTRUCT { ?person <age> ?age . }
WHERE { GRAPH <tickit> {
{ SELECT ?person ((YEAR(?date))-(YEAR(xsd:dateTime(?birthdate))) AS ?age)
WHERE {
?person <birthday> ?birthdate .
BIND(xsd:dateTime(NOW()) AS ?date)
}
}
}
}
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
•Enrichment
Add new features from federated call using SERVICE call to
Linked Open Data Cloud or other internal SPARQL endpoints
Example:
Look up address and geocodes for company, census population data,
crime rate, demographics, etc. All these can be new features to fed
into ML pipeline
©2019 Cambridge Semantics Inc. All rights reserved.
ML Step #1: Data Prep: Data Discovery and Feature Engineering
2.1 Bernoulli Distribution Determines the probability of Success or Failure (or Yes or No).
2.2 Binomial Distribution Determines the probability of success versus failure.
2.3 Chi-squared Distribution Determines the relationship between two categorical variables.
2.4 Exponential Distribution Determines the probability of event occurrence in time interval when past event number is unknown
2.5 Hypergeometric Distribution Determines the probability of success versus failure of a specific scenario.
2.6 Laplace Distribution Determines the probability of intervals.
2.7 Log Normal Distribution To model certain instances, such as the change in price distribution of a stock or commodity positions.
2.8 Logarithmic Series Distribution Determines the probability of occurrence of events like claim frequencies in insurance companies.
2.9 Negative Binomial Distribution Determines the probability of success versus failure.
2.10 Normal Distribution Model and determines probabilities of all natural and social data.
2.11 Poisson Distribution Determines the probability that a certain number of events will occur in a specific time period.
2.12 Skellam Distribution Determines the probability of two independent variables.
2.13 Beta-binomial Distribution Model number of successes in n binomial trials when probability of success p is a Beta random variable.
2.14 Continuous Uniform Distribution Assigns equal probability to all values between its minimum and maximum.
2.15 Discrete Uniform Distribution Determines the probability of finite number of outcomes equally likely to happen.
2.16 Student’s t-Distribution Determines the probability when sample size is small.
2.17 Weibull Distribution Used to assess product reliability, analyse life data and model failure times.
©2019 Cambridge Semantics Inc. All rights reserved.
ML Step #1: Data Discovery and Feature Engineering
Correlations
3.1 Pearson Correlation Coefficient Determines the positive, negative or no relationship between two variables.
3.2 Matthews Correlation Coefficient Determines the positive, negative or no relationship between two binary variables (0 & 1).
3.3 Spearman’s Rank Correlation Coefficient Measures the strength of a linear relationship between paired data.
5.1 Principal Component Analysis Reduces the dimensionality of large data sets and making predictive models.
6.1 Geometric Mean Determines the average growth rates.
6.2 Skewness Metric Calculates Pearson’s coefficient of skewness on Numeric Values.
6.3 T-Digest Metric Determines the percentile and quantile values accurately.
Feature Exploration
Profiling Metrics
©2019 Cambridge Semantics Inc. All rights reserved.
Scalability
Graph OLAP – Horizontally Scalable
Have more data. Need better performance. Add more servers
Deploy on VMs or bare metal with a TAR file that
is compatible with CentOS
Automated deployment in the Cloud.
Available in the AWS Marketplace & others soon
60 Day Full Feature Free Trial. Download or Cloud Deployment. Visit booth or AnzoGraph.com
Automated Deployment and Operations
Storage and Compute Integration
MODEL
Graph Data Model
• Lift Data into
Data Fabric
• Design Ontologies
• Connect Data
Models
ON-BOARD
Ingest & Map
• Automated ETL
• Collaborative
Mapping
• Metadata
Capture
Enterprise
Data Sources
Machine
Learning and AI
Enterprise
Search
“Last Mile”
Analytics Tools
Metadata Catalog
Semantic-based Metadata Management, Governance and Lineage
Cloud or On-Prem Data Storage Infrastructure
Data Storage Layer
Ingest
BLEND
GraphMarts
• Combine and Align
Related Data Sets
• In-memory MPP
OLAP Query Engine
• Data Layers
ACCESS
Hi-Res Analytics
• Analyze All
Data Together
• Fast, Iterative Queries
Ad Hoc, What if
• Code Free or API
Graphical Application Interface
Anzo - The Modern Data Discovery and Integration Layer for the Enterprise Data Fabric

More Related Content

What's hot

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Anastasija Nikiforova
 

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 

Similar to AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Connected World

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 

Similar to AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Connected World (20)

Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Using graphs for recommendations
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendations
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1) Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1)
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 

More from Cambridge Semantics

More from Cambridge Semantics (20)

Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
 
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
 
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Introduction to RDF*
Introduction to RDF*Introduction to RDF*
Introduction to RDF*
 
AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Healthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common DataHealthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common Data
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Accelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study AnalyticsAccelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study Analytics
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 

Recently uploaded

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 

AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Connected World

  • 1. AnzoGraph.com Driving AI and Machine Insights with Knowledge Graphs in a Connected World Thomas Cook Sales Director, AnzoGraph DB
  • 2. Data Continues to Grow AI and ML Demand Increases Complexity of Data Ecosystem Grows • Need to Build on Existing Analytics Capabilities with: • Automated Data Preparation & Better Understanding • Explainable AI & ML with Provenance • Improved Algorithms & Analytics • Cost Efficient Operations Context Knowledge Graphs & Graph Analytics
  • 3. The Data Preparation Problem Data Access ● Manual ETL coding ● Practicalities limit the # of sources and types of data Data Processing ● Laborious discovery, profiling and selection ● Use of rules and coding for harmonization & cleansing Feature Engineering ● Manual coding to transform data ● Manual feature engineering & selection 1 Cleaning Big Data, Forbes Magazine 70-80% of time spent in Data Preparation & Feature Engineering 3 Viewed as the “least enjoyable” part of work by 76% of data scientists1 Structured Data
  • 4. Traditional approaches to connecting siloed data Rigid data model Relationships more difficult Expensive Does not adapt well to change Concurrency & Performance Raw operational data dumps become unwieldly, difficult to consume and manage Referred to as the “Data Swamp” Data engineering efforts are costly, complex, lack lineage, often times not repeatable Heavy volume Spark clusters are difficult to manage and tune properly Data Warehouse Data Lake
  • 5. “Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries” …Gartner – Top 10 Data and Analytics Technology Trends for 2019
  • 6. Knowledge Graphs to Automate Data Preparation & Improve Common Understanding
  • 7. gg Why Graph? Graph’s Flexible Data Model Rich insights on relationships, not just entities Leveraging Industry Data Models Process & analyze growing amounts of diverse data Structured and Unstructured Data Natural Language Sparse Data Data you know you need to analyze Data you don’t know you need to analyze AI and ML Traditional Analytics Unique and Insightful Analytics Feature Engineering Can evolve as data changes Refactoring often not necessary as data/needs evolve
  • 8. What is a Knowledge Graph? Data Architect View One method to integrate data from multiple data sets, structured or unstructured, and to leverage standard industry ontologies to enhance analytics. Executive View Common understanding of all disparate data. Ontologist View The best way to represent knowledge and meaning and provide linkage and relationship information in a data analytics platform. Ontologies are at the center providing a way to standardize and enhance the conceptual model. Inferencing provides semantic reasoning for better understanding.
  • 9. ©2018 Cambridge Semantics Inc. All rights reserved. Canonical data model provides context for common understanding Easy to find and access the right data Automate complex data preparation tasks Perform deep link analysis of complex relationships for improved insights
  • 10. Using Knowledge Graphs with Graph Analytics Database as Scalable Infrastructure for ML & Data Science
  • 11. Analytical Capabilities - AnzoGraph DB Negation Property Paths BIND Aggregates Basic Federated Query ORDER BY and offsets Functions on Strings Functions on Numerics Functions on Dates and Times Hash Functions Basic Graph Patterns Count/Avg Min/Max GroupConcat Sample Algorithms like: • Page Rank • Shortest Path • All Path • Label Propagation • Weakly Connected Components • K neighborhood • Counting Triangles Inferencing (RDFS+ OWL 2 DL) OpenCypher (preview) Labeled Property Graphs (RDF*) OLAP Scalability: • Window Aggregates • Advanced Grouping Sets • Named Views • Named Queries • Conditional Expressions Database via SPARQL AnzoGraph® DB Extras Graph Database and Inferencing Data Science Extensions UDX Distributions ● Bernoulli ● Binomial ● Chi-squared ● Exponential ● Hypergeometric ● Laplace ● Log Normal ● Logarithmic Series ● Negative Binomial ● Normal Correlations ● Pearson Entropy ● Cross Entropy ● Differential Entropy Design your own
  • 12. Over 150 Geospatial Functions AnzoGraph.com Define Regions • Points, polygon, circles Use Common geospatial shape data files • Shp, Geojson, KML, WKT, WKB Understand Relationships • Equals, Intersects, Overlaps, Disjoint, Within, Touches Convert Coordinate Systems • Cartesian, Spherical, Cylindrical, Elliptical
  • 14. AnzoGraph Benchmark Results 217 X AnzoGraph DB when compared to Neo4j on and industry standard TPC-H benchmark 113 X AnzoGraph’s LUBM benchmark performance over previous fastest result 10-300X AnzoGraph’s performance on graph algorithms over SPARK SQL and SPARK with GraphFrames Analytical Benchmarks
  • 16. Traditional & Graph Analytics Schema-less Data Model Standards Customizable Algorithms Open Platform Use Cases AnzoGraph DB • Data Harmonization & Analytics • Enterprise Knowledge Graphs • Scientific Data Discovery • Customer 360 • Supply Chain • IoT • Fraud Detection • Financial Research • Network Optimization • Anti-Money Laundering
  • 17. Parabole and AnzoGraph Cognitive Analytics - alphaESG • Extract text and relationships from massive amounts documents and news feed • Use AnzoGraph DB to • Create cognitive models • Contextualize news, filings & reports • Harmonize data from SASB and various data sources • Provide customized outputs and signals • Execute analytics
  • 18. Download AnzoGraph DB Free Edition Today http://AnzoGraph.com
  • 21. ©2019 Cambridge Semantics Inc. All rights reserved. Execute Supervised & Unsupervised ML with Graph Algorithms Graph Algorithm • PageRank • Shortest Path • K-neighbors • All Paths • Counting Triangles • Weakly Connected Components • Label Propagation • Triangle Enumeration • Triangle Counting • Clustering Coefficient and more! Who is the most influential person in your customer list? What’s the most important item relating to a search of your knowledge graph? What is the shortest path to your destination across a route? What’s the optimal path for packets to travel across your network source: Wikipedia Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 22. ©2019 Cambridge Semantics Inc. All rights reserved. Graph Algorithms produce additional Features to train ML Models Graph Algorithms source: Wikipedia Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 23. What it is: ● Fast, Scalable Graph Database ○ In-Memory Massively Parallel Processing (MPP) ACID-Compliant Graph Database ○ Supports RDF & Labelled Property Graphs What it does: ○ Fast Data Loading ○ Fast Query ○ Rich Analytics ■ Graph Algorithms ■ BI/DW Analytics ■ Inferencing ■ Data Science/Feature Engineering Algorithms ■ Define-Your-Own Analytics ○ Linear Database Scaling ○ Persist data on cheap storage Based on Open Standards • Built on RDF & SPARQL 1.1 standards • LPG with the RDF* /SPARQL* • LPG with Cypher (in 2020) Deploy on-prem or cloud • Kubernetes/Helm on-demand cloud deployment • AWS, Google and Azure AnzoGraph™ DB Awards Select Customers
  • 24. ©2019 Cambridge Semantics Inc. All rights reserved. Graph OLAP Built for Analytics at Scale and Speed SQL OLAP vs Graph OLAP SQL OLAP Graph OLAP On-line Analytics at Massive Parallel Processing (MPP) Scale with SQL Database Example Netezza Amazon Redshift Analytics • Warehouse-Style BI Analytics On-line Analytics at Massive Parallel Processing (MPP) Scale with Native Graph Database Example AnzoGraph DB Analytics • Warehouse-Style BI Analytics • Graph Algorithms • Inferencing • Data Science Functions
  • 25. ©2019 Cambridge Semantics Inc. All rights reserved. Graph OLAP Built for Analytics at Scale and Speed Graph OLTP vs Graph OLAP Graph OLTP Graph OLAP Transactional databases • Built for building transactional applications & individual transactions • Scales vertically Example Neo4j AWS Neptune Analytical databases • Built for analytics and to deal with scale & performance • Deep Link analysis • Analytics on the population • Scales horizontally • Can complement Graph OLTP systems Example AnzoGraph DB
  • 26. Page Labelled Property Graphs facilitates Analytics isA: <Man> birthday: 09/17/1975 isA: <Woman> Birthday: 4/23/1979 isA: <Place> has: Water has: Trees partOf: <TheMountain> Person : Jill Person : Jack Place: The Hill friendOf WentUp WentUp metAt=<TheHill> metDate=07/04/2018 Date=07/04/2018 Date=07/04/2018 Today with RDF* and SPARQL* • Relationships can be described as clearly as any LPG database RDF*/SPARQL* extensions to the standard make W3C open standards databases even more capable
  • 27. Page User-defined Extensions (UDXs): Allows users to extend AnzoGraph DB functionality for custom usage User-Defined Functions (UDF) Create and register custom analytic functions, such as functions that concatenate values or convert integers to alternate currencies. User-Defined Aggregates (UDA) Create and register aggregate functions, such as functions that compute the arithmetic mean or calculate the average number from a list of maximum and minimum values. User-Defined Services (UDS) Create and register services that create local SPARQL endpoints. User-Defined Tables (UDT) Create and register a function that is repeatedly invoked within a query to generate the rows of a table on-the-fly. Data Science Functions User- defined Functions (UDX) Functions you can build in JAVA or C++
  • 28. ©2019 Cambridge Semantics Inc. All rights reserved. Execute Inferencing using RDFS+ and OWL 2 RL Person: Jack Person: Jill Is Married Inference Is Married Person: Jack Person: Sam Knows Inference Knows AnzoGraph allows you to insert inferred triples into the specified target graph If Jack is married to Jill, then you can definitely infer that Jill is married to Jack Jack knows Sam, but Sam may not know Jack. Here, the inference is less clear Both cases are supported. Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 29. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering •Wrangling, Blending, Munging, Transformations, Enrichment, Views •Use statistical functions, transformations or enrichment to get the data into the form needed for the downstream ML pipeline INSERT { graph <myNewGraph> { ?s a <Person>; <fullname> ?fullname } } USING <myOldGraph> WHERE { ?s a <Person>; <firstname> ?fname; <lastname> ?lname; BIND(CONCAT(?fname, “ “, ?lname) as ?fullname) }
  • 30. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering Materialized Views – good for heavy calculations - perform once - use many times CREATE MATERIALIZED VIEW <ages> AS CONSTRUCT { ?person <age> ?age . } WHERE { GRAPH <tickit> { { SELECT ?person ((YEAR(?date))-(YEAR(xsd:dateTime(?birthdate))) AS ?age) WHERE { ?person <birthday> ?birthdate . BIND(xsd:dateTime(NOW()) AS ?date) } } } }
  • 31. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering •Enrichment Add new features from federated call using SERVICE call to Linked Open Data Cloud or other internal SPARQL endpoints Example: Look up address and geocodes for company, census population data, crime rate, demographics, etc. All these can be new features to fed into ML pipeline
  • 32. ©2019 Cambridge Semantics Inc. All rights reserved. ML Step #1: Data Prep: Data Discovery and Feature Engineering 2.1 Bernoulli Distribution Determines the probability of Success or Failure (or Yes or No). 2.2 Binomial Distribution Determines the probability of success versus failure. 2.3 Chi-squared Distribution Determines the relationship between two categorical variables. 2.4 Exponential Distribution Determines the probability of event occurrence in time interval when past event number is unknown 2.5 Hypergeometric Distribution Determines the probability of success versus failure of a specific scenario. 2.6 Laplace Distribution Determines the probability of intervals. 2.7 Log Normal Distribution To model certain instances, such as the change in price distribution of a stock or commodity positions. 2.8 Logarithmic Series Distribution Determines the probability of occurrence of events like claim frequencies in insurance companies. 2.9 Negative Binomial Distribution Determines the probability of success versus failure. 2.10 Normal Distribution Model and determines probabilities of all natural and social data. 2.11 Poisson Distribution Determines the probability that a certain number of events will occur in a specific time period. 2.12 Skellam Distribution Determines the probability of two independent variables. 2.13 Beta-binomial Distribution Model number of successes in n binomial trials when probability of success p is a Beta random variable. 2.14 Continuous Uniform Distribution Assigns equal probability to all values between its minimum and maximum. 2.15 Discrete Uniform Distribution Determines the probability of finite number of outcomes equally likely to happen. 2.16 Student’s t-Distribution Determines the probability when sample size is small. 2.17 Weibull Distribution Used to assess product reliability, analyse life data and model failure times.
  • 33. ©2019 Cambridge Semantics Inc. All rights reserved. ML Step #1: Data Discovery and Feature Engineering Correlations 3.1 Pearson Correlation Coefficient Determines the positive, negative or no relationship between two variables. 3.2 Matthews Correlation Coefficient Determines the positive, negative or no relationship between two binary variables (0 & 1). 3.3 Spearman’s Rank Correlation Coefficient Measures the strength of a linear relationship between paired data. 5.1 Principal Component Analysis Reduces the dimensionality of large data sets and making predictive models. 6.1 Geometric Mean Determines the average growth rates. 6.2 Skewness Metric Calculates Pearson’s coefficient of skewness on Numeric Values. 6.3 T-Digest Metric Determines the percentile and quantile values accurately. Feature Exploration Profiling Metrics
  • 34. ©2019 Cambridge Semantics Inc. All rights reserved. Scalability Graph OLAP – Horizontally Scalable Have more data. Need better performance. Add more servers Deploy on VMs or bare metal with a TAR file that is compatible with CentOS Automated deployment in the Cloud. Available in the AWS Marketplace & others soon 60 Day Full Feature Free Trial. Download or Cloud Deployment. Visit booth or AnzoGraph.com
  • 35. Automated Deployment and Operations Storage and Compute Integration MODEL Graph Data Model • Lift Data into Data Fabric • Design Ontologies • Connect Data Models ON-BOARD Ingest & Map • Automated ETL • Collaborative Mapping • Metadata Capture Enterprise Data Sources Machine Learning and AI Enterprise Search “Last Mile” Analytics Tools Metadata Catalog Semantic-based Metadata Management, Governance and Lineage Cloud or On-Prem Data Storage Infrastructure Data Storage Layer Ingest BLEND GraphMarts • Combine and Align Related Data Sets • In-memory MPP OLAP Query Engine • Data Layers ACCESS Hi-Res Analytics • Analyze All Data Together • Fast, Iterative Queries Ad Hoc, What if • Code Free or API Graphical Application Interface Anzo - The Modern Data Discovery and Integration Layer for the Enterprise Data Fabric