SlideShare a Scribd company logo
1 of 18
USING AMAZON
NEPTUNE TO
BUILD FASHION
KNOWLEDGE
GRAPH
AWS Finland
September Meetup
Helsinki
September 2019
6
FASHION
KNOWLEDGE
GRAPH
7
What is a knowledge graph?
Collection of elementary sentences
Subject Predicate Object
Entity (IRI) Entity (IRI)
or Literal
Named (IRI)
Directed
8
Problem: “Beyonce” was one of our most failing search queries.
Solution: Record the link between Beyonce and Ivy Park in the Knowledge Graph.
Search system can use this information.
Why a Knowledge Graph?
Beyonce isDesignerOf IvyPark
Person Brand
isA
isA
9
10
Technical implementation
11
RDF is used to model the Knowledge graph
An RDF graph is a set of RDF triples
RDF - Resource Description Framework
12
SPARQL - SPARQL Protocol And RDF Query Language
13
SPARQL - Property path
14
SPARQL - Property path
16
The Graph Database
Blazegraph
● Blazegraph is an open source (GPLv2) graph database.
● Its protocol is HTTP-based, no special libraries is needed.
● One of the most complete in terms of SPARQL / RDF support.
● Used by the Wikimedia foundation for their wikidata query service since 2015
, but...
● No replication or cluster support in the open source version.
● The commercial version is not available since 2017...
… because the company that developed Blazegraph is acquired by Amazon
Amazon Neptune
Blazegraph appears as the foundation of Amazon Neptune database:
=
+
★ Replication and the R/O
endpoint to distribute load
among replicas;
★ Redundancy via failover;
★ Cloud backups;
★ Technical support.
+
Fashion Knowledge Graph Deployment
● Neptune endpoint supports no
authentication;
● Let’s introduce a low-level API:
○ Uses Zalando OAuth2 for
authentication and
authorization;
○ Does basic SPARQL
validation.
● High-level APIs may be
deployed in different VPCs.
Neptune is not an easy guy
● There may be bugs! A concurrency problem that
we have reported in mid-2018, has been finally
fixed in the version 296 in May 2019;
● Restoration from backup is VERY SLOW;
● Backups are confined inside AWS - you can’t
download them. We used data dump (using a
SPARQL SELECT) and upload instead for
migration between AWS regions;
● Test environment is rather expensive (instance
types start from r5.large);
● Performance may be really bad even with small
graphs, if your queries are poorly written.
21
No magic, it requires work even if you use the right DB for your task
Issues with complex queries (especially property path)
Performance
Denormalize the data
- Transactions to ensure consistency
- Named graph can help
22
Conclusion
● Amazon Neptune is a good solution for RDF/SPARQL-based
knowledge graph;
● It can scale horizontally for read queries (by increasing the
number of replicas);
● But it’s not magic, you need to work on your queries;
● You still can use Blazegraph as a compatible replacement for
Neptune in tests and development;
● Think over your backup&restore (and data migration) strategy.
Test it works as you expect.
Find out more about our
Culture, People & Jobs:
● ON SOCIAL MEDIA:
Linkedin @Zalando SE
Facebook @ Inside Zalando
Instagram @insidezalando
Twitter @ZalandoTech
● CAREER WEBSITE: jobs.zalando.com
● CORPORATE WEBSITE:
corporate.zalando.com
Feel free to reach us:
Matthieu Guillermin
<matthieu.guillermin@zalando.fi>
Uri Savelchev
<uri.savelchev@zalando.fi>

More Related Content

What's hot

Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Goutam Tadi
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at AirbnbHao Wang
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit
 
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jExtending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jDatabricks
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
Download It
Download ItDownload It
Download Itbutest
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scaleMark Schroering
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayDatabricks
 
FPGA Acceleration of Apache Spark on AWS
FPGA Acceleration of Apache Spark on AWSFPGA Acceleration of Apache Spark on AWS
FPGA Acceleration of Apache Spark on AWSChristoforos Kachris
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceCost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceDatabricks
 
Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comKarin Patenge
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks
 
Distributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as DatabaseDistributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as DatabaseDatabricks
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!Databricks
 
10 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 201910 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 2019JanBask Training
 

What's hot (20)

AI at Scale
AI at ScaleAI at Scale
AI at Scale
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
 
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jExtending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Download It
Download ItDownload It
Download It
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using Ray
 
FPGA Acceleration of Apache Spark on AWS
FPGA Acceleration of Apache Spark on AWSFPGA Acceleration of Apache Spark on AWS
FPGA Acceleration of Apache Spark on AWS
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceCost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark Service
 
Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache Spark
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
Distributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as DatabaseDistributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as Database
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
10 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 201910 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 2019
 

Similar to AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesLinkurious
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogsprateek kumar
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8Janu Jahnavi
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtssiddharth30121
 
Comparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java PlatformComparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java PlatformComputer Science
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB Database
 
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...OpenLink Software
 
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Raoapidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Raoapidays
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewMario Cartia
 

Similar to AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph (20)

GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogs
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Tutorial4
Tutorial4Tutorial4
Tutorial4
 
Comparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java PlatformComparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java Platform
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQL
 
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Spark
SparkSpark
Spark
 
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Raoapidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Apache spark
Apache sparkApache spark
Apache spark
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
 
GraphQL.pptx
GraphQL.pptxGraphQL.pptx
GraphQL.pptx
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

  • 1. USING AMAZON NEPTUNE TO BUILD FASHION KNOWLEDGE GRAPH AWS Finland September Meetup Helsinki September 2019
  • 3. 7 What is a knowledge graph? Collection of elementary sentences Subject Predicate Object Entity (IRI) Entity (IRI) or Literal Named (IRI) Directed
  • 4. 8 Problem: “Beyonce” was one of our most failing search queries. Solution: Record the link between Beyonce and Ivy Park in the Knowledge Graph. Search system can use this information. Why a Knowledge Graph? Beyonce isDesignerOf IvyPark Person Brand isA isA
  • 5. 9
  • 7. 11 RDF is used to model the Knowledge graph An RDF graph is a set of RDF triples RDF - Resource Description Framework
  • 8. 12 SPARQL - SPARQL Protocol And RDF Query Language
  • 12. Blazegraph ● Blazegraph is an open source (GPLv2) graph database. ● Its protocol is HTTP-based, no special libraries is needed. ● One of the most complete in terms of SPARQL / RDF support. ● Used by the Wikimedia foundation for their wikidata query service since 2015 , but... ● No replication or cluster support in the open source version. ● The commercial version is not available since 2017... … because the company that developed Blazegraph is acquired by Amazon
  • 13. Amazon Neptune Blazegraph appears as the foundation of Amazon Neptune database: = + ★ Replication and the R/O endpoint to distribute load among replicas; ★ Redundancy via failover; ★ Cloud backups; ★ Technical support. +
  • 14. Fashion Knowledge Graph Deployment ● Neptune endpoint supports no authentication; ● Let’s introduce a low-level API: ○ Uses Zalando OAuth2 for authentication and authorization; ○ Does basic SPARQL validation. ● High-level APIs may be deployed in different VPCs.
  • 15. Neptune is not an easy guy ● There may be bugs! A concurrency problem that we have reported in mid-2018, has been finally fixed in the version 296 in May 2019; ● Restoration from backup is VERY SLOW; ● Backups are confined inside AWS - you can’t download them. We used data dump (using a SPARQL SELECT) and upload instead for migration between AWS regions; ● Test environment is rather expensive (instance types start from r5.large); ● Performance may be really bad even with small graphs, if your queries are poorly written.
  • 16. 21 No magic, it requires work even if you use the right DB for your task Issues with complex queries (especially property path) Performance Denormalize the data - Transactions to ensure consistency - Named graph can help
  • 17. 22 Conclusion ● Amazon Neptune is a good solution for RDF/SPARQL-based knowledge graph; ● It can scale horizontally for read queries (by increasing the number of replicas); ● But it’s not magic, you need to work on your queries; ● You still can use Blazegraph as a compatible replacement for Neptune in tests and development; ● Think over your backup&restore (and data migration) strategy. Test it works as you expect.
  • 18. Find out more about our Culture, People & Jobs: ● ON SOCIAL MEDIA: Linkedin @Zalando SE Facebook @ Inside Zalando Instagram @insidezalando Twitter @ZalandoTech ● CAREER WEBSITE: jobs.zalando.com ● CORPORATE WEBSITE: corporate.zalando.com Feel free to reach us: Matthieu Guillermin <matthieu.guillermin@zalando.fi> Uri Savelchev <uri.savelchev@zalando.fi>

Editor's Notes

  1. Front End Image
  2. Backend Image
  3. Ontology: represent entities, ideas and events, with all their interdependent properties and relations, according to a system of categories Important point: predicate are also entities and graph is directed
  4. Could be solved by adding a synonym in search index *but* By creating a “semantic” representation we allow other use cases => Common vocabulary that can be used by many different services/teams. Shared identifiers simplify integration between components
  5. This is a part of our current Knowledge Graph (note that it doesn’t include individual products): 560.261 triples
  6. Example using Turtle format IRI for subject, predicate and some objects Literal for comment Using predicate defined internally or not Can be loaded into the graph
  7. Could mention verbally: aggregations (group by, having,...), Subqueries
  8. Could mention verbally: aggregations (group by, having,...), Subqueries Introduce next slide by saying that a lot of DB implements SPARQL and Uri will talk about which one we decided to use
  9. SQL queries become complex for highly connected data Highly connected data can cause a lot of joins Nested Loop joins are the simplest but expensive for connected data Hash Joins joins would require dataset to remain in memory
  10. Wikipedia lists over a dozen implementations of SPARQL, but only few of them are real GraphDBs.
  11. We didn’t use TinkerPop