AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

•Download as PPTX, PDF•

0 likes•82 views

Uri Savelchev

How we use Amazon Neptune in Zalando to empower our Fashion Store with semantic information.

Software

USING AMAZON
NEPTUNE TO
BUILD FASHION
KNOWLEDGE
GRAPH
AWS Finland
September Meetup
Helsinki
September 2019

7
What is a knowledge graph?
Collection of elementary sentences
Subject Predicate Object
Entity (IRI) Entity (IRI)
or Literal
Named (IRI)
Directed

11
RDF is used to model the Knowledge graph
An RDF graph is a set of RDF triples
RDF - Resource Description Framework

12
SPARQL - SPARQL Protocol And RDF Query Language

Blazegraph
● Blazegraph is an open source (GPLv2) graph database.
● Its protocol is HTTP-based, no special libraries is needed.
● One of the most complete in terms of SPARQL / RDF support.
● Used by the Wikimedia foundation for their wikidata query service since 2015
, but...
● No replication or cluster support in the open source version.
● The commercial version is not available since 2017...
… because the company that developed Blazegraph is acquired by Amazon

Amazon Neptune
Blazegraph appears as the foundation of Amazon Neptune database:
=
+
★ Replication and the R/O
endpoint to distribute load
among replicas;
★ Redundancy via failover;
★ Cloud backups;
★ Technical support.
+

Fashion Knowledge Graph Deployment
● Neptune endpoint supports no
authentication;
● Let’s introduce a low-level API:
○ Uses Zalando OAuth2 for
authentication and
authorization;
○ Does basic SPARQL
validation.
● High-level APIs may be
deployed in different VPCs.

Neptune is not an easy guy
● There may be bugs! A concurrency problem that
we have reported in mid-2018, has been finally
fixed in the version 296 in May 2019;
● Restoration from backup is VERY SLOW;
● Backups are confined inside AWS - you can’t
download them. We used data dump (using a
SPARQL SELECT) and upload instead for
migration between AWS regions;
● Test environment is rather expensive (instance
types start from r5.large);
● Performance may be really bad even with small
graphs, if your queries are poorly written.

21
No magic, it requires work even if you use the right DB for your task
Issues with complex queries (especially property path)
Performance
Denormalize the data
- Transactions to ensure consistency
- Named graph can help

22
Conclusion
● Amazon Neptune is a good solution for RDF/SPARQL-based
knowledge graph;
● It can scale horizontally for read queries (by increasing the
number of replicas);
● But it’s not magic, you need to work on your queries;
● You still can use Blazegraph as a compatible replacement for
Neptune in tests and development;
● Think over your backup&restore (and data migration) strategy.
Test it works as you expect.

Find out more about our
Culture, People & Jobs:
● ON SOCIAL MEDIA:
Linkedin @Zalando SE
Facebook @ Inside Zalando
Instagram @insidezalando
Twitter @ZalandoTech
● CAREER WEBSITE: jobs.zalando.com
● CORPORATE WEBSITE:
corporate.zalando.com
Feel free to reach us:
Matthieu Guillermin
<matthieu.guillermin@zalando.fi>
Uri Savelchev
<uri.savelchev@zalando.fi>

What's hot

AI at ScaleAdi Polak

Greenplum for Kubernetes PGConf india 2019Goutam Tadi

Spark at AirbnbHao Wang

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit

Extending Spark Graph for the Enterprise with Morpheus and Neo4jDatabricks

Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks

Download Itbutest

Processing genetic data at scaleMark Schroering

Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks

Scalable AutoML for Time Series Forecasting using RayDatabricks

FPGA Acceleration of Apache Spark on AWSChristoforos Kachris

Cost Efficiency Strategies for Managed Apache Spark ServiceDatabricks

Asynchronous Hyperparameter Optimization with Apache SparkDatabricks

Graph Analytics on Data from Meetup.comKarin Patenge

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks

Distributed ML/DL with Ignite ML Module Using Apache Spark as DatabaseDatabricks

Understanding and Improving Code GenerationDatabricks

Anomaly Detection at Scale!Databricks

10 big data analytics tools to watch out for in 2019JanBask Training

What's hot (20)

AI at Scale

Greenplum for Kubernetes PGConf india 2019

Spark at Airbnb

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...

Extending Spark Graph for the Enterprise with Morpheus and Neo4j

Applied Machine Learning for Ranking Products in an Ecommerce Setting

Download It

Processing genetic data at scale

Apache Spark Streaming in K8s with ArgoCD & Spark Operator

Scalable AutoML for Time Series Forecasting using Ray

FPGA Acceleration of Apache Spark on AWS

Cost Efficiency Strategies for Managed Apache Spark Service

Asynchronous Hyperparameter Optimization with Apache Spark

Graph Analytics on Data from Meetup.com

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...

Distributed ML/DL with Ignite ML Module Using Apache Spark as Database

Understanding and Improving Code Generation

Anomaly Detection at Scale!

10 big data analytics tools to watch out for in 2019

Similar to AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

GraphTech Ecosystem - part 1: Graph DatabasesLinkurious

spark interview questions & answers acadgild blogsprateek kumar

Spark vs HadoopOlesya Eidam

Apache spark with java 8Janu Jahnavi

Learn about SPARK tool and it's componemtssiddharth30121

Tutorial4ShwetaPolicepatil

Comparative Study That Aims Rdf Processing For The Java PlatformComputer Science

Apache Spark PDFNaresh Rupareliya

ArangoDB – A different approach to NoSQLArangoDB Database

ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...OpenLink Software

spark_v1_2Frank Schroeter

Sparkfatemehjamalii

apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Raoapidays

Scala and sparkJanu Jahnavi

Hadoop vs sparkamarkayam

Apache sparkDona Mary Philip

Started with-apache-sparkHappiest Minds Technologies

Using pySpark with Google Colab & Spark 3.0 previewMario Cartia

GraphQL.pptxPreston Flossy

Similar to AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph (20)

GraphTech Ecosystem - part 1: Graph Databases

spark interview questions & answers acadgild blogs

Spark vs Hadoop

Apache spark with java 8

Learn about SPARK tool and it's componemts

Tutorial4

Comparative Study That Aims Rdf Processing For The Java Platform

Apache Spark PDF

ArangoDB – A different approach to NoSQL

ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...

spark_v1_2

Spark

apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao

Scala and spark

Hadoop vs spark

Apache spark

Started with-apache-spark

Using pySpark with Google Colab & Spark 3.0 preview

GraphQL.pptx

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

What is Fashion PLM and Why Do You Need ItWave PLM

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

chapter--4-software-project-planning.pptkotipi9215

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Asset Management Software - InfographicHr365.us smith

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Professional Resume Template for Software DevelopersVinodh Ram

software engineering Chapter 5 System modeling.pptxnada99848

EY_Graph Database Powered SustainabilityNeo4j

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

Automate your Kamailio Test Calls - Kamailio World 2024

What is Fashion PLM and Why Do You Need It

Der Spagat zwischen BIAS und FAIRNESS (2024)

chapter--4-software-project-planning.ppt

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

The Evolution of Karaoke From Analog to App.pdf

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Asset Management Software - Infographic

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Intelligent Home Wi-Fi Solutions | ThinkPalm

Professional Resume Template for Software Developers

software engineering Chapter 5 System modeling.pptx

EY_Graph Database Powered Sustainability

AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

1. USING AMAZON NEPTUNE TO BUILD FASHION KNOWLEDGE GRAPH AWS Finland September Meetup Helsinki September 2019

2. 6 FASHION KNOWLEDGE GRAPH

3. 7 What is a knowledge graph? Collection of elementary sentences Subject Predicate Object Entity (IRI) Entity (IRI) or Literal Named (IRI) Directed

4. 8 Problem: “Beyonce” was one of our most failing search queries. Solution: Record the link between Beyonce and Ivy Park in the Knowledge Graph. Search system can use this information. Why a Knowledge Graph? Beyonce isDesignerOf IvyPark Person Brand isA isA

5. 9

6. 10 Technical implementation

7. 11 RDF is used to model the Knowledge graph An RDF graph is a set of RDF triples RDF - Resource Description Framework

8. 12 SPARQL - SPARQL Protocol And RDF Query Language

9. 13 SPARQL - Property path

10. 14 SPARQL - Property path

11. 16 The Graph Database

12. Blazegraph ● Blazegraph is an open source (GPLv2) graph database. ● Its protocol is HTTP-based, no special libraries is needed. ● One of the most complete in terms of SPARQL / RDF support. ● Used by the Wikimedia foundation for their wikidata query service since 2015 , but... ● No replication or cluster support in the open source version. ● The commercial version is not available since 2017... … because the company that developed Blazegraph is acquired by Amazon

13. Amazon Neptune Blazegraph appears as the foundation of Amazon Neptune database: = + ★ Replication and the R/O endpoint to distribute load among replicas; ★ Redundancy via failover; ★ Cloud backups; ★ Technical support. +

14. Fashion Knowledge Graph Deployment ● Neptune endpoint supports no authentication; ● Let’s introduce a low-level API: ○ Uses Zalando OAuth2 for authentication and authorization; ○ Does basic SPARQL validation. ● High-level APIs may be deployed in different VPCs.

15. Neptune is not an easy guy ● There may be bugs! A concurrency problem that we have reported in mid-2018, has been finally fixed in the version 296 in May 2019; ● Restoration from backup is VERY SLOW; ● Backups are confined inside AWS - you can’t download them. We used data dump (using a SPARQL SELECT) and upload instead for migration between AWS regions; ● Test environment is rather expensive (instance types start from r5.large); ● Performance may be really bad even with small graphs, if your queries are poorly written.

16. 21 No magic, it requires work even if you use the right DB for your task Issues with complex queries (especially property path) Performance Denormalize the data - Transactions to ensure consistency - Named graph can help

17. 22 Conclusion ● Amazon Neptune is a good solution for RDF/SPARQL-based knowledge graph; ● It can scale horizontally for read queries (by increasing the number of replicas); ● But it’s not magic, you need to work on your queries; ● You still can use Blazegraph as a compatible replacement for Neptune in tests and development; ● Think over your backup&restore (and data migration) strategy. Test it works as you expect.

18. Find out more about our Culture, People & Jobs: ● ON SOCIAL MEDIA: Linkedin @Zalando SE Facebook @ Inside Zalando Instagram @insidezalando Twitter @ZalandoTech ● CAREER WEBSITE: jobs.zalando.com ● CORPORATE WEBSITE: corporate.zalando.com Feel free to reach us: Matthieu Guillermin <matthieu.guillermin@zalando.fi> Uri Savelchev <uri.savelchev@zalando.fi>

Editor's Notes

Front End Image
Backend Image
Ontology: represent entities, ideas and events, with all their interdependent properties and relations, according to a system of categories Important point: predicate are also entities and graph is directed
Could be solved by adding a synonym in search index *but* By creating a “semantic” representation we allow other use cases => Common vocabulary that can be used by many different services/teams. Shared identifiers simplify integration between components
This is a part of our current Knowledge Graph (note that it doesn’t include individual products): 560.261 triples
Example using Turtle format IRI for subject, predicate and some objects Literal for comment Using predicate defined internally or not Can be loaded into the graph
Could mention verbally: aggregations (group by, having,...), Subqueries
Could mention verbally: aggregations (group by, having,...), Subqueries Introduce next slide by saying that a lot of DB implements SPARQL and Uri will talk about which one we decided to use
SQL queries become complex for highly connected data Highly connected data can cause a lot of joins Nested Loop joins are the simplest but expensive for connected data Hash Joins joins would require dataset to remain in memory
Wikipedia lists over a dozen implementations of SPARQL, but only few of them are real GraphDBs.
We didn’t use TinkerPop

AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

Similar to AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph (20)

Recently uploaded

Recently uploaded (20)

AWS Finland September Meetup - Using Amazon Neptune to build Fashion Knowledge Graph

Editor's Notes