SlideShare a Scribd company logo
Graphs, parallelism and business
cases
PHP Meetup #6
Daniel Toader
Who we are
Dan Belibov
/dantdr /belibov
Who we are
eMag is the biggest online shop and marketplace in
Romania and one of the biggest in Hungary, Bulgaria and
Poland.
Our team works on internal Recommendation engine,
that provides relevant products for customers in the
site, mobile applications, emails and showrooms.
TECHNOLOGIES DERIVE FROM BUSINESS
CASES
Usually, when we have a new
business case, our go to approach
is to use a technology stack that
is familiar to us.
Sometimes, taking a risk and
thinking out of the box can lead
to unexpected results.
SOME OF OUR BUSINESS CASES
Business case #1
Secured integration of external recommendation providers
Common approach
What we want: ~200ms - max response time
● Symfony framework
● Guzzle HTTP Client
Result: max response time exceeded due to >120ms - SSL validation
Potential solution: pfsockopen *
* when connection is broken because of physical net failure, pfsockopen() returns handle as if connection was working.
Our final approach
Go is an open source programming
language that makes it easy to build
simple, reliable, and efficient
software.
Our final approach
● Golang compiled tiny webserver and CLI command
● Native shared connections pool usage
● SSL validation is executed once per connection pool
● No external libraries needed
● Connection errors are handled by internal logic
Our final approach
Comparison of response times
Business case #2
Analyze products popularity based on customers actions
Common approach
In order to do this we needed to process more than a million of events per day (like visits,
orders, ratings, favourites, etc.).
Simple:
● Process the event as it happens
● A lot of events generates high load and service unavailability
Better:
● Queues system (ex. Rabbit)
● Each message needed to be fetched, processed and acknowledged
● Full queues can lead to data loss
Our final approach
Apache Kafka® is a distributed streaming
platform. It is used for building real-time
data pipelines and streaming apps. It is
horizontally scalable, fault-tolerant,
wicked fast, and runs in production in
thousands of companies.
Our final approach
● Publish and subscribe to streams of records, similar to a message
queue or enterprise messaging system
● Store streams of records in a fault-tolerant durable way
● Keep stream available for certain time
● You can have unlimited number of stream cursors
Our final approach
● Using Apache Kafka, messages should only be fetched and processed, stream
cursos is moved at read operations.
● Instead of using a lot of processes in PHP, we used less processes in Golang
with goroutines.
● We built a custom connector for Golang, which we want to make public in
future.
Comparison of used resources
720 processes
~100 cores usage
~12 GB RAM usage
31 processes
~20 cores usage
~3 GB RAM usage
Business case #3
Provide profiled recommendations
Common approach
Datas: ~5 mil customers, ~20 mil users, ~5 mil products and >200 mil of relations
Using a relational database:
● Needs associative entity tables - further increase join operation costs
● A lot of updates - generate big load
● Inconsistent data - hard to detect
● Complex queries - require processing power
Our final approach: use graph database
Bulk write 2nd place 2nd place 1st place
Single write 2nd place 3rd place 1st place
Read speed
(single read)
2nd place 2nd place 1st place
Similar query
(graph func)
1st place (100ms - 10s) 3rd place (>25s) 2nd place(20-70s)
DB Size 16 GB 22 GB 22 GB
Our final approach
Neo4j is a graph database
management system and is the
most popular graph database
according to DB-Engines ranking,
and the 22nd most popular
database overall.
Our final approach
Using Neo4J for native graph storage:
● Keeps relations as entities with information attached
● Easy to find dependencies and orphan nodes
● Uses Cypher as query language, Bolt as TCP driver and Java Core API for low-level
graph handling
● Drivers are available for a lot of languages, including PHP
Comparison: relational schema vs graph schema
SELECT rec.*
FROM Customer c
JOIN CustomerVisit cv1 ON c.Id =
cv1.CustomerId
JOIN Product p ON p.Id = cv1.ProductId
JOIN CustomerVisit cv2 ON p.Id =
cv2.ProductId
JOIN Customer cs ON cs.Id =
cv2.CustomerId
JOIN CustomerVisit cv3 ON cs.Id =
cv3.CustomerId
JOIN Product rec ON rec.Id =
cv3.ProductId
WHERE c.Id = x
GROUP BY rec.Id
ORDER BY count(rec.Id) DESC
LIMIT 10
Comparison: relational schema vs graph schema
MATCH
(:Customer{id:x})-[:VISITED]->(:Product)<-[:VISITED]-
(:Customer)-[o:VISITED]->(rec:Product)
WITH rec, count(o) AS freq
ORDER BY freq DESC
LIMIT 10
RETURN rec
Business requirements lead to technologies
You can use new or cool technologies
based on business cases, not only
because you want to
Know when to play safe and when not to
Q&A
Thank you!
Enjoy pizza, beer and
socialization.
/dantdr
/belibov

More Related Content

What's hot

Munching the mongo
Munching the mongoMunching the mongo
Munching the mongo
VulcanMinds
 

What's hot (8)

Firebase Golang Binding
Firebase Golang BindingFirebase Golang Binding
Firebase Golang Binding
 
mule introduction to dataweave
mule   introduction to dataweavemule   introduction to dataweave
mule introduction to dataweave
 
Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
Putting DITA Localization into Practice
Putting DITA Localization into PracticePutting DITA Localization into Practice
Putting DITA Localization into Practice
 
GIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLsGIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLs
 
Apache big data 2016 - Speaking the language of Big Data
Apache big data 2016 - Speaking the language of Big DataApache big data 2016 - Speaking the language of Big Data
Apache big data 2016 - Speaking the language of Big Data
 
Munching the mongo
Munching the mongoMunching the mongo
Munching the mongo
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital One
 

Similar to Graphs, parallelism and business cases

Similar to Graphs, parallelism and business cases (20)

Go at uber
Go at uberGo at uber
Go at uber
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Big data at scrapinghub
Big data at scrapinghubBig data at scrapinghub
Big data at scrapinghub
 
"High-load is at the intersection of DevOps and PHP development",
"High-load is at the intersection of DevOps and PHP development", "High-load is at the intersection of DevOps and PHP development",
"High-load is at the intersection of DevOps and PHP development",
 
Would Mr. Spok choose Open Source
Would Mr. Spok choose Open SourceWould Mr. Spok choose Open Source
Would Mr. Spok choose Open Source
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
IT for HR professionals
IT for HR professionalsIT for HR professionals
IT for HR professionals
 
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to ProductionData Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Architecturing the software stack at a small business
Architecturing the software stack at a small businessArchitecturing the software stack at a small business
Architecturing the software stack at a small business
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
Modern VoIP in modern infrastructures
Modern VoIP in modern infrastructuresModern VoIP in modern infrastructures
Modern VoIP in modern infrastructures
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the Wire
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Recently uploaded (20)

Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 

Graphs, parallelism and business cases

  • 1. Graphs, parallelism and business cases PHP Meetup #6
  • 2. Daniel Toader Who we are Dan Belibov /dantdr /belibov
  • 3. Who we are eMag is the biggest online shop and marketplace in Romania and one of the biggest in Hungary, Bulgaria and Poland. Our team works on internal Recommendation engine, that provides relevant products for customers in the site, mobile applications, emails and showrooms.
  • 4. TECHNOLOGIES DERIVE FROM BUSINESS CASES
  • 5. Usually, when we have a new business case, our go to approach is to use a technology stack that is familiar to us.
  • 6. Sometimes, taking a risk and thinking out of the box can lead to unexpected results.
  • 7. SOME OF OUR BUSINESS CASES
  • 8. Business case #1 Secured integration of external recommendation providers
  • 9. Common approach What we want: ~200ms - max response time ● Symfony framework ● Guzzle HTTP Client Result: max response time exceeded due to >120ms - SSL validation Potential solution: pfsockopen * * when connection is broken because of physical net failure, pfsockopen() returns handle as if connection was working.
  • 10. Our final approach Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
  • 11. Our final approach ● Golang compiled tiny webserver and CLI command ● Native shared connections pool usage ● SSL validation is executed once per connection pool ● No external libraries needed ● Connection errors are handled by internal logic
  • 14. Business case #2 Analyze products popularity based on customers actions
  • 15. Common approach In order to do this we needed to process more than a million of events per day (like visits, orders, ratings, favourites, etc.). Simple: ● Process the event as it happens ● A lot of events generates high load and service unavailability Better: ● Queues system (ex. Rabbit) ● Each message needed to be fetched, processed and acknowledged ● Full queues can lead to data loss
  • 16. Our final approach Apache Kafka® is a distributed streaming platform. It is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
  • 17. Our final approach ● Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system ● Store streams of records in a fault-tolerant durable way ● Keep stream available for certain time ● You can have unlimited number of stream cursors
  • 18. Our final approach ● Using Apache Kafka, messages should only be fetched and processed, stream cursos is moved at read operations. ● Instead of using a lot of processes in PHP, we used less processes in Golang with goroutines. ● We built a custom connector for Golang, which we want to make public in future.
  • 19. Comparison of used resources 720 processes ~100 cores usage ~12 GB RAM usage 31 processes ~20 cores usage ~3 GB RAM usage
  • 20. Business case #3 Provide profiled recommendations
  • 21. Common approach Datas: ~5 mil customers, ~20 mil users, ~5 mil products and >200 mil of relations Using a relational database: ● Needs associative entity tables - further increase join operation costs ● A lot of updates - generate big load ● Inconsistent data - hard to detect ● Complex queries - require processing power
  • 22. Our final approach: use graph database Bulk write 2nd place 2nd place 1st place Single write 2nd place 3rd place 1st place Read speed (single read) 2nd place 2nd place 1st place Similar query (graph func) 1st place (100ms - 10s) 3rd place (>25s) 2nd place(20-70s) DB Size 16 GB 22 GB 22 GB
  • 23. Our final approach Neo4j is a graph database management system and is the most popular graph database according to DB-Engines ranking, and the 22nd most popular database overall.
  • 24. Our final approach Using Neo4J for native graph storage: ● Keeps relations as entities with information attached ● Easy to find dependencies and orphan nodes ● Uses Cypher as query language, Bolt as TCP driver and Java Core API for low-level graph handling ● Drivers are available for a lot of languages, including PHP
  • 25. Comparison: relational schema vs graph schema SELECT rec.* FROM Customer c JOIN CustomerVisit cv1 ON c.Id = cv1.CustomerId JOIN Product p ON p.Id = cv1.ProductId JOIN CustomerVisit cv2 ON p.Id = cv2.ProductId JOIN Customer cs ON cs.Id = cv2.CustomerId JOIN CustomerVisit cv3 ON cs.Id = cv3.CustomerId JOIN Product rec ON rec.Id = cv3.ProductId WHERE c.Id = x GROUP BY rec.Id ORDER BY count(rec.Id) DESC LIMIT 10
  • 26. Comparison: relational schema vs graph schema MATCH (:Customer{id:x})-[:VISITED]->(:Product)<-[:VISITED]- (:Customer)-[o:VISITED]->(rec:Product) WITH rec, count(o) AS freq ORDER BY freq DESC LIMIT 10 RETURN rec
  • 27. Business requirements lead to technologies
  • 28. You can use new or cool technologies based on business cases, not only because you want to
  • 29. Know when to play safe and when not to
  • 30. Q&A
  • 31. Thank you! Enjoy pizza, beer and socialization. /dantdr /belibov