SlideShare a Scribd company logo
1 of 29
Neil Dahlke, Engineer
2016 November 4
Real-Time Analytics with
MemSQL and Spark
About Me: Neil Dahlke
 Engineer
 MemSQL
• real-time database for transactions / analytics
 Formerly Globus
• high performance data transfer for research scientists
 Past talks
• Real-time, Geospatial, Maps
 Slides: http://www.slideshare.net/MemSQL/realtime-geospatial-maps-
by-neil-dahlke
WHAT WE
ARE SEEING
A WORLD OF CONNECTED
MACHINES AND PEOPLE
WHAT WE ARE SEEING:
Sensors. Applications. Machines. And us.
Generating more data every single day.
By 2020, over 20 billion connected things will
be in use across a range of industries.
REAL-TIME
INPUTS
LIVE
OUTPUTS
Sensors
Logs
Events
Streaming
Inserts
Upserts
Queries
Dashboards
Business
Intelligence
Applications
Predict Analytics
WHAT DO REAL TIME BUSINESSES NEED?
FAST DATA
INGEST
The volume of data
that can be ingested
into the database
WHAT DO REAL TIME BUSINESSES NEED?
LOW LATENCY
QUERIES
The time it takes to
execute queries and
receive results
WHAT DO REAL TIME BUSINESSES NEED?
HIGH
CONCURRENCY
The ability to scale
simultaneous operations
WHAT DO REAL TIME BUSINESSES NEED?
FAST DATA
INGEST
The volume of data
that can be ingested
into the database
LOW LATENCY
QUERIES
The time it takes to
execute queries and
receive results
HIGH
CONCURRENCY
The ability to scale
simultaneous operations
REAL-TIME
INPUTS
LIVE
OUTPUTS
Sensors
Logs
Events
Streaming
Inserts
Upserts
Queries
Dashboards
Business
Intelligence
Applications
Predict Analytics
A massively scalable database and ingest solution allowed for
massive growth, real-time analytic applications and faster, targeted.
+
 Kafka
• Component we kept
 S3
• Persisted all logs to cold storage for eventual analysis
 Hadoop
• Nighly map-reduce jobs
 Redshift
• Took a full day to load data from previous day
• Reaching overlap of times caused data crisis
Before
 No real time access to analytics
 No SQL interface for analysts and data scientists
 Massive nightly Hadoop batch jobs (late data)
 Unfiltered and incomplete data (silos)
 Expensive
Why was this bad for their business operations?
Why was this bad for their data operations?
 Too slow
 Not scalable
 No deduplication
• aka not exactly-once
 Low concurrency
FAST DATA
INGEST
LOW
LATENCY
QUERIES
HIGH
CONCURRENCY
How It Works Now
After
TECHNICAL BENEFITS
 Instant accuracy to the latest re-pin
 1 GB/sec totaling 72 TB/day
THE PINTEREST REAL-TIME ARCHITECTURE
REAL-TIME
ANALYTICS
Accelerated ingest
time by 200,000x
1 GB/sec totaling
72 TB/day
RESULTS
Visualizing The Data
23
Visualizing the Data
 Demo built using
• Mapbox
• Websockets
• Tornado web server
 When an image is re pinned, the circles on the globe
expand, showing higher volume areas
 Reads data from MemSQL directly
24
DEMO
25
Questions?
More Info
 http://www.odbms.org/blog/2015/04/powering-big-data-at-
pinterest-interview-with-krishna-gade/
 https://gigaom.com/2015/02/18/pinterest-is-
experimenting-with-memsql-for-real-time-data-analytics/
 https://www.infoq.com/news/2015/03/pinterest-memsql-
spark-streaming
 http://blog.memsql.com/pinterest-apache-spark-use-case/
 https://engineering.pinterest.com/blog/real-time-analytics-
pinterest
Resources
 https://github.com/memsql/memsql-spark-connector
 http://docs.memsql.com/docs/streamliner-administration
 http://docs.memsql.com/docs/pipelines-overview
 https://github.com/memsql/memsql-docker-quickstart
Thank You

More Related Content

What's hot

What's hot (20)

Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
Driving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsDriving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive Analytics
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time Analytics
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Real-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeReal-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil Dahlke
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine Learning
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
 
Building Software to Scale
Building Software to Scale Building Software to Scale
Building Software to Scale
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
 

Viewers also liked

Streaming Analytics and Cognitive Computing - Changing the Game
Streaming Analytics and Cognitive Computing - Changing the GameStreaming Analytics and Cognitive Computing - Changing the Game
Streaming Analytics and Cognitive Computing - Changing the Game
Jason Bloomberg
 
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
Bringing olap fully online  analyze changing datasets in mem sql and spark wi...Bringing olap fully online  analyze changing datasets in mem sql and spark wi...
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
SingleStore
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
SingleStore
 

Viewers also liked (14)

Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLBuilding Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
 
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
 
MemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics PlatformMemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics Platform
 
Streaming Analytics and Cognitive Computing - Changing the Game
Streaming Analytics and Cognitive Computing - Changing the GameStreaming Analytics and Cognitive Computing - Changing the Game
Streaming Analytics and Cognitive Computing - Changing the Game
 
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
Bringing olap fully online  analyze changing datasets in mem sql and spark wi...Bringing olap fully online  analyze changing datasets in mem sql and spark wi...
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
 
Introducing MemSQL 4
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4
 
MemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur GoyalMemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur Goyal
 
In-Memory Database System Built for Speed and Scale
In-Memory Database System Built for Speed and ScaleIn-Memory Database System Built for Speed and Scale
In-Memory Database System Built for Speed and Scale
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
 
The Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQLThe Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQL
 
INTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINE
 
MemSQL
MemSQLMemSQL
MemSQL
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
In-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 InstancesIn-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 Instances
 

Similar to Real-Time Analytics with MemSQL and Spark

Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
GurinderG
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 

Similar to Real-Time Analytics with MemSQL and Spark (20)

Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Unlock the value of your big data infrastructure
Unlock the value of your big data infrastructureUnlock the value of your big data infrastructure
Unlock the value of your big data infrastructure
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 

More from SingleStore

More from SingleStore (20)

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Real-Time Analytics with MemSQL and Spark

Editor's Notes

  1. - Distributed In-Memory Database - Built for real-time analytics and transactions Familiar SQL Interface Spark integration out-of-the-box - Native Kafka Ingestion What did they want to do? - highly scalable infrastructure that collects, stores and processes user engagement data in real-time higher performance event logging Reliable log transport and storage ability to query real-time data
  2. user clicks Pin or repin event is pushed to Apache Kafka Storm, Spark and other custom built log readers process these events in real-time log persistence service called Secor that reliably writes these events to Amazon S3 (zero data loss, overcoming its weak eventual consistency model).  self-serve big data platform loads the data from S3 into many different Hadoop clusters for batch processing In house tools Singer (logger) & Secor (replicator) asynchronously replicating local logs from app servers to centralized S3 location using Kafka for transport Kafka was great for throughput, but needed a way to derive value, e.g. run SQL against these datasets in real time A few days later this data would hit Redshift and be queryable
  3. - took several days to access analytics and make available to data science team (too late, A/B testing, advertising) - no SQL Interface - 5.5 M rows / second for one topic, 1.7 M rows / second for another, with the lowest throughput being 132k rows / second - data needs to be filtered as well as enriched - At LEAST once semantics
  4. user clicks Pin or repin event is pushed to Apache Kafka Storm, Spark and other custom built log readers process these events in real-time log persistence service called Secor that reliably writes these events to Amazon S3 (zero data loss, overcoming its weak eventual consistency model).  self-serve big data platform loads the data from S3 into many different Hadoop clusters for batch processing In house tools Singer (logger) & Secor (replicator) asynchronously replicating local logs from app servers to centralized S3 location using Kafka for transport Kafka was great for throughput, but needed a way to derive value, e.g. run SQL against these datasets in real time A few days later this data would hit Redshift and be queryable
  5. Goes both ways
  6. easily repeatable success days to seconds now has a source of record for sharing relevant user engagement data and metrics their data analyst and with key brands Pinterest and their partners can get a better understanding of user behavior and provide more value to the Pinner community Cheaper the ability to identify (and react to) developing trends as they happen provides insight into how users are engaging with Pins across the globe in real-time helps Pinterest become a better recommendation engine- SQL interface for engineering and data science teams fast ad-hoc query execution on real-time data to allow the execution of SQL queries on the real-time events as they arrive
  7. easily repeatable success days to seconds now has a source of record for sharing relevant user engagement data and metrics their data analyst and with key brands Pinterest and their partners can get a better understanding of user behavior and provide more value to the Pinner community Cheaper the ability to identify (and react to) developing trends as they happen provides insight into how users are engaging with Pins across the globe in real-time helps Pinterest become a better recommendation engine- SQL interface for engineering and data science teams fast ad-hoc query execution on real-time data to allow the execution of SQL queries on the real-time events as they arrive
  8. Pull up Ops Pull up a terminal and create the database Deploy Spark Create a Streamliner pipeline Create a Pipeline pipeline Expose the UI Ad-Hoc queries, Tableau, and custom reporting