SlideShare a Scribd company logo
1 of 36
Download to read offline
Nikita Shamgunov, CTO and Co-founder of MemSQL
Spark Summit East | Boston | 9 February 2017
The Fast Path to Building Operational
Applications with Spark
About Me
Nikita Shamgunov
Co-founder and Chief Technology Officer, MemSQL
▪ Every piece of technology is scalable
▪ Analyzing data from hundreds of thousands of
machines
▪ Delivering immense value in real-time
• Real-time code deployment
• Detecting anomalies
• A/B testing results
▪ Fundamentally making the business faster by providing
data at your fingertips
An Insider’s View at Facebook
Imagine scaling a
database on industry
standard hardware.
Need 2x the
performance?
Add 2x the nodes.
▪ About MemSQL
▪ Using MemSQL Spark Connector
▪ Use Cases and Case Studies
▪ Entity Resolution
Today in My Talk
What is MemSQL?
▪ Scalable and elastic
• Petabyte scale
• High Concurrency
• System of record
▪ Real-time
• Operational
▪ Compatible
• ETL
• Business Intelligence
• Kafka
• Spark
MemSQL - Hybrid Cloud Data Warehouse
▪ Deployment
• Managed service in the
Cloud
• On-premises
▪ Community Edition
• Unlimited scale
• Limited high availability
and security features
MemSQL Confidential9
Product or Services Scores
for Operational Data
Warehouse
Critical Capabilities for Data
Warehouse and Data
Management Solutions for
Analytics
Gartner, July 2016
Keeping Pace
On-demand economy Real-Time Data Predictive Analytics
Understanding MemSQL and Spark
Easy Deployment of Real-Time Data Pipelines
▪ High-throughput
distributed
messaging system
▪ In-memory
execution engine
▪ Hybrid Cloud Data
Warehouse
▪ Publish and
subscribe to Kafka
“topics”
▪ High level operators
for procedural and
programmatic
analytics
▪ Full transactions and
complete durability
Amazon Kinesis
Use Spark and Operational Databases Together
Spark Operational Databases
Interface Programmatic Declarative
Execution Environment Job Scheduler SQL Engine and Query Optimizer
Persistent Storage Use another system Built-in
MemSQL Spark 2 Connector
MemSQL Spark Connector Architecture
15
CLUSTERCLUSTER
Spark RDD MemSQL Table(s)
Cluster-wide Parallelization | Bi-Directional
Operationalize Models Built in Spark
Stream and Event Processing
Extend MemSQL Analytics
Live Dashboards and Automated Reports
MemSQL and Spark Use Cases
Operationalize Models Built in Spark
17
Enterprise
Consumption
Data into
Spark
Model Creation Model Persistence
Results Set
CLUSTER
Stream and Event Processing
18
Enterprise
Consumption
Real-Time
Streaming
Data
Data
Transformation
Persistent,
Queryable Format
CLUSTER
Extend MemSQL Analytics
19
Applications,
Data Streams
Interactive
Analytics,
ML
Access to Live
Production Data
CLUSTER
Real-Time Replica
REPLICATED CLUSTER
Live Dashboards and Automated Reports
20
Live
Dashboards
Custom
Reporting
Access to Live
Production Data
SQL Transactions
and Analytics
CLUSTER
MemSQL Spark Connector via Spark Packages
The memsql-spark-connector is now available via Spark Packages:
http://spark-packages.org/
https://spark-packages.org/package/memsql/memsql-spark-connector
You can use it with any Spark command:
> $SPARK_HOME/bin/spark-shell --packages
com.memsql:memsql-connector_2.11:2.0.1
Also available on Maven
http://search.maven.org/#artifactdetails%7Ccom.memsql%7Cmemsql-connector_2.11%7C2.0.1%7Cjar
And the Github repository
https://github.com/memsql/memsql-spark-connector
Customer Spark Case Studies
MemSQL Confidential 23
Reducing delay in “freshness of data” from two hours to 10 minutes
+
https://www.enterprisetech.com/2016/12/09/managing-30b-bid-requests/
TECHNICAL BENEFITS
▪ 10x faster data refresh, from hours to minutes
▪ Run ad-hoc queries on log-level data within seconds
THE MANAGE REAL-TIME ARCHITECTURE
REAL-TIME
ANALYTICS
Real-Time
inputs
MemSQL Confidential25
Goldman Sachs at Kafka Summit April 2016
http://www.confluent.io/kafka-summit-2016-users-real-time-analytics-visualized-with-kafka
Real-Time Analytics Visualized w/ Kafka+Spark+MemSQL+ZoomData
Entity Resolution at Scale
Problem Statement
Employees have many opportunities to take advantage of their insider
knowledge and position of trust within a company. This includes:
▪ Preferential treatment to family or friends
▪ Fraud under someone else’s name
In many cases, proximity is one of the most common traits of those they
proxy their activities through.
MemSQL can quickly process the massive volume of calculations
needed to identify these relationships and iterate on new algorithms.
27
28
Problem Size
Target Group
100,000
Population
50 million
X
=
Comparisons
5 trillion
Parallelize
● filters
● projections
● entity resolution
Distributed, in-memory, massively
parallel processing
From 5 trillion to 50 million
Rank Probabilities
Relationship
Similar entity
Comparisons
Levenshtein
SoundEx
Metaphone
On Email and Name
Geospatial filter
50 meters
Examples for Demo
29
MemSQL Duke (Spark) Results
Rank Probabilities
Relationship
Similar entity
Comparisons
Levenshtein
SoundEx
Metaphone
On Email and Name
Index filter
Last names are equal
MemSQL Duke (Spark) Results
Example 1
Example 2
30
Scalability
Cluster
288 cores → 3 mins runtime
Runtime scales linearly with number
of cores
8 x c4.8xlarge
Want speed? Add cores!
Cluster size: 8 machines, c4.8xlarge, 36 cores, 60 GB
RAM
• 2 leaf nodes per machine, each with 9 partitions
• this gives us ~2 cores per partition in the cluster - one core is
going to be at 100% CPU during the computation, the other is
used for Spark + Duke + Misc
Cluster Size
31
32
Conclusion
▪ Speed in covering massive search space
• In memory (On commodity hardware)
• Parallelization
▪ Scales linearly
▪ Huge value in running all of this natively in MemSQL
▪ Push down the in-memory, proximity filter to each of the
leaves
▪ Leverage indexes
▪ Stream results in parallel to Duke Entity Resolution
How does MemSQL do it?
33
▪ Using Metaphone, SoundEx, and Levenshtein
algorithms to compare first name, last name and email
▪ Duke supports many more comparisons, and makes it
very easy to create new ones
▪ With a training dataset, Duke can use a genetic
algorithm to optimize comparator weights
▪ https://github.com/larsga/Duke
Duke Entity Resolution
34
Demo
www.memsql.com
Thank You

More Related Content

What's hot

The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLSingleStore
 
CTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsCTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast LearningSingleStore
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
 
Driving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsDriving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
 
Real-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeReal-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeSingleStore
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale SingleStore
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksDatabricks
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQLjenjermain
 
Building Software to Scale
Building Software to Scale Building Software to Scale
Building Software to Scale SingleStore
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demoDatabricks
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeSingleStore
 
Scaling Production Machine Learning Pipelines with Databricks
Scaling Production Machine Learning Pipelines with DatabricksScaling Production Machine Learning Pipelines with Databricks
Scaling Production Machine Learning Pipelines with DatabricksDatabricks
 
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLBuilding Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLSingleStore
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusDatabricks
 

What's hot (20)

The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
 
CTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsCTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive Analytics
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
 
Driving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsDriving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive Analytics
 
Real-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeReal-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil Dahlke
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
 
Building Software to Scale
Building Software to Scale Building Software to Scale
Building Software to Scale
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
Scaling Production Machine Learning Pipelines with Databricks
Scaling Production Machine Learning Pipelines with DatabricksScaling Production Machine Learning Pipelines with Databricks
Scaling Production Machine Learning Pipelines with Databricks
 
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLBuilding Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 

Viewers also liked

The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data ArchitectureWei-Chiu Chuang
 
CWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCapgemini
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Cloudera, Inc.
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...confluent
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Cloudera, Inc.
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata StreamingZoomdata
 
빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료ABRC_DATA
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Spark Summit
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessData IQ Argentina
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration storyJoan Viladrosa Riera
 

Viewers also liked (20)

Softnix Security Data Lake
Softnix Security Data Lake Softnix Security Data Lake
Softnix Security Data Lake
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
CWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / Cloudera
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Ibm watson
Ibm watsonIbm watson
Ibm watson
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets

 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
 
Zoomdata
ZoomdataZoomdata
Zoomdata
 
빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for Business
 
Softnix Messaging Server
Softnix Messaging ServerSoftnix Messaging Server
Softnix Messaging Server
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
 

Similar to The Fast Path to Building Operational Applications with Spark

Fighting Fraud with Apache Spark
Fighting Fraud with Apache SparkFighting Fraud with Apache Spark
Fighting Fraud with Apache SparkMiklos Christine
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeSingleStore
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOSQAware GmbH
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Elasticsearch
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkLenovo Data Center
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsSingleStore
 

Similar to The Fast Path to Building Operational Applications with Spark (20)

Fighting Fraud with Apache Spark
Fighting Fraud with Apache SparkFighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
NextGenML
NextGenML NextGenML
NextGenML
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
 

More from SingleStore

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeSingleStore
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemSingleStore
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics SingleStore
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLSingleStore
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQLSingleStore
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsSingleStore
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureSingleStore
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored ProceduresSingleStore
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017SingleStore
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming DataSingleStore
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSingleStore
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondSingleStore
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AISingleStore
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudSingleStore
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataSingleStore
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSingleStore
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleSingleStore
 

More from SingleStore (20)

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 

The Fast Path to Building Operational Applications with Spark

  • 1. Nikita Shamgunov, CTO and Co-founder of MemSQL Spark Summit East | Boston | 9 February 2017 The Fast Path to Building Operational Applications with Spark
  • 2. About Me Nikita Shamgunov Co-founder and Chief Technology Officer, MemSQL
  • 3. ▪ Every piece of technology is scalable ▪ Analyzing data from hundreds of thousands of machines ▪ Delivering immense value in real-time • Real-time code deployment • Detecting anomalies • A/B testing results ▪ Fundamentally making the business faster by providing data at your fingertips An Insider’s View at Facebook
  • 4. Imagine scaling a database on industry standard hardware. Need 2x the performance? Add 2x the nodes.
  • 5.
  • 6. ▪ About MemSQL ▪ Using MemSQL Spark Connector ▪ Use Cases and Case Studies ▪ Entity Resolution Today in My Talk
  • 8. ▪ Scalable and elastic • Petabyte scale • High Concurrency • System of record ▪ Real-time • Operational ▪ Compatible • ETL • Business Intelligence • Kafka • Spark MemSQL - Hybrid Cloud Data Warehouse ▪ Deployment • Managed service in the Cloud • On-premises ▪ Community Edition • Unlimited scale • Limited high availability and security features
  • 9. MemSQL Confidential9 Product or Services Scores for Operational Data Warehouse Critical Capabilities for Data Warehouse and Data Management Solutions for Analytics Gartner, July 2016
  • 10. Keeping Pace On-demand economy Real-Time Data Predictive Analytics
  • 12. Easy Deployment of Real-Time Data Pipelines ▪ High-throughput distributed messaging system ▪ In-memory execution engine ▪ Hybrid Cloud Data Warehouse ▪ Publish and subscribe to Kafka “topics” ▪ High level operators for procedural and programmatic analytics ▪ Full transactions and complete durability Amazon Kinesis
  • 13. Use Spark and Operational Databases Together Spark Operational Databases Interface Programmatic Declarative Execution Environment Job Scheduler SQL Engine and Query Optimizer Persistent Storage Use another system Built-in
  • 14. MemSQL Spark 2 Connector
  • 15. MemSQL Spark Connector Architecture 15 CLUSTERCLUSTER Spark RDD MemSQL Table(s) Cluster-wide Parallelization | Bi-Directional
  • 16. Operationalize Models Built in Spark Stream and Event Processing Extend MemSQL Analytics Live Dashboards and Automated Reports MemSQL and Spark Use Cases
  • 17. Operationalize Models Built in Spark 17 Enterprise Consumption Data into Spark Model Creation Model Persistence Results Set CLUSTER
  • 18. Stream and Event Processing 18 Enterprise Consumption Real-Time Streaming Data Data Transformation Persistent, Queryable Format CLUSTER
  • 19. Extend MemSQL Analytics 19 Applications, Data Streams Interactive Analytics, ML Access to Live Production Data CLUSTER Real-Time Replica REPLICATED CLUSTER
  • 20. Live Dashboards and Automated Reports 20 Live Dashboards Custom Reporting Access to Live Production Data SQL Transactions and Analytics CLUSTER
  • 21. MemSQL Spark Connector via Spark Packages The memsql-spark-connector is now available via Spark Packages: http://spark-packages.org/ https://spark-packages.org/package/memsql/memsql-spark-connector You can use it with any Spark command: > $SPARK_HOME/bin/spark-shell --packages com.memsql:memsql-connector_2.11:2.0.1 Also available on Maven http://search.maven.org/#artifactdetails%7Ccom.memsql%7Cmemsql-connector_2.11%7C2.0.1%7Cjar And the Github repository https://github.com/memsql/memsql-spark-connector
  • 23. MemSQL Confidential 23 Reducing delay in “freshness of data” from two hours to 10 minutes + https://www.enterprisetech.com/2016/12/09/managing-30b-bid-requests/
  • 24. TECHNICAL BENEFITS ▪ 10x faster data refresh, from hours to minutes ▪ Run ad-hoc queries on log-level data within seconds THE MANAGE REAL-TIME ARCHITECTURE REAL-TIME ANALYTICS Real-Time inputs
  • 25. MemSQL Confidential25 Goldman Sachs at Kafka Summit April 2016 http://www.confluent.io/kafka-summit-2016-users-real-time-analytics-visualized-with-kafka Real-Time Analytics Visualized w/ Kafka+Spark+MemSQL+ZoomData
  • 27. Problem Statement Employees have many opportunities to take advantage of their insider knowledge and position of trust within a company. This includes: ▪ Preferential treatment to family or friends ▪ Fraud under someone else’s name In many cases, proximity is one of the most common traits of those they proxy their activities through. MemSQL can quickly process the massive volume of calculations needed to identify these relationships and iterate on new algorithms. 27
  • 28. 28 Problem Size Target Group 100,000 Population 50 million X = Comparisons 5 trillion Parallelize ● filters ● projections ● entity resolution Distributed, in-memory, massively parallel processing From 5 trillion to 50 million
  • 29. Rank Probabilities Relationship Similar entity Comparisons Levenshtein SoundEx Metaphone On Email and Name Geospatial filter 50 meters Examples for Demo 29 MemSQL Duke (Spark) Results Rank Probabilities Relationship Similar entity Comparisons Levenshtein SoundEx Metaphone On Email and Name Index filter Last names are equal MemSQL Duke (Spark) Results Example 1 Example 2
  • 30. 30 Scalability Cluster 288 cores → 3 mins runtime Runtime scales linearly with number of cores 8 x c4.8xlarge Want speed? Add cores!
  • 31. Cluster size: 8 machines, c4.8xlarge, 36 cores, 60 GB RAM • 2 leaf nodes per machine, each with 9 partitions • this gives us ~2 cores per partition in the cluster - one core is going to be at 100% CPU during the computation, the other is used for Spark + Duke + Misc Cluster Size 31
  • 32. 32 Conclusion ▪ Speed in covering massive search space • In memory (On commodity hardware) • Parallelization ▪ Scales linearly ▪ Huge value in running all of this natively in MemSQL
  • 33. ▪ Push down the in-memory, proximity filter to each of the leaves ▪ Leverage indexes ▪ Stream results in parallel to Duke Entity Resolution How does MemSQL do it? 33
  • 34. ▪ Using Metaphone, SoundEx, and Levenshtein algorithms to compare first name, last name and email ▪ Duke supports many more comparisons, and makes it very easy to create new ones ▪ With a training dataset, Duke can use a genetic algorithm to optimize comparator weights ▪ https://github.com/larsga/Duke Duke Entity Resolution 34
  • 35. Demo