Getting It Right Exactly Once: Principles for Streaming Architectures

SingleStore
SingleStoreSingleStore
Getting It Right Exactly Once:
Principles for Streaming Architectures
Darryl Smith, Chief Data Platform Architect and Distinguished Engineer, Dell Technologies
September 2016 | Strata+Hadoop World, NY
2
Getting Started
 I’m Darryl Smith
• Chief Data Platform Architect
and Distinguished Engineer
Dell Technologies
 Agenda
• Real-Time And The Need For Streaming
• Adding Real-Time And Streaming To The Data Lake
• Results, Plans, Lessons Learned
• Demonstration
3
Trickle, Flood, or Torrent…
Streaming is about
continuous data motion,
more than speed
or volume
4
The Conversation Around Streaming
Website and Mobile
Application Logs
Internet of Things
Sensors
The Enterprise Reality
5
Batch > Real-Time > Streaming
Enterprise Opportunities
Immediate Business Advantage
Website and Mobile
Application Logs
Internet of Things
Sensors
6
The Enterprise Streaming Play
Moving from batch to real-time streams
avoids surges, normalizes compute,
and drives value
7
Real time and the need for streaming
8
Drive DellEMC towards a
Predictive Enterprise via
intelligent data driving agility,
increasing revenue and
productivity resulting in a
competitive advantage
Analytics Vision
9
 Need to use new data for
competitive advantage
• Volume, Variety and Velocity
 Leverage near real time and
streaming data sets to
optimize predictions
• Make faster, better decisions
 Cost-effectively scale to
improve query and load
performance
 Put the data in the hands of
the business
Becoming An Analytical Enterprise
DRIVE
COMPETITIVE
ADVANTAGE
COST-
EFFECTIVELY
SCALE
DATA ACCESS
BY BUSINESS
NEAR
REAL-TIME
ANALYTICS
10
Problem Statement
Teams do not have access
to maintenance renewal
quotes in the timeframes
or the degree of quality
which they need for Tech
Refresh and Renewal
sales.
Desired Outcome
Implement a cost-effective,
real-time solution that
improves productivity
and gives confidence to
produce desired outcomes
efficiently.
Scoping The Business Objectives
11
Business Drivers
CURRENT REALITY
VISION FOR THE
FUTURE
TO REALIZE
THIS VISION:
IMPLEMENT
CALM
SOLUTION
PHASES AND
OPTIMZE
BUSINESS
PROCESSES
HIGH TOUCH
TACTICAL EXECUTION
LOW TOUCH SELF
SERVICE
DATE DRIVEN
PROCESSES
BUSINESS VALUE
DRIVEN PROCESSES
INEFFICENCIES &
LOST PRODUCTITY
INCREASED
PRODUCTIVITY
SILOED DATA /
LIMITED VIEWS
SINGLE VIEW OF
DATA/DATA SCORING
VARIABLE DATA
QUALITY
DATA QUALITY &
CONFIDENCE
12
The Need for “CALM”
Customer Asset Lifecycle Management
For
enterprise sales
Who need
accurate and timely customer information
CALM is a
real-time application
Providing
up to the moment customer 360 dashboards
For enterprise sales
Who need accurate and timely customer information
CALM is a real-time application
Providing up to the moment customer 360
o
dashboards
Install Base
Pricing
Device Config
Contacts
Contracts
Analytics Contracts
Component
Data
Offers
Scorecard
13
Data Lake Architecture
D A T A P L A T F O R M
V M W A R E V C L O U D S U I T E
E X E C U T I O N
P R O C E S S GREENPLUM DBSPRING XD PIVOTAL HD
Gemfire
H A D O O P
INGESTION
DATAGOVERNANCE
Cassandra PostgreSQL MemSQL
HDFS ON ISILON
HADOOP ON SCALEIO
VCE VBLOCK/VxRACK | XTREMIO | DATA DOMAIN
A N A L Y T I C S
T O O L B O X
Network WebSensor SupplierSocial Media Market
S T R U C T U R E DU N S T R U C T U R E D
CRM PLMERP
APPLICATIONS
ApacheRangerAttivioCollibra
Real-TimeMicro-BatchBatch
14
Data Ingestion
• Small to Big Data (high-throughput)
• Structured and unstructured Data from any Source
• Streams and Batches
• Secure, multi-tenant, configurable Framework
Real-Time Analytics
• Tap into streams for in-memory Analytics
• Real Time Data insights and decisions
Services
• Data Ingestion to Data Lake
• Data Lake APIs
• Data Alerting
Business Data Lake Offerings
Unstructured
Structured
15
Adding Real Time and Streaming
to the Data Lake
16
Seeking A Fast Database
A compliment to the business data lake
O P C M
HammerDB Platform Benchmarks
HammerDB workloads testing was done following EMC’s Oracle and SQL Server
DBA Teams standard practices.
 Definition of workload. Mix of 5 transactions as follows:
• New order: receive a new order from a customer: 45%
• Payment: update the customer balance to record a payment: 43%
• Delivery: deliver orders asynchronously: 4%
• Order status: retrieve the status of customer’s most recent order: 4%
• Stock level: return the status of the warehouse’s inventory: 4%
 Testing scenario:
• 100 warehouses 8 vUsers. Database creation and initial data loading.
• Timed testing. 20 minutes per each testing session.
• Scaled number of virtual users for each testing session from 1 until 44.
 No changes done to the systems and databases configuration while running the
test.
HammerDB Workload Testing
 Each test was 16 vCPU x 32 GB RAM
• RedHat 6.4
• Oracle 11g R2
• Windows Core 2012 R2
• SQL Server 2012 Ent Ed.
• RedHat 6.4
• PostgreSQL 9.3.3
HammerDB Workload - Results
Results
Query PostgreSQL MemSQL
Opportunity(5K) 5 seconds 200ms
Sales Order(170K) 1-1.5 Minutes 6 seconds
Territory(60K) 60 seconds 5 seconds
PostgreSQL vs In-Memory DB
We picked 5 top queries run by different business functions.
Presented here are 3 queries that had response times that did not meet the SLA.
21
Business Data Lake – Ingestion to Fulfillment
Raw Data
Summary
Data
DATAGOVERNOR
Consumers
Predictive/
Prescriptive
Analytics
Processed
Data
Analytical Data
GREENPLUM DATABASE
HADOOP
RAW
Data
INGEST
MANAGER
SPRING XD
SPARK
SQOOP
Execution Tier
CASSANDRAGEMFIRE
MEMSQL POSTGRESQL
Real-Time
Tap
22
Here Are The Data Flows We Built
Low Velocity
Batch
Real-Time
23
Data Flow Patterns – Low Velocity
Analytical [BATCH]
Ingestion
Data
Service
JDBC
Application
Presentation [SPEED/SERVING]
GREENPLUM
DATABASE
PIVOTAL HD
POSTGRESQL
MEMSQL
Raw
Data
One-Time
CASSANDRA
GEMFIRE
Analytical [BATCH]
Ingestion
Data
Service
JDBC
Application
GREENPLUM
DATABASE
PIVOTAL HD
24
Data Flow Patterns – Batch
Batch
Presentation [SPEED/SERVING]
POSTGRESQL
MEMSQL CASSANDRA
GEMFIRE
25
Data Flow Patterns – Real Time
Real-time
Initial Load
Analytical [BATCH]
Ingestion
Data
Service
JDBC
Application
GREENPLUM
DATABASE
PIVOTAL HD
Presentation [SPEED/SERVING]
POSTGRESQL
MEMSQL CASSANDRA
GEMFIRE
26
Nothing Closer To Real Time Than Streaming
 Let’s look at the leading edge
 Apache Kafka
 Messaging Semantics
• At most once
• At least once
• Exactly once
27
At most once
000
?
01 02 03 04
28
At least once
01 02 03 04
000
?
29
Exactly Once
000
01 02 03 04
01
30
Understanding Streaming Semantics
At most once At least once Exactly once
Message pulled once Message pulled one or
more times;
processed each time
Message pulled one or
more times;
processed once
May or may not be
received
Receipt guaranteed Receipt guaranteed
No duplicates Likely duplicates No duplicates
Possible missing data No missing data No missing data
000
? 000000
?
01
01
01
31
Rendering In Real Time
 Picking the right business intelligence layer
• Tableau
• Custom Application (CF, D3, Docker)
• Additional Third Party Solutions
32
Results, Plans, Lessons Learned
33
Business Benefits
DATA QUERYING
Down from 4 hours per quarter
to less than 1 minute per year
SIMPLIFIED
PROVISIONING
Reduced number of tables/report
required
DATA
GOVERNANCE
Provides one version of
the truth
TIME TO MARKET
Reduced number of tables/report
required
TOOL
AGNOSTIC
Business logic in the DB not
the tool provides increased
flexibility
34
Use Case: Customer Account Profile
 STREAMLINED analytics ENVIRONMENT TO GAIN A HOLISTIC CUSTOMER VIEW
Service Request
Contracts
Installed Base
Bookings
Billings
EMC DATA
LAKE
BDL
SERVICES
DATA
WORKSPACES
DATA INGESTION
Prof Services
23 BUSINESS MANAGED WORKSPACES
35
Customer Asset Lifecycle Management
Platform Roadmap
Phase 1 : Foundational
Capabilities/Discovery
Phase 2 : Scale Platform /
Automate
Future Phases : Global Standard tool
Integrations , advanced Analytics
BAaaS/Tableau
Scalable
Platform
Integrated
Platform
GBS
Renewals
Inside
Sales
Additional
Business groups
Oct 2015 2016 TBDAug 2015
BDL Platform
Enablement CollaborationAcceleration
In-Memory Capabilities
(POC)
We are here
36
Data Services Roadmap
Security
Planned integration into
custom BDL security API for
managing Role Based Access
Control (RBAC) to the
underlying data
Business Data Lake Plans
37
Lessons Learned – Key Takeaways
EDUCATE ASSESS INFRASTRUCTURE JOURNEY
Educate the
business
Use examples of
business impact
Assess in-house
big data skills
Ensure plan to
support the
organization for 3-
5 years
Choose the best
possible infrastructure
Make sure your Big
Data technology
platform can evolve
Remember it is a
journey
Look for small wins
as well as big wins.
38
Lessons Learned: Analytics and Data
Sourcing the right skills, working with a different philosophy,
and some new tools will help you meet your analytical goals
TRANSFORM YOUR
PEOPLE
CHANGE YOUR
PROCESSES
ADAPT YOUR
TECHNOLOGY
 Data science in the
organization, IT or both?
 Helping business units
take initiative
 New philosophy to
running analytics projects
 How and when to share
data
 Steadily refine toolsets
based on needed analysis
 Identify to infrastructure
layers
39
Demonstration
40
Demo Agenda
Showcase exactly-once semantics from Kafka
1: Data set of 200,000 transactions summing to zero
2: CREATE TABE AND CREATE PIPELINE
3: Push to Kafka and confirm exactly-once
4: Validate Resiliency and confirm exactly-once
Step 1: Data Source
 start with a data set of 200,000 transactions representing
money/goods that sum to zero
 200,000 transactions
• Transaction number
• Increase / Decrease
• Amount
Step 2: CREATE TABLE AND CREATE PIPELINE
 create a table and pipeline in MemSQL that subscribes to
that Kafka topic
CREATE TABLE
CREATE PIPELINE
Step 3: Push to Kafka
 Push that data set to Kafka
 Validate exactly-once delivery by querying MemSQL
• show tables;
• show pipelines;
• select sum(amount) from transactions;
 Should be 0 in the demo
• select count(*) from transactions;
 Should be 200,000 in the demo
46
Step 4: Resiliency
 induce a failures to show resiliency during exactly-once
workflows
a. randomly_fail_batches.py
b. restart Kafka and show error count
c. continue and validate exactly-once semantics
48
Errors
Total
Transactions
Sum
The mission is clear:
We’re moving
from batch to real-time
with streaming
Thank You
Darryl Smith
Chief Data Platform Architect and Distinguished Engineer
Dell Technologies
1 of 51

Recommended

In-Memory Computing Webcast. Market Predictions 2017 by
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
2K views33 slides
Real-Time, Geospatial, Maps by Neil Dahlke by
Real-Time, Geospatial, Maps by Neil DahlkeReal-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeSingleStore
1.4K views37 slides
Real-Time Geospatial Intelligence at Scale by
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale SingleStore
489 views26 slides
CTO View: Driving the On-Demand Economy with Predictive Analytics by
CTO View: Driving the On-Demand Economy with Predictive AnalyticsCTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
361 views31 slides
Driving the On-Demand Economy with Spark and Predictive Analytics by
Driving the On-Demand Economy with Spark and Predictive AnalyticsDriving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
736 views22 slides
Real-Time Analytics with Confluent and MemSQL by
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
11.4K views37 slides

More Related Content

What's hot

Modeling the Smart and Connected City of the Future with Kafka and Spark by
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkSingleStore
2K views69 slides
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics by
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore
661 views41 slides
Building an IoT Kafka Pipeline in Under 5 Minutes by
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesSingleStore
468 views33 slides
O'Reilly Media Webcast: Building Real-Time Data Pipelines by
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesSingleStore
1.9K views57 slides
Best Practices for Supercharging Cloud Analytics on Amazon Redshift by
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
9.3K views33 slides
Real-Time Analytics with Spark and MemSQL by
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLSingleStore
774 views41 slides

What's hot(20)

Modeling the Smart and Connected City of the Future with Kafka and Spark by SingleStore
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
SingleStore2K views
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics by SingleStore
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
SingleStore661 views
Building an IoT Kafka Pipeline in Under 5 Minutes by SingleStore
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
SingleStore468 views
O'Reilly Media Webcast: Building Real-Time Data Pipelines by SingleStore
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
SingleStore1.9K views
Best Practices for Supercharging Cloud Analytics on Amazon Redshift by SnapLogic
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic9.3K views
Real-Time Analytics with Spark and MemSQL by SingleStore
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
SingleStore774 views
Five ways database modernization simplifies your data life by SingleStore
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore2.4K views
Driving the On-Demand Economy with Predictive Analytics by SingleStore
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
SingleStore367 views
Internet of Things and Multi-model Data Infrastructure by SingleStore
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
SingleStore1.9K views
See who is using MemSQL by jenjermain
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
jenjermain1.9K views
Machines and the Magic of Fast Learning by SingleStore
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
SingleStore455 views
Building the Next-gen Digital Meter Platform for Fluvius by Databricks
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
Databricks228 views
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL by SingleStore
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLBuilding Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
SingleStore1.7K views
Enabling Real-Time Analytics for IoT by SingleStore
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
SingleStore643 views
The Fast Path to Building Operational Applications with Spark by SingleStore
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
SingleStore849 views
Webinar: BI in the Sky - The New Rules of Cloud Analytics by SnapLogic
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
SnapLogic2.7K views
The evolution of the big data platform @ Netflix (OSCON 2015) by Eva Tse
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse51.2K views
Spark Summit East Keynote by Anjul Bhambhri by Jen Aman
Spark Summit East Keynote by Anjul BhambhriSpark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul Bhambhri
Jen Aman208 views
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah by Databricks
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Databricks417 views

Viewers also liked

Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark by
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkSingleStore
2.3K views23 slides
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising by
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
1.2K views20 slides
ION performance brief hp dl980-8b by
ION performance brief   hp dl980-8bION performance brief   hp dl980-8b
ION performance brief hp dl980-8bLouis liu
1.6K views73 slides
Huawei SAPPHIRE presentation on KunLun 32-socket server by
Huawei SAPPHIRE presentation on KunLun 32-socket serverHuawei SAPPHIRE presentation on KunLun 32-socket server
Huawei SAPPHIRE presentation on KunLun 32-socket serverMike Nelson
1.8K views15 slides
Introducing MemSQL 4 by
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4SingleStore
1.5K views15 slides
MemSQL DB Class, Ankur Goyal by
MemSQL DB Class, Ankur GoyalMemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur GoyalSingleStore
1.7K views174 slides

Viewers also liked(19)

Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark by SingleStore
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
SingleStore2.3K views
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising by SingleStore
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
SingleStore1.2K views
ION performance brief hp dl980-8b by Louis liu
ION performance brief   hp dl980-8bION performance brief   hp dl980-8b
ION performance brief hp dl980-8b
Louis liu1.6K views
Huawei SAPPHIRE presentation on KunLun 32-socket server by Mike Nelson
Huawei SAPPHIRE presentation on KunLun 32-socket serverHuawei SAPPHIRE presentation on KunLun 32-socket server
Huawei SAPPHIRE presentation on KunLun 32-socket server
Mike Nelson1.8K views
Introducing MemSQL 4 by SingleStore
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4
SingleStore1.5K views
MemSQL DB Class, Ankur Goyal by SingleStore
MemSQL DB Class, Ankur GoyalMemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur Goyal
SingleStore1.7K views
Spark and the Enterprise by Tony Baer by Spark Summit
Spark and the Enterprise by Tony BaerSpark and the Enterprise by Tony Baer
Spark and the Enterprise by Tony Baer
Spark Summit1.3K views
MemSQL - The Real-time Analytics Platform by SingleStore
MemSQL - The Real-time Analytics PlatformMemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics Platform
SingleStore2.2K views
In-Memory Database System Built for Speed and Scale by SingleStore
In-Memory Database System Built for Speed and ScaleIn-Memory Database System Built for Speed and Scale
In-Memory Database System Built for Speed and Scale
SingleStore1.3K views
Elevating customer analytics - how to gain a 720 degree view of your customer by Actian Corporation
Elevating customer analytics - how to gain a 720 degree view of your customerElevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customer
Actian Corporation2.9K views
The Road To RAM - Carlos Bueno, MemSQL by SingleStore
The Road To RAM - Carlos Bueno, MemSQLThe Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQL
SingleStore2.3K views
INTRODUCING: CREATE PIPELINE by SingleStore
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINE
SingleStore2.4K views
Journey to the Real-Time Analytics in Extreme Growth by SingleStore
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
SingleStore2.5K views
In-Memory Database Performance on AWS M4 Instances by SingleStore
In-Memory Database Performance on AWS M4 InstancesIn-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 Instances
SingleStore2K views
The Magic of Tuning in PostgreSQL by Ashnikbiz
The Magic of Tuning in PostgreSQLThe Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQL
Ashnikbiz8.4K views
Virtual san hardware guidance & best practices by solarisyougood
Virtual san hardware guidance & best practicesVirtual san hardware guidance & best practices
Virtual san hardware guidance & best practices
solarisyougood2K views
Lambda at Weather Scale by Robbie Strickland by Spark Summit
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie Strickland
Spark Summit2K views
Spark Summit Keynote with Ken Tsai by Spark Summit
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken Tsai
Spark Summit1.8K views

Similar to Getting It Right Exactly Once: Principles for Streaming Architectures

AWS Webcast - Informatica - Big Data Solutions Showcase by
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
3.3K views51 slides
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ... by
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
270 views15 slides
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift by
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
2.7K views32 slides
The Hidden Value of Hadoop Migration by
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
612 views23 slides
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics... by
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA
119 views20 slides
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ... by
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
971 views44 slides

Similar to Getting It Right Exactly Once: Principles for Streaming Architectures(20)

AWS Webcast - Informatica - Big Data Solutions Showcase by Amazon Web Services
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
Amazon Web Services3.3K views
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ... by Precisely
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely270 views
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift by Amazon Web Services
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
Amazon Web Services2.7K views
The Hidden Value of Hadoop Migration by Databricks
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks612 views
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics... by Data Con LA
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA119 views
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ... by Deepak Chandramouli
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee... by HostedbyConfluent
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
HostedbyConfluent378 views
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod... by Hortonworks
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks2.7K views
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스 by Amazon Web Services Korea
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
Achieving Business Value by Fusing Hadoop and Corporate Data by Inside Analysis
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
Inside Analysis1.6K views
Slides: Success Stories for Data-to-Cloud by DATAVERSITY
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
DATAVERSITY564 views
Your Roadmap for An Enterprise Graph Strategy by Neo4j
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j180 views
Relevance of time series databases & druid.io by Muniraju V
Relevance of time series databases & druid.ioRelevance of time series databases & druid.io
Relevance of time series databases & druid.io
Muniraju V111 views
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr... by Precisely
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
Precisely544 views
Data Warehouse Optimization by Cloudera, Inc.
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.5.6K views
Digital Business Transformation in the Streaming Era by Attunity
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
Attunity520 views
Igniting Audience Measurement at Time Warner Cable by Tim Case
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner Cable
Tim Case1K views

More from SingleStore

How Kafka and Modern Databases Benefit Apps and Analytics by
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore
373 views49 slides
Architecting Data in the AWS Ecosystem by
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemSingleStore
6K views44 slides
Building the Foundation for a Latency-Free Life by
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeSingleStore
520 views26 slides
Converging Database Transactions and Analytics by
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics SingleStore
3.8K views36 slides
Building a Machine Learning Recommendation Engine in SQL by
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLSingleStore
2.2K views68 slides
MemSQL 201: Advanced Tips and Tricks Webcast by
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
2.1K views41 slides

More from SingleStore(20)

How Kafka and Modern Databases Benefit Apps and Analytics by SingleStore
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore373 views
Architecting Data in the AWS Ecosystem by SingleStore
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
SingleStore6K views
Building the Foundation for a Latency-Free Life by SingleStore
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore520 views
Converging Database Transactions and Analytics by SingleStore
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
SingleStore3.8K views
Building a Machine Learning Recommendation Engine in SQL by SingleStore
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
SingleStore2.2K views
MemSQL 201: Advanced Tips and Tricks Webcast by SingleStore
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore2.1K views
Introduction to MemSQL by SingleStore
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
SingleStore1.6K views
An Engineering Approach to Database Evaluations by SingleStore
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
SingleStore1.1K views
Building a Fault Tolerant Distributed Architecture by SingleStore
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
SingleStore834 views
Stream Processing with Pipelines and Stored Procedures by SingleStore
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
SingleStore900 views
Curriculum Associates Strata NYC 2017 by SingleStore
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
SingleStore1.8K views
Image Recognition on Streaming Data by SingleStore
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
SingleStore605 views
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition by SingleStore
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
SingleStore843 views
The State of the Data Warehouse in 2017 and Beyond by SingleStore
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore1.1K views
How Database Convergence Impacts the Coming Decades of Data Management by SingleStore
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
SingleStore628 views
Teaching Databases to Learn in the World of AI by SingleStore
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
SingleStore901 views
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud by SingleStore
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
SingleStore3K views
Gartner Catalyst 2017: Image Recognition on Streaming Data by SingleStore
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
SingleStore482 views
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark by SingleStore
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore490 views
Real-Time Analytics at Uber Scale by SingleStore
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
SingleStore23.3K views

Recently uploaded

Customer Data Cleansing Project.pptx by
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptxNat O
6 views23 slides
Lack of communication among family.pptx by
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptxahmed164023
14 views10 slides
shivam tiwari.pptx by
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptxAanyaMishra4
5 views14 slides
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
8 views36 slides
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... by
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...DataScienceConferenc1
5 views18 slides
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...DataScienceConferenc1
5 views19 slides

Recently uploaded(20)

Customer Data Cleansing Project.pptx by Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 views
Lack of communication among family.pptx by ahmed164023
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptx
ahmed16402314 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... by DataScienceConferenc1
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat452 views
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion by Bertram Ludäscher
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
Dr. Ousmane Badiane-2023 ReSAKSS Conference by AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 views
Best Home Security Systems.pptx by mogalang
Best Home Security Systems.pptxBest Home Security Systems.pptx
Best Home Security Systems.pptx
mogalang9 views
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821714 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx by DataScienceConferenc1
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...

Getting It Right Exactly Once: Principles for Streaming Architectures

  • 1. Getting It Right Exactly Once: Principles for Streaming Architectures Darryl Smith, Chief Data Platform Architect and Distinguished Engineer, Dell Technologies September 2016 | Strata+Hadoop World, NY
  • 2. 2 Getting Started  I’m Darryl Smith • Chief Data Platform Architect and Distinguished Engineer Dell Technologies  Agenda • Real-Time And The Need For Streaming • Adding Real-Time And Streaming To The Data Lake • Results, Plans, Lessons Learned • Demonstration
  • 3. 3 Trickle, Flood, or Torrent… Streaming is about continuous data motion, more than speed or volume
  • 4. 4 The Conversation Around Streaming Website and Mobile Application Logs Internet of Things Sensors
  • 5. The Enterprise Reality 5 Batch > Real-Time > Streaming Enterprise Opportunities Immediate Business Advantage Website and Mobile Application Logs Internet of Things Sensors
  • 6. 6 The Enterprise Streaming Play Moving from batch to real-time streams avoids surges, normalizes compute, and drives value
  • 7. 7 Real time and the need for streaming
  • 8. 8 Drive DellEMC towards a Predictive Enterprise via intelligent data driving agility, increasing revenue and productivity resulting in a competitive advantage Analytics Vision
  • 9. 9  Need to use new data for competitive advantage • Volume, Variety and Velocity  Leverage near real time and streaming data sets to optimize predictions • Make faster, better decisions  Cost-effectively scale to improve query and load performance  Put the data in the hands of the business Becoming An Analytical Enterprise DRIVE COMPETITIVE ADVANTAGE COST- EFFECTIVELY SCALE DATA ACCESS BY BUSINESS NEAR REAL-TIME ANALYTICS
  • 10. 10 Problem Statement Teams do not have access to maintenance renewal quotes in the timeframes or the degree of quality which they need for Tech Refresh and Renewal sales. Desired Outcome Implement a cost-effective, real-time solution that improves productivity and gives confidence to produce desired outcomes efficiently. Scoping The Business Objectives
  • 11. 11 Business Drivers CURRENT REALITY VISION FOR THE FUTURE TO REALIZE THIS VISION: IMPLEMENT CALM SOLUTION PHASES AND OPTIMZE BUSINESS PROCESSES HIGH TOUCH TACTICAL EXECUTION LOW TOUCH SELF SERVICE DATE DRIVEN PROCESSES BUSINESS VALUE DRIVEN PROCESSES INEFFICENCIES & LOST PRODUCTITY INCREASED PRODUCTIVITY SILOED DATA / LIMITED VIEWS SINGLE VIEW OF DATA/DATA SCORING VARIABLE DATA QUALITY DATA QUALITY & CONFIDENCE
  • 12. 12 The Need for “CALM” Customer Asset Lifecycle Management For enterprise sales Who need accurate and timely customer information CALM is a real-time application Providing up to the moment customer 360 dashboards For enterprise sales Who need accurate and timely customer information CALM is a real-time application Providing up to the moment customer 360 o dashboards Install Base Pricing Device Config Contacts Contracts Analytics Contracts Component Data Offers Scorecard
  • 13. 13 Data Lake Architecture D A T A P L A T F O R M V M W A R E V C L O U D S U I T E E X E C U T I O N P R O C E S S GREENPLUM DBSPRING XD PIVOTAL HD Gemfire H A D O O P INGESTION DATAGOVERNANCE Cassandra PostgreSQL MemSQL HDFS ON ISILON HADOOP ON SCALEIO VCE VBLOCK/VxRACK | XTREMIO | DATA DOMAIN A N A L Y T I C S T O O L B O X Network WebSensor SupplierSocial Media Market S T R U C T U R E DU N S T R U C T U R E D CRM PLMERP APPLICATIONS ApacheRangerAttivioCollibra Real-TimeMicro-BatchBatch
  • 14. 14 Data Ingestion • Small to Big Data (high-throughput) • Structured and unstructured Data from any Source • Streams and Batches • Secure, multi-tenant, configurable Framework Real-Time Analytics • Tap into streams for in-memory Analytics • Real Time Data insights and decisions Services • Data Ingestion to Data Lake • Data Lake APIs • Data Alerting Business Data Lake Offerings Unstructured Structured
  • 15. 15 Adding Real Time and Streaming to the Data Lake
  • 16. 16 Seeking A Fast Database A compliment to the business data lake O P C M
  • 17. HammerDB Platform Benchmarks HammerDB workloads testing was done following EMC’s Oracle and SQL Server DBA Teams standard practices.  Definition of workload. Mix of 5 transactions as follows: • New order: receive a new order from a customer: 45% • Payment: update the customer balance to record a payment: 43% • Delivery: deliver orders asynchronously: 4% • Order status: retrieve the status of customer’s most recent order: 4% • Stock level: return the status of the warehouse’s inventory: 4%  Testing scenario: • 100 warehouses 8 vUsers. Database creation and initial data loading. • Timed testing. 20 minutes per each testing session. • Scaled number of virtual users for each testing session from 1 until 44.  No changes done to the systems and databases configuration while running the test.
  • 18. HammerDB Workload Testing  Each test was 16 vCPU x 32 GB RAM • RedHat 6.4 • Oracle 11g R2 • Windows Core 2012 R2 • SQL Server 2012 Ent Ed. • RedHat 6.4 • PostgreSQL 9.3.3
  • 19. HammerDB Workload - Results Results
  • 20. Query PostgreSQL MemSQL Opportunity(5K) 5 seconds 200ms Sales Order(170K) 1-1.5 Minutes 6 seconds Territory(60K) 60 seconds 5 seconds PostgreSQL vs In-Memory DB We picked 5 top queries run by different business functions. Presented here are 3 queries that had response times that did not meet the SLA.
  • 21. 21 Business Data Lake – Ingestion to Fulfillment Raw Data Summary Data DATAGOVERNOR Consumers Predictive/ Prescriptive Analytics Processed Data Analytical Data GREENPLUM DATABASE HADOOP RAW Data INGEST MANAGER SPRING XD SPARK SQOOP Execution Tier CASSANDRAGEMFIRE MEMSQL POSTGRESQL Real-Time Tap
  • 22. 22 Here Are The Data Flows We Built Low Velocity Batch Real-Time
  • 23. 23 Data Flow Patterns – Low Velocity Analytical [BATCH] Ingestion Data Service JDBC Application Presentation [SPEED/SERVING] GREENPLUM DATABASE PIVOTAL HD POSTGRESQL MEMSQL Raw Data One-Time CASSANDRA GEMFIRE
  • 24. Analytical [BATCH] Ingestion Data Service JDBC Application GREENPLUM DATABASE PIVOTAL HD 24 Data Flow Patterns – Batch Batch Presentation [SPEED/SERVING] POSTGRESQL MEMSQL CASSANDRA GEMFIRE
  • 25. 25 Data Flow Patterns – Real Time Real-time Initial Load Analytical [BATCH] Ingestion Data Service JDBC Application GREENPLUM DATABASE PIVOTAL HD Presentation [SPEED/SERVING] POSTGRESQL MEMSQL CASSANDRA GEMFIRE
  • 26. 26 Nothing Closer To Real Time Than Streaming  Let’s look at the leading edge  Apache Kafka  Messaging Semantics • At most once • At least once • Exactly once
  • 28. 28 At least once 01 02 03 04 000 ?
  • 30. 30 Understanding Streaming Semantics At most once At least once Exactly once Message pulled once Message pulled one or more times; processed each time Message pulled one or more times; processed once May or may not be received Receipt guaranteed Receipt guaranteed No duplicates Likely duplicates No duplicates Possible missing data No missing data No missing data 000 ? 000000 ? 01 01 01
  • 31. 31 Rendering In Real Time  Picking the right business intelligence layer • Tableau • Custom Application (CF, D3, Docker) • Additional Third Party Solutions
  • 33. 33 Business Benefits DATA QUERYING Down from 4 hours per quarter to less than 1 minute per year SIMPLIFIED PROVISIONING Reduced number of tables/report required DATA GOVERNANCE Provides one version of the truth TIME TO MARKET Reduced number of tables/report required TOOL AGNOSTIC Business logic in the DB not the tool provides increased flexibility
  • 34. 34 Use Case: Customer Account Profile  STREAMLINED analytics ENVIRONMENT TO GAIN A HOLISTIC CUSTOMER VIEW Service Request Contracts Installed Base Bookings Billings EMC DATA LAKE BDL SERVICES DATA WORKSPACES DATA INGESTION Prof Services 23 BUSINESS MANAGED WORKSPACES
  • 35. 35 Customer Asset Lifecycle Management Platform Roadmap Phase 1 : Foundational Capabilities/Discovery Phase 2 : Scale Platform / Automate Future Phases : Global Standard tool Integrations , advanced Analytics BAaaS/Tableau Scalable Platform Integrated Platform GBS Renewals Inside Sales Additional Business groups Oct 2015 2016 TBDAug 2015 BDL Platform Enablement CollaborationAcceleration In-Memory Capabilities (POC) We are here
  • 36. 36 Data Services Roadmap Security Planned integration into custom BDL security API for managing Role Based Access Control (RBAC) to the underlying data Business Data Lake Plans
  • 37. 37 Lessons Learned – Key Takeaways EDUCATE ASSESS INFRASTRUCTURE JOURNEY Educate the business Use examples of business impact Assess in-house big data skills Ensure plan to support the organization for 3- 5 years Choose the best possible infrastructure Make sure your Big Data technology platform can evolve Remember it is a journey Look for small wins as well as big wins.
  • 38. 38 Lessons Learned: Analytics and Data Sourcing the right skills, working with a different philosophy, and some new tools will help you meet your analytical goals TRANSFORM YOUR PEOPLE CHANGE YOUR PROCESSES ADAPT YOUR TECHNOLOGY  Data science in the organization, IT or both?  Helping business units take initiative  New philosophy to running analytics projects  How and when to share data  Steadily refine toolsets based on needed analysis  Identify to infrastructure layers
  • 40. 40 Demo Agenda Showcase exactly-once semantics from Kafka 1: Data set of 200,000 transactions summing to zero 2: CREATE TABE AND CREATE PIPELINE 3: Push to Kafka and confirm exactly-once 4: Validate Resiliency and confirm exactly-once
  • 41. Step 1: Data Source  start with a data set of 200,000 transactions representing money/goods that sum to zero
  • 42.  200,000 transactions • Transaction number • Increase / Decrease • Amount
  • 43. Step 2: CREATE TABLE AND CREATE PIPELINE  create a table and pipeline in MemSQL that subscribes to that Kafka topic
  • 45. Step 3: Push to Kafka  Push that data set to Kafka  Validate exactly-once delivery by querying MemSQL • show tables; • show pipelines; • select sum(amount) from transactions;  Should be 0 in the demo • select count(*) from transactions;  Should be 200,000 in the demo
  • 46. 46
  • 47. Step 4: Resiliency  induce a failures to show resiliency during exactly-once workflows a. randomly_fail_batches.py b. restart Kafka and show error count c. continue and validate exactly-once semantics
  • 48. 48
  • 50. The mission is clear: We’re moving from batch to real-time with streaming
  • 51. Thank You Darryl Smith Chief Data Platform Architect and Distinguished Engineer Dell Technologies