1© StreamSets, Inc. All rights reserved.
Introduction to DataOps
Modern Streaming Data Stack
with Kinetica & StreamSets
In-Car Sensors (Edge)
Diagnostic Analytics (BI, AI)
Streaming
Race-track Cluster (On-Prem)
Predictive Analytics (BI, AI)
Streaming
HQ (Cloud Compute)
Descriptive (BI) & Prescriptive
Analytics (AI/ML)
Batch & Streaming
Modern IT Imperative: Pervasive Intelligence
Any technique, Any place, Any speed
3© StreamSets, Inc. All rights reserved.
Pervasive Intelligence Hindered by Data Drift
Traditional Data Integration cannot deal with Data Drift!
1. Ungoverned, unstructured
upstream data sources.
1.
1.
1.
2.
3.
3. Numerous systems acting
midstream on the data without
change management.
2. Evolving downstream multi-
cloud, multi-platform
infrastructure.
3.
Data drift (noun) - unexpected and frequent changes to data structure and semantics caused by
normal business operations that break data pipelines and pollute data.
4© StreamSets, Inc. All rights reserved.
Traditional Data Integration Technology
Rigid, opaque, brittle pipelines
5© StreamSets, Inc. All rights reserved.
DataOps Technology
Powered by built-in Dataflow Sensors
Throughput
Processing
Time
Memory
Consumpti
on
Error
Handling
PII
Detection
Drift
Detection
6© StreamSets, Inc. All rights reserved.
Challenges with Data Ingest
Not all
developers are
created equally
>_
Integrations are
abundant and
unnecessarily rigid
Build-to-deploy takes
far longer than
necessary
7© StreamSets, Inc. All rights reserved.
Challenges with Managing Dataflow Pipelines at Scale
Coordinating
efforts across
teams
Deploying and
controlling
pipelines at
scale
Mounting
regulatory and
compliance
pressures
?
?
?
?
8© StreamSets, Inc. All rights reserved.
StreamSets Control Hub
Disciplined management for dataflow architectures
Collaborative development, automated deployment, scalable
execution and governance of batch and streaming pipelines
Collaborative
pipeline design
and shared
repository
Automate pipeline
deployment
End-to-end lineage
and support for
data governance
Architecture wide
visibility and
control
9© StreamSets, Inc. All rights reserved.
What does the StreamSets DataOps Platform do?
Data Lake
10© StreamSets, Inc. All rights reserved.
How does StreamSets Help?
Operationalization of data in motion
Build
Simplify development
cycles and build dataflow
pipelines in minutes, not
days or months
Execute
Deploy and execute
when and where you
want to optimize the
economics of your
architecture
Operate
Architectures are
constantly changing and
have more stringent
SLA’s
Protect
Data must be protected
in flight, not just at rest
11© StreamSets, Inc. All rights reserved.
Proven Success
“We chose StreamSets over NiFi as our enterprise-wide
standard for our next generation data lake infrastructure
because of their singular focus on solving deployment and
operations challenges.”
“StreamSets allowed us to build and operate over
175,000 pipelines and synchronize 97% of our
structured data in R&D to our Data Lake within 4
months. This will save us billions of dollars.”
“It’s simple and easy enough that we don’t need to find
a StreamSets developer to create their own data
pipelines. Before, it could take 90 days just to find a
traditional ETL developer.”
12© StreamSets, Inc. All rights reserved.
StreamSets DataOps Platform
Control Hub
___
Enterprise-Grade
Build, Test, Deploy
BUILD OPERATE
Dataflow Performance Manager
___
Data Quality and Availability SLAs
Data Collector
___
Drift Handling
Engineering Productivity
Cloud and on-premise
Data Collector Edge
Devices, Mobile
Dataflow Sensors
Data Protector
Data Security SLAs
Open Source
Unpredictable Data
ComplexityofAnalysis
EXTREME DATA
Data-Powered Business
BIG DATA
Data-Informed Business
TRADITIONAL
DATA
Data-Validated Business
THE WORLD OF DATA
HAS NEW DIMENSIONS
(Volume, Variety & Velocity)
COMPANY OVERVIEW
100+ employees
worldwide
Exponential
growth
$50M Series A
Top Tier Investors
CLICK TO EDIT MASTER TITLE STYLE
CORE CONCEPTS
GPU-Accelerated Memory-first Columnar Database
KINETICA CORE DIFFERENTIATION
Location-based
Analysis, Rendering,
Discovery & Insights
Data-driven,
Streamlined Machine
Learning
BREAKTHROUGH
SPEED WITH +
Advanced Analytics on
Extreme Data:
Static & Streaming
INSIDE KINETICA
OLAP
optimized
Native Geospatial
Datatypes & Functions
Distributed,
Linear Scale Out
User Defined
Functions (UDFs) &
Models
Native REST API Full Text
Search
SQL92
Visual
Dashboards
Ecosystem
Connector
s
1 NODE (1TB/2GPU)
PARALLEL
INGEST
1 NODE (1TB/2GPU)
1 NODE (1TB/2GPU)
Each node of the system can
share the task of data ingest,
provides more and faster
throughput. It can always be
made faster simply by adding
more nodes.
PARALLEL INGEST
PROVIDES HIGH
PERFORMANCE
STREAMING
Leading In-memory DB NoSQL DB
150s
753s
1029s
KINETICA & STREAMSETS IN YOUR ECOSYSTEM
ETL / STREAM
PROCESSING
SQL
Native
APIs
PARALLEL
INGEST
Geospatial
WMS
Custom
Connectors
BI DASHBOARDS
BI / GIS / APPS
CUSTOM APPS
& GEOSPATIAL
KINETICA ‘REVEAL’
STREAMINGDATA
UDF
s
ON DEMAND SCALE OUT +
Built-in Machine Learning
CUSTOM
LOGIC
BIDMach
ERP / CRM /
TRANSACTIONA
L
BUILT FOR BUSINESS USERS & DATA SCIENTISTS
MACHINE
LEARNING
MASSIVE
PARALLEL
COMPUTING
CUSTOM
APPLICATIONS
GEOSPATIAL
VISUALIZATION
STREAMING DATA
ANALYSIS
ADVANCED
ANALYTICS
BUSINESS USERS DATA SCIENTISTS /DEVELOPERS
21© StreamSets, Inc. All rights reserved.
StreamSets Demonstration
22© StreamSets, Inc. All rights reserved.
StreamSets Dataflow Architecture for Kinetica
IOT
Gateways
StreamSets
Data Collector
StreamSets
Data Collector
StreamSets Data
Collector Edge
Kubernetes-based elastic
deployment and scale out
Multi-head ingest
SQL
DEVELOPER
FRONT-END
USER
DATA
SCIENTIST
PERSONAS
Kinetica Demonstration
25© StreamSets, Inc. All rights reserved.
Thank you
25© StreamSets, Inc. All rights reserved.
Learn more:
www.kinetica.com/partner/streamsets
Twitter: @KineticaHQ
Email: info@kinetica.com
www.streamsets.com
Twitter: @StreamSets
Email: info@streamsets.com

Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets

  • 1.
    1© StreamSets, Inc.All rights reserved. Introduction to DataOps Modern Streaming Data Stack with Kinetica & StreamSets
  • 2.
    In-Car Sensors (Edge) DiagnosticAnalytics (BI, AI) Streaming Race-track Cluster (On-Prem) Predictive Analytics (BI, AI) Streaming HQ (Cloud Compute) Descriptive (BI) & Prescriptive Analytics (AI/ML) Batch & Streaming Modern IT Imperative: Pervasive Intelligence Any technique, Any place, Any speed
  • 3.
    3© StreamSets, Inc.All rights reserved. Pervasive Intelligence Hindered by Data Drift Traditional Data Integration cannot deal with Data Drift! 1. Ungoverned, unstructured upstream data sources. 1. 1. 1. 2. 3. 3. Numerous systems acting midstream on the data without change management. 2. Evolving downstream multi- cloud, multi-platform infrastructure. 3. Data drift (noun) - unexpected and frequent changes to data structure and semantics caused by normal business operations that break data pipelines and pollute data.
  • 4.
    4© StreamSets, Inc.All rights reserved. Traditional Data Integration Technology Rigid, opaque, brittle pipelines
  • 5.
    5© StreamSets, Inc.All rights reserved. DataOps Technology Powered by built-in Dataflow Sensors Throughput Processing Time Memory Consumpti on Error Handling PII Detection Drift Detection
  • 6.
    6© StreamSets, Inc.All rights reserved. Challenges with Data Ingest Not all developers are created equally >_ Integrations are abundant and unnecessarily rigid Build-to-deploy takes far longer than necessary
  • 7.
    7© StreamSets, Inc.All rights reserved. Challenges with Managing Dataflow Pipelines at Scale Coordinating efforts across teams Deploying and controlling pipelines at scale Mounting regulatory and compliance pressures ? ? ? ?
  • 8.
    8© StreamSets, Inc.All rights reserved. StreamSets Control Hub Disciplined management for dataflow architectures Collaborative development, automated deployment, scalable execution and governance of batch and streaming pipelines Collaborative pipeline design and shared repository Automate pipeline deployment End-to-end lineage and support for data governance Architecture wide visibility and control
  • 9.
    9© StreamSets, Inc.All rights reserved. What does the StreamSets DataOps Platform do? Data Lake
  • 10.
    10© StreamSets, Inc.All rights reserved. How does StreamSets Help? Operationalization of data in motion Build Simplify development cycles and build dataflow pipelines in minutes, not days or months Execute Deploy and execute when and where you want to optimize the economics of your architecture Operate Architectures are constantly changing and have more stringent SLA’s Protect Data must be protected in flight, not just at rest
  • 11.
    11© StreamSets, Inc.All rights reserved. Proven Success “We chose StreamSets over NiFi as our enterprise-wide standard for our next generation data lake infrastructure because of their singular focus on solving deployment and operations challenges.” “StreamSets allowed us to build and operate over 175,000 pipelines and synchronize 97% of our structured data in R&D to our Data Lake within 4 months. This will save us billions of dollars.” “It’s simple and easy enough that we don’t need to find a StreamSets developer to create their own data pipelines. Before, it could take 90 days just to find a traditional ETL developer.”
  • 12.
    12© StreamSets, Inc.All rights reserved. StreamSets DataOps Platform Control Hub ___ Enterprise-Grade Build, Test, Deploy BUILD OPERATE Dataflow Performance Manager ___ Data Quality and Availability SLAs Data Collector ___ Drift Handling Engineering Productivity Cloud and on-premise Data Collector Edge Devices, Mobile Dataflow Sensors Data Protector Data Security SLAs Open Source
  • 13.
    Unpredictable Data ComplexityofAnalysis EXTREME DATA Data-PoweredBusiness BIG DATA Data-Informed Business TRADITIONAL DATA Data-Validated Business THE WORLD OF DATA HAS NEW DIMENSIONS (Volume, Variety & Velocity)
  • 14.
  • 15.
    CLICK TO EDITMASTER TITLE STYLE CORE CONCEPTS GPU-Accelerated Memory-first Columnar Database
  • 16.
    KINETICA CORE DIFFERENTIATION Location-based Analysis,Rendering, Discovery & Insights Data-driven, Streamlined Machine Learning BREAKTHROUGH SPEED WITH + Advanced Analytics on Extreme Data: Static & Streaming
  • 17.
    INSIDE KINETICA OLAP optimized Native Geospatial Datatypes& Functions Distributed, Linear Scale Out User Defined Functions (UDFs) & Models Native REST API Full Text Search SQL92 Visual Dashboards Ecosystem Connector s
  • 18.
    1 NODE (1TB/2GPU) PARALLEL INGEST 1NODE (1TB/2GPU) 1 NODE (1TB/2GPU) Each node of the system can share the task of data ingest, provides more and faster throughput. It can always be made faster simply by adding more nodes. PARALLEL INGEST PROVIDES HIGH PERFORMANCE STREAMING Leading In-memory DB NoSQL DB 150s 753s 1029s
  • 19.
    KINETICA & STREAMSETSIN YOUR ECOSYSTEM ETL / STREAM PROCESSING SQL Native APIs PARALLEL INGEST Geospatial WMS Custom Connectors BI DASHBOARDS BI / GIS / APPS CUSTOM APPS & GEOSPATIAL KINETICA ‘REVEAL’ STREAMINGDATA UDF s ON DEMAND SCALE OUT + Built-in Machine Learning CUSTOM LOGIC BIDMach ERP / CRM / TRANSACTIONA L
  • 20.
    BUILT FOR BUSINESSUSERS & DATA SCIENTISTS MACHINE LEARNING MASSIVE PARALLEL COMPUTING CUSTOM APPLICATIONS GEOSPATIAL VISUALIZATION STREAMING DATA ANALYSIS ADVANCED ANALYTICS BUSINESS USERS DATA SCIENTISTS /DEVELOPERS
  • 21.
    21© StreamSets, Inc.All rights reserved. StreamSets Demonstration
  • 22.
    22© StreamSets, Inc.All rights reserved. StreamSets Dataflow Architecture for Kinetica IOT Gateways StreamSets Data Collector StreamSets Data Collector StreamSets Data Collector Edge Kubernetes-based elastic deployment and scale out Multi-head ingest
  • 23.
  • 24.
  • 25.
    25© StreamSets, Inc.All rights reserved. Thank you 25© StreamSets, Inc. All rights reserved. Learn more: www.kinetica.com/partner/streamsets Twitter: @KineticaHQ Email: info@kinetica.com www.streamsets.com Twitter: @StreamSets Email: info@streamsets.com