SlideShare a Scribd company logo
1 of 20
Download to read offline
Director Product Management
Unified Batch & Stream Processing
Platform
Himanshu Bari
© 2015 DataTorrent Confidential – Do Not Distribute
Most Big Data Use Cases Are About
Improving/Re-write EXISTING
solutions To KNOWN problems…
© 2015 DataTorrent Confidential – Do Not Distribute
Current Solutions Were Built On
A. Imperfect information
B. Expensive s/w & h/w infrastructure
C. Relational data stores
© 2015 DataTorrent Confidential – Do Not Distribute
Inevitable Course of the Re-write
Specialized solutions
Near perfect information
Commodity hardware Open source software
Next gen data management platform
HadoopNoSQL SearchGraph
EDW +
RDBMS
In Memory Processing
© 2015 DataTorrent Confidential – Do Not Distribute
Data Processing Categories in Big Data Use Cases
Known Unknown
Questions known?
Data
Velocity
Static
Streaming
Batch
Processing
Stream
Processing
Ad-hoc
Query
N/A
© 2015 DataTorrent Confidential – Do Not Distribute
Every Batch Process *Could* Have Been A Stream Process
• Every ‘Static’ data point was ‘Streaming’ at some point
• We choose to wait and collect a bunch of data points and then process them at
once in ‘Batch Mode’
• Move processing time closer to the data generation or ‘Event’ occurrence time
• Reduces time to insight and allows you to be proactive with timely actions
© 2015 DataTorrent Confidential – Do Not Distribute
But We Still Need Batch
• Historical data analysis
• What-if analysis
• Experimentation
• Data re-statement
• Transaction processing and re-conciliation
• Audit
• Machine learning model training
• ……..
© 2015 DataTorrent Confidential – Do Not Distribute
YARN
HDFS
Need A Unified Platform
Enterprise Repositories
RDBMS
EDW
NoSQL Graph
Search
Other…
Any Notification Engine
Any BPM Engine
Stale Data
Change
Data
‘Fresh’
Streaming
Data
Unified Batch & Stream Processing
Engine to execute data pipelines
• Streaming only pipelines
OR
• Batch processing only pipelines
OR
• Dual Batch & Stream processing
pipelines
© 2015 DataTorrent Confidential – Do Not Distribute
DataTorrent RTS Provides A Unified Batch & Streaming Platform
Graphical Application Assembly Real-Time Data Visualization
Re-Usable Java Operator Library
Apache 2.0 licensed – Project Malhar
Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform
Apache 2.0 licensed - Project Apex
Hadoop 2.0 —YARN + HDFS
Physical Virtual Cloud
Transform
Normalize
Descriptive &
Predictive
Analytics
Alert
Action
Management&Monitoring
Unified Batch & Streaming Data Ingestion & Distribution Pipeline for
Hadoop
© 2015 DataTorrent Confidential – Do Not Distribute
DataTorrent Project Apex- Unified Batch & Stream Processing Engine
In-memory compute engine1
Hadoop native (YARN +
HDFS) architecture2
Sub-second event
processing3
Event order guarantees4
Flexible stream partitioning
mechanism5
Stateful fault tolerance with
NO data loss
6
Automatic & incremental
recovery
7
Rolling & tumbling
windows
8
Processing guarantees 9
Ability to update application
without downtime
10
© 2015 DataTorrent Confidential – Do Not Distribute
Unified Data Ingestion & Distribution Pipeline
• FTP, S3 etc.
• Kafka & JMS
• Change data capture
Input & Output
Variety
• Overcome HDFS small file
problem
• Skew management
Tackle data size
& speed
fluctuations
• Parameters like filtering criteria,
bandwidth utilization & polling
interval should be updateable at
runtime
Run-time
updates
•Graphical UI & API
•End to end metrics
Simple to
build &
Operate
• Easily extend and insert
operations for data preparation
Easily
customizable
•Runs within the Hadoop cluster
over YARN & HDFS
Hadoop
Native
© 2015 DataTorrent Confidential – Do Not Distribute
Hadoop Data Ingestion & Distribution Application
© 2015 DataTorrent Confidential – Do Not Distribute
Data Prep & Analytics Layer Requirements
• Truly scale horizontally across the Hadoop cluster
• Pre-built operators
ᵒ Re-ordering
ᵒ Normalization
ᵒ Transpose
ᵒ De-duplication
ᵒ Tagging
ᵒ Filtering
ᵒ Enrichment
• Operators work seamlessly in both streaming & batch mode
ᵒ Data local HDFS read & process
ᵒ Ability to do computations on time window as well as file boundaries
• Ability to re-use existing business logic
• Simple workflow & scheduling capabilities
ᵒ Built-in or integrations with Oozie or other schedulers
© 2015 DataTorrent Confidential – Do Not Distribute
Development API Requirements
• Consistent between Streaming & Batch pipelines
• No mapreduce
• No exposure of low level processing engine concepts
• Easily extendible
© 2015 DataTorrent Confidential – Do Not Distribute
Malhar Operator Library Overview
Malhar Operator Library
Input / Output Connectors
Compute Operators
Parsers
Stream
Manipulators
Query &
Scripting
Stats & MathPattern Matching
Machine Learning &
Algorithms
File Systems RDBMS NoSQL Messaging
Notification
In Memory
Databases
Social Media
Protocol
read/write
© 2015 DataTorrent Confidential – Do Not Distribute
Visual Application Assembly
• Simple to extend
ᵒ Simple API to enable any existing DataTorrent operator
ᵒ Ability to plug any business logic using a custom operator
• Easy to Use
ᵒ Web based drag-n-drop
development environment
ᵒ Automatic port compatibility
validation
ᵒ Simple schema management
ᵒ Generic property
configurator
• Easy to Operate
ᵒ No external component
dependency - Runs natively
in Hadoop
ᵒ Integrated with DataTorrent
management platform
© 2015 DataTorrent Confidential – Do Not Distribute
Streaming or Batch Data Processing Visualization
• Intuitive
user interface
ᵒ Auto-generate or custom create
ᵒ One dashboard for multiple apps
ᵒ Supports bar, line, pie, area
charts & data tables
• Easy to Operate
ᵒ No external component
dependency - Runs natively
in Hadoop
ᵒ Integrated with DataTorrent
management platform
• Simple to extend
ᵒ Any DAG operator can be
made a real-time datasource
© 2015 DataTorrent Confidential – Do Not Distribute
Summary
• World is moving from ‘Batch’ to
‘Streaming’ BUT both are required
• Need a Hadoop native in memory
compute engine that is scalable &
fault tolerant in BOTH batch &
streaming modes
• With out -of-the box data prep &
analytics operators
• Using a consistent & functional
development API
• Operationalized through a common
management & monitoring layer
Project Apex
https://www.datatorrent.com/product/project
-apex/
DataTorrent RTS Sandbox
https://www.datatorrent.com/download/
Graphical Application Assembly Real-Time Data Visualization
Malhar Apache 2.0 license Re-Usable Java Operator Library
Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform
Apache 2.0 licensed - Project Apex
Hadoop 2.0 —YARN + HDFS
Physical Virtual Cloud
Transform
Normalize
Descriptive &
Predictive
Analytics
Alert
Action
Management&Monitoring
Unified Batch & Streaming Data Ingestion & Distribution Pipeline
for Hadoop
Big Data. Big Actions. Now.Big Data. Big Actions. Now.
© 2015 DataTorrent Confidential – Do Not Distribute
Some Verticals & Use Cases
Ad-Tech Telco & Cable
• Real-time customer facing dashboards on key
performance indicators
• Click fraud detection
• Billing optimization
• Call detail record (CDR) & extended data record (XDR)
analysis for
• Service quality improvement
• Capacity planning/optimization
• Understanding customer behavior AND context
• Packaging and selling anonymous customer data
Financial Services IoT
• Fraud & risk monitoring
• Sentiment based trading strategies
• Usage based insurance
• Improved credit risk assessment
• Improving turn around time of trade settlement
processes
• Process optimization
• Proactive maintenance prediction
• Remote monitoring & diagnostics

More Related Content

What's hot

Break Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftBreak Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftAttunity
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesArvind Prabhakar
 
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache KafkaTransform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache KafkaPrecisely
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hPrecisely
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...DataWorks Summit
 
What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data OnboardingWhat’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data OnboardingPrecisely
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...DataWorks Summit
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarImpetus Technologies
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 

What's hot (20)

Break Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftBreak Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and Microsoft
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
 
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache KafkaTransform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
 
1200x630 1
1200x630 11200x630 1
1200x630 1
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
 
What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data OnboardingWhat’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 

Similar to IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Platform: A Unified Architecture for both Stream and Batch Processing

Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupThomas Weise
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache ApexPramod Immaneni
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Apache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupApache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupPramod Immaneni
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...Yahoo Developer Network
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Will Du
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 

Similar to IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Platform: A Unified Architecture for both Stream and Batch Processing (20)

Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at Cask
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Apache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupApache Apex - Hadoop Users Group
Apache Apex - Hadoop Users Group
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 

More from In-Memory Computing Summit

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...In-Memory Computing Summit
 

More from In-Memory Computing Summit (20)

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Platform: A Unified Architecture for both Stream and Batch Processing

  • 1. Director Product Management Unified Batch & Stream Processing Platform Himanshu Bari
  • 2. © 2015 DataTorrent Confidential – Do Not Distribute Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems…
  • 3. © 2015 DataTorrent Confidential – Do Not Distribute Current Solutions Were Built On A. Imperfect information B. Expensive s/w & h/w infrastructure C. Relational data stores
  • 4. © 2015 DataTorrent Confidential – Do Not Distribute Inevitable Course of the Re-write Specialized solutions Near perfect information Commodity hardware Open source software Next gen data management platform HadoopNoSQL SearchGraph EDW + RDBMS In Memory Processing
  • 5. © 2015 DataTorrent Confidential – Do Not Distribute Data Processing Categories in Big Data Use Cases Known Unknown Questions known? Data Velocity Static Streaming Batch Processing Stream Processing Ad-hoc Query N/A
  • 6. © 2015 DataTorrent Confidential – Do Not Distribute Every Batch Process *Could* Have Been A Stream Process • Every ‘Static’ data point was ‘Streaming’ at some point • We choose to wait and collect a bunch of data points and then process them at once in ‘Batch Mode’ • Move processing time closer to the data generation or ‘Event’ occurrence time • Reduces time to insight and allows you to be proactive with timely actions
  • 7. © 2015 DataTorrent Confidential – Do Not Distribute But We Still Need Batch • Historical data analysis • What-if analysis • Experimentation • Data re-statement • Transaction processing and re-conciliation • Audit • Machine learning model training • ……..
  • 8. © 2015 DataTorrent Confidential – Do Not Distribute YARN HDFS Need A Unified Platform Enterprise Repositories RDBMS EDW NoSQL Graph Search Other… Any Notification Engine Any BPM Engine Stale Data Change Data ‘Fresh’ Streaming Data Unified Batch & Stream Processing Engine to execute data pipelines • Streaming only pipelines OR • Batch processing only pipelines OR • Dual Batch & Stream processing pipelines
  • 9. © 2015 DataTorrent Confidential – Do Not Distribute DataTorrent RTS Provides A Unified Batch & Streaming Platform Graphical Application Assembly Real-Time Data Visualization Re-Usable Java Operator Library Apache 2.0 licensed – Project Malhar Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform Apache 2.0 licensed - Project Apex Hadoop 2.0 —YARN + HDFS Physical Virtual Cloud Transform Normalize Descriptive & Predictive Analytics Alert Action Management&Monitoring Unified Batch & Streaming Data Ingestion & Distribution Pipeline for Hadoop
  • 10. © 2015 DataTorrent Confidential – Do Not Distribute DataTorrent Project Apex- Unified Batch & Stream Processing Engine In-memory compute engine1 Hadoop native (YARN + HDFS) architecture2 Sub-second event processing3 Event order guarantees4 Flexible stream partitioning mechanism5 Stateful fault tolerance with NO data loss 6 Automatic & incremental recovery 7 Rolling & tumbling windows 8 Processing guarantees 9 Ability to update application without downtime 10
  • 11. © 2015 DataTorrent Confidential – Do Not Distribute Unified Data Ingestion & Distribution Pipeline • FTP, S3 etc. • Kafka & JMS • Change data capture Input & Output Variety • Overcome HDFS small file problem • Skew management Tackle data size & speed fluctuations • Parameters like filtering criteria, bandwidth utilization & polling interval should be updateable at runtime Run-time updates •Graphical UI & API •End to end metrics Simple to build & Operate • Easily extend and insert operations for data preparation Easily customizable •Runs within the Hadoop cluster over YARN & HDFS Hadoop Native
  • 12. © 2015 DataTorrent Confidential – Do Not Distribute Hadoop Data Ingestion & Distribution Application
  • 13. © 2015 DataTorrent Confidential – Do Not Distribute Data Prep & Analytics Layer Requirements • Truly scale horizontally across the Hadoop cluster • Pre-built operators ᵒ Re-ordering ᵒ Normalization ᵒ Transpose ᵒ De-duplication ᵒ Tagging ᵒ Filtering ᵒ Enrichment • Operators work seamlessly in both streaming & batch mode ᵒ Data local HDFS read & process ᵒ Ability to do computations on time window as well as file boundaries • Ability to re-use existing business logic • Simple workflow & scheduling capabilities ᵒ Built-in or integrations with Oozie or other schedulers
  • 14. © 2015 DataTorrent Confidential – Do Not Distribute Development API Requirements • Consistent between Streaming & Batch pipelines • No mapreduce • No exposure of low level processing engine concepts • Easily extendible
  • 15. © 2015 DataTorrent Confidential – Do Not Distribute Malhar Operator Library Overview Malhar Operator Library Input / Output Connectors Compute Operators Parsers Stream Manipulators Query & Scripting Stats & MathPattern Matching Machine Learning & Algorithms File Systems RDBMS NoSQL Messaging Notification In Memory Databases Social Media Protocol read/write
  • 16. © 2015 DataTorrent Confidential – Do Not Distribute Visual Application Assembly • Simple to extend ᵒ Simple API to enable any existing DataTorrent operator ᵒ Ability to plug any business logic using a custom operator • Easy to Use ᵒ Web based drag-n-drop development environment ᵒ Automatic port compatibility validation ᵒ Simple schema management ᵒ Generic property configurator • Easy to Operate ᵒ No external component dependency - Runs natively in Hadoop ᵒ Integrated with DataTorrent management platform
  • 17. © 2015 DataTorrent Confidential – Do Not Distribute Streaming or Batch Data Processing Visualization • Intuitive user interface ᵒ Auto-generate or custom create ᵒ One dashboard for multiple apps ᵒ Supports bar, line, pie, area charts & data tables • Easy to Operate ᵒ No external component dependency - Runs natively in Hadoop ᵒ Integrated with DataTorrent management platform • Simple to extend ᵒ Any DAG operator can be made a real-time datasource
  • 18. © 2015 DataTorrent Confidential – Do Not Distribute Summary • World is moving from ‘Batch’ to ‘Streaming’ BUT both are required • Need a Hadoop native in memory compute engine that is scalable & fault tolerant in BOTH batch & streaming modes • With out -of-the box data prep & analytics operators • Using a consistent & functional development API • Operationalized through a common management & monitoring layer Project Apex https://www.datatorrent.com/product/project -apex/ DataTorrent RTS Sandbox https://www.datatorrent.com/download/ Graphical Application Assembly Real-Time Data Visualization Malhar Apache 2.0 license Re-Usable Java Operator Library Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform Apache 2.0 licensed - Project Apex Hadoop 2.0 —YARN + HDFS Physical Virtual Cloud Transform Normalize Descriptive & Predictive Analytics Alert Action Management&Monitoring Unified Batch & Streaming Data Ingestion & Distribution Pipeline for Hadoop
  • 19. Big Data. Big Actions. Now.Big Data. Big Actions. Now.
  • 20. © 2015 DataTorrent Confidential – Do Not Distribute Some Verticals & Use Cases Ad-Tech Telco & Cable • Real-time customer facing dashboards on key performance indicators • Click fraud detection • Billing optimization • Call detail record (CDR) & extended data record (XDR) analysis for • Service quality improvement • Capacity planning/optimization • Understanding customer behavior AND context • Packaging and selling anonymous customer data Financial Services IoT • Fraud & risk monitoring • Sentiment based trading strategies • Usage based insurance • Improved credit risk assessment • Improving turn around time of trade settlement processes • Process optimization • Proactive maintenance prediction • Remote monitoring & diagnostics