SlideShare a Scribd company logo
1 of 56
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & others
VP of Incubator at Apache Foundation
Email tdunning@apache.org tdunning@maprtech.com
Twitter @ted_dunning
© 2014 MapR Technologies 3
Goals
• Real-time or near-time
– Includes situations with deadlines
– Also includes situations where delay is simply undesirable
– Even includes situations where delay is just fine
• Micro-services
– Streaming is a convenient idiom for design
– Micro-services … you know we wanted it
– Service isolation is a key requirement
© 2014 MapR Technologies 4
Real-time or Near-time?
• The real point is flow versus state (catch me later today)
• One consequence of flow-based computing is real-time and
near-time become relatively easy
• Life may be a bitch, but it doesn’t happen in batches!
© 2014 MapR Technologies 6
Agenda
• Background / micro-services
• Global requirements
• Scale
© 2014 MapR Technologies 7
A microservice is
loosely coupled
with bounded context
© 2014 MapR Technologies 8
How to Couple Services and Break micro-ness
• Shared schemas, relational stores
• Ad hoc communication between services
• Enterprise service busses
• Brittle protocols
• Poor protocol versioning
• Single PoF schema repositories
Don’t do this!
© 2014 MapR Technologies 9
How to Decouple Services
• Use self-describing data
• Private database tables
• Infrastructural communication between services
• Use modern protocols
• Adopt future-proof protocol practices
• Use shared storage only where necessary due to scale
© 2014 MapR Technologies 11
What is the Right Structure for Flow Compute?
• Traditional message queues?
– Message queues are classic answer
– Key feature/bug is out-of-order acknowledgement
– Many implementations
– You pay a huge performance hit for persistence
• Kafka-esque Logs?
– Logs are like queues, but with ordering
– Out of order consumption is possible, acknowledgement not so much
– Canonical base implementation is Kafka
– Performance plus persistence
© 2014 MapR Technologies 12
Scenarios
Profile Database
© 2014 MapR Technologies 13
The task
?
POS 1
location, t, card #
yes/no?
POS 2
location, t, card #
yes/no?
© 2014 MapR Technologies 14
Traditional Solution
POS
1..n
Fraud
detector
Last card
use
© 2014 MapR Technologies 15
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
© 2014 MapR Technologies 16
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
© 2014 MapR Technologies 17
How to Get Service Isolation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
© 2014 MapR Technologies 18
New Uses of Data
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
© 2014 MapR Technologies 19
Scaling Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
© 2014 MapR Technologies 20
Lessons
• De-coupling and isolation are key
• Private data stores/tables are important,
– But node local storage of private data is a bug
• Propagate events, not table updates
• Note that tables and streams should be as easy as files
– It should not be necessary to provision a cluster to get them
© 2014 MapR Technologies 21
Scenarios
IoT Data Aggregation
© 2014 MapR Technologies 22
Basic Situation
Each location
has many
pumps
pump data
Multiple
locations
© 2014 MapR Technologies 23
What Does a Pump Look Like
inlet
out let
m ot or
Temperature
Pressure
Flow
Temperature
Pressure
Flow
Winding temperature
Voltage
Current
© 2014 MapR Technologies 24
Basic Situation
Each location
has many
pumps
pump data
Multiple
locations
© 2014 MapR Technologies 25
pump data
pump data
pump data
pump data
Basic Architecture Reflects Business Structure
© 2014 MapR Technologies 26
Lessons
• Data architecture should reflect business structure
• Even very modest designs involve multiple data centers
• Schemas cannot be frozen in the real world
• Security must follow data ownership
© 2014 MapR Technologies 27
Scenarios
Global Data Recovery
© 2014 MapR Technologies 28
Tokyo
Corporate
HQ
© 2014 MapR Technologies 29
Singapore
Tokyo
Corporate
HQ
© 2014 MapR Technologies 30
Singapore
Tokyo
Corporate
HQ
© 2014 MapR Technologies 31
Singapore
Tokyo
Corporate
HQ
© 2014 MapR Technologies 32
Lessons
• Streams and tables are primitive building blocks
• Arbitrary number of topics important for simplicity + performance
• Updates happen in many places
• Mobility implies change in replication patterns
• Multi-master updates simplify design massively
© 2014 MapR Technologies 33
Converged Requirements
© 2014 MapR Technologies 34
What Have We Learned?
• Need persistence and performance
– Possibly for years and to 100’s of millions t/s
• Must have convergence
– Need files, tables AND streams
– Need volumes, snapshots, mirrors, permissions and …
• Must have platform security
– Cannot depend on perimeter
– Must follow business structure
• Must have global scale and scope
– Millions of topics for natural designs
– Multi-master replication and update
© 2014 MapR Technologies 35
The Importance of Common API’s
• Commonality and interoperability are critical
– Compare Hadoop eco-system and the noSQL world
• Table stakes
– Persistence
– Performance
– Polymorphism
• Major trend so far is to adopt Kafka API
– 0.9 API and beyond remove major abstraction leaks
– Kafka API supported by all major Hadoop vendors
© 2014 MapR Technologies 36
What we do at MapR
© 2014 MapR Technologies 37
Evolution of Data Storage
Functionality
Compatibility
Scalability
Linux
POSIX
Over decades of progress,
Unix-based systems have set the
standard for compatibility and
functionality
© 2014 MapR Technologies 38
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
Hadoop achieves much higher
scalability by trading away
essentially all of this compatibility
Evolution of Data Storage
© 2014 MapR Technologies 39
Evolution of Data Storage
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
MapR enhanced Apache Hadoop by
restoring the compatibility while
increasing scalability and performance
Functionality
Compatibility
Scalability
POSIX
© 2014 MapR Technologies 40
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
Evolution of Data Storage
Adding converged tables and streams
enhances the functionality of the base
file system
© 2014 MapR Technologies 41
http://bit.ly/fastest-big-data
© 2014 MapR Technologies 42
How we do this with MapR
• MapR Streams is a C++ reimplementation of Kafka API
– Advantages in predictability, performance, scale
– Common security and permissions with entire MapR converged data
platform
• Semantic extensions
– A cluster contains volumes, files, tables … and now streams
– Streams contain topics
– Can have default stream or can name stream by path name
• Core MapR capabilities preserved
– Consistent snapshots, mirrors, multi-master replication
© 2014 MapR Technologies 43
MapR core Innovations
• Volumes
– Distributed management
– Data placement
• Read/write random access file system
– Allows distributed meta-data
– Improved scaling
– Enables NFS access
• Application-level NIC bonding
• Transactionally correct snapshots and mirrors
© 2014 MapR Technologies 44
MapR's Containers
 Each container contains
 Directories & files
 Data blocks
 Replicated on servers
 No need to manage
directly
Files/directories are sharded into blocks, which
are placed into containers on disks
Containers are 16-
32 GB segments of
disk, placed on
nodes
© 2014 MapR Technologies 45
MapR's Containers
 Each container has a
replication chain
 Updates are transactional
 Failures are handled by
rearranging replication
© 2014 MapR Technologies 46
Container locations and replication
CLDB
N1, N2
N3, N2
N1, N2
N1, N3
N3, N2
N1
N2
N3Container location database
(CLDB) keeps track of nodes
hosting each container and
replication chain order
© 2014 MapR Technologies 47
MapR Scaling
Containers represent 16 - 32GB of data
 Each can hold up to 1 Billion files and directories
 100M containers = ~ 2 Exabytes (a very large cluster)
250 bytes DRAM to cache a container
 25GB to cache all containers for 2EB cluster
 But not necessary, can page to disk
 Typical large 10PB cluster needs 2GB
Container-reports are 100x - 1000x < HDFS block-reports
 Serve 100x more data-nodes
 Increase container size to 64G to serve 4EB cluster
© 2014 MapR Technologies 48
But Wait, There’s More
• Directories and files are implemented in terms of B-trees
– Key is offset, value is data blob
– Internal transactional semantics guarantees safety and consistency
– Layout algorithms give very high layout linearization
• Tables are implemented in terms of B-trees
– Twisted B-tree implementation allows virtues of log-structured merge
tree without the compaction delays
– Tablet splitting without pausing, integration with file system transactions
• Common security and permissions scheme
© 2014 MapR Technologies 49
And More …
• Streams are implemented in terms of B-trees as well
– Topics and consumer offsets are kept in stream, not ZK
– Similar splitting technology as MapR DB tables
– Consistent permissions, security, data replication
• Standard Kafka 0.9 API
• Plans to add OJAI for high-level structuring
• Performance is very high
© 2014 MapR Technologies 50
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2014 MapR Technologies 51
Cluster
Volume mount point
© 2014 MapR Technologies 52
Lessons
• API’s matter more than implementations
• There is plenty of room to innovate ahead of the community
• Posix, HDFS, HBASE all define useful API’s
• Kafka 0.9+ also defines a useful and broadly adopted API
© 2014 MapR Technologies 53
Call to action:
Require convergence
© 2014 MapR Technologies 54
© 2014 MapR Technologies 55
Short Books by Ted Dunning & Ellen Friedman
• Published by O’Reilly in 2014 - 2016
• For sale from Amazon or O’Reilly
• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/recommend
ation-ebook
© 2014 MapR Technologies 56
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book
signing today
http://bit.ly/mapr-ebook-streams
© 2014 MapR Technologies 57
Thank You!
© 2014 MapR Technologies 58
Q&A
@mapr maprtech
tdunning@maprtech.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...MapR Technologies
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceDataWorks Summit/Hadoop Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 

What's hot (20)

Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Smart Cities: An APAC Necessity
Smart Cities: An APAC Necessity Smart Cities: An APAC Necessity
Smart Cities: An APAC Necessity
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 

Viewers also liked

Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeAndre Langevin
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data ScienceVMware Tanzu
 
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2VMware Tanzu
 
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireSpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireJay Lee
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow VMware Tanzu
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?VMware Tanzu
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the MonolithVMware Tanzu
 
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization TransformationVMware Tanzu
 
Cloud foundry architecture and deep dive
Cloud foundry architecture and deep diveCloud foundry architecture and deep dive
Cloud foundry architecture and deep diveAnimesh Singh
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native JourneyVMware Tanzu
 
Cloud foundry presentation
Cloud foundry presentation Cloud foundry presentation
Cloud foundry presentation Vivek Parihar
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewVMware Tanzu
 
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)VMware Tanzu
 

Viewers also liked (20)

Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data Science
 
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireSpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
 
Cloud foundry architecture and deep dive
Cloud foundry architecture and deep diveCloud foundry architecture and deep dive
Cloud foundry architecture and deep dive
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native Journey
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Cloud foundry presentation
Cloud foundry presentation Cloud foundry presentation
Cloud foundry presentation
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical Overview
 
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
 

Similar to Keys for Success from Streams to Queries

Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down InternetMapR Technologies
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15MapR Technologies
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningMapR Technologies
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftFlink Forward
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 

Similar to Keys for Success from Streams to Queries (20)

Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Zeta architecture -2015
Zeta architecture -2015Zeta architecture -2015
Zeta architecture -2015
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Keys for Success from Streams to Queries

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning
  • 3. © 2014 MapR Technologies 3 Goals • Real-time or near-time – Includes situations with deadlines – Also includes situations where delay is simply undesirable – Even includes situations where delay is just fine • Micro-services – Streaming is a convenient idiom for design – Micro-services … you know we wanted it – Service isolation is a key requirement
  • 4. © 2014 MapR Technologies 4 Real-time or Near-time? • The real point is flow versus state (catch me later today) • One consequence of flow-based computing is real-time and near-time become relatively easy • Life may be a bitch, but it doesn’t happen in batches!
  • 5. © 2014 MapR Technologies 6 Agenda • Background / micro-services • Global requirements • Scale
  • 6. © 2014 MapR Technologies 7 A microservice is loosely coupled with bounded context
  • 7. © 2014 MapR Technologies 8 How to Couple Services and Break micro-ness • Shared schemas, relational stores • Ad hoc communication between services • Enterprise service busses • Brittle protocols • Poor protocol versioning • Single PoF schema repositories Don’t do this!
  • 8. © 2014 MapR Technologies 9 How to Decouple Services • Use self-describing data • Private database tables • Infrastructural communication between services • Use modern protocols • Adopt future-proof protocol practices • Use shared storage only where necessary due to scale
  • 9. © 2014 MapR Technologies 11 What is the Right Structure for Flow Compute? • Traditional message queues? – Message queues are classic answer – Key feature/bug is out-of-order acknowledgement – Many implementations – You pay a huge performance hit for persistence • Kafka-esque Logs? – Logs are like queues, but with ordering – Out of order consumption is possible, acknowledgement not so much – Canonical base implementation is Kafka – Performance plus persistence
  • 10. © 2014 MapR Technologies 12 Scenarios Profile Database
  • 11. © 2014 MapR Technologies 13 The task ? POS 1 location, t, card # yes/no? POS 2 location, t, card # yes/no?
  • 12. © 2014 MapR Technologies 14 Traditional Solution POS 1..n Fraud detector Last card use
  • 13. © 2014 MapR Technologies 15 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 14. © 2014 MapR Technologies 16 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 15. © 2014 MapR Technologies 17 How to Get Service Isolation POS 1..n Fraud detector Last card use Updater card activity
  • 16. © 2014 MapR Technologies 18 New Uses of Data POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  • 17. © 2014 MapR Technologies 19 Scaling Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 18. © 2014 MapR Technologies 20 Lessons • De-coupling and isolation are key • Private data stores/tables are important, – But node local storage of private data is a bug • Propagate events, not table updates • Note that tables and streams should be as easy as files – It should not be necessary to provision a cluster to get them
  • 19. © 2014 MapR Technologies 21 Scenarios IoT Data Aggregation
  • 20. © 2014 MapR Technologies 22 Basic Situation Each location has many pumps pump data Multiple locations
  • 21. © 2014 MapR Technologies 23 What Does a Pump Look Like inlet out let m ot or Temperature Pressure Flow Temperature Pressure Flow Winding temperature Voltage Current
  • 22. © 2014 MapR Technologies 24 Basic Situation Each location has many pumps pump data Multiple locations
  • 23. © 2014 MapR Technologies 25 pump data pump data pump data pump data Basic Architecture Reflects Business Structure
  • 24. © 2014 MapR Technologies 26 Lessons • Data architecture should reflect business structure • Even very modest designs involve multiple data centers • Schemas cannot be frozen in the real world • Security must follow data ownership
  • 25. © 2014 MapR Technologies 27 Scenarios Global Data Recovery
  • 26. © 2014 MapR Technologies 28 Tokyo Corporate HQ
  • 27. © 2014 MapR Technologies 29 Singapore Tokyo Corporate HQ
  • 28. © 2014 MapR Technologies 30 Singapore Tokyo Corporate HQ
  • 29. © 2014 MapR Technologies 31 Singapore Tokyo Corporate HQ
  • 30. © 2014 MapR Technologies 32 Lessons • Streams and tables are primitive building blocks • Arbitrary number of topics important for simplicity + performance • Updates happen in many places • Mobility implies change in replication patterns • Multi-master updates simplify design massively
  • 31. © 2014 MapR Technologies 33 Converged Requirements
  • 32. © 2014 MapR Technologies 34 What Have We Learned? • Need persistence and performance – Possibly for years and to 100’s of millions t/s • Must have convergence – Need files, tables AND streams – Need volumes, snapshots, mirrors, permissions and … • Must have platform security – Cannot depend on perimeter – Must follow business structure • Must have global scale and scope – Millions of topics for natural designs – Multi-master replication and update
  • 33. © 2014 MapR Technologies 35 The Importance of Common API’s • Commonality and interoperability are critical – Compare Hadoop eco-system and the noSQL world • Table stakes – Persistence – Performance – Polymorphism • Major trend so far is to adopt Kafka API – 0.9 API and beyond remove major abstraction leaks – Kafka API supported by all major Hadoop vendors
  • 34. © 2014 MapR Technologies 36 What we do at MapR
  • 35. © 2014 MapR Technologies 37 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Over decades of progress, Unix-based systems have set the standard for compatibility and functionality
  • 36. © 2014 MapR Technologies 38 Functionality Compatibility Scalability Linux POSIX Hadoop Hadoop achieves much higher scalability by trading away essentially all of this compatibility Evolution of Data Storage
  • 37. © 2014 MapR Technologies 39 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Hadoop MapR enhanced Apache Hadoop by restoring the compatibility while increasing scalability and performance Functionality Compatibility Scalability POSIX
  • 38. © 2014 MapR Technologies 40 Functionality Compatibility Scalability Linux POSIX Hadoop Evolution of Data Storage Adding converged tables and streams enhances the functionality of the base file system
  • 39. © 2014 MapR Technologies 41 http://bit.ly/fastest-big-data
  • 40. © 2014 MapR Technologies 42 How we do this with MapR • MapR Streams is a C++ reimplementation of Kafka API – Advantages in predictability, performance, scale – Common security and permissions with entire MapR converged data platform • Semantic extensions – A cluster contains volumes, files, tables … and now streams – Streams contain topics – Can have default stream or can name stream by path name • Core MapR capabilities preserved – Consistent snapshots, mirrors, multi-master replication
  • 41. © 2014 MapR Technologies 43 MapR core Innovations • Volumes – Distributed management – Data placement • Read/write random access file system – Allows distributed meta-data – Improved scaling – Enables NFS access • Application-level NIC bonding • Transactionally correct snapshots and mirrors
  • 42. © 2014 MapR Technologies 44 MapR's Containers  Each container contains  Directories & files  Data blocks  Replicated on servers  No need to manage directly Files/directories are sharded into blocks, which are placed into containers on disks Containers are 16- 32 GB segments of disk, placed on nodes
  • 43. © 2014 MapR Technologies 45 MapR's Containers  Each container has a replication chain  Updates are transactional  Failures are handled by rearranging replication
  • 44. © 2014 MapR Technologies 46 Container locations and replication CLDB N1, N2 N3, N2 N1, N2 N1, N3 N3, N2 N1 N2 N3Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
  • 45. © 2014 MapR Technologies 47 MapR Scaling Containers represent 16 - 32GB of data  Each can hold up to 1 Billion files and directories  100M containers = ~ 2 Exabytes (a very large cluster) 250 bytes DRAM to cache a container  25GB to cache all containers for 2EB cluster  But not necessary, can page to disk  Typical large 10PB cluster needs 2GB Container-reports are 100x - 1000x < HDFS block-reports  Serve 100x more data-nodes  Increase container size to 64G to serve 4EB cluster
  • 46. © 2014 MapR Technologies 48 But Wait, There’s More • Directories and files are implemented in terms of B-trees – Key is offset, value is data blob – Internal transactional semantics guarantees safety and consistency – Layout algorithms give very high layout linearization • Tables are implemented in terms of B-trees – Twisted B-tree implementation allows virtues of log-structured merge tree without the compaction delays – Tablet splitting without pausing, integration with file system transactions • Common security and permissions scheme
  • 47. © 2014 MapR Technologies 49 And More … • Streams are implemented in terms of B-trees as well – Topics and consumer offsets are kept in stream, not ZK – Similar splitting technology as MapR DB tables – Consistent permissions, security, data replication • Standard Kafka 0.9 API • Plans to add OJAI for high-level structuring • Performance is very high
  • 48. © 2014 MapR Technologies 50 Example Files Table Streams Directories Cluster Volume mount point
  • 49. © 2014 MapR Technologies 51 Cluster Volume mount point
  • 50. © 2014 MapR Technologies 52 Lessons • API’s matter more than implementations • There is plenty of room to innovate ahead of the community • Posix, HDFS, HBASE all define useful API’s • Kafka 0.9+ also defines a useful and broadly adopted API
  • 51. © 2014 MapR Technologies 53 Call to action: Require convergence
  • 52. © 2014 MapR Technologies 54
  • 53. © 2014 MapR Technologies 55 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 - 2016 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
  • 54. © 2014 MapR Technologies 56 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today http://bit.ly/mapr-ebook-streams
  • 55. © 2014 MapR Technologies 57 Thank You!
  • 56. © 2014 MapR Technologies 58 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies