Submit Search
Upload
Keys for Success from Streams to Queries
•
Download as PPTX, PDF
•
0 likes
•
529 views
DataWorks Summit/Hadoop Summit
Follow
Keys for Success from Streams to Queries
Read less
Read more
Technology
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 56
Download now
Recommended
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
Log I am your father
Log I am your father
DataWorks Summit/Hadoop Summit
Big data at United Airlines
Big data at United Airlines
DataWorks Summit
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
Depositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske Bank
DataWorks Summit/Hadoop Summit
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Admiral Group
Admiral Group
DataWorks Summit/Hadoop Summit
Recommended
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
Log I am your father
Log I am your father
DataWorks Summit/Hadoop Summit
Big data at United Airlines
Big data at United Airlines
DataWorks Summit
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
Depositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske Bank
DataWorks Summit/Hadoop Summit
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Admiral Group
Admiral Group
DataWorks Summit/Hadoop Summit
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
DataWorks Summit/Hadoop Summit
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
DataWorks Summit
Big Data at your Desk with KNIME
Big Data at your Desk with KNIME
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
MapR Technologies
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
DataWorks Summit
Smart Cities: An APAC Necessity
Smart Cities: An APAC Necessity
DataWorks Summit/Hadoop Summit
Filling the Data Lake
Filling the Data Lake
DataWorks Summit/Hadoop Summit
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
DataWorks Summit/Hadoop Summit
Scaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
DataWorks Summit/Hadoop Summit
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
DataWorks Summit/Hadoop Summit
Data-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
Solving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
Andre Langevin
More Related Content
What's hot
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
DataWorks Summit/Hadoop Summit
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
DataWorks Summit
Big Data at your Desk with KNIME
Big Data at your Desk with KNIME
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
MapR Technologies
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
DataWorks Summit
Smart Cities: An APAC Necessity
Smart Cities: An APAC Necessity
DataWorks Summit/Hadoop Summit
Filling the Data Lake
Filling the Data Lake
DataWorks Summit/Hadoop Summit
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
DataWorks Summit/Hadoop Summit
Scaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
DataWorks Summit/Hadoop Summit
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
DataWorks Summit/Hadoop Summit
Data-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
Solving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
What's hot
(20)
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Big Data at your Desk with KNIME
Big Data at your Desk with KNIME
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
Smart Cities: An APAC Necessity
Smart Cities: An APAC Necessity
Filling the Data Lake
Filling the Data Lake
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
Scaling Data Science on Big Data
Scaling Data Science on Big Data
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
Data-In-Motion Unleashed
Data-In-Motion Unleashed
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
Solving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Viewers also liked
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
Andre Langevin
Driving Real Insights Through Data Science
Driving Real Insights Through Data Science
VMware Tanzu
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2
VMware Tanzu
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
Jay Lee
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Workload Automation + Hadoop?
Workload Automation + Hadoop?
DataWorks Summit/Hadoop Summit
SQL and Search with Spark in your browser
SQL and Search with Spark in your browser
DataWorks Summit/Hadoop Summit
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
VMware Tanzu
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?
VMware Tanzu
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
Breaking the Monolith
Breaking the Monolith
VMware Tanzu
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
VMware Tanzu
Cloud foundry architecture and deep dive
Cloud foundry architecture and deep dive
Animesh Singh
The Cloud Native Journey
The Cloud Native Journey
VMware Tanzu
Producing Spark on YARN for ETL
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
Cloud foundry presentation
Cloud foundry presentation
Vivek Parihar
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical Overview
VMware Tanzu
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
VMware Tanzu
Viewers also liked
(20)
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
Driving Real Insights Through Data Science
Driving Real Insights Through Data Science
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Workload Automation + Hadoop?
Workload Automation + Hadoop?
SQL and Search with Spark in your browser
SQL and Search with Spark in your browser
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Breaking the Monolith
Breaking the Monolith
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
Cloud foundry architecture and deep dive
Cloud foundry architecture and deep dive
The Cloud Native Journey
The Cloud Native Journey
Producing Spark on YARN for ETL
Producing Spark on YARN for ETL
Cloud foundry presentation
Cloud foundry presentation
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical Overview
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
Cloud Foundry Compared With Other PaaSes (Cloud Foundry Summit 2014)
Similar to Keys for Success from Streams to Queries
Real time-hadoop
Real time-hadoop
Ted Dunning
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
Next Generation Enterprise Architecture
Next Generation Enterprise Architecture
MapR Technologies
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
DataWorks Summit
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
John Mulhall
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15
MapR Technologies
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
MapR Technologies
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
MapR Technologies
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
NoSQLmatters
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
John Berns
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
Zeta architecture -2015
Zeta architecture -2015
MapR Technologies
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
Flink Forward
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Chris Fregly
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
Similar to Keys for Success from Streams to Queries
(20)
Real time-hadoop
Real time-hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
Next Generation Enterprise Architecture
Next Generation Enterprise Architecture
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Dunning time-series-2015
Dunning time-series-2015
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Zeta architecture -2015
Zeta architecture -2015
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Recently uploaded
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Lars Bell
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dilum Bandara
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Recently uploaded
(20)
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Keys for Success from Streams to Queries
1.
© 2014 MapR
Technologies 1© 2014 MapR Technologies
2.
© 2014 MapR
Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning
3.
© 2014 MapR
Technologies 3 Goals • Real-time or near-time – Includes situations with deadlines – Also includes situations where delay is simply undesirable – Even includes situations where delay is just fine • Micro-services – Streaming is a convenient idiom for design – Micro-services … you know we wanted it – Service isolation is a key requirement
4.
© 2014 MapR
Technologies 4 Real-time or Near-time? • The real point is flow versus state (catch me later today) • One consequence of flow-based computing is real-time and near-time become relatively easy • Life may be a bitch, but it doesn’t happen in batches!
5.
© 2014 MapR
Technologies 6 Agenda • Background / micro-services • Global requirements • Scale
6.
© 2014 MapR
Technologies 7 A microservice is loosely coupled with bounded context
7.
© 2014 MapR
Technologies 8 How to Couple Services and Break micro-ness • Shared schemas, relational stores • Ad hoc communication between services • Enterprise service busses • Brittle protocols • Poor protocol versioning • Single PoF schema repositories Don’t do this!
8.
© 2014 MapR
Technologies 9 How to Decouple Services • Use self-describing data • Private database tables • Infrastructural communication between services • Use modern protocols • Adopt future-proof protocol practices • Use shared storage only where necessary due to scale
9.
© 2014 MapR
Technologies 11 What is the Right Structure for Flow Compute? • Traditional message queues? – Message queues are classic answer – Key feature/bug is out-of-order acknowledgement – Many implementations – You pay a huge performance hit for persistence • Kafka-esque Logs? – Logs are like queues, but with ordering – Out of order consumption is possible, acknowledgement not so much – Canonical base implementation is Kafka – Performance plus persistence
10.
© 2014 MapR
Technologies 12 Scenarios Profile Database
11.
© 2014 MapR
Technologies 13 The task ? POS 1 location, t, card # yes/no? POS 2 location, t, card # yes/no?
12.
© 2014 MapR
Technologies 14 Traditional Solution POS 1..n Fraud detector Last card use
13.
© 2014 MapR
Technologies 15 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
14.
© 2014 MapR
Technologies 16 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
15.
© 2014 MapR
Technologies 17 How to Get Service Isolation POS 1..n Fraud detector Last card use Updater card activity
16.
© 2014 MapR
Technologies 18 New Uses of Data POS 1..n Fraud detector Last card use Updater Card location history Other card activity
17.
© 2014 MapR
Technologies 19 Scaling Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
18.
© 2014 MapR
Technologies 20 Lessons • De-coupling and isolation are key • Private data stores/tables are important, – But node local storage of private data is a bug • Propagate events, not table updates • Note that tables and streams should be as easy as files – It should not be necessary to provision a cluster to get them
19.
© 2014 MapR
Technologies 21 Scenarios IoT Data Aggregation
20.
© 2014 MapR
Technologies 22 Basic Situation Each location has many pumps pump data Multiple locations
21.
© 2014 MapR
Technologies 23 What Does a Pump Look Like inlet out let m ot or Temperature Pressure Flow Temperature Pressure Flow Winding temperature Voltage Current
22.
© 2014 MapR
Technologies 24 Basic Situation Each location has many pumps pump data Multiple locations
23.
© 2014 MapR
Technologies 25 pump data pump data pump data pump data Basic Architecture Reflects Business Structure
24.
© 2014 MapR
Technologies 26 Lessons • Data architecture should reflect business structure • Even very modest designs involve multiple data centers • Schemas cannot be frozen in the real world • Security must follow data ownership
25.
© 2014 MapR
Technologies 27 Scenarios Global Data Recovery
26.
© 2014 MapR
Technologies 28 Tokyo Corporate HQ
27.
© 2014 MapR
Technologies 29 Singapore Tokyo Corporate HQ
28.
© 2014 MapR
Technologies 30 Singapore Tokyo Corporate HQ
29.
© 2014 MapR
Technologies 31 Singapore Tokyo Corporate HQ
30.
© 2014 MapR
Technologies 32 Lessons • Streams and tables are primitive building blocks • Arbitrary number of topics important for simplicity + performance • Updates happen in many places • Mobility implies change in replication patterns • Multi-master updates simplify design massively
31.
© 2014 MapR
Technologies 33 Converged Requirements
32.
© 2014 MapR
Technologies 34 What Have We Learned? • Need persistence and performance – Possibly for years and to 100’s of millions t/s • Must have convergence – Need files, tables AND streams – Need volumes, snapshots, mirrors, permissions and … • Must have platform security – Cannot depend on perimeter – Must follow business structure • Must have global scale and scope – Millions of topics for natural designs – Multi-master replication and update
33.
© 2014 MapR
Technologies 35 The Importance of Common API’s • Commonality and interoperability are critical – Compare Hadoop eco-system and the noSQL world • Table stakes – Persistence – Performance – Polymorphism • Major trend so far is to adopt Kafka API – 0.9 API and beyond remove major abstraction leaks – Kafka API supported by all major Hadoop vendors
34.
© 2014 MapR
Technologies 36 What we do at MapR
35.
© 2014 MapR
Technologies 37 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Over decades of progress, Unix-based systems have set the standard for compatibility and functionality
36.
© 2014 MapR
Technologies 38 Functionality Compatibility Scalability Linux POSIX Hadoop Hadoop achieves much higher scalability by trading away essentially all of this compatibility Evolution of Data Storage
37.
© 2014 MapR
Technologies 39 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Hadoop MapR enhanced Apache Hadoop by restoring the compatibility while increasing scalability and performance Functionality Compatibility Scalability POSIX
38.
© 2014 MapR
Technologies 40 Functionality Compatibility Scalability Linux POSIX Hadoop Evolution of Data Storage Adding converged tables and streams enhances the functionality of the base file system
39.
© 2014 MapR
Technologies 41 http://bit.ly/fastest-big-data
40.
© 2014 MapR
Technologies 42 How we do this with MapR • MapR Streams is a C++ reimplementation of Kafka API – Advantages in predictability, performance, scale – Common security and permissions with entire MapR converged data platform • Semantic extensions – A cluster contains volumes, files, tables … and now streams – Streams contain topics – Can have default stream or can name stream by path name • Core MapR capabilities preserved – Consistent snapshots, mirrors, multi-master replication
41.
© 2014 MapR
Technologies 43 MapR core Innovations • Volumes – Distributed management – Data placement • Read/write random access file system – Allows distributed meta-data – Improved scaling – Enables NFS access • Application-level NIC bonding • Transactionally correct snapshots and mirrors
42.
© 2014 MapR
Technologies 44 MapR's Containers Each container contains Directories & files Data blocks Replicated on servers No need to manage directly Files/directories are sharded into blocks, which are placed into containers on disks Containers are 16- 32 GB segments of disk, placed on nodes
43.
© 2014 MapR
Technologies 45 MapR's Containers Each container has a replication chain Updates are transactional Failures are handled by rearranging replication
44.
© 2014 MapR
Technologies 46 Container locations and replication CLDB N1, N2 N3, N2 N1, N2 N1, N3 N3, N2 N1 N2 N3Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
45.
© 2014 MapR
Technologies 47 MapR Scaling Containers represent 16 - 32GB of data Each can hold up to 1 Billion files and directories 100M containers = ~ 2 Exabytes (a very large cluster) 250 bytes DRAM to cache a container 25GB to cache all containers for 2EB cluster But not necessary, can page to disk Typical large 10PB cluster needs 2GB Container-reports are 100x - 1000x < HDFS block-reports Serve 100x more data-nodes Increase container size to 64G to serve 4EB cluster
46.
© 2014 MapR
Technologies 48 But Wait, There’s More • Directories and files are implemented in terms of B-trees – Key is offset, value is data blob – Internal transactional semantics guarantees safety and consistency – Layout algorithms give very high layout linearization • Tables are implemented in terms of B-trees – Twisted B-tree implementation allows virtues of log-structured merge tree without the compaction delays – Tablet splitting without pausing, integration with file system transactions • Common security and permissions scheme
47.
© 2014 MapR
Technologies 49 And More … • Streams are implemented in terms of B-trees as well – Topics and consumer offsets are kept in stream, not ZK – Similar splitting technology as MapR DB tables – Consistent permissions, security, data replication • Standard Kafka 0.9 API • Plans to add OJAI for high-level structuring • Performance is very high
48.
© 2014 MapR
Technologies 50 Example Files Table Streams Directories Cluster Volume mount point
49.
© 2014 MapR
Technologies 51 Cluster Volume mount point
50.
© 2014 MapR
Technologies 52 Lessons • API’s matter more than implementations • There is plenty of room to innovate ahead of the community • Posix, HDFS, HBASE all define useful API’s • Kafka 0.9+ also defines a useful and broadly adopted API
51.
© 2014 MapR
Technologies 53 Call to action: Require convergence
52.
© 2014 MapR
Technologies 54
53.
© 2014 MapR
Technologies 55 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 - 2016 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
54.
© 2014 MapR
Technologies 56 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today http://bit.ly/mapr-ebook-streams
55.
© 2014 MapR
Technologies 57 Thank You!
56.
© 2014 MapR
Technologies 58 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies
Download now