SlideShare a Scribd company logo
1 of 21
Download to read offline
Real-Time Predictions
H2O // Storm
H2
O.ai
Spencer Aiello
spencer@h2o.ai
Jan 15, 2015
H2
O.aiOverview:
● Introductions
● Real Time Analytics
● The Speed of Information
● The Analytics Workflow
● H2O // Storm
● Demo
H2
O.ai
Real Time Analytics: Then & Now
1930 - 1940s
Kerrison Predictor
ENIAC - Weather Modeling
(pseudo real time)
1950s
Real Time Analytics
to Fight Fraud
1990s
Traffic
Management
Dynamic
Pricing
Shopping & Movie
Recommendations
1970s
Real Time
Roulette Wheel
Prediction With A
Computer In A
Shoe
H2
O.ai
The Speed of Information
Factors to consider:
● Speed of Light
○ 3x108
m/s
● Infrastructure
○ Line-of-sight relays
○ Submarine Cables
○ Where is the information coming from?
○ Where is it going?
○ Lossless?
● Power Consumption
○ Efficiency
● Amount of Information
○ Bandwidth considerations (impacts infrastructure)
○ How quickly can you schlepp around 1TB? 1PB?
■ How quickly do you _need_ to do that?
■ I.e., are you making efficient use of resources?
H2
O.ai
The Shannon Limit:
Sup({ Bounds on bits/s })
- C = Channel Capacity (bits/s)
- B = Bandwidth (Hz)
- S = Signal in Joules/s (Watts)
- N = Noise in Joules/s (Watts)
The Speed of Information
H2
O.ai
The Speed of Information
Consider: The Warning Beacons of Gondor
7 beacons (13 in the movie)
Probably 1 cord of wood (~3.6 m3
)
1 bit of information (@ Shannon Limit)
optical transmission
Compare to the current World Record:
1 Petabit / second Fiber Transmission over 50-km
(~5,000 HDTV Videos/Second over single fiber)
About 25 orders of magnitude difference!
(source: http://www.ntt.co.jp/news2012/1209e/120920a.html)
H2
O.ai
The Speed of Information
AT&T “Long Lines”:
● 838 mile route connecting Chicago to New York
● 4GHz microwave line-of-sight radio relays
● ~25 miles separation (due to curvature of the Earth)
● 34 hops in all
High Frequency Trading (HFT):
● Light propagation delays between distant points are relevant
sources:
- Relativistic Statistical Arbitrage (http://www.alexwg.org/publications/PhysRevE_82-056104.pdf)
- Information Transmission Between Financial Markets in Chicago and New York (http://arxiv.org/pdf/1302.
5966v1.pdf)
H2
O.ai
The Speed of Information
Observations:
● Moving bits around is a big deal!
● ∃ insurmountable physical and theoretical limitations
○ Shannon Limit
○ Speed of Light
○ Landauer’s Principle
○ Relativistic Effects
○ Curvature of the Earth
● Other limitations or complications?
○ Hairpinning: Non-optimal routing to far flung nodes
■ Geographic locality ≠ Internet locality
○ Bad hardware
○ Bad software
H2
O.ai
(n.d.). Retrieved from http://www.us.ntt.net/support/looking-glass/
(n.d.). Retrieved from http://www.submarinecablemap.com/
The Speed of Information
H2
O.ai
The Analytics Workflow
The Analytics Process:
1. Define your problem
2. Gather data and explore
3. Prepare your data for modeling
4. Modeling
5. Model Validation
6. Implementation & Tracking
H2
O.ai
The Analytics Workflow
The Analytics Process:
1. Define your problem
2. Gather data and explore
3. Prepare your data for modeling
4. Modeling
5. Model Validation
6. Implementation & Tracking
} Here’s where H2O fits into the analytics process
http://learn.h2o.ai/content/
H2
O.ai
The Analytics Workflow
:::Prep:::
Data Preparation:
● A sequence of transformations applied to your data
● This step will define your Storm topology
● Take raw information and give it structure
H2
O.ai
The Analytics Workflow
:::Modeling:::
Questions to ask yourself:
● How fast must a scoring engine classify incoming tuples?
● How do I optimize between scoring latency and predictive power?
● E.g.What are the trade-offs between a GLM and a GBM?
Science!
H2
O.ai
The Analytics Workflow
:::Validation:::
Types of Validation:
● N-fold cross validation
● Train/Validate/Test -- What Features are Important?
● Model Comparison -- Does your model optimize all needs?
○ Business needs
○ Resource needs
● Repeat steps 3 - 5 until satisfied
H2
O.ai
The Analytics Workflow
:::Validation:::
Types of Validation:
● N-fold cross validation
● Train/Validate/Test -- What Features are Important?
● Model Comparison -- Does your model optimize all needs?
○ Business needs
○ Resource needs
● Repeat steps 3 - 5 until satisfied
WRONG: You should never be satisfied!
Your model will go out of date (if it hasn’t already)!
H2
O.ai
The Analytics Workflow
:::Tracking:::
An Extension of Validation:
● Do not open the fire-hose and blast your model with 100% of your data
○ Expect the unexpected
○ Your topology might will break (oops forgot about unicode… derp)
○ Start off with 10% and ramp up; course-correct along the way
● Perform batch modeling in off-peak hours (Jenkins never sleeps)
● Models should be replaced “gradually”
H2
O.ai
H2O // Storm
H2
O.ai
H2O // Storm
For a complete tutorial please visit:
http://learn.h2o.ai/content/demos/streaming_data.html
H2
O.ai
Use H2O
Awesome
H2
O.ai
Thank you!
H2
O.ai
DEMO
http://learn.h2o.ai/content/demos/streaming_data.html

More Related Content

What's hot

Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases Big Data Pulse
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniDonatella Cambosu
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligenceManish Jain
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTPrasant Misra
 
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...BAINIDA
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Big Data Spain
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)Swiss Big Data User Group
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analyticsMJ Xavier
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...Andrew Clark
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsRavi Teja
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Data Science Thailand
 
Real time analytics of big data
Real time analytics of big dataReal time analytics of big data
Real time analytics of big dataDeependra Jyoti
 
An Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceAn Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceWesley Eldridge
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBernard Marr
 

What's hot (20)

Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoT
 
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
Real time analytics of big data
Real time analytics of big dataReal time analytics of big data
Real time analytics of big data
 
An Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceAn Obligatory Introduction to Data Science
An Obligatory Introduction to Data Science
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 

Similar to H2o storm

Adding Velocity to BigBench
Adding Velocity to BigBenchAdding Velocity to BigBench
Adding Velocity to BigBencht_ivanov
 
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...DataBench
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...StormForge .io
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalJoachim Draeger
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles Sonigo
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
IPLC Analytic Dashboard - Mohd Rizal bin Mohd Ramly
IPLC Analytic Dashboard - Mohd Rizal bin Mohd RamlyIPLC Analytic Dashboard - Mohd Rizal bin Mohd Ramly
IPLC Analytic Dashboard - Mohd Rizal bin Mohd RamlyMyNOG
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
Data Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DCData Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DCCharlie Reverte
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
Keeping the Internet Fast and Resilient for You and Your Customers
Keeping the Internet Fast and Resilient for You and Your CustomersKeeping the Internet Fast and Resilient for You and Your Customers
Keeping the Internet Fast and Resilient for You and Your CustomersCloudflare
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Seattle Apache Flink Meetup
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...Bowen Li
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
Container world 2019 Canary Release
Container world 2019 Canary ReleaseContainer world 2019 Canary Release
Container world 2019 Canary ReleaseBilly Yuen
 

Similar to H2o storm (20)

Adding Velocity to BigBench
Adding Velocity to BigBenchAdding Velocity to BigBench
Adding Velocity to BigBench
 
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
IPLC Analytic Dashboard - Mohd Rizal bin Mohd Ramly
IPLC Analytic Dashboard - Mohd Rizal bin Mohd RamlyIPLC Analytic Dashboard - Mohd Rizal bin Mohd Ramly
IPLC Analytic Dashboard - Mohd Rizal bin Mohd Ramly
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
Data Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DCData Lessons Learned at Scale - Big Data DC
Data Lessons Learned at Scale - Big Data DC
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Keeping the Internet Fast and Resilient for You and Your Customers
Keeping the Internet Fast and Resilient for You and Your CustomersKeeping the Internet Fast and Resilient for You and Your Customers
Keeping the Internet Fast and Resilient for You and Your Customers
 
Sensing the world with data of things
Sensing the world with  data of thingsSensing the world with  data of things
Sensing the world with data of things
 
Sensing the world with Data of Things
Sensing the world with Data of ThingsSensing the world with Data of Things
Sensing the world with Data of Things
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Container world 2019 Canary Release
Container world 2019 Canary ReleaseContainer world 2019 Canary Release
Container world 2019 Canary Release
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

H2o storm

  • 1. Real-Time Predictions H2O // Storm H2 O.ai Spencer Aiello spencer@h2o.ai Jan 15, 2015
  • 2. H2 O.aiOverview: ● Introductions ● Real Time Analytics ● The Speed of Information ● The Analytics Workflow ● H2O // Storm ● Demo
  • 3. H2 O.ai Real Time Analytics: Then & Now 1930 - 1940s Kerrison Predictor ENIAC - Weather Modeling (pseudo real time) 1950s Real Time Analytics to Fight Fraud 1990s Traffic Management Dynamic Pricing Shopping & Movie Recommendations 1970s Real Time Roulette Wheel Prediction With A Computer In A Shoe
  • 4. H2 O.ai The Speed of Information Factors to consider: ● Speed of Light ○ 3x108 m/s ● Infrastructure ○ Line-of-sight relays ○ Submarine Cables ○ Where is the information coming from? ○ Where is it going? ○ Lossless? ● Power Consumption ○ Efficiency ● Amount of Information ○ Bandwidth considerations (impacts infrastructure) ○ How quickly can you schlepp around 1TB? 1PB? ■ How quickly do you _need_ to do that? ■ I.e., are you making efficient use of resources?
  • 5. H2 O.ai The Shannon Limit: Sup({ Bounds on bits/s }) - C = Channel Capacity (bits/s) - B = Bandwidth (Hz) - S = Signal in Joules/s (Watts) - N = Noise in Joules/s (Watts) The Speed of Information
  • 6. H2 O.ai The Speed of Information Consider: The Warning Beacons of Gondor 7 beacons (13 in the movie) Probably 1 cord of wood (~3.6 m3 ) 1 bit of information (@ Shannon Limit) optical transmission Compare to the current World Record: 1 Petabit / second Fiber Transmission over 50-km (~5,000 HDTV Videos/Second over single fiber) About 25 orders of magnitude difference! (source: http://www.ntt.co.jp/news2012/1209e/120920a.html)
  • 7. H2 O.ai The Speed of Information AT&T “Long Lines”: ● 838 mile route connecting Chicago to New York ● 4GHz microwave line-of-sight radio relays ● ~25 miles separation (due to curvature of the Earth) ● 34 hops in all High Frequency Trading (HFT): ● Light propagation delays between distant points are relevant sources: - Relativistic Statistical Arbitrage (http://www.alexwg.org/publications/PhysRevE_82-056104.pdf) - Information Transmission Between Financial Markets in Chicago and New York (http://arxiv.org/pdf/1302. 5966v1.pdf)
  • 8. H2 O.ai The Speed of Information Observations: ● Moving bits around is a big deal! ● ∃ insurmountable physical and theoretical limitations ○ Shannon Limit ○ Speed of Light ○ Landauer’s Principle ○ Relativistic Effects ○ Curvature of the Earth ● Other limitations or complications? ○ Hairpinning: Non-optimal routing to far flung nodes ■ Geographic locality ≠ Internet locality ○ Bad hardware ○ Bad software
  • 9. H2 O.ai (n.d.). Retrieved from http://www.us.ntt.net/support/looking-glass/ (n.d.). Retrieved from http://www.submarinecablemap.com/ The Speed of Information
  • 10. H2 O.ai The Analytics Workflow The Analytics Process: 1. Define your problem 2. Gather data and explore 3. Prepare your data for modeling 4. Modeling 5. Model Validation 6. Implementation & Tracking
  • 11. H2 O.ai The Analytics Workflow The Analytics Process: 1. Define your problem 2. Gather data and explore 3. Prepare your data for modeling 4. Modeling 5. Model Validation 6. Implementation & Tracking } Here’s where H2O fits into the analytics process http://learn.h2o.ai/content/
  • 12. H2 O.ai The Analytics Workflow :::Prep::: Data Preparation: ● A sequence of transformations applied to your data ● This step will define your Storm topology ● Take raw information and give it structure
  • 13. H2 O.ai The Analytics Workflow :::Modeling::: Questions to ask yourself: ● How fast must a scoring engine classify incoming tuples? ● How do I optimize between scoring latency and predictive power? ● E.g.What are the trade-offs between a GLM and a GBM? Science!
  • 14. H2 O.ai The Analytics Workflow :::Validation::: Types of Validation: ● N-fold cross validation ● Train/Validate/Test -- What Features are Important? ● Model Comparison -- Does your model optimize all needs? ○ Business needs ○ Resource needs ● Repeat steps 3 - 5 until satisfied
  • 15. H2 O.ai The Analytics Workflow :::Validation::: Types of Validation: ● N-fold cross validation ● Train/Validate/Test -- What Features are Important? ● Model Comparison -- Does your model optimize all needs? ○ Business needs ○ Resource needs ● Repeat steps 3 - 5 until satisfied WRONG: You should never be satisfied! Your model will go out of date (if it hasn’t already)!
  • 16. H2 O.ai The Analytics Workflow :::Tracking::: An Extension of Validation: ● Do not open the fire-hose and blast your model with 100% of your data ○ Expect the unexpected ○ Your topology might will break (oops forgot about unicode… derp) ○ Start off with 10% and ramp up; course-correct along the way ● Perform batch modeling in off-peak hours (Jenkins never sleeps) ● Models should be replaced “gradually”
  • 18. H2 O.ai H2O // Storm For a complete tutorial please visit: http://learn.h2o.ai/content/demos/streaming_data.html