SlideShare a Scribd company logo
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Agenda
A sample problem
A general approach
Complications arise
Light is cast on the villains
Who flee from the scene
© 2014 MapR Technologies 3
Agenda Script
A sample problem
A general approach
Complications arise
Light is cast on the villains
Who flee from the scene
© 2014 MapR Technologies 4
Model Building in a Nutshell
Gather
data
Build
models
Predict
future
World
domination!
Fight fraud
Save the
planet
✔
© 2014 MapR Technologies 5
A Sample Problem
© 2014 MapR Technologies 6
Modeling Energy Use
• Modeling office and home energy use can save energy
• Guides retrofits
• Finds bad leaks
• Increases awareness and understanding of problems
• Demonstrated results of 20% or more savings
• Savings = less CO2 = less planet warming
© 2014 MapR Technologies 7
Modeling Energy Use
See ASHRAE RP-1050
http://bit.ly/1ovwGfy
© 2014 MapR Technologies 8
Modeling Energy Use (or not)
© 2014 MapR Technologies 9
Modeling Energy Use (complete hash)
© 2014 MapR Technologies 10
Some Notes on the Method
• Can’t change method since this is ASHRAE standard
• Small changes in cutoff can have ragged effect on model fit
– Linear methods out of the question
– Gradient based methods find local minima
• All parameters interact strongly
– Can’t solve for one at a time
© 2014 MapR Technologies 11
Evolutionary Algorithms
• Basic algorithm:
fill population with random solutions
do {
keep best x% of solutions
mutate survivors to fill population
} until happy with results
• Works great
• Converges very slowly
– If mutation is small, takes many, many steps to find best, gets trapped
– If mutation is too big, keeps jumping away from optimum
© 2014 MapR Technologies 12
Doesn’t work in practice
© 2014 MapR Technologies 13
Meta-Evolutionary Algorithms
• Meta mutation algorithm:
fill population with random solutions
do {
keep best x% of solutions
mutate survivors to fill population
use mutation size to set mutation rate per candidate
} until happy with results
• Works great
• Converges very fast
– If small jump works, we get more of that
– If big jump works, we get more of that
© 2014 MapR Technologies 14
Meta-Evolutionary Algorithms
• Meta mutation algorithm:
fill population with random solutions
do {
keep best x% of solutions
mutate survivors to fill population
use mutation size to set mutation rate per candidate
} until happy with results
• Works great
• Converges very fast
– If small jump works, we get more of that
– If big jump works, we get more of that
© 2014 MapR Technologies 15
Meta-Evolutionary Algorithms
• Algorithm may go wrong way
• May take wrong-size steps
• But it quickly learns to correct
• Bad strategies die out along with
bad solutions
© 2014 MapR Technologies 16
But There’s a Rub
• This new algorithm may be gang busters
– But it comes with new knobs to turn
• How can we tell where to turn them?
• How do we make sense of a seething mass of 5 dimensional
spiders?
© 2014 MapR Technologies 17
We need to look inside
© 2014 MapR Technologies 18
Demo Reel Synopsis
• Constant mutation rate failure example
• Meta-mutation succeeds
• Meta-mutation can handle highly correlated narrow valleys
• Very complex landscapes can be navigated
• Strategy shifts fluidly to find solutions
© 2014 MapR Technologies 19
Let’s put on a show!
© 2014 MapR Technologies 20
Not quite that simple
• Current problem is 5-dimensional
• Problem parameters don’t make sense directly
• So we need to show the human face of the problem
(that is where we started!)
• We also need dynamics to understand how the algorithm gets
where it goes
© 2014 MapR Technologies 21
Main-line Model and Visualization Flow
Data
repo
Solver
grep
Solver
JSON
model
d3 +
twistd
JSON
model
Conventional
Scalable
© 2014 MapR Technologies 22
How does R make video?
© 2014 MapR Technologies 23
© 2014 MapR Technologies 24
© 2014 MapR Technologies 25
Diagnostic Visualizations
Solver
JSON
model
Scalable
Logs
ScaleR ffmpeg
© 2014 MapR Technologies 26
Of Note
• RevoScaleR solves most of the parallelism issues
• We still want to run arbitrary R
• Some legacy functions are Particularly Unfriendly to hdfs
– png(filename) – requires conventional file access
– system(command) – assumes conventional file access
– ffmpeg (1) – assumes conventional file access
© 2014 MapR Technologies 27
Simple Solution
• MapR provides hdfs and NFS access to cluster
• All path names are the same
• Map reduce programs can use legacy POSIX code
© 2014 MapR Technologies 28
Diagnostic Videos
• 5D x 100 can get trapped in local minimum
– ’470 example
• 5D x 500 avoids trapping issues
– ’470 quiescence and resurgence
• 3D x 500 and 3D x 100 also avoid trapping
• Need to distinguish empty house from occupied
– ’771 shows poor fit to either regime, classic real world issue
© 2014 MapR Technologies 29
Lessons I Learned by Watching Movies
• Lower dimensional problems are easier
– Evolve baseline level and cut-points, solve for wing slopes
– Hybrid solutions are not “cheating”
• Real-world data always has surprises and I am always surprised
by this
• Can use 5P models as cluster “centroids” to handle 2-state
homes
© 2014 MapR Technologies 30
And there’s a
PRIZE in every
box!

More Related Content

What's hot

Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Ted Dunning
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
MapR Technologies
 
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
Ellen Friedman
 
11 2016 jit-dumping_ss360
11 2016 jit-dumping_ss36011 2016 jit-dumping_ss360
11 2016 jit-dumping_ss360
Yvonne C. Salazar
 
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
DataWorks Summit
 
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Thomas Willberger
 

What's hot (10)

Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
 
11 2016 jit-dumping_ss360
11 2016 jit-dumping_ss36011 2016 jit-dumping_ss360
11 2016 jit-dumping_ss360
 
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
 
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
 

Similar to Hadoop and R Go to the Movies

Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
DataWorks Summit/Hadoop Summit
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
DataWorks Summit
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
Allen Day, PhD
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
MapR Technologies
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
John Mulhall
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
MapR Technologies
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
DataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
MapR Technologies
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
MapR Technologies
 

Similar to Hadoop and R Go to the Movies (20)

Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 

Recently uploaded (20)

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 

Hadoop and R Go to the Movies

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Agenda A sample problem A general approach Complications arise Light is cast on the villains Who flee from the scene
  • 3. © 2014 MapR Technologies 3 Agenda Script A sample problem A general approach Complications arise Light is cast on the villains Who flee from the scene
  • 4. © 2014 MapR Technologies 4 Model Building in a Nutshell Gather data Build models Predict future World domination! Fight fraud Save the planet ✔
  • 5. © 2014 MapR Technologies 5 A Sample Problem
  • 6. © 2014 MapR Technologies 6 Modeling Energy Use • Modeling office and home energy use can save energy • Guides retrofits • Finds bad leaks • Increases awareness and understanding of problems • Demonstrated results of 20% or more savings • Savings = less CO2 = less planet warming
  • 7. © 2014 MapR Technologies 7 Modeling Energy Use See ASHRAE RP-1050 http://bit.ly/1ovwGfy
  • 8. © 2014 MapR Technologies 8 Modeling Energy Use (or not)
  • 9. © 2014 MapR Technologies 9 Modeling Energy Use (complete hash)
  • 10. © 2014 MapR Technologies 10 Some Notes on the Method • Can’t change method since this is ASHRAE standard • Small changes in cutoff can have ragged effect on model fit – Linear methods out of the question – Gradient based methods find local minima • All parameters interact strongly – Can’t solve for one at a time
  • 11. © 2014 MapR Technologies 11 Evolutionary Algorithms • Basic algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population } until happy with results • Works great • Converges very slowly – If mutation is small, takes many, many steps to find best, gets trapped – If mutation is too big, keeps jumping away from optimum
  • 12. © 2014 MapR Technologies 12 Doesn’t work in practice
  • 13. © 2014 MapR Technologies 13 Meta-Evolutionary Algorithms • Meta mutation algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population use mutation size to set mutation rate per candidate } until happy with results • Works great • Converges very fast – If small jump works, we get more of that – If big jump works, we get more of that
  • 14. © 2014 MapR Technologies 14 Meta-Evolutionary Algorithms • Meta mutation algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population use mutation size to set mutation rate per candidate } until happy with results • Works great • Converges very fast – If small jump works, we get more of that – If big jump works, we get more of that
  • 15. © 2014 MapR Technologies 15 Meta-Evolutionary Algorithms • Algorithm may go wrong way • May take wrong-size steps • But it quickly learns to correct • Bad strategies die out along with bad solutions
  • 16. © 2014 MapR Technologies 16 But There’s a Rub • This new algorithm may be gang busters – But it comes with new knobs to turn • How can we tell where to turn them? • How do we make sense of a seething mass of 5 dimensional spiders?
  • 17. © 2014 MapR Technologies 17 We need to look inside
  • 18. © 2014 MapR Technologies 18 Demo Reel Synopsis • Constant mutation rate failure example • Meta-mutation succeeds • Meta-mutation can handle highly correlated narrow valleys • Very complex landscapes can be navigated • Strategy shifts fluidly to find solutions
  • 19. © 2014 MapR Technologies 19 Let’s put on a show!
  • 20. © 2014 MapR Technologies 20 Not quite that simple • Current problem is 5-dimensional • Problem parameters don’t make sense directly • So we need to show the human face of the problem (that is where we started!) • We also need dynamics to understand how the algorithm gets where it goes
  • 21. © 2014 MapR Technologies 21 Main-line Model and Visualization Flow Data repo Solver grep Solver JSON model d3 + twistd JSON model Conventional Scalable
  • 22. © 2014 MapR Technologies 22 How does R make video?
  • 23. © 2014 MapR Technologies 23
  • 24. © 2014 MapR Technologies 24
  • 25. © 2014 MapR Technologies 25 Diagnostic Visualizations Solver JSON model Scalable Logs ScaleR ffmpeg
  • 26. © 2014 MapR Technologies 26 Of Note • RevoScaleR solves most of the parallelism issues • We still want to run arbitrary R • Some legacy functions are Particularly Unfriendly to hdfs – png(filename) – requires conventional file access – system(command) – assumes conventional file access – ffmpeg (1) – assumes conventional file access
  • 27. © 2014 MapR Technologies 27 Simple Solution • MapR provides hdfs and NFS access to cluster • All path names are the same • Map reduce programs can use legacy POSIX code
  • 28. © 2014 MapR Technologies 28 Diagnostic Videos • 5D x 100 can get trapped in local minimum – ’470 example • 5D x 500 avoids trapping issues – ’470 quiescence and resurgence • 3D x 500 and 3D x 100 also avoid trapping • Need to distinguish empty house from occupied – ’771 shows poor fit to either regime, classic real world issue
  • 29. © 2014 MapR Technologies 29 Lessons I Learned by Watching Movies • Lower dimensional problems are easier – Evolve baseline level and cut-points, solve for wing slopes – Hybrid solutions are not “cheating” • Real-world data always has surprises and I am always surprised by this • Can use 5P models as cluster “centroids” to handle 2-state homes
  • 30. © 2014 MapR Technologies 30 And there’s a PRIZE in every box!