Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

H20: A platform for big math

1,460 views

Published on

H20: A platform for big math

Published in: Technology
  • Be the first to comment

H20: A platform for big math

  1. 1. H2O.ai
 Machine Intelligence H2O: A Platform for Big Math Arno Candel, PhD
 Chief Architect or: How to make A.I. and TensorFlow work for you +20 more
  2. 2. H2O.ai
 Machine Intelligence Who Am I? Arno Candel Chief Architect, Physicist & Hacker at H2O.ai PhD Physics, ETH Zurich 2005 10+ yrs Supercomputing (HPC) 6 yrs at SLAC (Stanford Linear Accelerator) 4.5 yrs Machine Learning 2.5 yrs at H2O.ai Fortune Magazine Big Data All Star Follow me @ArnoCandel 2 Who am I?
  3. 3. H2O.ai
 Machine Intelligence 3 Overview Machine Learning (ML) Artificial Intelligence (A.I.) Computer Science (CS) H2O.ai Deep Learning (DL) hot hot hot hot hot
  4. 4. H2O.ai
 Machine Intelligence 4 A Simple Deep Learning Model: Artificial Neural Network heartbeat blood pressure oxygen send to regular care send to intensive
 care unit (ICU) IN: data OUT: prediction nodes : neuron activations (real numbers) — represent features arrows : connecting weights (real numbers) — learned during training : non-linearity x -> f(x) — adds model complexity from 1970s, now rebranded as DL
  5. 5. H2O.ai
 Machine Intelligence 5 Brief History of A.I., ML and DL John McCarthy
 Princeton, Bell Labs, Dartmouth, later: MIT, Stanford 1955: “A proposal for the Dartmouth summer research project on Artificial Intelligence” with Marvin Minsky (MIT), Claude Shannon 
 (Bell Labs) and Nathaniel Rochester (IBM) http://www.asiapacific-mathnews.com/04/0403/0015_0020.pdf A step back: A.I. was coined over 60 years ago
  6. 6. H2O.ai
 Machine Intelligence 6 Step 1: Great Algorithms + Fast Computers http://nautil.us/issue/18/genius/why-the-chess-computer-deep-blue-played-like-a-human 1997: Playing Chess (IBM Deep Blue beats Kasparov) Computer Science
 30 custom CPUs, 60 billion moves in 3 mins “No computer will ever beat me at playing chess.”
  7. 7. H2O.ai
 Machine Intelligence 7 Step 2: More Data + Real-Time Processing http://cs.stanford.edu/group/roadrunner/old/presskit.html 2005: Self-driving Cars
 DARPA Grand Challenge, 132 miles (won by Stanford A.I. lab*) Sensors & Computer Science
 video, radar, laser, GPS, 7 Pentium computers “No computer will ever drive a car!?” *A.I. lab was established by McCarthy et al. in the early 60s
  8. 8. H2O.ai
 Machine Intelligence 8 Step 3: Big Data + In-Memory Clusters 2011: Jeopardy (IBM Watson) In-Memory Analytics/ML 4 TB of data (incl. wikipedia), 90 servers,
 16 TB RAM, Hadoop, 6 million logic rules https://www.youtube.com/watch?v=P18EdAKuC1U https://en.wikipedia.org/wiki/Watson_(computer) Note: IBM Watson received the question in electronic written form, and was often able to press the answer button faster than the competing humans. “No computer will ever answer random questions!?”
  9. 9. H2O.ai
 Machine Intelligence 9 “No computer will ever speak any language!?” 2014: Google
 (acquired Quest Visual) Deep Learning
 Convolutional and Recurrent Neural Networks, with training data from users Step 4: Deep Learning • Translate between 103 languages by typing • Instant camera translation: Use your camera to translate text instantly in 29 languages • Camera Mode: Take pictures of text for higher-quality translations in 37 languages • Conversation Mode: Two-way instant speech translation in 32 languages • Handwriting: Draw characters instead of using the keyboard in 93 languages
  10. 10. H2O.ai
 Machine Intelligence 10 Step 5: Augmented Deep Learning 2014: Atari Games (DeepMind) 2016: AlphaGo (Google DeepMind) Deep Learning
 + reinforcement learning, tree search,
 Monte Carlo, GPUs, playing against itself, … https://deepmind.com Go board has approx. 200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (2E170) possible positions. trained from raw pixel values, no human rules “No computer will ever beat the best Go master!?”
  11. 11. H2O.ai
 Machine Intelligence 11 Microsoft had won the Visual Recognition challenge: http://image-net.org/challenges/LSVRC/2015/ Step 6: A.I. Chatbots have Opinions too!
  12. 12. H2O.ai
 Machine Intelligence 12 What Will Change? Today Tomorrow Better Data — Better Models — Better Results Example: Fraud Prediction
  13. 13. H2O.ai
 Machine Intelligence 13 H2O.ai - Makers of H2O H2O - AI for Business Transformation • Scalable and Distributed Data Science and Machine Learning:
 Deep Learning, Gradient Boosting, Random Forest, Decision Trees, 
 Logistic Regression, Generalized Linear Modeling, K-Means, PCA, GLRM, … • Fast, accurate, robust, proven, fully featured • Apache v2 open source (github.com/h2oai) 
 Easy to Use and Deploy • h2o.ai/download and run anywhere, immediately • Client APIs: R, Python, Java, Scala, REST, Flow GUI • Spark (cf. Sparkling Water), Hadoop, Standalone • Java scoring code auto-generated
  14. 14. H2O.ai
 Machine Intelligence 14 H2O.ai - Growing Rapidly + 10 more recent hires author data.table r2d3.us ceo grammar of graphics pure software kaggle master many many talents at H2O… found Pentium bug POSIX h2o.ai/careers
  15. 15. H2O.ai
 Machine Intelligence 15 High Level Architecture of H2O HDFS S3 NFS Distributed In-Memory Parallel Parser Lossless Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model
 Evaluation & Selection Predict Data & Model
 Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL LDAP Kerberos SSL HTTPS HTTP
  16. 16. H2O.ai
 Machine Intelligence Native APIs: Java, Scala — REST APIs: R, Python, Flow, JavaScript, Java 16 library(h2o) h2o.init() h2o.deeplearning(x=1:4,y=5,as.h2o(iris)) import h2o from h2o.estimators.deeplearning import H2ODeepLearningEstimator h2o.init() dl = H2ODeepLearningEstimator() dl.train(x=list(range(1,4)), y="Species", training_frame=iris.hex) import _root_.hex.deeplearning.DeepLearning import _root_.hex.deeplearning.DeepLearningParameters val dlParams = new DeepLearningParameters() dlParams._train = iris.hex dlParams._response_column = ‘Species val dl = new DeepLearning(dlParams) val dlModel = dl.trainModel.get All heavy lifting is done by the backend! Built-in interactive GUI and notebook - no coding necessary!
  17. 17. H2O.ai
 Machine Intelligence 17 Gradient Boosting Machine
 Tree Model (nano-fast) Auto-generated
 Java scoring code to easily
 Operationalize Data Science Easily Bring Models into Production READ MORE
  18. 18. H2O.ai
 Machine Intelligence Spark + H2O = Sparkling Water 18 • Spark 2.0 API compatibility • Use H2O algorithms in conjunction with, or instead of, MLLib algorithms on Spark • Build Ensembles using H2O and MLLib Algorithms • Visual Intelligence for Spark. Run Spark, MLLib, Scala in Flow • Export MLLib models as POJOs • Toolchain for ML pipelines and debugging support Sparkling W ater 2.0
  19. 19. H2O.ai
 Machine Intelligence 19 Live H2O Deep Learning Demo: Predict Airplane Delays 10 nodes:
 all 320 cores busy real-time, interactive model inspection in Flow 116M rows, 6GB CSV file
 800+ predictors (numeric + categorical) model trained in <1 min:
 2M+ samples/second Deep Learning Model
  20. 20. H2O.ai
 Machine Intelligence 20 H2O Elastic Net (GLM): 10 secs alpha=0.5, lambda=1.379e-4 (auto) H2O Deep Learning: 45 secs 4 hidden ReLU layers of 20 neurons, 1 epoch Features have non- linear impact Chicago, Atlanta, Dallas:
 often delayed Significant Performance Gains with Deep Learning Predict departure delay (Y/N) on 20 years of airline flight data (116M rows, 12 cols, categorical + numerical data with missing values) WATCH NOW AUC: 0.656 AUC: 0.703 (higher is better, ranges from 0.5 to 1) Feature importances 10 nodes: Dual E5-2650 (8 cores, 2.6GHz), 10GbE
  21. 21. 21 • Data matrix is chunked into columnar blocks • Algorithms can parallelize over these blocks • Scalable to many TBs: Each node fills its memory with data • Columns are separate entities (fast add/remove/modify) • Similar to data frames in R, Pandas, and now also Spark Distributed In-Memory Data Frames So How Does It Work?
  22. 22. 22 p cols N/6 rows N/6 rows N/6 rows N/6 rows N/6 rows N/6 rows massively p cols N rows parallel Parallel Parse into Distributed Rows HDFS, S3, NFS, SQL, … parser
  23. 23. 23 map() map() reduce() reduce() reduce()map() map() map() map() reduce() reduce() reduce()map() map() map() map() reduce() reduce() reduce()map() map() map() map() reduce() reduce() reduce()map() map() map() map() reduce() reduce() reduce()map() map() map() map() reduce() reduce() reduce()map() map() reduce() reduce() reduce() reduce() reduce()driver Algo calls
 M/R Task Data Parallelism - all CPU cores are at work Compute Paradigm: Fine-Grain Map/Reduce final result map(): process data, reduce(): aggregate results
  24. 24. 24 • Distributed in-memory data store holds data, models, etc. • Columnar compression (often better than gzip on disk) • Low-level Java code (byte[], float[], bit operations, etc.) • Data read/write access at memory bandwidth speeds • Custom serialization, networking and execution layer • Auto-generated REST client-server API (R, Python, Flow,…) • Standalone scoring code auto-generated for every model Implementation Details
  25. 25. 25 Distributed Gradient Boosting Machine find optimal split (feature & value) • H2O: First open-source implementation of scalable, distributed Gradient Boosting Machine - fully featured • Parallelized Individual Tree Construction • Discretization (binning) for speedup without loss of accuracy age < 25 ? Y N all data age 12 118 income 1k 1M Analytical error landscape best split: age 25 H2O: discretized into bins 12 118 age 25 age
  26. 26. 26 map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() driver Algo calls
 histogram method data parallelism - global histogram computation Scalable Distributed Histogram Calculation global histogram local histogram - one each for w, w*y and w*y2 w: observation weights y : response Same results as if computed on a single compute node
  27. 27. Over 7000 enterprises use H2O Financial Insurance MarketingTelecom Healthcare 27
  28. 28. H2O.ai
 Machine Intelligence User Based Insurance WATCH NOW WATCH NOW “H2O is an enabler in how people are thinking about data.” “We have many plans to use H2O across the different business units.” 28 Today’s Keynote!
  29. 29. H2O.ai
 Machine Intelligence Digital Marketing - Campaigns “H2O gave us the capability to do Big Modeling. There is no limit to scaling in H2O.” “Working with the H2O team has been amazing.” “The business value that we have gained from advanced analytics is enormous.” WATCH NOW WATCH NOW 29
  30. 30. H2O.ai
 Machine Intelligence WATCH NOW WATCH NOW Matching TV Watching Behavior with Buying Behavior “Unlike other systems where I had to buy the whole package and just use 10-20%, I can customize H2O to suit my needs.” “I am a big fan of open source. H2O is the best fit in terms of cost as well as ease of use and scalability and usability.” 30
  31. 31. H2O.ai
 Machine Intelligence WATCH NOW WATCH NOW Insurance - Risk Assessment “Predictive analytics is the differentiator for insurance companies going forward in the next couple of decades.” “Advanced analytics was one of the key investments that we decided to make.” 31
  32. 32. H2O.ai
 Machine Intelligence Fintech - Fraud/Risk/Churn/etc. “H2O is a great solution because it's designed to be enterprise ready and can operate on very large datasets.” ”H2O has been a one-stop shop that helps us do all our modeling in one framework.” ”H2O is the best solution to be able to iterate very quickly on large datasets and produce meaningful models.” WATCH NOW WATCH NOW 32 Today’s Keynote!
  33. 33. H2O.ai
 Machine Intelligence 33 H2O Booklets DOWNLOAD Come get your booklets at our booth! R Python Deep Learning GLMGBMSparkling Water
  34. 34. H2O.ai
 Machine Intelligence 34 Data Scientists Love This Stuff H2O GBM Model Tuning Tutorial for R/Python/Flow https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/
  35. 35. H2O.ai
 Machine Intelligence 35 KDNuggets Poll about Deep Learning Tools & Platforms http://www.kdnuggets.com H2O and TensorFlow are tied usage of Deep Learning tools in past year
  36. 36. H2O.ai
 Machine Intelligence 36 TensorFlow + H2O + Apache Spark = Anything is possible In progress: Integration with GPU DL tools (TensorFlow/Caffe/mxnet/etc.) https://github.com/h2oai/sparkling- water/blob/master/py/examples/ notebooks/TensorFlowDeepLearning.ipynb https://www.youtube.com/watch?v=62TFK641gG8
  37. 37. H2O ALGORITHMS EXPERIENCE DATA VERTICALS • H2O Flow Single web-based Document for code execution, text, mathematics, plots and rich media • Visual Intelligence UX and Interpretability for AI • Steam Elastic ML & Auto ML Operationalize Data Science H2O.ai Now Focused On Experience
 Beyond Algorithms and Data
 37 DATA PRODUCTS
  38. 38. H2O.ai
 Machine Intelligence 38 Steam - Automated Platform to Build and Scale Smart Data Products DevOps/Data Engineers Data Scientists Advanced Data Scientists Software Engineers Application Software Engineers DATA BUSINESS INSIGHTS AI – Machine Learning Automation Scalability Visualization Coming Soon
  39. 39. H2O.ai
 Machine Intelligence 39 H2O OPEN TOUR 
w w w. O P E N . H 2 O . A I We’re coming to a town near you in NYC / TX Visit our Booth Today!
  40. 40. H2O.ai
 Machine Intelligence A.I. and Deep Learning are hot (again)! Make your own smart data products with H2O! 
 Try H2O today - installs in minutes! 40 h2o.ai/download https://www.youtube.com/user/0xdata/videos https://github.com/h2oai/h2o-3 H2O Google Group @h2oai Summary We’re hiring: h2o.ai/careers/

×