Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CHASE-CI: A Distributed Big Data Machine Learning Platform

138 views

Published on

Opening Talk With Professor Ken Kreutz-Delgado
CHASE-CI Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 14, 2018

Published in: Data & Analytics
  • Memory Improvement: How To Improve Your Memory In Just 30 Days, click here.. ♥♥♥ https://bit.ly/2GEWG9T
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

CHASE-CI: A Distributed Big Data Machine Learning Platform

  1. 1. “CHASE-CI: A Distributed Big Data Machine Learning Platform” Opening Talk With Professor Ken Kreutz-Delgado CHASE-CI Workshop Calit2’s Qualcomm Institute University of California, San Diego May 14, 2018 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net
  2. 2. DOE ESnet’s Science DMZ Creates a Separate Network for Big Data Applications • A Science DMZ Integrates 4 Key Concepts Into a Unified Whole: – A network architecture designed for high-performance applications, with the science network distinct from the general-purpose network – The use of dedicated systems as data transfer nodes (DTNs) – Performance measurement and network testing systems that are regularly used to characterize and troubleshoot the network – Security policies and enforcement mechanisms that are tailored for high performance science environments http://fasterdata.es.net/science-dmz/ Science DMZ Coined 2010
  3. 3. Based on Community Input and on ESnet’s Science DMZ Concept, NSF Has Made Over 200 Campus-Level Awards in 44 States Source: Kevin Thompson, NSF
  4. 4. Science DMZ Data Transfer Nodes (DTNs) - Flash I/O Network Appliances (FIONAs) UCSD Designed FIONAs To Solve the Disk-to-Disk Data Transfer Problem at Full Speed on 10G, 40G and 100G Networks FIONAS—10/40G, $8,000 Phil Papadopoulos, SDSC & Tom DeFanti, Joe Keefe & John Graham, Calit2 FIONette—1G, $250 Five Racked FIONAs at Calit2 • Each Contains: • Dual 12-Core CPUs • 96GB RAM • 1TB SSD • 2 10GbE interfaces • Total ~$10,500 • With 8 GPUs • total ~$18,500
  5. 5. Logical Next Step: The Pacific Research Platform Networks Campus DMZs to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System (GDC) NSF CC*DNI Grant $5M 10/2015-10/2020 PI: Larry Smarr, UC San Diego Calit2 Co-PIs: • Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2/QI, • Philip Papadopoulos, UCSD SDSC, • Frank Wuerthwein, UCSD Physics and SDSC Letters of Commitment from: • 50 Researchers from 15 Campuses • 32 IT/Network Organization Leaders NSF Program Officer: Amy Walton Source: John Hess, CENIC
  6. 6. PRP National-Scale Experimental Distributed Testbed: Using Internet2 to Connect Early-Adopter Quilt Regional R&E Networks Original PRP Extended PRP Testbed Announced at Internet2 Global Summit May 8, 2018
  7. 7. PRP’s First 2.5 Years: Connecting Multi-Campus Application Teams and Devices Earth Sciences
  8. 8. 100 Gbps FIONA at UCSC Allows for Downloads to the UCSC Hyades Cluster from the LBNL NERSC Supercomputer for Telescope Survey Analysis 300 images per night. 100MB per raw image 120GB per night 250 images per night. 530MB per raw image 800GB per night Source: Peter Nugent, LBNL Professor of Astronomy, UC Berkeley NSF-Funded Cyberengineer Shaw Dong @UCSC Receiving FIONA Feb 7, 2017 CENIC 2018 Innovations in Networking Award for Research Applications
  9. 9. Game Changer: Using Kubernetes to Manage Containers Across the PRP “Kubernetes is a way of stitching together a collection of machines into, basically, a big computer,” --Craig Mcluckie, Google and now CEO and Founder of Heptio "Everything at Google runs in a container." --Joe Beda,Google “Kubernetes has emerged as the container orchestration engine of choice for many cloud providers including Google, AWS, Rackspace, and Microsoft, and is now being used in HPC and Science DMZs. --John Graham, Calit2/QI UC San Diego
  10. 10. Rook is Ceph Cloud-Native Object Storage ‘Inside’ Kubernetes https://rook.io/ Source: John Graham, Calit2/QI
  11. 11. FIONA8 FIONA8 100G Epyc NVMe 40G 160TB 100G NVMe 6.4T SDSU 100G Gold NVMe March 2018 John Graham, UCSD 100G NVMe 6.4T Caltech 40G 160TB UCAR FIONA8 UCI FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 sdx-controller controller-0 Calit2 100G Gold FIONA8 SDSC 40G 160TB UCR 40G 160TB USC 40G 160TB UCLA 40G 160TB Stanford 40G 160TB UCSB 100G NVMe 6.4T 40G 160TB UCSC 40G 160TB Hawaii Running Kubernetes/Rook/Ceph On PRP Allows Us to Deploy a Distributed PB+ of Storage for Posting Science Data Rook/Ceph - Block/Object/FS Swift API compatible with SDSC, AWS, and Rackspace Kubernetes Centos7
  12. 12. The Rise of Brain-Inspired Computers: Left & Right Brain Computing: Arithmetic vs. Pattern Recognition Adapted from D-Wave
  13. 13. New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure: Adding a Machine Learning Layer Built on Top of the Pacific Research Platform Caltech UCB UCI UCR UCSD UCSC Stanford MSU UCM SDSU NSF Grant for High Speed “Cloud” of 256 GPUs For 30 ML Faculty & Their Students at 10 Campuses for Training AI Algorithms on Big Data NSF Program Officer: Mimi McClure
  14. 14. FIONA8: Adding GPUs to FIONAs Supports Data Science Machine Learning Multi-Tenant Containerized GPU JupyterHub Running Kubernetes / CoreOS Eight Nvidia GTX-1080 Ti GPUs 32GB RAM, 3TB SSD, 40G & Dual 10G ports Source: John Graham, Calit2
  15. 15. 48 GPUs for OSG Applications UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure - Devoted to Data Analytics and Machine Learning SunCAVE 70 GPUs WAVE + Vroom 48 GPUs FIONA with 8-Game GPUs 95 GPUs for Students CHASE-CI Grant Provides 96 GPUs at UCSD for Training AI Algorithms on Big Data Plus 288 64-bit GPUs On SDSC’s Comet
  16. 16. Next Step: Surrounding the PRP Machine Learning Platform With Clouds of GPUs and Non-Von Neumann Processors Microsoft Installs Altera FPGAs into Bing Servers & 384 into TACC for Academic Access CHASE-CI 64-TrueNorth Cluster 64-bit GPUs 4352x NVIDIA Tesla V100 GPUs
  17. 17. The Future of Supercomputing Will Blend Traditional HPC and Data Analytics Integrating Non-von Neumann Architectures “High Performance Computing Will Evolve Towards a Hybrid Model, Integrating Emerging Non-von Neumann Architectures, with Huge Potential in Pattern Recognition, Streaming Data Analysis, and Unpredictable New Applications.” Horst Simon, Deputy Director, U.S. Department of Energy’s Lawrence Berkeley National Laboratory
  18. 18. Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab For Machine Learning on GPUs and von Neumann and NvN Processors Source: Dr. Dharmendra Modha Founding Director, IBM Cognitive Computing Group August 8, 2014 UCSD ECE Professor Ken Kreutz-Delgado Brings the IBM TrueNorth Chip to Start Calit2’s Qualcomm Institute Pattern Recognition Laboratory September 16, 2015
  19. 19. Ken Kreutz-Delgado Director, Calit2/QI Pattern Recognition Laboratory Professor of Electrical & Computer Engineering Irwin & Joan Jacobs School of Engineering University of California, San Diego Calit2/QI Pattern Recognition Laboratory (PRLab)
  20. 20. Pattern Recognition Lab (PRLab) – A Nexus for a Community of Researchers and Practitioners in Theory and Applications of Pattern Recognition & Machine Learning – All Disciplines and Application Areas (Medicine, Education, Finance, Economics, Science, Engineering, Art…) Can Be Involved – Computing “On-The-Edge-of-The-Edge”: Real-Time, Local, Fast and Robust Processing for Critical Control and Decision Making – (e.g., Robotic Surgical Assistance, Autonomous Aircraft)
  21. 21. The PRLab Community - I Calit2 Technical Leadership: Tom DeFanti, Engineering Systems Scientist Srinjoy Das, Principle Chip Algorithms Designer John Graham, Systems Development & Integration Joe Keefe, Systems Integration The PRLab Community - I 21 Regional UC Campuses
  22. 22. The PRLab Community II UC Calit2 Cenic/PRP/CHASE-CI All Networks Networked
  23. 23. Mapping Machine Learning Algorithm Families onto Novel Architectures for Real-time On-the-Edge Embedded Computing • Deep & Recurrent Neural Networks (DNN, RNN) • Graph Theoretic Approaches (Bayes Nets, Markov Random Fields) • Reinforcement Learning and Control (RL) • Markov Decision Processes; Time Series Analysis • Clustering and other Neighborhood Approaches • Support Vector Machines (SVM) • Sparse Signal Processing, Source Localization & Compressive Sensing • Stochastic Sampling & Variational Approximation for Bayesian Reasoning • Dimensionality Reduction & Manifold Learning • Ensemble Learning (Boosting, Bagging) • Latent Variable Analysis (PCA, ICA)
  24. 24. Example Hard Problem – Real-Time EEG-Based BCI Research Performed by Grad Students Jason Palmer, Nima Bigdely-Shamlo, Ozgur Balkan, Luca Pion-Tonachini, Alejandro Pineda, Ramon Martinez, Ching-fu Chen, in Collaboration with Dr. Scott Makeig, Director SCCN Localized and Isolate Dynamically Changing Brain Sources in Real Time Image source: emotiv.com
  25. 25. Computing on the Edge-of-the-Edge – New Computational Paradigms are Needed for Real-Time Pattern Recognition and Machine Learning Algorithms – Exploit and Enhance the Performance of: – Advanced SOC Mobile Device Processors (e.g., Qualcomm Snapdragon) – Non-von Neumann (NvN) Processors, Including: – Field Programmable Gate Arrays (FPGAs) – Digital Neuromorphic Processors (e.g., IBM TrueNorth)
  26. 26. Horst Simon, Deputy Director, Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center Qualcomm Institute Brain-Inspired Computation • Straightforward Extrapolation Results in a Real Time Human Brain Scale Simulation at 1–10 Exaflop/s with 4 PB of Memory • A Digital Computer with this Performance Might be Available in 2022–2024 with a Power Consumption of >20–30 MW • The Human Brain Runs on 20 W • Our Brain is a Million Times More Power Efficient! SI-1
  27. 27. Brain-Inspired Computing 13 May 201627 Spike-based Snapdragon Other Canonical Families of Pattern Recognition Algorithms Heterogeneous Mix of Cybercores Deep Networks Stochastic SamplingSupport Vector Machines
  28. 28. Pushing the NvN Envelope Deep Generative Neural Networks for Real-Time Embedded Hardware-Based IoT Applications Approximate and Efficient Arithmetic, (Adders, Comparators, Multipliers); Finite Precision; Memory Optimization; Data Flow Optimization Functional Approximation; Weights and Modes Criticality & Sensitivity analysis; Sparsity & Pruning; Applications and Processors-Specific Architecture Determination and Optimization Training and Inference Methodologies; Gibbs Sampling; Variational Approximation; Transfer Learning; Reinforcement Learning Performance at Power & Power at Performance measures; Distributional Similarity Measures; Statistical Hypothesis Testing; Design optimization Criteria and Tools Low-Power, Embedded Real-Time Decision Making, Control & Scene/Scenario Generation and Situation Analysis Pushing the NvN Envelope Deep Generative Neural Networks for Real-Time Embedded Hardware-Based IoT Applications Approximate and Efficient Arithmetic, (Adders, Comparators, Multipliers); Finite Precision; Memory Optimization; Data Flow Optimization Functional Approximation; Weights and Modes Criticality & Sensitivity analysis; Sparsity & Pruning; Applications and Processors-Specific Architecture Determination and Optimization Training and Inference Methodologies; Gibbs Sampling; Variational Approximation; Transfer Learning; Reinforcement Learning Performance at Power & Power at Performance measures; Distributional Similarity Measures; Statistical Hypothesis Testing; Design optimization Criteria and Tools Low-Power, Embedded Real-Time Decision Making, Control & Scene/Scenario Generation and Situation Analysis
  29. 29. Brain-Inspired Processors Are Accelerating the Non-von Neumann Architecture Era “On the drawing board are collections of 64, 256, 1024, and 4096 chips. ‘It’s only limited by money, not imagination,’ Modha says.” Source: Dr. Dharmendra Modha IBM Chief Scientist for Brain-inspired Computing August 8, 2014
  30. 30. Example: Stochastic Sampling for Deep Learning Algorithms On Analog & Digital (IBM TrueNorth) Neuromorphic Chips 30 Hierarchical, Probabilistic Learning and Inference (Restricted Boltzmann Machines, Deep Belief Networks) Massively Parallel Computational Substrates (“Brain-Like” VLSI Platforms) Analog Neurons (UCSD IFAT) Neuromorphic VLSI, Stochastic Sampling & RBMs Digital Neurons (IBM TrueNorth) Conventional Von Neuman MNIST Digit Recognition, Completion, Generation Low-Power Neuromorphic (Neftci et al., Frontiers in Neuroscience 2014) (Neural Sampling with Event- Driven Contrastive Divergence for RBM learning/inference) (Digital Gibbs Sampling for RBM/DBN Inference) (Das et al., ISCAS 2015)
  31. 31. TrueNorth Chip is on the PRP • Our IBM TrueNorth Platform is Available for Use for Anyone on the PRP • We Encourage All Who are Interested, Particularly Students, to Do Similar Research. • Last Summer: – Three MS Students Remotely Accessed Our TrueNorth Chip From Berkeley – Four UCSD Students Learned How to Program the TrueNorth Chip
  32. 32. Current Focus on FPGA Applications • Application: Real-Time, Low Power Inference in Restricted Boltzmann Machines (RBM), Deep Belief Networks (DBN) and Generative Adversarial Networks (GAN) using FPGA • Current Shortcomings of Neuromorphic Approach: – Analog Platforms (Homoeostasis Necessary) – Digital Platforms Like TrueNorth – Algorithm Conversion to “Spiking” Version for Maximum Effectiveness Required (b/c Rate-Based is Inefficient) – FPGA: Very Flexible NvN Platform – Rapid Prototyping Enabling Algorithm Exploration – If Necessary, Mapping to ASIC is Possible
  33. 33. Accelerator for Deconvolutional Neural Network
  34. 34. Can Optimize Power/Performance Trade-Off Synthetically Generated Faces: Lower Bitwidth = Less Power Higher Bitwidth = More Realistic Metric Says Use 12 bits
  35. 35. Google Released Its AI Software as Open Source Accelerating Development https://exponential.singularityu.org/medicine/big-data-machine-learning-with-jeremy-howard/ From Programming Computers Step by Step To Achieve a Goal To Showing the Computer Some Examples of What You Want It to Achieve and Then Letting the Computer Figure It Out On Its Own --Jeremy Howard, Singularity Univ. 2015 November 9, 2015
  36. 36. Google Designed a NvN Machine Learning Accelerator Calit2 is Negotiating Access for CHASE-CI
  37. 37. Join the Fun – and Do Good Science! • The PRLab is a Nexus for Pattern Recognition, Machine Learning, and Neural Networks That Arise in Any Domain – From Medicine to the Arts • As We Expand Our Suite of Processors, Opportunities for Students and Others To Do Important Research and Development will Expand • The Applications are Important and Will Become Even More So… This is the Future!
  38. 38. Our Support: • US National Science Foundation (NSF) awards  CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158, ACI-1540112, & ACI-1541349 • University of California Office of the President CIO • UCSD Chancellor’s Integrated Digital Infrastructure Program • UCSD Next Generation Networking initiative • Calit2 and Calit2 Qualcomm Institute • CENIC, PacificWave and StarLight • DOE ESnet

×