SlideShare a Scribd company logo
1 of 25
Download to read offline
Storage and Data
Challenges for Production
Machine Learning
Nisha Talagala
CEO, Pyxeda AI
Machine Learning Growth
Data: Sources
and Storage
Compute:
Cloud, Hardware
Innovation
Algorithms and
Open Source
Growth of AI/ML
technologies/products
Each logo is a (separate) service offered by GCP, AWS or Azure for part of an AI workflow
Realities of Production
Use
https://www.oreilly.com/library/view/the-new-artificial/9781492048978/
https://emerj.com/ai-sector-overviews/valuing-the-artificial-intelligence-market-graphs-and-predictions/
Despite the advanced services available, AI usage still minimal
In This Talk:
• AI and ML: A quick overview
• Trends as relevant for Storage
• Workloads
• Trust, Governance and Data Management
• Edge
• The users
What is Machine Learning and AI?
• AI: Natural Language Processing, Image
Recognition, Anomaly Detection, etc.
• Machine Learning: Supervised,
Unsupervised, Reinforcement, Transfer, etc.
• Deep Learning: CNNs, RNNs etc.
• Common Threads
• Training
• Inference (aka Scoring, Model Serving,
Prediction)
AI
Machine
Learning
Deep
Learning
A typical flow
• Use case definition
• Data prep
• Modeling
• Training
• Deploy
• Integrate
• Monitor/Optimize
• Iterate
Data
Train
Model(s)
Develop
Model(s)
Test
Model(s)
Deploy
Model(s)
Connect
to
Business
app
App developers
Data Scientists
ML Engineers
Operations
Business
Need
Monitor
and
Optimize
A Typical ML Operational Pipeline
Data
Data Cleaning
Feature Eng
Model
Training
Model
Validation
Model
Prediction
Feature
Eng
Live
DataBusiness
Application
Model
Prediction
Training
Inference
Trend 1: How ML/DL Workloads Think About Data
• Data Sizes
• Incoming datasets can range from MB to TB
• Statistical ML Models are typically small. Largest models tend to be in deep neural
networks (DL) and range from 10s MB to GBs
• Common Structured Data Types
• Time series and Streams
• Multi-dimensional Arrays, Matrices and Vectors
• Common distributed patterns
• Data Parallel, periodic synchronization
• Model Parallel
• Straggler performance issues can be significant
Trend 1: How ML/DL Workloads Think About Data
• The older data gets – the more its “role” changes
• Older data for batch- historical analytics and model reboots
• Used for model training (sort of), not for inference
• Guarantees can be “flexible” on older data
• Availability can be reduced (most algorithms can deal with some data loss)
• A few data corruptions don’t really hurt J
• Data is evaluated in aggregate and algorithms are tolerant of outliers
• Holes are a fact of real life data – algorithms deal with it
• Quality of service exists but is different
• Random access is very rare
• Heavily patterned access (most operations are some form of array/matrix)
• Shuffle phase in some analytic engines
• Publicized “mistakes” that
damage corporate brands and
generate business risk
• Example Racism in Microsoft
Tay bot and Bias in Amazon HR
hiring tool
• Intersection of AI decisions and
human social values
AI Trust
Pillars for AI Trust
• Together ensure that the ML is
operating correctly and free
from intrusion
• Details about how and why
predictions and made
• Reproduce cases if needed
What does this mean for data?
Data
Data Cleaning
Feature Eng
Model
Training
Model
Validation
Model
Prediction
Feature
Eng
Live
DataBusiness
Application
Model
Prediction
Training
Inference
D
A
T
A
N
E
W
D
AT
A
N
E
W
D
AT
A
N
E
W
D
AT
A
N
E
W
D
AT
A
D
A
T
A
Access control, Lineage, Tracking of all data artifacts is critical for AI Trust
Trend 2: Need for Governance
• ML is only as good as its data
• Managing ML requires understanding data provenance
• How was it created? Where did it come from? When was it valid?
• Who can access it? (all or subsets)? Which features were used for what?
• How was it transformed?
• What ML was it used for and when?
• Solutions require both storage management and ML management
Trend 2: Need for Governance
• Examples
• Established: Example: Model Risk Management in Financial Services
• https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf
• Example GDPR/CCPA on Data, Reproducing and Explaining ML
Decisions
• https://iapp.org/news/a/is-there-a-right-to-explanation-for-machine-learning-in-
the-gdpr/
• Example: New York City Algorithm Fairness Monitoring
• https://techcrunch.com/2017/12/12/new-york-city-moves-to-establish-
algorithm-monitoring-task-force/
Trend 3: The Growing Role of the Edge
• Closest to data ingest, lowest latency.
• Benefits to real time ML inference and
(maybe later) training
• Varied hardware architectures and
resource constraints
• Differs from geographically distributed
data center architecture
• Creates need for cross cloud/edge data
storage and management strategies IoT Reference Model
Trend 4: The Changing Role of Persistence
• For ML functions, most computations today are in-memory
• Data load and store are primary storage interaction
• Intermediate data storage sometimes used
• Tiered memory can be used within engines
• For in-memory databases, persistence is part of the core engine
• Log based persistence is common
• Loading & cleaning of data is still a very large fraction of the
pipeline time
• Most of this involves manipulating stored data
Trend 5: Who accesses the data
• Multiple ML roles interact with data
• Data Scientist
• Decision Scientist, Decision Intelligence
• Data Engineer / ML Engineer
• ML roles need to collaborate with Operations roles for successful
Operational ML.
• Requires data access controls, access management to ensure ML
consistency and governance
Storage for ML: Challenges and Opportunities
• Data access Speeds (Particularly for Deep Learning Workloads)
• Data Management
• Reproducibility and Lineage
• Governance and the Challenges of Regulation, Data Access Control
and Access Management
• The Edge
• The new data managers
Storage for ML: Example systems
• Databricks Delta
• Apache Atlas
• RDMA data acceleration for Deep Learning (Ex. from Mellanox)
• Time series optimized databases (Ex. BTrDB, GorrillaDB)
• API pushdown techniques and Native RDD Access APIs (Ex. Iguaz.io)
• Lineage: Link data and compute history (Ex. Alluxio/formerly Tachyon)
• Memory expansion (Ex. Many studies on DRAM/Persistent Memory/Flash
tiering for analytics)
Takeaways
• The use of ML/DL in enterprise is at its infancy
• The first and most obvious storage challenge is performance
• The larger challenge is likely data management and governance
• Edge and distribution are also emerging challenges
• Opportunities exist to significantly improve storage and memory for
these use cases
Additional Resources
• NFS Vision report on Storage for 2025
• See Storage and AI track
• Proceedings/Slides of USENIX OpML 2019
• Research at HotStorage, HotEdge, FAST, USENIX ATC
Thank You
Nisha Talagala
nisha@pyxeda.ai
Data
Data Repositories SQL
Data
Data Streams NoSQL
A Sample Analytics Stack: (Partial) Ecosystem
Data from
Repositories or
Live Streams
Flink / Apex
Spark Streaming
Storm / Samza / NiFi
Caffe
Tensor Flow
Pytorch
Hadoop
Spark
Tensor Flow
SparkML, TensorFlow
Processing
Engines
Algorithms
and Libraries
Containerized
Models (Python
etc.)
Edited version of slide from Balint Fleischer’s talk:
Flash Memory Summit 2016, Santa Clara, CA
CL
Teaching Assistants
Elderly Companions
Service Robots
Personal Social Robots
Smart Cities
Robot Drones
Smart Homes
Intelligent Vehicles
Personal Assistants (bots) Smart Enterprise
X
Growing Sources of Data
Edge Cloud

More Related Content

What's hot

Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYCSri Ambati
 
IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesTony Pearson
 
IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the CloudsDr. Mirko Kämpf
 
Executive Briefing: What Is Fast Data And Why Is It Important
Executive Briefing: What Is Fast Data And Why Is It ImportantExecutive Briefing: What Is Fast Data And Why Is It Important
Executive Briefing: What Is Fast Data And Why Is It ImportantLightbend
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreAccelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreDatabricks
 
Introduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software StackIntroduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software StackJérôme Kehrli
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
 
Near realtime AI deployment with huge data and super low latency - Levi Brack...
Near realtime AI deployment with huge data and super low latency - Levi Brack...Near realtime AI deployment with huge data and super low latency - Levi Brack...
Near realtime AI deployment with huge data and super low latency - Levi Brack...Sri Ambati
 
Machine Learning Operations (MLOps) - Active Failures and Latent Conditions
Machine Learning Operations (MLOps) - Active Failures and Latent ConditionsMachine Learning Operations (MLOps) - Active Failures and Latent Conditions
Machine Learning Operations (MLOps) - Active Failures and Latent ConditionsFlavio Clesio
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonSynerzip
 
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Databricks
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_futureNisha Talagala
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationHunter Carlisle
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoSri Ambati
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 

What's hot (20)

Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYC
 
IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
 
IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the Clouds
 
Executive Briefing: What Is Fast Data And Why Is It Important
Executive Briefing: What Is Fast Data And Why Is It ImportantExecutive Briefing: What Is Fast Data And Why Is It Important
Executive Briefing: What Is Fast Data And Why Is It Important
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreAccelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
 
Introduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software StackIntroduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software Stack
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
Near realtime AI deployment with huge data and super low latency - Levi Brack...
Near realtime AI deployment with huge data and super low latency - Levi Brack...Near realtime AI deployment with huge data and super low latency - Levi Brack...
Near realtime AI deployment with huge data and super low latency - Levi Brack...
 
Machine Learning Operations (MLOps) - Active Failures and Latent Conditions
Machine Learning Operations (MLOps) - Active Failures and Latent ConditionsMachine Learning Operations (MLOps) - Active Failures and Latent Conditions
Machine Learning Operations (MLOps) - Active Failures and Latent Conditions
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
AI in the Enterprise at Scale
AI in the Enterprise at ScaleAI in the Enterprise at Scale
AI in the Enterprise at Scale
 
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production Application
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning Automático
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 

Similar to Msst 2019 v4

Fms invited talk_2018 v5
Fms invited talk_2018 v5Fms invited talk_2018 v5
Fms invited talk_2018 v5Nisha Talagala
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsGanesan Narayanasamy
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha Talagala
 
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...HostedbyConfluent
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Big data and machine learning / Gil Chamiel
Big data and machine learning / Gil Chamiel   Big data and machine learning / Gil Chamiel
Big data and machine learning / Gil Chamiel geektimecoil
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
EVAIN Artificial intelligence and semantic annotation: are you serious about it?
EVAIN Artificial intelligence and semantic annotation: are you serious about it?EVAIN Artificial intelligence and semantic annotation: are you serious about it?
EVAIN Artificial intelligence and semantic annotation: are you serious about it?FIAT/IFTA
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-finalsupportlogic
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
 

Similar to Msst 2019 v4 (20)

Fms invited talk_2018 v5
Fms invited talk_2018 v5Fms invited talk_2018 v5
Fms invited talk_2018 v5
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
 
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
 
Global ai conf_final
Global ai conf_finalGlobal ai conf_final
Global ai conf_final
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Big data and machine learning / Gil Chamiel
Big data and machine learning / Gil Chamiel   Big data and machine learning / Gil Chamiel
Big data and machine learning / Gil Chamiel
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
EVAIN Artificial intelligence and semantic annotation: are you serious about it?
EVAIN Artificial intelligence and semantic annotation: are you serious about it?EVAIN Artificial intelligence and semantic annotation: are you serious about it?
EVAIN Artificial intelligence and semantic annotation: are you serious about it?
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 

Recently uploaded

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Msst 2019 v4

  • 1. Storage and Data Challenges for Production Machine Learning Nisha Talagala CEO, Pyxeda AI
  • 2. Machine Learning Growth Data: Sources and Storage Compute: Cloud, Hardware Innovation Algorithms and Open Source
  • 3. Growth of AI/ML technologies/products Each logo is a (separate) service offered by GCP, AWS or Azure for part of an AI workflow
  • 5. In This Talk: • AI and ML: A quick overview • Trends as relevant for Storage • Workloads • Trust, Governance and Data Management • Edge • The users
  • 6. What is Machine Learning and AI? • AI: Natural Language Processing, Image Recognition, Anomaly Detection, etc. • Machine Learning: Supervised, Unsupervised, Reinforcement, Transfer, etc. • Deep Learning: CNNs, RNNs etc. • Common Threads • Training • Inference (aka Scoring, Model Serving, Prediction) AI Machine Learning Deep Learning
  • 7. A typical flow • Use case definition • Data prep • Modeling • Training • Deploy • Integrate • Monitor/Optimize • Iterate Data Train Model(s) Develop Model(s) Test Model(s) Deploy Model(s) Connect to Business app App developers Data Scientists ML Engineers Operations Business Need Monitor and Optimize
  • 8. A Typical ML Operational Pipeline Data Data Cleaning Feature Eng Model Training Model Validation Model Prediction Feature Eng Live DataBusiness Application Model Prediction Training Inference
  • 9. Trend 1: How ML/DL Workloads Think About Data • Data Sizes • Incoming datasets can range from MB to TB • Statistical ML Models are typically small. Largest models tend to be in deep neural networks (DL) and range from 10s MB to GBs • Common Structured Data Types • Time series and Streams • Multi-dimensional Arrays, Matrices and Vectors • Common distributed patterns • Data Parallel, periodic synchronization • Model Parallel • Straggler performance issues can be significant
  • 10. Trend 1: How ML/DL Workloads Think About Data • The older data gets – the more its “role” changes • Older data for batch- historical analytics and model reboots • Used for model training (sort of), not for inference • Guarantees can be “flexible” on older data • Availability can be reduced (most algorithms can deal with some data loss) • A few data corruptions don’t really hurt J • Data is evaluated in aggregate and algorithms are tolerant of outliers • Holes are a fact of real life data – algorithms deal with it • Quality of service exists but is different • Random access is very rare • Heavily patterned access (most operations are some form of array/matrix) • Shuffle phase in some analytic engines
  • 11. • Publicized “mistakes” that damage corporate brands and generate business risk • Example Racism in Microsoft Tay bot and Bias in Amazon HR hiring tool • Intersection of AI decisions and human social values AI Trust
  • 12. Pillars for AI Trust • Together ensure that the ML is operating correctly and free from intrusion • Details about how and why predictions and made • Reproduce cases if needed
  • 13. What does this mean for data? Data Data Cleaning Feature Eng Model Training Model Validation Model Prediction Feature Eng Live DataBusiness Application Model Prediction Training Inference D A T A N E W D AT A N E W D AT A N E W D AT A N E W D AT A D A T A Access control, Lineage, Tracking of all data artifacts is critical for AI Trust
  • 14. Trend 2: Need for Governance • ML is only as good as its data • Managing ML requires understanding data provenance • How was it created? Where did it come from? When was it valid? • Who can access it? (all or subsets)? Which features were used for what? • How was it transformed? • What ML was it used for and when? • Solutions require both storage management and ML management
  • 15. Trend 2: Need for Governance • Examples • Established: Example: Model Risk Management in Financial Services • https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf • Example GDPR/CCPA on Data, Reproducing and Explaining ML Decisions • https://iapp.org/news/a/is-there-a-right-to-explanation-for-machine-learning-in- the-gdpr/ • Example: New York City Algorithm Fairness Monitoring • https://techcrunch.com/2017/12/12/new-york-city-moves-to-establish- algorithm-monitoring-task-force/
  • 16. Trend 3: The Growing Role of the Edge • Closest to data ingest, lowest latency. • Benefits to real time ML inference and (maybe later) training • Varied hardware architectures and resource constraints • Differs from geographically distributed data center architecture • Creates need for cross cloud/edge data storage and management strategies IoT Reference Model
  • 17. Trend 4: The Changing Role of Persistence • For ML functions, most computations today are in-memory • Data load and store are primary storage interaction • Intermediate data storage sometimes used • Tiered memory can be used within engines • For in-memory databases, persistence is part of the core engine • Log based persistence is common • Loading & cleaning of data is still a very large fraction of the pipeline time • Most of this involves manipulating stored data
  • 18. Trend 5: Who accesses the data • Multiple ML roles interact with data • Data Scientist • Decision Scientist, Decision Intelligence • Data Engineer / ML Engineer • ML roles need to collaborate with Operations roles for successful Operational ML. • Requires data access controls, access management to ensure ML consistency and governance
  • 19. Storage for ML: Challenges and Opportunities • Data access Speeds (Particularly for Deep Learning Workloads) • Data Management • Reproducibility and Lineage • Governance and the Challenges of Regulation, Data Access Control and Access Management • The Edge • The new data managers
  • 20. Storage for ML: Example systems • Databricks Delta • Apache Atlas • RDMA data acceleration for Deep Learning (Ex. from Mellanox) • Time series optimized databases (Ex. BTrDB, GorrillaDB) • API pushdown techniques and Native RDD Access APIs (Ex. Iguaz.io) • Lineage: Link data and compute history (Ex. Alluxio/formerly Tachyon) • Memory expansion (Ex. Many studies on DRAM/Persistent Memory/Flash tiering for analytics)
  • 21. Takeaways • The use of ML/DL in enterprise is at its infancy • The first and most obvious storage challenge is performance • The larger challenge is likely data management and governance • Edge and distribution are also emerging challenges • Opportunities exist to significantly improve storage and memory for these use cases
  • 22. Additional Resources • NFS Vision report on Storage for 2025 • See Storage and AI track • Proceedings/Slides of USENIX OpML 2019 • Research at HotStorage, HotEdge, FAST, USENIX ATC
  • 24. Data Data Repositories SQL Data Data Streams NoSQL A Sample Analytics Stack: (Partial) Ecosystem Data from Repositories or Live Streams Flink / Apex Spark Streaming Storm / Samza / NiFi Caffe Tensor Flow Pytorch Hadoop Spark Tensor Flow SparkML, TensorFlow Processing Engines Algorithms and Libraries Containerized Models (Python etc.)
  • 25. Edited version of slide from Balint Fleischer’s talk: Flash Memory Summit 2016, Santa Clara, CA CL Teaching Assistants Elderly Companions Service Robots Personal Social Robots Smart Cities Robot Drones Smart Homes Intelligent Vehicles Personal Assistants (bots) Smart Enterprise X Growing Sources of Data Edge Cloud