SlideShare a Scribd company logo
Analytics and Machine Learning with
Red Hat Infrastructure
Kyle Bader, Senior Architect
Sean Pryor, AI Developer
Sherard Griffin, Senior Manager, Open Data Hub
BOSTON, 2019
● PROBLEM STATEMENT
○ Multi-tenant data analytics and machine learning
○ Shared data context
○ Sensitive data can’t leave the country, data governance restrictions
● DATA STRUCTURES
○ Shared data context with Ceph
○ Preparing your data
■ Structured data with Hive Metastore*
■ Semi-structured data
■ Data processing jobs
■ Spark
○ AI/ML
■ Features/Labels/other important terms
■ Background on AI and how it works
■ TensorFlow
● DATA PLATFORM ARCHITECTURE
○ Open Data Hub (Spark, Ceph, JupyterHub, TensorFlow)
○ Follow-up slides for them to learn more
■ ISVs
■ ODH
■ Frameworks
■ Other talks, etc.
PROBLEM STATEMENT
ANALYTICS AND ML CHALLENGES
EXPLOSIVE GROWTH
in analytics teams and analytic tools
MULTIPLE TEAMS COMPETING
for use of the same big data resources
CONGESTION
in busy analytic clusters causing frustration
and missed SLAs
HADOOP
SPARK
HIVE
PRESTO
IMPALA
KAFKA
NIFI
TENSORFLOW
PYTORCH
OPTIONS TO ADDRESS CHALLENGES
Get a bigger cluster
for many teams to share
Give each team
own dedicated cluster,
each with copies of
PBs of data
#1 #2
Give teams ability to
spin-up/spin-down
clusters which can
share common data store
#3
MULTI-WORKLOAD TENANCY
SHARED DATA CONTEXT
HIT SERVICE-LEVEL AGREEMENTS
Give teams their own compute clusters.
ELIMINATE IDLE RESOURCES
By right-sizing de-coupled compute and storage.
BUY 10’s OF PBS INSTEAD OF 100’s
Share data sets across clusters instead of duplicating them.
INCREASE AGILITY
With spin-up/spin-down clusters.
HYBRID CLOUD ANALYTICS AND ML
OPERATOR FRAMEWORK
Provides a managed service like experience
STATEFUL STORAGE SERVICES
Object, block, and file interfaces
DEVICE PLUGIN
GPU acceleration
LOCAL PVS
High performance scratch storage
DATA STRUCTURES
CLEANING AND CONFORMING
SEMI-STRUCTURED DATA
● Sources
○ Stateless applications
○ Sensors
● Common formats
○ CSV, JSON, XML
○ ORC, Avro, Parquet
DATA PROCESSING
● Variety of sources and formats
● Schema detection
● Distributed streaming and batch ETL
STRUCTURED DATA
● Cataloged into databases and tables
● External locations map to object URIs
● Table and column statistics
Select
Model
Select
Features
Model
Training
Model
Evaluation
Model
Tuning
Trained
Models
Model
Serving
&
Scoring
Keras
Microsoft
Cognitive
Toolkit
Horovod
MODELING AND SERVING
DATA PLATFORM
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
OPEN DATA HUB
Collaborate on a Data & AI platform for the Hybrid Cloud
● Open source community for AI-as-a-service platform
● Cloud-agnostic - AI for the Hybrid Cloud
● No cloud vendor lock-in
● OpenDataHub.io
Sentiment analysis and entity detection
on customer engagements, support
tickets, marketing surveys and more.
Trained on the specific Red Hat product
terminology.
AWS Microsoft AzureOpenStackDatacenterLaptop
CONTAINERIZER APPS
AT RED HAT’S CORE PROCESSES
Internal Use Cases
AWS Microsoft AzureOpenStackDatacenterLaptop
CONTAINERIZER APPS
AT RED HAT’S CORE PROCESSES
Internal Use Cases
Improve Red Hat’s core Engineering and
Operations processes by applying
analytics, machine learning, and AI.
AWS Microsoft AzureOpenStackDatacenterLaptop
CONTAINERIZER APPS
- rules
- heuristics
- ML
CORE DEPLOYMENT
● Container platform
● Certified Kubernetes
● Hybrid cloud
● Unified, distributed
storage
● RESTful gateway
● S3 and Swift compatible
● Radanalytics.io
community
● Unified analytics
engine
● Large-scale data
● Runs on Kubernetes
● Multi-user Jupyter
● Used for data science
and research
Available Now at OpenDataHub.io
Add-Ons
● Part of Open Data Hub
● Set of deployed
pre-defined AI models
available to use
● Monitoring and alerting
toolkit
● Records numeric time
series data
● Used to diagnose
problems
● Analytics platform for
all metrics
● Query, visualize and
alert on metrics
● Deploying machine
learning models on
Kubernetes
● Expose models via
REST and gRPC
● Full model lifecycle
management
Available Now at OpenDataHub.io
Open Data Hub
AI Library
RUNNING AT RED HAT
PLANNED RELEASES
Highlights
July
2019
Data Engineering Additions
- Cloudera Hue deployment
- Spark SQL Thrift Server deployment
- Argo deployment
- MLFlow deployment
- Kubeflow integration
- Kafka (Strimzi) deployment
- Seldon-core deployment
October
2019
To be determined
January
2019
Version 0.1 - Initial ODH Release
- OCP 3.10 and 3.11 support
- JupyterHub + Spark + Ceph-nano
deployment
April
2019
Operator Support + Monitoring
- OCP 4.0+ support
- Open Data Hub operator
- AI Library
- Rook for Ceph deployment
- TwoSigma BeakerX integration
- JupyterHub with GPU support
- Prometheus deployment with Spark
monitoring
AI AND MACHINE LEARNING
IN THIS LAB
AI IN THIS LAB
WHAT NEXT?
● Try Open Data Hub yourself!
○ https://try.openshift.com
○ https://gitlab.com/opendatahub/opendatahub-operator
● Building the Next Generation of Innovation Together
○ Thursday at 8:30 AM
● Kaleidoscope of Innovation: AI and Machine Learning on
OpenShift
○ Part 1: Thursday at 2:00 PM
○ Part 2: Thursday at 3:15 PM
Red Hat data analytics infrastructure solution
red.ht/videos-RHDAIS
MACHINE LEARNING CYCLE
Ingest Prepare Preprocess Discover Develop Train Test Deploy
MKL-DNN
cuDNN

More Related Content

What's hot

Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
kbajda
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
GraphAware
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
Vinayak Hegde
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
Stitch Fix Algorithms
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
Stitch Fix Algorithms
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
Data Science Thailand
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
UA DevOps Conference
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
 
Serverless data pipelines gcp
Serverless data pipelines gcpServerless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for Quants
Wes McKinney
 
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Databricks
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
 
Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
BigData_Europe
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
Jen Stirrup
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
DataWorks Summit
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
Lars Albertsson
 

What's hot (20)

Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
Serverless data pipelines gcp
Serverless data pipelines gcpServerless data pipelines gcp
Serverless data pipelines gcp
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for Quants
 
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 

Similar to Red hat infrastructure for analytics

Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
Cambridge Semantics
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
Marcin Bielak
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
Martin Zapletal
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
Cambridge Semantics
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
Abhinav Joshi
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Cambridge Semantics
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
Karin Patenge
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Anant Corporation
 

Similar to Red hat infrastructure for analytics (20)

Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

Red hat infrastructure for analytics

  • 1. Analytics and Machine Learning with Red Hat Infrastructure Kyle Bader, Senior Architect Sean Pryor, AI Developer Sherard Griffin, Senior Manager, Open Data Hub BOSTON, 2019
  • 2. ● PROBLEM STATEMENT ○ Multi-tenant data analytics and machine learning ○ Shared data context ○ Sensitive data can’t leave the country, data governance restrictions ● DATA STRUCTURES ○ Shared data context with Ceph ○ Preparing your data ■ Structured data with Hive Metastore* ■ Semi-structured data ■ Data processing jobs ■ Spark ○ AI/ML ■ Features/Labels/other important terms ■ Background on AI and how it works ■ TensorFlow ● DATA PLATFORM ARCHITECTURE ○ Open Data Hub (Spark, Ceph, JupyterHub, TensorFlow) ○ Follow-up slides for them to learn more ■ ISVs ■ ODH ■ Frameworks ■ Other talks, etc.
  • 4. ANALYTICS AND ML CHALLENGES EXPLOSIVE GROWTH in analytics teams and analytic tools MULTIPLE TEAMS COMPETING for use of the same big data resources CONGESTION in busy analytic clusters causing frustration and missed SLAs HADOOP SPARK HIVE PRESTO IMPALA KAFKA NIFI TENSORFLOW PYTORCH
  • 5. OPTIONS TO ADDRESS CHALLENGES Get a bigger cluster for many teams to share Give each team own dedicated cluster, each with copies of PBs of data #1 #2 Give teams ability to spin-up/spin-down clusters which can share common data store #3
  • 6. MULTI-WORKLOAD TENANCY SHARED DATA CONTEXT HIT SERVICE-LEVEL AGREEMENTS Give teams their own compute clusters. ELIMINATE IDLE RESOURCES By right-sizing de-coupled compute and storage. BUY 10’s OF PBS INSTEAD OF 100’s Share data sets across clusters instead of duplicating them. INCREASE AGILITY With spin-up/spin-down clusters.
  • 7. HYBRID CLOUD ANALYTICS AND ML OPERATOR FRAMEWORK Provides a managed service like experience STATEFUL STORAGE SERVICES Object, block, and file interfaces DEVICE PLUGIN GPU acceleration LOCAL PVS High performance scratch storage
  • 10. SEMI-STRUCTURED DATA ● Sources ○ Stateless applications ○ Sensors ● Common formats ○ CSV, JSON, XML ○ ORC, Avro, Parquet
  • 11. DATA PROCESSING ● Variety of sources and formats ● Schema detection ● Distributed streaming and batch ETL
  • 12. STRUCTURED DATA ● Cataloged into databases and tables ● External locations map to object URIs ● Table and column statistics
  • 20. OPEN DATA HUB Collaborate on a Data & AI platform for the Hybrid Cloud ● Open source community for AI-as-a-service platform ● Cloud-agnostic - AI for the Hybrid Cloud ● No cloud vendor lock-in ● OpenDataHub.io
  • 21. Sentiment analysis and entity detection on customer engagements, support tickets, marketing surveys and more. Trained on the specific Red Hat product terminology. AWS Microsoft AzureOpenStackDatacenterLaptop CONTAINERIZER APPS AT RED HAT’S CORE PROCESSES Internal Use Cases
  • 22. AWS Microsoft AzureOpenStackDatacenterLaptop CONTAINERIZER APPS AT RED HAT’S CORE PROCESSES Internal Use Cases Improve Red Hat’s core Engineering and Operations processes by applying analytics, machine learning, and AI. AWS Microsoft AzureOpenStackDatacenterLaptop CONTAINERIZER APPS - rules - heuristics - ML
  • 23. CORE DEPLOYMENT ● Container platform ● Certified Kubernetes ● Hybrid cloud ● Unified, distributed storage ● RESTful gateway ● S3 and Swift compatible ● Radanalytics.io community ● Unified analytics engine ● Large-scale data ● Runs on Kubernetes ● Multi-user Jupyter ● Used for data science and research Available Now at OpenDataHub.io
  • 24. Add-Ons ● Part of Open Data Hub ● Set of deployed pre-defined AI models available to use ● Monitoring and alerting toolkit ● Records numeric time series data ● Used to diagnose problems ● Analytics platform for all metrics ● Query, visualize and alert on metrics ● Deploying machine learning models on Kubernetes ● Expose models via REST and gRPC ● Full model lifecycle management Available Now at OpenDataHub.io Open Data Hub AI Library
  • 26. PLANNED RELEASES Highlights July 2019 Data Engineering Additions - Cloudera Hue deployment - Spark SQL Thrift Server deployment - Argo deployment - MLFlow deployment - Kubeflow integration - Kafka (Strimzi) deployment - Seldon-core deployment October 2019 To be determined January 2019 Version 0.1 - Initial ODH Release - OCP 3.10 and 3.11 support - JupyterHub + Spark + Ceph-nano deployment April 2019 Operator Support + Monitoring - OCP 4.0+ support - Open Data Hub operator - AI Library - Rook for Ceph deployment - TwoSigma BeakerX integration - JupyterHub with GPU support - Prometheus deployment with Spark monitoring
  • 27. AI AND MACHINE LEARNING IN THIS LAB
  • 28. AI IN THIS LAB
  • 29. WHAT NEXT? ● Try Open Data Hub yourself! ○ https://try.openshift.com ○ https://gitlab.com/opendatahub/opendatahub-operator ● Building the Next Generation of Innovation Together ○ Thursday at 8:30 AM ● Kaleidoscope of Innovation: AI and Machine Learning on OpenShift ○ Part 1: Thursday at 2:00 PM ○ Part 2: Thursday at 3:15 PM Red Hat data analytics infrastructure solution red.ht/videos-RHDAIS
  • 30.
  • 31.
  • 32. MACHINE LEARNING CYCLE Ingest Prepare Preprocess Discover Develop Train Test Deploy MKL-DNN cuDNN