SlideShare a Scribd company logo
Data Operations Problems Created by Deep
Learning, and How to Fix Them!
Jim Scott
@kingmesal
2 © 2018 MapR Technologies, Inc. // MapR Confidential
Public Service Announcement
You may see Artificial Intelligence, Machine
Learning and Deep Learning used interchangeably
within this presentation please feel free to
mentally substitute the phrase of your choice if
it is more applicable to you.
I’m not trying to split hairs on terminology.
Thanks for understanding!
3 © 2018 MapR Technologies, Inc. // MapR Confidential
Terminology
Data Science
Artificial Intelligence
Machine Learning
Deep
Learning
Data Science
Artificial Intelligence (AI)
Machine Learning (ML)
The use of algorithms to extract knowledge and
insights from data in various forms in order to
obtain insights.
Some Subfields: Statistics, Artificial Intelligence
(AI), Computational Math
The simulation of intelligent human behavior for
problem-solving and decision-making.
Some Subfields: Robotics, Natural Language
Processing (NLP), Machine Learning.
The process by which machines are taught to make
calculated suggestions and/or predictions by
examining large amounts of input data.
Some Subfields: Logistic Regression, Deep Learning,
Reinforcement Learning
4 © 2018 MapR Technologies, Inc. // MapR Confidential
90% of the effort in successful
machine learning isn’t in the
training or model development…
It’s the logistics!
5 © 2018 MapR Technologies, Inc. // MapR Confidential
“Machine learning offers a
fantastically powerful toolkit for
building complex systems
quickly.… it is remarkably easy
to incur massive ongoing
maintenance costs at the
system level when applying
machine learning.”
The Importance of Data Logistics
6 © 2018 MapR Technologies, Inc. // MapR Confidential
Why?
Just getting the training data is hard:
● Which data? How to make it accessible? Multiple sources!
● New kinds of observations force restarts
● Requires a ton of domain knowledge
The myth of a single model:
● You cannot train just one
● You will have dozens of models, likely hundreds or more
● Handoff to new versions is tricky
● You have to get runtime to be sure about which is better
7 © 2018 MapR Technologies, Inc. // MapR Confidential
Problem 1
Lack of support for the
Artificial Intelligence
Software Development Lifecycle
(AI-SDLC)
8 © 2018 MapR Technologies, Inc. // MapR Confidential
Building a Machine Learning Solution
1. Identify the data sources
2. Identify the tools
3. Write some code
4. Train a model
5. Test the model
6. Analyze the output of the tests
7. Repeat steps 3 through 6 until happy-ish
a. Maybe swap out a tool if your cannot achieve happiness
8. Figure out how to get this solution into production
9 © 2018 MapR Technologies, Inc. // MapR Confidential
Choosing the Best Machine Learning Tool
Most successful groups keep several “favorite” machine learning tools on hand:
● No single tool is the best in every situation
The most important tool is a platform that supports logistics well:
● Everything does not need to be done at the application level
● Lots of what matters can be handled at the platform level
A good design for the logistics can make a big difference
10 © 2018 MapR Technologies, Inc. // MapR Confidential
Problem 2
The deep learning
workload is only one
type of workload
11 © 2018 MapR Technologies, Inc. // MapR Confidential
Massaging your data
● This will normally include cleansing, normalizing, and even optimizing data formats
for downstream consumption for GPU based workloads
● Distributed compute is often the best approach for this type of activity due to the
volume of data and variety of data types
● Be sure to keep your original data, in case of mistakes
Separate your training and holdback data sets
● This is based off of the massaged data
Analyze model outputs to determine the quality of your models
● Especially valuable over time to know that the models are moving the right way
● Great candidate for a distributed workload
Workloads
12 © 2018 MapR Technologies, Inc. // MapR Confidential
Problem 3
Putting machine learning into
production is not quite the same
as other enterprise software
13 © 2018 MapR Technologies, Inc. // MapR Confidential
Gotchas with Making it to Production
● Ops-oriented people will not necessarily “get it” regarding modeling subtleties
● Data scientists will not necessarily “get it” regarding operational realities
● Therefore, modelers have to deliver self-contained models
● And, ops has to provide pre-wired structure
14 © 2018 MapR Technologies, Inc. // MapR Confidential
Handling Real-time
Stream instead of database as the shared “truth”
Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
15 © 2018 MapR Technologies, Inc. // MapR Confidential
Real-time can be started on
your schedule, that is the key
16 © 2018 MapR Technologies, Inc. // MapR Confidential
Problem 4
Running machine learning
models in more than
one location is tough
17 © 2018 MapR Technologies, Inc. // MapR Confidential
Streaming Isolates Services
18 © 2018 MapR Technologies, Inc. // MapR Confidential
With MapR, Geo-Distributed Data Appears Local
19 © 2018 MapR Technologies, Inc. // MapR Confidential
With MapR, Geo-Distributed Data Appears Local
Global Data Center
Regional Data Center
20 © 2018 MapR Technologies, Inc. // MapR Confidential
Features of Good Streaming
Persistent
● Messages stick around for other consumers
● Consumers don’t affect producers
● Consumer doesn’t have to be online when message arrives
Performant
● You should NEVER need to worry if a stream can keep up
Pervasive
● It is there whenever you need it, no need to deploy anything
● How much work is it to create a new file? Why harder for a stream?
21 © 2018 MapR Technologies, Inc. // MapR Confidential
Improving Machine Learning Logistics
Stream first architecture is a powerful approach with surprisingly widespread
advantages
● Innovative technologies emerging to for streaming data
Microservices approach provides flexibility
● Streaming supports microservices (if done right)
Containers remove surprises
● Predictable environment for running models
22 © 2018 MapR Technologies, Inc. // MapR Confidential
Problem 5
Data dependencies cost more
than code dependencies,
a lot more!
23 © 2018 MapR Technologies, Inc. // MapR Confidential
Data dependencies cost more than code dependencies!
● Code dependencies are easy to track, because it is a well known and a well
practiced discipline
● The data may be unstable
Undeclared consumers can wreak havoc on your models
● Downstream users may create a data dependency on the data from your model
● Updates to your model may break their system, if they made an assumption on
the function of your model
● Who do you think will suffer?
Top Reason to Use a Streaming Architecture
24 © 2018 MapR Technologies, Inc. // MapR Confidential
First Look with Streams
25 © 2018 MapR Technologies, Inc. // MapR Confidential
Then Rendezvous
26 © 2018 MapR Technologies, Inc. // MapR Confidential
Faster Throughput Through Failure
Suppose we have one model that can handle 10,000 t/s @ 2ms
● But this isn’t the most accurate model. Not bad, but not the best.
And our champion model can handle 1,000 t/s @ 10ms
● Then imagine a burst of 2,000 t/s for several minutes
Champion can only evaluate half of all requests
● Should skip to keep up
● Fast model will cover for champion
27 © 2018 MapR Technologies, Inc. // MapR Confidential
Rendezvous – Mainly for Making Decisions
Decisioning models
● Looking for a “right answer”
● Simpler than reinforcement learning
Examples
● Fraud detection
● Predictive analytics / market prediction
● Churn prediction (as in telecommunications)
● Yield optimization
● Deep learning in form of speech or image recognition, in some cases
28 © 2018 MapR Technologies, Inc. // MapR Confidential
Some Key Points
● Note that all models see identical inputs
● All models run in production setting
● All models send scores to same stream
● The rendezvous server decides which scores to ignore
● Roll forward, roll back, correlated comparison are all now trivial
29 © 2018 MapR Technologies, Inc. // MapR Confidential
Problem 6
Wash,
Rinse,
Repeat!
30 © 2018 MapR Technologies, Inc. // MapR Confidential
Are you performing all of these steps in your AI-SDLC manually?
● Consider a workflow tool
○ e.g. Airflow, Kubeflow, Argo, etc…
Is all of your data in static files or will there be real-time data?
● Prepare for real-time in development to be ready for production
Version everything!
● I’m sorry, this isn’t a job for GIT!
● Includes source data: static and real-time
● Also includes models and their output
● Ensures sanity checks
● Long-term performance analytics
Concerns About Repeatability
31 © 2018 MapR Technologies, Inc. // MapR Confidential
Quality & Reproducibility of Input Data is Important!
Recording raw-ish data is really a big deal
● Data as seen by a model is worth gold
● Data reconstructed later often has time-machine leaks
● Databases were made for updates, streams are safer
Raw data is useful for non-ML cases as well (think flexibility)
Decoy model records training data as seen by models under development &
evaluation
32 © 2018 MapR Technologies, Inc. // MapR Confidential
A Quick Review
33 © 2018 MapR Technologies, Inc. // MapR Confidential
The Proxy Talks to the Outside World
34 © 2018 MapR Technologies, Inc. // MapR Confidential
The Input Stream Feeds All Models Identically
35 © 2018 MapR Technologies, Inc. // MapR Confidential
The Scores Stream Contains All Results
36 © 2018 MapR Technologies, Inc. // MapR Confidential
The Rendezvous Picks A Result
37 © 2018 MapR Technologies, Inc. // MapR Confidential
Results Return Via A Stream and Return Address
38 © 2018 MapR Technologies, Inc. // MapR Confidential
Problem 7
In the real world
conditions may
(will) change!
39 © 2018 MapR Technologies, Inc. // MapR Confidential
Not Such Bad Ideas
Keep models running “in the wings”
● Do not wait until conditions change to start building the next model
● Keep new short-history models ready to roll, some graybeards as well
Hot hand-off
● With rendezvous, stop ignoring the new best model
Deploy a canary server
● Keep an old model active as a reference
● If it was 90% correct, difference with any better model should be small
● Score distribution should be roughly constant
40 © 2018 MapR Technologies, Inc. // MapR Confidential
Prepare for Scaling Up
Model Variety
● Multiple rendezvous frameworks for different tasks
Throughput
● Fast default models
● Partition input stream to allow parallel model evaluation
● Input batching
Extreme Volumes
● Cannibalize fancy models to run more fast/simple models
● Speed before beauty
41 © 2018 MapR Technologies, Inc. // MapR Confidential
Making Improvements
1. Data + the right question + domain knowledge, matters!
2. Prioritize – put serious effort into infrastructure
a. DataOps requires more than just data science
3. Persist – use streams to keep data around
4. Measure – everything, and record it
5. Analyze – understand and see what is happening
6. Containerize – make deployment predictable, repeatable and easy
42 © 2018 MapR Technologies, Inc. // MapR Confidential
Problems 8, 9 and 10
Copying data from your
streaming system,
data lake,
and edge systems to your
machine learning environment
43 © 2018 MapR Technologies, Inc. // MapR Confidential
PLEASE, PLEASE, PLEASE…
...tell me you are not copying
all your data between these systems
44 © 2018 MapR Technologies, Inc. // MapR Confidential
Storage
Appliance
Traditional Storage Vendor Solution
Edge
Copy
Ingest
Core Cloud
Unified Data
Lake
Data Prep
Training
+
Testing
Production
Training
Cluster Deployment
Copy
Storage
Appliance
ServersServers w/
GPU
Does NOT support real-time workflows
Doesn’t support distributed data preparation workloads
Copy
Copy
45 © 2018 MapR Technologies, Inc. // MapR Confidential
Hadoop Based Solutions
Edge
Copy
Core Cloud
Unified Data
Lake
Data Prep
Training
+
Testing
Production
Training
Cluster Deployment
HDFS
Cluster
ServersServers w/
GPU
Minimum of seven non-homogeneous environments to administer and secure
Full data copies without versioning, lineage control or multi-master support
Copy
Kafkain-motion
Copy
Copy
Copy
in-motion Kafka
in-motion
Copy
Copy
Copy
Ingest
Kafka
Where does the
master copy of
the data live?
46 © 2018 MapR Technologies, Inc. // MapR Confidential
MapR Solution
Data Fabric
Global Namespace
Core CloudEdge
Data Prep
Training
+
Testing
Deployment
One homogeneous environment to manage and secure
Supports real-time processing with data protection, lineage, and versioning
Runs directly on DGX servers to create a unified DGX cluster
47 © 2018 MapR Technologies, Inc. // MapR Confidential
MapR AI + RAPIDS
Document
DB
Events
Structured
Data
Unstructured
Data
Inference
Typical Training and Evaluation Workflow
Events
Production DeploymentData Management
Applications
RAPIDS
Apache
Arrow GPU Memory
cuGRAPH
Graph Analytics
cuML
Machine Learning
cuDF
Analytics
Data
Preparation
Training
Data Set
Model
Training
Evaluate/
Visualize
48 © 2018 MapR Technologies, Inc. // MapR Confidential
How Data is Accessed
Advantages of the MapR Data Fabric
● Linear Scalability
● Architected for performance, scale,
and reliability
● Distributed metadata in the fabric
How Data is Stored
How Data is Accessed
● Distributed location support
● Multi-master Replication
● Location awareness
How Data is Distributed
● Capability to serve as a system of record
● Data security and governance within the
fabric
● Mixed Data access from multiple
protocols
● Distributed Multi-tenancy
● Global Namespace
● Integrated data streaming for AI
49 © 2018 MapR Technologies, Inc. // MapR Confidential
On-premise or Cloud Infrastructure
• Combines Distributed
Compute, AI, HPC, and
general purpose
workloads
• MapR provides
complementary data
logistics to better manage
and deploy deep learning
across the entire ecosystem
• Enables deployment agility
with data management
extending from on-premise,
to cross-cloud, to the edge
Architecture Matters
50 © 2018 MapR Technologies, Inc. // MapR Confidential
Simplified administration and security models
● One and done - no need for a different model in each location
● GDPR “compliant”!
Scales linearly with customer needs
● No reason to create a bunch of separate clusters
Sustainability - All data, files, database and event streaming
● Both at-rest and in-motion
An enabling and flexible architecture
● Only way to bring distributed data and GPUs together
● Easy to meet customers needs
● Supports both kubernetes and containers
Low cost of entry and linear cost of scaling
MapR Advantages for AI
51 © 2018 MapR Technologies, Inc. // MapR Confidential
Same platform and architecture in all locations:
● On-premise works the same as the cloud
● Second cloud works the same as a first cloud
● Data mirroring between locations is built-in
● Real-time event management and lineage is built-in
○ Scale out streaming applications without rearchitecting them
● Kubernetes is a simple way to inject MapR storage and GPU support into a
container
○ Leverage resources anywhere with Global Namespace
○ Application portability across all locations, no rework required
On-Premise, Cloud or Both
52 © 2018 MapR Technologies, Inc. // MapR Confidential
Complex data pipelines, large data volumes serving GPUs
● Mixed workloads - distributed data prep plus real-time
Simultaneous data and model versioning
● Data at-rest and in-motion
Model output lands in a stream
● Creates pluggable model flow
Works across on-premise and cloud infrastructures, simultaneously
Simplifying Model Development and Deployment
53 © 2018 MapR Technologies, Inc. // MapR Confidential
“90+% of Machine Learning
Success is Data Logistics”
https://mapr.com/ebook/machine-learning-logistics
The Key is Data Logistics
54 © 2018 MapR Technologies, Inc. // MapR Confidential
● Over 35 FREE on-demand training courses for AI and analytic development, data
engineering and administration
● Certification tracks for developers, administrators, and data scientists
● Expanded support portal and knowledge base
● Containerized clusters, for free download, solution templates and code examples
for hands-on experience
https://mapr.com/training/
Need Help Solving Your Data Logistics Problems?
Thank you!

More Related Content

Similar to Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW TO FIX THEM!

DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
Ellen Friedman
 
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Matt Stubbs
 
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
Ellen Friedman
 
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 20187 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
Ellen Friedman
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
TigerGraph
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
DataWorks Summit
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial WorldCheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial World
Rehgan Avon
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
Antje Barth
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
Justin Brandenburg
 
The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
Shift Conference
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Matt Stubbs
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
WeAreEsynergy
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
Ted Dunning
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
Mathieu Dumoulin
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGain
Data Con LA
 

Similar to Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW TO FIX THEM! (20)

DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
 
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
 
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 20187 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial WorldCheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial World
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGain
 

More from Matt Stubbs

Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesBlueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Matt Stubbs
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data Platform
Matt Stubbs
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Matt Stubbs
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEBig Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Matt Stubbs
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLBig Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Matt Stubbs
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Matt Stubbs
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRBig Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPR
Matt Stubbs
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Matt Stubbs
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Matt Stubbs
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSBig Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Matt Stubbs
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEBig Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Matt Stubbs
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Matt Stubbs
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Matt Stubbs
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEBig Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Matt Stubbs
 

More from Matt Stubbs (20)

Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesBlueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data Platform
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEBig Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLBig Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRBig Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPR
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSBig Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEBig Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEBig Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
 

Recently uploaded

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW TO FIX THEM!

  • 1. Data Operations Problems Created by Deep Learning, and How to Fix Them! Jim Scott @kingmesal
  • 2. 2 © 2018 MapR Technologies, Inc. // MapR Confidential Public Service Announcement You may see Artificial Intelligence, Machine Learning and Deep Learning used interchangeably within this presentation please feel free to mentally substitute the phrase of your choice if it is more applicable to you. I’m not trying to split hairs on terminology. Thanks for understanding!
  • 3. 3 © 2018 MapR Technologies, Inc. // MapR Confidential Terminology Data Science Artificial Intelligence Machine Learning Deep Learning Data Science Artificial Intelligence (AI) Machine Learning (ML) The use of algorithms to extract knowledge and insights from data in various forms in order to obtain insights. Some Subfields: Statistics, Artificial Intelligence (AI), Computational Math The simulation of intelligent human behavior for problem-solving and decision-making. Some Subfields: Robotics, Natural Language Processing (NLP), Machine Learning. The process by which machines are taught to make calculated suggestions and/or predictions by examining large amounts of input data. Some Subfields: Logistic Regression, Deep Learning, Reinforcement Learning
  • 4. 4 © 2018 MapR Technologies, Inc. // MapR Confidential 90% of the effort in successful machine learning isn’t in the training or model development… It’s the logistics!
  • 5. 5 © 2018 MapR Technologies, Inc. // MapR Confidential “Machine learning offers a fantastically powerful toolkit for building complex systems quickly.… it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning.” The Importance of Data Logistics
  • 6. 6 © 2018 MapR Technologies, Inc. // MapR Confidential Why? Just getting the training data is hard: ● Which data? How to make it accessible? Multiple sources! ● New kinds of observations force restarts ● Requires a ton of domain knowledge The myth of a single model: ● You cannot train just one ● You will have dozens of models, likely hundreds or more ● Handoff to new versions is tricky ● You have to get runtime to be sure about which is better
  • 7. 7 © 2018 MapR Technologies, Inc. // MapR Confidential Problem 1 Lack of support for the Artificial Intelligence Software Development Lifecycle (AI-SDLC)
  • 8. 8 © 2018 MapR Technologies, Inc. // MapR Confidential Building a Machine Learning Solution 1. Identify the data sources 2. Identify the tools 3. Write some code 4. Train a model 5. Test the model 6. Analyze the output of the tests 7. Repeat steps 3 through 6 until happy-ish a. Maybe swap out a tool if your cannot achieve happiness 8. Figure out how to get this solution into production
  • 9. 9 © 2018 MapR Technologies, Inc. // MapR Confidential Choosing the Best Machine Learning Tool Most successful groups keep several “favorite” machine learning tools on hand: ● No single tool is the best in every situation The most important tool is a platform that supports logistics well: ● Everything does not need to be done at the application level ● Lots of what matters can be handled at the platform level A good design for the logistics can make a big difference
  • 10. 10 © 2018 MapR Technologies, Inc. // MapR Confidential Problem 2 The deep learning workload is only one type of workload
  • 11. 11 © 2018 MapR Technologies, Inc. // MapR Confidential Massaging your data ● This will normally include cleansing, normalizing, and even optimizing data formats for downstream consumption for GPU based workloads ● Distributed compute is often the best approach for this type of activity due to the volume of data and variety of data types ● Be sure to keep your original data, in case of mistakes Separate your training and holdback data sets ● This is based off of the massaged data Analyze model outputs to determine the quality of your models ● Especially valuable over time to know that the models are moving the right way ● Great candidate for a distributed workload Workloads
  • 12. 12 © 2018 MapR Technologies, Inc. // MapR Confidential Problem 3 Putting machine learning into production is not quite the same as other enterprise software
  • 13. 13 © 2018 MapR Technologies, Inc. // MapR Confidential Gotchas with Making it to Production ● Ops-oriented people will not necessarily “get it” regarding modeling subtleties ● Data scientists will not necessarily “get it” regarding operational realities ● Therefore, modelers have to deliver self-contained models ● And, ops has to provide pre-wired structure
  • 14. 14 © 2018 MapR Technologies, Inc. // MapR Confidential Handling Real-time Stream instead of database as the shared “truth” Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
  • 15. 15 © 2018 MapR Technologies, Inc. // MapR Confidential Real-time can be started on your schedule, that is the key
  • 16. 16 © 2018 MapR Technologies, Inc. // MapR Confidential Problem 4 Running machine learning models in more than one location is tough
  • 17. 17 © 2018 MapR Technologies, Inc. // MapR Confidential Streaming Isolates Services
  • 18. 18 © 2018 MapR Technologies, Inc. // MapR Confidential With MapR, Geo-Distributed Data Appears Local
  • 19. 19 © 2018 MapR Technologies, Inc. // MapR Confidential With MapR, Geo-Distributed Data Appears Local Global Data Center Regional Data Center
  • 20. 20 © 2018 MapR Technologies, Inc. // MapR Confidential Features of Good Streaming Persistent ● Messages stick around for other consumers ● Consumers don’t affect producers ● Consumer doesn’t have to be online when message arrives Performant ● You should NEVER need to worry if a stream can keep up Pervasive ● It is there whenever you need it, no need to deploy anything ● How much work is it to create a new file? Why harder for a stream?
  • 21. 21 © 2018 MapR Technologies, Inc. // MapR Confidential Improving Machine Learning Logistics Stream first architecture is a powerful approach with surprisingly widespread advantages ● Innovative technologies emerging to for streaming data Microservices approach provides flexibility ● Streaming supports microservices (if done right) Containers remove surprises ● Predictable environment for running models
  • 22. 22 © 2018 MapR Technologies, Inc. // MapR Confidential Problem 5 Data dependencies cost more than code dependencies, a lot more!
  • 23. 23 © 2018 MapR Technologies, Inc. // MapR Confidential Data dependencies cost more than code dependencies! ● Code dependencies are easy to track, because it is a well known and a well practiced discipline ● The data may be unstable Undeclared consumers can wreak havoc on your models ● Downstream users may create a data dependency on the data from your model ● Updates to your model may break their system, if they made an assumption on the function of your model ● Who do you think will suffer? Top Reason to Use a Streaming Architecture
  • 24. 24 © 2018 MapR Technologies, Inc. // MapR Confidential First Look with Streams
  • 25. 25 © 2018 MapR Technologies, Inc. // MapR Confidential Then Rendezvous
  • 26. 26 © 2018 MapR Technologies, Inc. // MapR Confidential Faster Throughput Through Failure Suppose we have one model that can handle 10,000 t/s @ 2ms ● But this isn’t the most accurate model. Not bad, but not the best. And our champion model can handle 1,000 t/s @ 10ms ● Then imagine a burst of 2,000 t/s for several minutes Champion can only evaluate half of all requests ● Should skip to keep up ● Fast model will cover for champion
  • 27. 27 © 2018 MapR Technologies, Inc. // MapR Confidential Rendezvous – Mainly for Making Decisions Decisioning models ● Looking for a “right answer” ● Simpler than reinforcement learning Examples ● Fraud detection ● Predictive analytics / market prediction ● Churn prediction (as in telecommunications) ● Yield optimization ● Deep learning in form of speech or image recognition, in some cases
  • 28. 28 © 2018 MapR Technologies, Inc. // MapR Confidential Some Key Points ● Note that all models see identical inputs ● All models run in production setting ● All models send scores to same stream ● The rendezvous server decides which scores to ignore ● Roll forward, roll back, correlated comparison are all now trivial
  • 29. 29 © 2018 MapR Technologies, Inc. // MapR Confidential Problem 6 Wash, Rinse, Repeat!
  • 30. 30 © 2018 MapR Technologies, Inc. // MapR Confidential Are you performing all of these steps in your AI-SDLC manually? ● Consider a workflow tool ○ e.g. Airflow, Kubeflow, Argo, etc… Is all of your data in static files or will there be real-time data? ● Prepare for real-time in development to be ready for production Version everything! ● I’m sorry, this isn’t a job for GIT! ● Includes source data: static and real-time ● Also includes models and their output ● Ensures sanity checks ● Long-term performance analytics Concerns About Repeatability
  • 31. 31 © 2018 MapR Technologies, Inc. // MapR Confidential Quality & Reproducibility of Input Data is Important! Recording raw-ish data is really a big deal ● Data as seen by a model is worth gold ● Data reconstructed later often has time-machine leaks ● Databases were made for updates, streams are safer Raw data is useful for non-ML cases as well (think flexibility) Decoy model records training data as seen by models under development & evaluation
  • 32. 32 © 2018 MapR Technologies, Inc. // MapR Confidential A Quick Review
  • 33. 33 © 2018 MapR Technologies, Inc. // MapR Confidential The Proxy Talks to the Outside World
  • 34. 34 © 2018 MapR Technologies, Inc. // MapR Confidential The Input Stream Feeds All Models Identically
  • 35. 35 © 2018 MapR Technologies, Inc. // MapR Confidential The Scores Stream Contains All Results
  • 36. 36 © 2018 MapR Technologies, Inc. // MapR Confidential The Rendezvous Picks A Result
  • 37. 37 © 2018 MapR Technologies, Inc. // MapR Confidential Results Return Via A Stream and Return Address
  • 38. 38 © 2018 MapR Technologies, Inc. // MapR Confidential Problem 7 In the real world conditions may (will) change!
  • 39. 39 © 2018 MapR Technologies, Inc. // MapR Confidential Not Such Bad Ideas Keep models running “in the wings” ● Do not wait until conditions change to start building the next model ● Keep new short-history models ready to roll, some graybeards as well Hot hand-off ● With rendezvous, stop ignoring the new best model Deploy a canary server ● Keep an old model active as a reference ● If it was 90% correct, difference with any better model should be small ● Score distribution should be roughly constant
  • 40. 40 © 2018 MapR Technologies, Inc. // MapR Confidential Prepare for Scaling Up Model Variety ● Multiple rendezvous frameworks for different tasks Throughput ● Fast default models ● Partition input stream to allow parallel model evaluation ● Input batching Extreme Volumes ● Cannibalize fancy models to run more fast/simple models ● Speed before beauty
  • 41. 41 © 2018 MapR Technologies, Inc. // MapR Confidential Making Improvements 1. Data + the right question + domain knowledge, matters! 2. Prioritize – put serious effort into infrastructure a. DataOps requires more than just data science 3. Persist – use streams to keep data around 4. Measure – everything, and record it 5. Analyze – understand and see what is happening 6. Containerize – make deployment predictable, repeatable and easy
  • 42. 42 © 2018 MapR Technologies, Inc. // MapR Confidential Problems 8, 9 and 10 Copying data from your streaming system, data lake, and edge systems to your machine learning environment
  • 43. 43 © 2018 MapR Technologies, Inc. // MapR Confidential PLEASE, PLEASE, PLEASE… ...tell me you are not copying all your data between these systems
  • 44. 44 © 2018 MapR Technologies, Inc. // MapR Confidential Storage Appliance Traditional Storage Vendor Solution Edge Copy Ingest Core Cloud Unified Data Lake Data Prep Training + Testing Production Training Cluster Deployment Copy Storage Appliance ServersServers w/ GPU Does NOT support real-time workflows Doesn’t support distributed data preparation workloads Copy Copy
  • 45. 45 © 2018 MapR Technologies, Inc. // MapR Confidential Hadoop Based Solutions Edge Copy Core Cloud Unified Data Lake Data Prep Training + Testing Production Training Cluster Deployment HDFS Cluster ServersServers w/ GPU Minimum of seven non-homogeneous environments to administer and secure Full data copies without versioning, lineage control or multi-master support Copy Kafkain-motion Copy Copy Copy in-motion Kafka in-motion Copy Copy Copy Ingest Kafka Where does the master copy of the data live?
  • 46. 46 © 2018 MapR Technologies, Inc. // MapR Confidential MapR Solution Data Fabric Global Namespace Core CloudEdge Data Prep Training + Testing Deployment One homogeneous environment to manage and secure Supports real-time processing with data protection, lineage, and versioning Runs directly on DGX servers to create a unified DGX cluster
  • 47. 47 © 2018 MapR Technologies, Inc. // MapR Confidential MapR AI + RAPIDS Document DB Events Structured Data Unstructured Data Inference Typical Training and Evaluation Workflow Events Production DeploymentData Management Applications RAPIDS Apache Arrow GPU Memory cuGRAPH Graph Analytics cuML Machine Learning cuDF Analytics Data Preparation Training Data Set Model Training Evaluate/ Visualize
  • 48. 48 © 2018 MapR Technologies, Inc. // MapR Confidential How Data is Accessed Advantages of the MapR Data Fabric ● Linear Scalability ● Architected for performance, scale, and reliability ● Distributed metadata in the fabric How Data is Stored How Data is Accessed ● Distributed location support ● Multi-master Replication ● Location awareness How Data is Distributed ● Capability to serve as a system of record ● Data security and governance within the fabric ● Mixed Data access from multiple protocols ● Distributed Multi-tenancy ● Global Namespace ● Integrated data streaming for AI
  • 49. 49 © 2018 MapR Technologies, Inc. // MapR Confidential On-premise or Cloud Infrastructure • Combines Distributed Compute, AI, HPC, and general purpose workloads • MapR provides complementary data logistics to better manage and deploy deep learning across the entire ecosystem • Enables deployment agility with data management extending from on-premise, to cross-cloud, to the edge Architecture Matters
  • 50. 50 © 2018 MapR Technologies, Inc. // MapR Confidential Simplified administration and security models ● One and done - no need for a different model in each location ● GDPR “compliant”! Scales linearly with customer needs ● No reason to create a bunch of separate clusters Sustainability - All data, files, database and event streaming ● Both at-rest and in-motion An enabling and flexible architecture ● Only way to bring distributed data and GPUs together ● Easy to meet customers needs ● Supports both kubernetes and containers Low cost of entry and linear cost of scaling MapR Advantages for AI
  • 51. 51 © 2018 MapR Technologies, Inc. // MapR Confidential Same platform and architecture in all locations: ● On-premise works the same as the cloud ● Second cloud works the same as a first cloud ● Data mirroring between locations is built-in ● Real-time event management and lineage is built-in ○ Scale out streaming applications without rearchitecting them ● Kubernetes is a simple way to inject MapR storage and GPU support into a container ○ Leverage resources anywhere with Global Namespace ○ Application portability across all locations, no rework required On-Premise, Cloud or Both
  • 52. 52 © 2018 MapR Technologies, Inc. // MapR Confidential Complex data pipelines, large data volumes serving GPUs ● Mixed workloads - distributed data prep plus real-time Simultaneous data and model versioning ● Data at-rest and in-motion Model output lands in a stream ● Creates pluggable model flow Works across on-premise and cloud infrastructures, simultaneously Simplifying Model Development and Deployment
  • 53. 53 © 2018 MapR Technologies, Inc. // MapR Confidential “90+% of Machine Learning Success is Data Logistics” https://mapr.com/ebook/machine-learning-logistics The Key is Data Logistics
  • 54. 54 © 2018 MapR Technologies, Inc. // MapR Confidential ● Over 35 FREE on-demand training courses for AI and analytic development, data engineering and administration ● Certification tracks for developers, administrators, and data scientists ● Expanded support portal and knowledge base ● Containerized clusters, for free download, solution templates and code examples for hands-on experience https://mapr.com/training/ Need Help Solving Your Data Logistics Problems?