SlideShare a Scribd company logo
1 of 57
Scott HooverOperationalizing Analytics To Scale
Operationalizing Analytics To Scale
Many companies have invested time and money into building sophisticated data pipelines
that can move massive amounts of data in (near) real time. However, for the analyst or data
scientist who builds models offline, integrating their analyses into these pipelines for
operational purposes can pose a challenge.
In this workshop, we will discuss some key technologies and workflows companies can
leverage to build end-to-end solutions for automating analytical, statistical and machine
learning solutions: from collection and storage to analysis and real-time predictions.
Abstract
Agenda
● Introduction
Agenda
● Introduction
● What Are we Talking About Exactly?
Agenda
● Introduction
● What Are we Talking About Exactly?
● The Problem at Hand
Agenda
● Introduction
● What Are we Talking About Exactly?
● The Problem at Hand
● Operationalizing Analytics
Agenda
● Introduction
● What Are we Talking About Exactly?
● The Problem at Hand
● Operationalizing Analytics
● Operationalizing Predictive Analytics
Agenda
● Introduction
● What Are we Talking About Exactly?
● The Problem at Hand
● Operationalizing Analytics
● Operationalizing Predictive Analytics
● Questions
Agenda
Introduction
● I work on the Internal Data team at Looker.
Introduction
● I work on the Internal Data team at Looker.
● Before Looker, I worked in consulting and research.
Introduction
● I work on the Internal Data team at Looker.
● Before Looker, I worked in consulting and research.
● Looker is a business intelligence tool.
What are we talking about?
● What do I mean when I say “operationalizing”?
What are we talking about?
● What do I mean when I say “operationalizing”?
● Why is this important?
The Problem at Hand
● Analysts are providing basic reports for the entire
business.
● Analysts are providing basic reports for the entire
business.
● Analysts and Data Scientists are building offline models.
The Problem at Hand
The Problem With Offline Models
● Offline analyses aren’t associated with particularly quick
turnaround times.
The Problem With Offline Models
● Offline analyses aren’t associated with particularly quick
turnaround times.
● Offline analyses aren’t particularly collaborative.
The Problem With Offline Models
● Offline analyses aren’t associated with particularly quick
turnaround times.
● Offline analyses aren’t particularly collaborative.
● Offline analyses aren’t particularly portable.
A Potential Set-up (Straw Man)
Data Sources
http
Data
Stores
query
Analysis Consumption
Operationalizing Analytics - The Simple Case
Operationalizing Analytics - The Simple Case
● These metrics are vanilla.
● These metrics are vanilla.
● These metrics are critical.
Operationalizing Analytics - The Simple Case
● These metrics are vanilla.
● These metrics are critical.
● The business would probably better served if Data
Scientists and Analysts were spending their time
answering questions that require deep technical
knowledge.
Operationalizing Analytics - The Simple Case
● Build or buy a workhorse ETL tool.
Operationalizing Analytics - A How To
● Build or buy a workhorse ETL tool.
● Move toward an Operational Data Store (ODS), reducing
the need for postprocessing and data “mashups.”
Operationalizing Analytics - A How To
● Build or buy a workhorse ETL tool.
● Move toward an Operational Data Store (ODS), reducing
the need for postprocessing and data “mashups.”
● Emphasize self-service wherever possible.
Operationalizing Analytics - A How To
● Build or buy a workhorse ETL tool.
● Move toward an Operational Data Store (ODS), reducing
the need for postprocessing and data “mashups.”
● Emphasize self-service wherever possible.
● Analytics should slot into existing the infrastructure with
minimal friction.
Operationalizing Analytics - A How To
Operationalizing Predictive Analytics
Where to Begin
● Out-of-the-box tools.
● Out-of-the-box tools.
● Build from scratch.
Where to Begin
● Out-of-the-box tools.
● Build from scratch.
● A mean between extremes.
Where to Begin
● XML-based, model-storage format.
A Model Standard - PMML
● XML-based, model-storage format.
● Created and maintained by the Data Mining Group.
A Model Standard - PMML
● XML-based, model-storage format.
● Created and maintained by the Data Mining Group.
● Most commonly used statistical/machine learning
models are supported.
A Model Standard - PMML
PMML Integrations
Producers Consumers
JPMML
● JPMML is an open-source API for evaluating PMML files.
JPMML
● JPMML is an open-source API for evaluating PMML files.
● In essence, we equip the JPMML application with our PMML file,
serve it up with new data, and it provides us with predictions.
JPMML
● JPMML is an open-source API for evaluating PMML files.
● In essence, we equip the JPMML application with our PMML file,
serve it up with new data, and it provides us with predictions.
● Openscoring.io distributes various JPPML APIs and UDFs—for
example, RESTful API, Heroku, Hive, Pig, Cascading and
PostgreSQL.
JPMML
● JPMML is an open-source API for evaluating PMML files.
● In essence, we equip the JPMML application with our PMML file,
serve it up with new data, and it provides us with predictions.
● Openscoring.io distributes various JPPML APIs and UDFs—for
example, RESTful API, Heroku, Hive, Pig, Cascading and
PostgreSQL.
● All we have to do is write some code that fetches new values, serves
them up to the JPMML API, captures the predictions, then pushes
them back to a database.
Example Architecture - Lead Scoring
API
API
GET lead
UPDATE lead
GET leads
Heroku: git push heroku master
REST: curl -X PUT --data-binary @BayesLeadScore.pmml -H "Content-type:
text/xml" http://ec2_endpoint/openscoring/model/BayesLeadScore
Deploy Model - PUT /model/${id}
CURLing or navigating to
http://heroku_endpoint/openscoring/model/BayesLeadScore
or
http://ec2_endpoint/openscoring/model/BayesLeadScore
will display our pmml model.
View Model - GET /model/${id}
Test Model - POST /model/${id}
newLead.json
curl -X POST 
--data-binary @newLead.json 
-H "Content-type: application/json" 
http://ec2_endpoint/openscoring/model/Ba
yesLeadScore
Send request to JPMML API{
“id” : “001”,
“arguments” : {
“country” :
“US”,
“budget” :
7.8
}
}
Example Response
{
“id” : “001”,
“result” : {
“meeting” : “1”,
“Probability_0” :
0.33062906130485653,
“Probability_1” : 0.6693709386951435
}
}
Batch Request - POST /model/${id}/batch
batchLeads.json
curl -X POST --data-binary 
@batchLeads.json -H "Content-type: 
application/json" 
http://ec2_endpoint/openscoring/model/Ba
yesLeadScore/batch
Send request to JPMML API
{
"id":"batch-1",
"requests":[
{
"id":"001",
"arguments":{
"country":"US",
"budget":7.8
}
},
{
"id":"002",
"arguments":{
"country":"CA",
"budget":3.2
}
}
]
}
Scale Considerations
Scale Considerations
● Horizontal scaling.
Scale Considerations
● Horizontal scaling.
● Vertical scaling.
What About Truly Big Data?
● For the rare few of us who need to make real-time predictions
against millions of rows per second, there’s a popular apache suite
to handle this.
*image borrowed from OryxProject
Applications
ODS Analysis
APIs
Transactional DB
/ Event Storage
Business Intelligence
Scoring Server
Consumers Review /
Versioning
Closing Thoughts
Questions
Learn more at looker.com/demo

More Related Content

What's hot

Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You! DataKitchen
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOpsSteven Ensslen
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningJen Aman
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Looker
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative DataDatabricks
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsRob Winters
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019DataKitchen
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflixCody Rioux
 
Improving Data Modeling Workflow
Improving Data Modeling WorkflowImproving Data Modeling Workflow
Improving Data Modeling WorkflowLooker
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondDataWorks Summit/Hadoop Summit
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousingRob Winters
 
What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)Newton Day Uploads
 

What's hot (20)

Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old Constraints
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine Learning
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative Data
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data Analytics
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
Improving Data Modeling Workflow
Improving Data Modeling WorkflowImproving Data Modeling Workflow
Improving Data Modeling Workflow
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousing
 
What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)
 

Viewers also liked

On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsVillu Ruusmann
 
Boligløsninger for eldre 040214: Lene Schmidts presentasjon
Boligløsninger for eldre 040214: Lene Schmidts presentasjonBoligløsninger for eldre 040214: Lene Schmidts presentasjon
Boligløsninger for eldre 040214: Lene Schmidts presentasjonUrbanRegionalResearch
 
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...PAPIs.io
 
ROI & Social webinar with Craig Rosenberg & Jason Miller
ROI & Social webinar with Craig Rosenberg & Jason MillerROI & Social webinar with Craig Rosenberg & Jason Miller
ROI & Social webinar with Craig Rosenberg & Jason MillerViralheat
 
Predictive analytics from a to z
Predictive analytics from a to zPredictive analytics from a to z
Predictive analytics from a to zalpinedatalabs
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
Winning with Data
Winning with Data Winning with Data
Winning with Data Looker
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
 
A Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentViewA Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentViewramesh.latentview
 
How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...Looker
 
How to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsLooker
 
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...Traction Conf
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Languageaguazzel
 
Representing TF and TF-IDF transformations in PMML
Representing TF and TF-IDF transformations in PMMLRepresenting TF and TF-IDF transformations in PMML
Representing TF and TF-IDF transformations in PMMLVillu Ruusmann
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Traction Conf
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 

Viewers also liked (20)

On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) models
 
Boligløsninger for eldre 040214: Lene Schmidts presentasjon
Boligløsninger for eldre 040214: Lene Schmidts presentasjonBoligløsninger for eldre 040214: Lene Schmidts presentasjon
Boligløsninger for eldre 040214: Lene Schmidts presentasjon
 
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
 
ROI & Social webinar with Craig Rosenberg & Jason Miller
ROI & Social webinar with Craig Rosenberg & Jason MillerROI & Social webinar with Craig Rosenberg & Jason Miller
ROI & Social webinar with Craig Rosenberg & Jason Miller
 
Predictive analytics from a to z
Predictive analytics from a to zPredictive analytics from a to z
Predictive analytics from a to z
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Winning with Data
Winning with Data Winning with Data
Winning with Data
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
A Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentViewA Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentView
 
How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...
 
How to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to Insights
 
Production Grade Data Science for Hadoop
Production Grade Data Science for HadoopProduction Grade Data Science for Hadoop
Production Grade Data Science for Hadoop
 
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
 
Representing TF and TF-IDF transformations in PMML
Representing TF and TF-IDF transformations in PMMLRepresenting TF and TF-IDF transformations in PMML
Representing TF and TF-IDF transformations in PMML
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 

Similar to Operationalizing analytics to scale

Data_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfData_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfprevota
 
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDemocratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDatabricks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!Julian Feinauer
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfvitm11
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Singh
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Real-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationReal-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationPatrick Van Renterghem
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019VMware Tanzu
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsMárton Kodok
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 

Similar to Operationalizing analytics to scale (20)

DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Data_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfData_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdf
 
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDemocratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Real-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationReal-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse Automation
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 

More from Looker

Join 2017_Deep Dive_Table Calculations 201
Join 2017_Deep Dive_Table Calculations 201Join 2017_Deep Dive_Table Calculations 201
Join 2017_Deep Dive_Table Calculations 201Looker
 
Join 2017_Deep Dive_Table Calculations 101
Join 2017_Deep Dive_Table Calculations 101Join 2017_Deep Dive_Table Calculations 101
Join 2017_Deep Dive_Table Calculations 101Looker
 
Join 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart CachingJoin 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart CachingLooker
 
Join 2017_Deep Dive_Sessionization
Join 2017_Deep Dive_SessionizationJoin 2017_Deep Dive_Sessionization
Join 2017_Deep Dive_SessionizationLooker
 
Join 2017_Deep Dive_Redshift Optimization
Join 2017_Deep Dive_Redshift OptimizationJoin 2017_Deep Dive_Redshift Optimization
Join 2017_Deep Dive_Redshift OptimizationLooker
 
Join 2017_Deep Dive_Integrating Looker with R and Python
Join 2017_Deep Dive_Integrating Looker with R and PythonJoin 2017_Deep Dive_Integrating Looker with R and Python
Join 2017_Deep Dive_Integrating Looker with R and PythonLooker
 
Join 2017_Deep Dive_Customer Retention
Join 2017_Deep Dive_Customer Retention Join 2017_Deep Dive_Customer Retention
Join 2017_Deep Dive_Customer Retention Looker
 
Join 2017_Deep Dive_Workflows with Zapier
Join 2017_Deep Dive_Workflows with ZapierJoin 2017_Deep Dive_Workflows with Zapier
Join 2017_Deep Dive_Workflows with ZapierLooker
 
Join2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS OperationsJoin2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS OperationsLooker
 
Join 2017 - Deep Dive - Action Hub
Join 2017 - Deep Dive - Action HubJoin 2017 - Deep Dive - Action Hub
Join 2017 - Deep Dive - Action HubLooker
 
Winning the 3rd Wave of BI
Winning the 3rd Wave of BIWinning the 3rd Wave of BI
Winning the 3rd Wave of BILooker
 
Wisdom of Crowds Webinar Deck
Wisdom of Crowds Webinar DeckWisdom of Crowds Webinar Deck
Wisdom of Crowds Webinar DeckLooker
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Looker
 
Meet Looker 4
Meet Looker 4Meet Looker 4
Meet Looker 4Looker
 
Data Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutData Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutLooker
 
Embedding Data & Analytics With Looker
Embedding Data & Analytics With LookerEmbedding Data & Analytics With Looker
Embedding Data & Analytics With LookerLooker
 
The Three Pillars of Customer Success Analytics
The Three Pillars of Customer Success AnalyticsThe Three Pillars of Customer Success Analytics
The Three Pillars of Customer Success AnalyticsLooker
 
The Power of Smart Counting at The RealReal
The Power of Smart Counting at The RealRealThe Power of Smart Counting at The RealReal
The Power of Smart Counting at The RealRealLooker
 
Data Democracy: Hadoop + Redshift
Data Democracy: Hadoop + RedshiftData Democracy: Hadoop + Redshift
Data Democracy: Hadoop + RedshiftLooker
 
Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Looker
 

More from Looker (20)

Join 2017_Deep Dive_Table Calculations 201
Join 2017_Deep Dive_Table Calculations 201Join 2017_Deep Dive_Table Calculations 201
Join 2017_Deep Dive_Table Calculations 201
 
Join 2017_Deep Dive_Table Calculations 101
Join 2017_Deep Dive_Table Calculations 101Join 2017_Deep Dive_Table Calculations 101
Join 2017_Deep Dive_Table Calculations 101
 
Join 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart CachingJoin 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart Caching
 
Join 2017_Deep Dive_Sessionization
Join 2017_Deep Dive_SessionizationJoin 2017_Deep Dive_Sessionization
Join 2017_Deep Dive_Sessionization
 
Join 2017_Deep Dive_Redshift Optimization
Join 2017_Deep Dive_Redshift OptimizationJoin 2017_Deep Dive_Redshift Optimization
Join 2017_Deep Dive_Redshift Optimization
 
Join 2017_Deep Dive_Integrating Looker with R and Python
Join 2017_Deep Dive_Integrating Looker with R and PythonJoin 2017_Deep Dive_Integrating Looker with R and Python
Join 2017_Deep Dive_Integrating Looker with R and Python
 
Join 2017_Deep Dive_Customer Retention
Join 2017_Deep Dive_Customer Retention Join 2017_Deep Dive_Customer Retention
Join 2017_Deep Dive_Customer Retention
 
Join 2017_Deep Dive_Workflows with Zapier
Join 2017_Deep Dive_Workflows with ZapierJoin 2017_Deep Dive_Workflows with Zapier
Join 2017_Deep Dive_Workflows with Zapier
 
Join2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS OperationsJoin2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS Operations
 
Join 2017 - Deep Dive - Action Hub
Join 2017 - Deep Dive - Action HubJoin 2017 - Deep Dive - Action Hub
Join 2017 - Deep Dive - Action Hub
 
Winning the 3rd Wave of BI
Winning the 3rd Wave of BIWinning the 3rd Wave of BI
Winning the 3rd Wave of BI
 
Wisdom of Crowds Webinar Deck
Wisdom of Crowds Webinar DeckWisdom of Crowds Webinar Deck
Wisdom of Crowds Webinar Deck
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Meet Looker 4
Meet Looker 4Meet Looker 4
Meet Looker 4
 
Data Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutData Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at Tout
 
Embedding Data & Analytics With Looker
Embedding Data & Analytics With LookerEmbedding Data & Analytics With Looker
Embedding Data & Analytics With Looker
 
The Three Pillars of Customer Success Analytics
The Three Pillars of Customer Success AnalyticsThe Three Pillars of Customer Success Analytics
The Three Pillars of Customer Success Analytics
 
The Power of Smart Counting at The RealReal
The Power of Smart Counting at The RealRealThe Power of Smart Counting at The RealReal
The Power of Smart Counting at The RealReal
 
Data Democracy: Hadoop + Redshift
Data Democracy: Hadoop + RedshiftData Democracy: Hadoop + Redshift
Data Democracy: Hadoop + Redshift
 
Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...
 

Recently uploaded

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Operationalizing analytics to scale

  • 2. Operationalizing Analytics To Scale Many companies have invested time and money into building sophisticated data pipelines that can move massive amounts of data in (near) real time. However, for the analyst or data scientist who builds models offline, integrating their analyses into these pipelines for operational purposes can pose a challenge. In this workshop, we will discuss some key technologies and workflows companies can leverage to build end-to-end solutions for automating analytical, statistical and machine learning solutions: from collection and storage to analysis and real-time predictions. Abstract
  • 5. ● Introduction ● What Are we Talking About Exactly? Agenda
  • 6. ● Introduction ● What Are we Talking About Exactly? ● The Problem at Hand Agenda
  • 7. ● Introduction ● What Are we Talking About Exactly? ● The Problem at Hand ● Operationalizing Analytics Agenda
  • 8. ● Introduction ● What Are we Talking About Exactly? ● The Problem at Hand ● Operationalizing Analytics ● Operationalizing Predictive Analytics Agenda
  • 9. ● Introduction ● What Are we Talking About Exactly? ● The Problem at Hand ● Operationalizing Analytics ● Operationalizing Predictive Analytics ● Questions Agenda
  • 10. Introduction ● I work on the Internal Data team at Looker.
  • 11. Introduction ● I work on the Internal Data team at Looker. ● Before Looker, I worked in consulting and research.
  • 12. Introduction ● I work on the Internal Data team at Looker. ● Before Looker, I worked in consulting and research. ● Looker is a business intelligence tool.
  • 13. What are we talking about? ● What do I mean when I say “operationalizing”?
  • 14. What are we talking about? ● What do I mean when I say “operationalizing”? ● Why is this important?
  • 15. The Problem at Hand ● Analysts are providing basic reports for the entire business.
  • 16. ● Analysts are providing basic reports for the entire business. ● Analysts and Data Scientists are building offline models. The Problem at Hand
  • 17. The Problem With Offline Models ● Offline analyses aren’t associated with particularly quick turnaround times.
  • 18. The Problem With Offline Models ● Offline analyses aren’t associated with particularly quick turnaround times. ● Offline analyses aren’t particularly collaborative.
  • 19. The Problem With Offline Models ● Offline analyses aren’t associated with particularly quick turnaround times. ● Offline analyses aren’t particularly collaborative. ● Offline analyses aren’t particularly portable.
  • 20. A Potential Set-up (Straw Man) Data Sources http Data Stores query Analysis Consumption
  • 21. Operationalizing Analytics - The Simple Case
  • 22. Operationalizing Analytics - The Simple Case ● These metrics are vanilla.
  • 23. ● These metrics are vanilla. ● These metrics are critical. Operationalizing Analytics - The Simple Case
  • 24. ● These metrics are vanilla. ● These metrics are critical. ● The business would probably better served if Data Scientists and Analysts were spending their time answering questions that require deep technical knowledge. Operationalizing Analytics - The Simple Case
  • 25. ● Build or buy a workhorse ETL tool. Operationalizing Analytics - A How To
  • 26. ● Build or buy a workhorse ETL tool. ● Move toward an Operational Data Store (ODS), reducing the need for postprocessing and data “mashups.” Operationalizing Analytics - A How To
  • 27. ● Build or buy a workhorse ETL tool. ● Move toward an Operational Data Store (ODS), reducing the need for postprocessing and data “mashups.” ● Emphasize self-service wherever possible. Operationalizing Analytics - A How To
  • 28. ● Build or buy a workhorse ETL tool. ● Move toward an Operational Data Store (ODS), reducing the need for postprocessing and data “mashups.” ● Emphasize self-service wherever possible. ● Analytics should slot into existing the infrastructure with minimal friction. Operationalizing Analytics - A How To
  • 30. Where to Begin ● Out-of-the-box tools.
  • 31. ● Out-of-the-box tools. ● Build from scratch. Where to Begin
  • 32. ● Out-of-the-box tools. ● Build from scratch. ● A mean between extremes. Where to Begin
  • 33. ● XML-based, model-storage format. A Model Standard - PMML
  • 34. ● XML-based, model-storage format. ● Created and maintained by the Data Mining Group. A Model Standard - PMML
  • 35. ● XML-based, model-storage format. ● Created and maintained by the Data Mining Group. ● Most commonly used statistical/machine learning models are supported. A Model Standard - PMML
  • 37. JPMML ● JPMML is an open-source API for evaluating PMML files.
  • 38. JPMML ● JPMML is an open-source API for evaluating PMML files. ● In essence, we equip the JPMML application with our PMML file, serve it up with new data, and it provides us with predictions.
  • 39. JPMML ● JPMML is an open-source API for evaluating PMML files. ● In essence, we equip the JPMML application with our PMML file, serve it up with new data, and it provides us with predictions. ● Openscoring.io distributes various JPPML APIs and UDFs—for example, RESTful API, Heroku, Hive, Pig, Cascading and PostgreSQL.
  • 40. JPMML ● JPMML is an open-source API for evaluating PMML files. ● In essence, we equip the JPMML application with our PMML file, serve it up with new data, and it provides us with predictions. ● Openscoring.io distributes various JPPML APIs and UDFs—for example, RESTful API, Heroku, Hive, Pig, Cascading and PostgreSQL. ● All we have to do is write some code that fetches new values, serves them up to the JPMML API, captures the predictions, then pushes them back to a database.
  • 41. Example Architecture - Lead Scoring API API GET lead UPDATE lead GET leads
  • 42.
  • 43.
  • 44.
  • 45. Heroku: git push heroku master REST: curl -X PUT --data-binary @BayesLeadScore.pmml -H "Content-type: text/xml" http://ec2_endpoint/openscoring/model/BayesLeadScore Deploy Model - PUT /model/${id}
  • 46. CURLing or navigating to http://heroku_endpoint/openscoring/model/BayesLeadScore or http://ec2_endpoint/openscoring/model/BayesLeadScore will display our pmml model. View Model - GET /model/${id}
  • 47. Test Model - POST /model/${id} newLead.json curl -X POST --data-binary @newLead.json -H "Content-type: application/json" http://ec2_endpoint/openscoring/model/Ba yesLeadScore Send request to JPMML API{ “id” : “001”, “arguments” : { “country” : “US”, “budget” : 7.8 } }
  • 48. Example Response { “id” : “001”, “result” : { “meeting” : “1”, “Probability_0” : 0.33062906130485653, “Probability_1” : 0.6693709386951435 } }
  • 49. Batch Request - POST /model/${id}/batch batchLeads.json curl -X POST --data-binary @batchLeads.json -H "Content-type: application/json" http://ec2_endpoint/openscoring/model/Ba yesLeadScore/batch Send request to JPMML API { "id":"batch-1", "requests":[ { "id":"001", "arguments":{ "country":"US", "budget":7.8 } }, { "id":"002", "arguments":{ "country":"CA", "budget":3.2 } } ] }
  • 52. Scale Considerations ● Horizontal scaling. ● Vertical scaling.
  • 53. What About Truly Big Data? ● For the rare few of us who need to make real-time predictions against millions of rows per second, there’s a popular apache suite to handle this. *image borrowed from OryxProject
  • 54. Applications ODS Analysis APIs Transactional DB / Event Storage Business Intelligence Scoring Server Consumers Review / Versioning
  • 57. Learn more at looker.com/demo

Editor's Notes

  1. What I mean is to automate—as much as possible—the creation, dissemination and application of analyses so that they can be used in high-volume or fast-paced, tactical decisionmaking. This carries with it a hard requirement for data pipelines and workflows to scale.
  2. This seems obvious. Many people require data to inform their choices. Reducing friction in the production and dissemination of analyses is beneficial for both consumers and producers of analytics. Consumers can respond to changes quicker, producers free up time to do more or to focus on depth of analysis.
  3. Their time is better spent doing more in-depth analyses. They are an information bottleneck.
  4. The process by which people create predictive analyses is not as efficient as it could be. Typically, when new batches of data come in, analysts retrieve these data, rebuild their model and make new predictions, then they disseminate this information somehow. These people need to start thinking like engineers.
  5. There’s not much to do about this. However, if we do some work up front, we might be able to automate more of this process.
  6. While analysts can share R or Python scripts, it’s not immediately obvious, by looking at their code, what is going on. To collaborate, one must reproduce others’ analyses before they can, themselves, contribute.
  7. That is, they are not easily ported from R to Python to Matlab, etc.
  8. A somewhat standard analytics pipeline that companies may have is akin to this: Data from mobile and web applications, APIs, and public data sources is collected. Data is stored in relational and/or nonrelational data stores. Data is queried, transformed, and analyzed. Decisionmakers consume the data and analyses, often as a report or dashboard, which then feeds back into the pipeline as product changes, etc. There are potential efficiency gains: Getting data out of a store into analysts’ hands. Presentation of reports and analyses to decision-makers. Feedback into product development, engineering, sales, marketing, etc. Predictive analyses are strictly offline.
  9. This class of metric is not particularly sexy. However, they are metrics people need in order to do their day-to-day operations: “Are we on track to meet our sales targets?” “How many users saw a particular marketing campaign yesterday?” “Is supply low in a certain region?”
  10. For most businesses, these metrics address the majority of questions people need answered—they are the metrics that keep the business humming.
  11. Typically, we encounter situations where a few developers and analysts supply an entire organization with data and analytics. This tends to create a bottleneck where one doesn't need to exist. [stated this problem earlier. no need to dwell on it too much.]
  12. On the buying front, there is a litany of choices, and a lot depends on which data sources and destinations are at play. However, some solutions we commonly encounter are fivetran, alooma, bigsynx, informatica cloud, and datavirtuality. For those who prefer to build over buy, we’re talking about custom jobs written in a scripting language (shell, Ruby, Python) with some sort of dependency/workflow management tool, like Luigi or Airflow. As companies grow, scaling ETL processes may pose a few problems if they’ve built their own tool. Admittedly, moving unstructured data is relatively easy. Even as the size and complexity of data increases, transformations don’t really come into play. We can just stuff more key-value pairs into our JSON objects and stash them in a NoSQL database. For companies making use of relational data stores, however, ever-changing schemas will undoubtedly make the ETL process more difficult. There are clear tradeoffs here: an unstructured data store may come with lower accessibility in favor of simplicity in the data movement phase. Conversely, relational databases are rigid and require work to get large amounts of complex data into the correct format. Typically, however, SQL is the querying language with which most analysts are familiar, so accessibility becomes the upside.
  13. This will reduce the need for continual data pulls and postprocessing in some desktop tool, both of which are slow and work intensive. By storing everything in an ODS, disparate data can be joined in the database. This is likely faster than the alternative; it’s also more conducive to automation. Redshift is a favorite at Looker. It scales quickly and cost-effectively, relative to other MPP RDBMS offerings. Additionally, we’ve seen great promise with both Spark and Impala. Spark has a leg up on a lot of its competitors: it’s new; it does most of its heavy lifting in-memory rather than reading from and writing to disk; it scales very well; it’s feature rich and has a built-in machine-learning library; and has easy-to-use SQL, Scala and Python interfaces.
  14. Most reporting and simpler analyses can be automated to a degree. Done correctly, they can be accessed by business teams and even tinkered with in a self-service manner (segmenting and filtering so that the analyst doesn’t have to respond to minor changes to, what is effectively, the same report with a minor tweak). All that's left is to teach man a to fish (admittedly, this is the most difficult component, based on my experience).
  15. When setup correctly, a business intelligence or querying tool can scale quite well to support a large organization and automate much of the day-to-day operational analysis. Ideally, such a tool would slot into the existing data infrastructure, exploiting an ODS or connecting to multiple data sources without subsequently moving the data again for further processing.
  16. Imagine, now, that we’re in a world where most analyses are largely self-service or automated. Analysts and data scientists are, instead, focusing on predictive analyses. How do we take these, seemingly inherent offline processes, and integrate them into existing data pipelines and applications?
  17. There are more and more tools that automate statistical- and machine-learning processes, some are more blackbox than others—bigml is perhaps the most popular. Additionally, there is prediction.io, alchemyAPI, indico, rapidminer, yhat, azureML, aws machine learning, etc. Some of these tools integrate with existing data-science workflows better than others. Let’s suppose, however, that we have an aversion to another canned analysis tool and we prefer a more customizable solution.
  18. Building an operational machine-learning platform from the ground up is no trivial task. This likely requires significant resources from engineering and analytics departments. A reasonable starting place would be to write some Python that trains and tests models, and handles model selection. This is doable with some great existing libraries, such as numpy and sci-kit learn. The daunting task is writing an API that can fetch or accept new data from various sources, score the data using the model created earlier, and finally pipe predictions back into a database for consumption or into an external application. Sci-kit learn does have the notion of model persistence, which relies on Python “pickles.” This gets us close.
  19. I’d assert that there’s a suite of tools that makes up the mean between these two extremes. This mean probably provides a few base features: 1. It standardizes machine-learning models, irrespective of the language in which they were written, making them portable and a bit more collaborative. 2. Models are serialized, marshalled, or persisted, so they do not need to be re-training for subsequent prediction batches. 3. It provides a basic API for the ingestion and application of our serialized models. Everything else—which models are used, how models are chosen, which and how many data sources flow into the API, and how predictions are handled—would be left to the user to handle.
  20. Everything one must know in order to describe and translate a model is captured in a well-structured format: a data dictionary, how to handle missing values, model coefficients, conditional probabilities, etc.
  21. It’s actively maintained and updated by a community comprised of academics and industry professionals.
  22. e.g., regression, svm, association rules, naive bayes, clustering, decision trees, random forest, neural networks, and ensembles.
  23. A large number of pre-existing tools produce and/or consume PMML files. This means adopting PMML as a model standard would likely not disrupt the analytics workflow.
  24. JPMML is the best mean between extremes that I’ve seen to-date.
  25. basic example in R
  26. basic example in Spark’s MLlib
  27. what the PMML looks like
  28. Our predicted value just needs to be pushed into a database or back into its source API, such that it makes its way to our ODS and is ultimately presented to end users.
  29. While the JPMML API doesn’t scale horizontally on its own, it’s feasible to set up a parallel environment and route incoming data accordingly.
  30. On moderate hardware, JPMML can score thousands of records in a second. The simplest solution to scale this out, for the vast majority of use cases, would be to throw more powerful hardware at the problem. For batch jobs that contain tens of millions of records or more, using a Openscoring solution for Hive, Pig, or Cascading would likely be a better choice.
  31. In the rare occurrence that we work at a company that needs to make predictions on millions of incoming records per second, we may find that there are other tools which are better suited to meet our needs. A framework for such a task was proposed by Nathan Marz, known as “the lambda architecture.” There are a number of components involved: 1. input distribution, for real-time or microbatch event distribution to both speed and batch layers; 2. stream processing, to transform or make predictions against incoming data; 3. batch processing, to transform or train against historical data; 4. serving layer, for reconciliation of speed and batch results and to serve ad hoc querying. A popular setup relies on Zookeeper for cluster management, Kafka or Flume for event/message handling, and Spark Streaming or Storm for real-time analysis. Cloudera has bundled Zookeeper, Kafka, and Spark Streaming into a single framework called Oryx. Also, Spark Streaming can make use of MLLib for microbatch machine learning tasks. Storm, too, has a comparable set-up relying on Trident-ML, which abstracts much of Storm’s low-level programming into declarative, Pig-Latin like statements with machine learning capabilities.
  32. Our straw-man setup, re-architected.
  33. [vamp]