SlideShare a Scribd company logo
Data Science as a Service
Dr. Pouria Amirian (Pouria.Amirian@ndm.ox.ac.uk)
Big Data Project Coordinator, The Global Health Network, University of Oxford
Intersection Of Cloud Computing And Data Science
outline
 Data Science
 What data science is
 Steps in a Data Science project
 Experiments
 Using AzureML
 Big Data issues
 In a data science project
 Methods in analysis
2
What is Data Science?
 Practice of obtaining useful insights from data
 3 Vs of Big Data:
 Volume
 Variety
 Velocity
 + other Vs
 It applies to large volume data (volume)
 It applies to semi-structured and unstructured data (variety)
 It sometimes applies to real-time or fast changing data
(velocity)
 It applies to small and traditional static data
3
Data Science as a team sport
4
Math
Statistical Learning
Linguistics
Machine Learning
Signal Processing
Programming
Storage/Data StructureOperations Research
Distributed and High
Performance Computing
Data Science from analytics point of view
 Analytics Spectrum:
5
Descriptive
Diagnostic
Predictive
Prescriptive
What happened?
Why did it happen?
What will happen?
What should I do?
Data Science Vs. Business Intelligence
 Analytics Spectrum:
6
What happened?
Why did it happen?
What will happen?
What should I do?
Traditional BI
Descriptive
Diagnostic
Predictive
Prescriptive
Why is it so popular? why it matters?
 A) More Available and Usable Data
 McKensey: Organizations that use data science to make
decisions are more productive and deliver higher ROI
 Gartner: Organizations that invest in modern data infrastructure
will outperform their peers by up to 20%
7
Why is it so popular? why it matters?
 B) Increased Awareness of Machine Learning Techniques
 A subset of machine learning algorithms are now more widely
understood since they have been tried and tested by early
adopters such as Netflix and Amazon (Recommendation engines).
 while many people may not know details of the algorithms used,
they now increasingly understand their research/business value.
8
Why is it so popular? why it matters?
 C) More Accuracte Analysis
 The large volumes of data being collected also enables you to
build more accurate predictive models.
 The larger sample size, the smaller the margin of error. This in turn
increases the accuracy of predictions from your model.
9
Why is it so popular? why it matters?
 D) Faster and Cheaper Computation
 Today, a smartphone’s processor is up to five times more
powerful than that of a desktop computer 20 years ago.
 Price of computation is decreased
 Capacity of computation is increased
 dramatic gains in technology, productivity, innovations etc.
10
The Data Science Workflow
Problem Definition
Data Collection and
Preparation
Model
Development
Model
Deployment
Performance
Improvement
11
Critical
Very Important
Time Consuming
Fun :D
Iterative
Cumbersome :(
Critical
The Data Science Workflow
Problem Definition
Data Collection and
Preparation
Model
Development
Model
Deployment
Performance
Improvement
12
• Domain Knowledge
• Separation of Concerns
• Prioritize each problem
• Selection or right data
• Data Transformation
• Missing Values
• Exploratory analysis
• Right algorithm
• Test accuracy
• Test other algorithms
• Validate
• Turning data scientist model
to developer code
(R to C#)
• Monitor the performance of
deployed model
• Re-Training model
• Re-Deploying model
• Re-monitoring
The Data Science Workflow
Big Data
Issues (I)
Problem Definition
Data Collection and
Preparation
Model
Development
Model
Deployment
Performance
Improvement
13
Solutions to overcome the big data issues
14
 1- Use advanced research computing
(http://www.arc.ox.ac.uk/)
Solutions to overcome the big data issues
15
 2-Create and use a Hadoop Cluster
 Open source (Apache)
 It is based on two components
HDFS
MapReduce
MapReduce
16
HortonWorks
17
Cloudera
18
MapR
19
open source won't prevent vendor
lock-in!!!
20
Third Solution
 Microsoft’s
Cloud Computing
21
AzureML (Azure Machine Learning)
 Azure ML provides an easy-to-use and powerful set of cloud-
based data transformation and machine learning
tools.
 AzureML Studio (or Studio for short)
 It has many modules for data transformation, analysis,
visualization,…
 It supports R and Python
 It is under heavy development
 www.studio.azureml.net
22
AzureML Workflow
23
Data Input
Data Transformation (Project)
Split Data(training and test)Learning Algorithm
Train the Learning Algorithm
Validate the Algorithm(Score)
Evaluate Model Performance
First Experiment:
Predicting Price of Car
AutomobileFullModuleModel02-03-2015
24
Second Experiment: Using R in ML Studio
AutomobileRTransformation02-03-2015
25
Third experiment: comparing two models
AutomobileFullModuleTwoModel02-03-2015
26
Fourth experiment: Creating Web service
 Very easy just some clicks!!!!
 Make: bmw
 Engine-size: 164
 Horse-power: 121
 highway—mpg: 25
 Its actual price is 24,565
27
Tips
 Data input can come from a variety of data interfaces,
including HTTP connections (any filesharing service like
dropbox, googleDrive, oneDrive), SQLAzure, and Hive Query.
 You can use functionality in all supported R modules (410)
 You can write your utility functions and upload it as another
module
 It is under heavy development
 Two weeks ago the process for web service publication changed
 Two months ago there was no support for Python
 Two months ago around 400 R packages were supported
 …
28
Big Data Issues (II)
 High dimensional data or wide data
 Using various methods needs knowledge of those methods
 Traditional methods are not efficient enough (unstable)
 Least Squares for example
29
Advantages of AzureML
 Solutions can be quickly deployed as web services.
 Models run in a highly scalable cloud environment.
 using the R and Python language for solution-specific
functionality.
 It creates minimum code for consuming the web service
in R and Python (and C#)
 It can be run from anywhere
30
“
”
Big Data is not about Data.
The value in big data is in
Analytics.
GARY KING
Thanks for your attention
Time for Q/A

More Related Content

What's hot

Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Flavio Clesio
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
UmmeSalmaM1
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Formulatedby
 
How a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratchHow a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratch
Carlo Torniai
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Sri Ambati
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
Databricks
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data Engineering
Novita Sari
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Sanjana Chowdhury
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
Manoj Mishra
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...
Dataiku
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Ankit Rathi
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Vishal Chowdhary
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
Databricks
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
Nisha Talagala
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
Mostafa
 
DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...
DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...
DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...
Dataconomy Media
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
Data Science Milan
 

What's hot (19)

Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
 
How a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratchHow a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratch
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data Engineering
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...
DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...
DN18 | Applied Machine Learning in Cybersecurity: Detect malicious DGA Domain...
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
 

Viewers also liked

Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
David Rostcheck
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programmingIsmail El Gayar
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Ismail El Gayar
 
Why computer engineering
Why computer engineeringWhy computer engineering
Why computer engineering
Ismail El Gayar
 
Computer Engineering
Computer EngineeringComputer Engineering
Computer Engineering
Roberto Madera
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog Language
REHMAT ULLAH
 
Ivan Bratko - Prolog programming for artificial intelligence 3rd edition
Ivan Bratko - Prolog programming for artificial intelligence 3rd editionIvan Bratko - Prolog programming for artificial intelligence 3rd edition
Ivan Bratko - Prolog programming for artificial intelligence 3rd editionguest012a99e
 
Disruptive Data Science - How Data Science and Big Data are Transforming Busi...
Disruptive Data Science - How Data Science and Big Data are Transforming Busi...Disruptive Data Science - How Data Science and Big Data are Transforming Busi...
Disruptive Data Science - How Data Science and Big Data are Transforming Busi...
EMC
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
EMC
 

Viewers also liked (10)

Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Why computer engineering
Why computer engineeringWhy computer engineering
Why computer engineering
 
Prolog & lisp
Prolog & lispProlog & lisp
Prolog & lisp
 
Computer Engineering
Computer EngineeringComputer Engineering
Computer Engineering
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog Language
 
Ivan Bratko - Prolog programming for artificial intelligence 3rd edition
Ivan Bratko - Prolog programming for artificial intelligence 3rd editionIvan Bratko - Prolog programming for artificial intelligence 3rd edition
Ivan Bratko - Prolog programming for artificial intelligence 3rd edition
 
Disruptive Data Science - How Data Science and Big Data are Transforming Busi...
Disruptive Data Science - How Data Science and Big Data are Transforming Busi...Disruptive Data Science - How Data Science and Big Data are Transforming Busi...
Disruptive Data Science - How Data Science and Big Data are Transforming Busi...
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 

Similar to Data Science as a Service: Intersection of Cloud Computing and Data Science

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
Denodo
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Robert Grossman
 
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
MLconf
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
Aravindharamanan S
 
Data Analytics in Real World
Data Analytics in Real WorldData Analytics in Real World
Data Analytics in Real World
geetachauhan
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
mattdenesuk
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics
Ruben Pertusa Lopez
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Denodo
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
cedrinemadera
 
IRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data AnalyticsIRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data Analytics
IRJET Journal
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
DataWorks Summit/Hadoop Summit
 
Machine learning operations model book mlops
Machine learning operations model book mlopsMachine learning operations model book mlops
Machine learning operations model book mlops
RuyPerez1
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
IBM Think Milano
IBM Think MilanoIBM Think Milano
IBM Think Milano
ATMOSPHERE .
 
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
BAINIDA
 

Similar to Data Science as a Service: Intersection of Cloud Computing and Data Science (20)

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Data Analytics in Real World
Data Analytics in Real WorldData Analytics in Real World
Data Analytics in Real World
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
IRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data AnalyticsIRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data Analytics
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Machine learning operations model book mlops
Machine learning operations model book mlopsMachine learning operations model book mlops
Machine learning operations model book mlops
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
IBM Think Milano
IBM Think MilanoIBM Think Milano
IBM Think Milano
 
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 

Data Science as a Service: Intersection of Cloud Computing and Data Science

  • 1. Data Science as a Service Dr. Pouria Amirian (Pouria.Amirian@ndm.ox.ac.uk) Big Data Project Coordinator, The Global Health Network, University of Oxford Intersection Of Cloud Computing And Data Science
  • 2. outline  Data Science  What data science is  Steps in a Data Science project  Experiments  Using AzureML  Big Data issues  In a data science project  Methods in analysis 2
  • 3. What is Data Science?  Practice of obtaining useful insights from data  3 Vs of Big Data:  Volume  Variety  Velocity  + other Vs  It applies to large volume data (volume)  It applies to semi-structured and unstructured data (variety)  It sometimes applies to real-time or fast changing data (velocity)  It applies to small and traditional static data 3
  • 4. Data Science as a team sport 4 Math Statistical Learning Linguistics Machine Learning Signal Processing Programming Storage/Data StructureOperations Research Distributed and High Performance Computing
  • 5. Data Science from analytics point of view  Analytics Spectrum: 5 Descriptive Diagnostic Predictive Prescriptive What happened? Why did it happen? What will happen? What should I do?
  • 6. Data Science Vs. Business Intelligence  Analytics Spectrum: 6 What happened? Why did it happen? What will happen? What should I do? Traditional BI Descriptive Diagnostic Predictive Prescriptive
  • 7. Why is it so popular? why it matters?  A) More Available and Usable Data  McKensey: Organizations that use data science to make decisions are more productive and deliver higher ROI  Gartner: Organizations that invest in modern data infrastructure will outperform their peers by up to 20% 7
  • 8. Why is it so popular? why it matters?  B) Increased Awareness of Machine Learning Techniques  A subset of machine learning algorithms are now more widely understood since they have been tried and tested by early adopters such as Netflix and Amazon (Recommendation engines).  while many people may not know details of the algorithms used, they now increasingly understand their research/business value. 8
  • 9. Why is it so popular? why it matters?  C) More Accuracte Analysis  The large volumes of data being collected also enables you to build more accurate predictive models.  The larger sample size, the smaller the margin of error. This in turn increases the accuracy of predictions from your model. 9
  • 10. Why is it so popular? why it matters?  D) Faster and Cheaper Computation  Today, a smartphone’s processor is up to five times more powerful than that of a desktop computer 20 years ago.  Price of computation is decreased  Capacity of computation is increased  dramatic gains in technology, productivity, innovations etc. 10
  • 11. The Data Science Workflow Problem Definition Data Collection and Preparation Model Development Model Deployment Performance Improvement 11 Critical Very Important Time Consuming Fun :D Iterative Cumbersome :( Critical
  • 12. The Data Science Workflow Problem Definition Data Collection and Preparation Model Development Model Deployment Performance Improvement 12 • Domain Knowledge • Separation of Concerns • Prioritize each problem • Selection or right data • Data Transformation • Missing Values • Exploratory analysis • Right algorithm • Test accuracy • Test other algorithms • Validate • Turning data scientist model to developer code (R to C#) • Monitor the performance of deployed model • Re-Training model • Re-Deploying model • Re-monitoring
  • 13. The Data Science Workflow Big Data Issues (I) Problem Definition Data Collection and Preparation Model Development Model Deployment Performance Improvement 13
  • 14. Solutions to overcome the big data issues 14  1- Use advanced research computing (http://www.arc.ox.ac.uk/)
  • 15. Solutions to overcome the big data issues 15  2-Create and use a Hadoop Cluster  Open source (Apache)  It is based on two components HDFS MapReduce
  • 20. open source won't prevent vendor lock-in!!! 20
  • 22. AzureML (Azure Machine Learning)  Azure ML provides an easy-to-use and powerful set of cloud- based data transformation and machine learning tools.  AzureML Studio (or Studio for short)  It has many modules for data transformation, analysis, visualization,…  It supports R and Python  It is under heavy development  www.studio.azureml.net 22
  • 23. AzureML Workflow 23 Data Input Data Transformation (Project) Split Data(training and test)Learning Algorithm Train the Learning Algorithm Validate the Algorithm(Score) Evaluate Model Performance
  • 24. First Experiment: Predicting Price of Car AutomobileFullModuleModel02-03-2015 24
  • 25. Second Experiment: Using R in ML Studio AutomobileRTransformation02-03-2015 25
  • 26. Third experiment: comparing two models AutomobileFullModuleTwoModel02-03-2015 26
  • 27. Fourth experiment: Creating Web service  Very easy just some clicks!!!!  Make: bmw  Engine-size: 164  Horse-power: 121  highway—mpg: 25  Its actual price is 24,565 27
  • 28. Tips  Data input can come from a variety of data interfaces, including HTTP connections (any filesharing service like dropbox, googleDrive, oneDrive), SQLAzure, and Hive Query.  You can use functionality in all supported R modules (410)  You can write your utility functions and upload it as another module  It is under heavy development  Two weeks ago the process for web service publication changed  Two months ago there was no support for Python  Two months ago around 400 R packages were supported  … 28
  • 29. Big Data Issues (II)  High dimensional data or wide data  Using various methods needs knowledge of those methods  Traditional methods are not efficient enough (unstable)  Least Squares for example 29
  • 30. Advantages of AzureML  Solutions can be quickly deployed as web services.  Models run in a highly scalable cloud environment.  using the R and Python language for solution-specific functionality.  It creates minimum code for consuming the web service in R and Python (and C#)  It can be run from anywhere 30
  • 31. “ ” Big Data is not about Data. The value in big data is in Analytics. GARY KING Thanks for your attention Time for Q/A