SlideShare a Scribd company logo
Data Science in the cloud
with
Microsoft Azure
MARTIN THORNALLEY
DATA SOLUTION ARCHITECT, MICROSOFT
Introduction
Data Science Definition
“Data science is an interdisciplinary field about
processes and systems to extract knowledge or
insights from data in various forms, either structured
or unstructured, which is a continuation of some of
the data analysis fields such as statistics, machine
learning, data mining, and predictive analytics”
https://en.wikipedia.org/wiki/Data_science
Data Science Skillset
http://berkeleysciencereview.com/how-to-become-a-data-scientist-before-you-graduate/
The Cloud
Why does the Cloud matter for Data Science?
 High capacity and cost effective data storage
 Flexible, elastic compute capacity
 Ready to use technologies
 Choice of Infrastructure or Platform
 Enables Agile & DevOps
 Operational reliability and security
 Pay as you go
Microsoft Azure Cloud Platform
 Wide range of services covering Compute, Web & Mobile, Data &
Storage, Analytics, Internet of Things & Intelligence plus many more,
see http://azureplatform.azurewebsites.net/en-us/
 Easy to get started, free to try for 30 days but limited spend, also
MSDN licence free credits, see https://azure.microsoft.com/en-
gb/free/
 Comprehensive documentation and examples
 Global presence with many recognisable brands fully committed
 Huge investment and growing rapidly
Data Science Process
https://azure.microsoft.com/en-us/documentation/articles/data-science-process-overview/
Worked
Example
NYC taxis
 2013 NYC taxi trips and fares – open but non-trivial dataset
 24 CSV files - 12 trip, 12 fare, 1 for each month
 ~20GB compressed, ~50GB uncompressed, 170+ million records
 medallion – vehicle identifier
 hack license – driver identifier
 passenger count
 pickup & dropoff – datetime, longitude, latitude
 trip – time and distance
 fare - payment type, fare amount, surcharge, mta tax, tip amount, tolls
amount, total amount
http://www.andresmh.com/nyctaxitrips/
Predictions
 Predict whether a specific journey will result in a tip – binary
classification
 Predict what class of tip will be for a specific journey – multiclass
classification
 Predict how much a tip will be for a specific journey – regression
A Data Science
Environment
Data Science Virtual Machine
Create Linux and Windows virtual machines in minutes
 Wide range of configurations - CPU cores, memory, disks, network
speeds
 Scale to what you need
 Pay only for what you use
 Enhance security and compliance
 Preloaded with full set of tools and utilities from Azure MarketPlace
e.g. SQL Server 2016 Developer edition, Azure SDK, Python, R,
Jupyter, etc.
Storage Accounts
Massively scalable cloud storage for your applications
 Security-enhanced, durable, and highly available across the globe
 Industry-leading performance with exabytes of capacity
 Pay only for what you use
 Open, multi-platform support
HDInsight
A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service
made easy
 Scale to petabytes on demand
 Crunch all data—structured, semi-structured, unstructured
 Skip buying and maintaining hardware
 Spin up Apache Hadoop, Spark, and R clusters in the cloud
 Use Excel or your favourite BI tool to visualize Hadoop data
 Connect on-premises Hadoop clusters with the cloud
Azure Machine Learning
A fully managed cloud service that enables you to easily build, deploy,
and share predictive analytics solutions.
 Powerful cloud based analytics, now part of Cortana Intelligence
Suite
 Azure Machine Learning Studio includes hundreds of built-in
packages and support for custom code
 Share your solution with the world in the Gallery or on the Azure
Marketplace
The Process
Preparation & Exploration
 Copy data using Azcopy and decompress
 Inspect files and load in to RStudio
 Create external Hive tables and load
 Query over full dataset for further exploration
 Remove erroneous data e.g. passenger numbers, lat/long
 Engineer features using Hive
 Distance from start to finish using Haversine calculation
 Binary indicator for tips
 Tip level based on ranges for multiclass classification
 Downsample dataset and save as internal table for Machine Learning
Machine Learning & Deployment
 Import Data using Hive Query
 Build Training Experiments
 Evaluate model performance
 Create Predictive Experiments
 Publish Web Service
 Test Web Service
 Call from Excel
Next Steps
To build a fully fledged enterprise solution with regular data ingestion
and model execution consider the following:
 Data Catalog
 Data Factory
 Event Hubs & Stream Analytics
 Power BI
 Cognitive Services
Conclusion
Summary
 Microsoft Azure provides a wide range of technologies for Data
Science activities
 Platform services reduce the management overhead
 No capacity limitations and flexible provisioning – pay as you go
 Choice of Open Source and Microsoft – use the best tool for the task
 The tools are well integrated
 Azure Machine Learning makes it trivial to deploy your models
 It’s quick and easy to get started
Getting Started
 Sign up for free
https://azure.microsoft.com/en-gb/free/
 Create a Data Science VM
https://azure.microsoft.com/en-us/marketplace/partners/microsoft-
ads/standard-data-science-vm/
 Visit Cortana Intelligence Gallery
https://gallery.cortanaintelligence.com/
Q&A
Thank You
Martin Thornalley
Data Solution Architect, Microsoft
@mthornal
martin.thornalley@microsoft.com
https://www.linkedin.com/in/martinthornalley

More Related Content

What's hot

Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
Mostafa
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
BigDataCamp
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Databricks
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
Sergio Zenatti Filho
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Data Con LA
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE Bytes
Derek Graham
 
Raven DB; day to day
Raven DB; day to dayRaven DB; day to day
Raven DB; day to day
Andrea Magnorsky
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
Data Con LA
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating Environment
Cloudian
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Sergio Zenatti Filho
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System
Alluxio, Inc.
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
Venkatesh Narayanan
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra
 
Digital Transformation with Microsoft Azure
Digital Transformation with Microsoft AzureDigital Transformation with Microsoft Azure
Digital Transformation with Microsoft Azure
Luan Moreno Medeiros Maciel
 

What's hot (20)

Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE Bytes
 
Raven DB; day to day
Raven DB; day to dayRaven DB; day to day
Raven DB; day to day
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating Environment
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Digital Transformation with Microsoft Azure
Digital Transformation with Microsoft AzureDigital Transformation with Microsoft Azure
Digital Transformation with Microsoft Azure
 

Viewers also liked

Juliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYCJuliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYC
MLconf
 
Presentación univalle
Presentación univallePresentación univalle
Presentación univalle
hugoandresdb
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
SoftServe
 
Unit5
Unit5Unit5
Understanding the Lean Startup
Understanding the Lean StartupUnderstanding the Lean Startup
Understanding the Lean Startup
Ric Pratte
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
Sri Ambati
 

Viewers also liked (8)

Juliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYCJuliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYC
 
Presentación univalle
Presentación univallePresentación univalle
Presentación univalle
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Unit5
Unit5Unit5
Unit5
 
Understanding the Lean Startup
Understanding the Lean StartupUnderstanding the Lean Startup
Understanding the Lean Startup
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 

Similar to Data Science in the cloud with Microsoft Azure

MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)
Alex Danvy
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the Cloud
Nigel Watson
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
rustd
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Unidev
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
Clint Edmonson
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
James Serra
 
Deep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckDeep Learning Technical Pitch Deck
Deep Learning Technical Pitch Deck
Nicholas Vossburg
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 

Similar to Data Science in the cloud with Microsoft Azure (20)

MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Adam azure presentation
Adam   azure presentationAdam   azure presentation
Adam azure presentation
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the Cloud
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Deep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckDeep Learning Technical Pitch Deck
Deep Learning Technical Pitch Deck
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 

More from TechExeter

Exeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie WhiteheadExeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie Whitehead
TechExeter
 
South West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo KingSouth West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo King
TechExeter
 
Generative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq RashidGenerative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq Rashid
TechExeter
 
Conf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actorConf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actor
TechExeter
 
Conf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't realConf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't real
TechExeter
 
Conf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace InnovationConf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace Innovation
TechExeter
 
Conf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try ElmConf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try Elm
TechExeter
 
Conf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial servicesConf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial services
TechExeter
 
Conf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlowConf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlow
TechExeter
 
Conf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with StencilConf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with Stencil
TechExeter
 
Conf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving processConf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving process
TechExeter
 
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UKConf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
TechExeter
 
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
TechExeter
 
Gps behaving badly - Guy Busenel
Gps behaving badly - Guy BusenelGps behaving badly - Guy Busenel
Gps behaving badly - Guy Busenel
TechExeter
 
Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance
TechExeter
 
Turning Developers into Testers
Turning Developers into TestersTurning Developers into Testers
Turning Developers into Testers
TechExeter
 
Remote working
Remote workingRemote working
Remote working
TechExeter
 
Developing an Agile Mindset
Developing an Agile Mindset Developing an Agile Mindset
Developing an Agile Mindset
TechExeter
 
Think like a gardener
Think like a gardenerThink like a gardener
Think like a gardener
TechExeter
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure
TechExeter
 

More from TechExeter (20)

Exeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie WhiteheadExeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie Whitehead
 
South West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo KingSouth West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo King
 
Generative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq RashidGenerative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq Rashid
 
Conf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actorConf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actor
 
Conf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't realConf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't real
 
Conf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace InnovationConf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace Innovation
 
Conf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try ElmConf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try Elm
 
Conf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial servicesConf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial services
 
Conf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlowConf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlow
 
Conf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with StencilConf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with Stencil
 
Conf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving processConf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving process
 
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UKConf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
 
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
 
Gps behaving badly - Guy Busenel
Gps behaving badly - Guy BusenelGps behaving badly - Guy Busenel
Gps behaving badly - Guy Busenel
 
Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance
 
Turning Developers into Testers
Turning Developers into TestersTurning Developers into Testers
Turning Developers into Testers
 
Remote working
Remote workingRemote working
Remote working
 
Developing an Agile Mindset
Developing an Agile Mindset Developing an Agile Mindset
Developing an Agile Mindset
 
Think like a gardener
Think like a gardenerThink like a gardener
Think like a gardener
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure
 

Recently uploaded

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Data Science in the cloud with Microsoft Azure

  • 1. Data Science in the cloud with Microsoft Azure MARTIN THORNALLEY DATA SOLUTION ARCHITECT, MICROSOFT
  • 3. Data Science Definition “Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics” https://en.wikipedia.org/wiki/Data_science
  • 5. The Cloud Why does the Cloud matter for Data Science?  High capacity and cost effective data storage  Flexible, elastic compute capacity  Ready to use technologies  Choice of Infrastructure or Platform  Enables Agile & DevOps  Operational reliability and security  Pay as you go
  • 6. Microsoft Azure Cloud Platform  Wide range of services covering Compute, Web & Mobile, Data & Storage, Analytics, Internet of Things & Intelligence plus many more, see http://azureplatform.azurewebsites.net/en-us/  Easy to get started, free to try for 30 days but limited spend, also MSDN licence free credits, see https://azure.microsoft.com/en- gb/free/  Comprehensive documentation and examples  Global presence with many recognisable brands fully committed  Huge investment and growing rapidly
  • 9. NYC taxis  2013 NYC taxi trips and fares – open but non-trivial dataset  24 CSV files - 12 trip, 12 fare, 1 for each month  ~20GB compressed, ~50GB uncompressed, 170+ million records  medallion – vehicle identifier  hack license – driver identifier  passenger count  pickup & dropoff – datetime, longitude, latitude  trip – time and distance  fare - payment type, fare amount, surcharge, mta tax, tip amount, tolls amount, total amount http://www.andresmh.com/nyctaxitrips/
  • 10. Predictions  Predict whether a specific journey will result in a tip – binary classification  Predict what class of tip will be for a specific journey – multiclass classification  Predict how much a tip will be for a specific journey – regression
  • 12. Data Science Virtual Machine Create Linux and Windows virtual machines in minutes  Wide range of configurations - CPU cores, memory, disks, network speeds  Scale to what you need  Pay only for what you use  Enhance security and compliance  Preloaded with full set of tools and utilities from Azure MarketPlace e.g. SQL Server 2016 Developer edition, Azure SDK, Python, R, Jupyter, etc.
  • 13. Storage Accounts Massively scalable cloud storage for your applications  Security-enhanced, durable, and highly available across the globe  Industry-leading performance with exabytes of capacity  Pay only for what you use  Open, multi-platform support
  • 14. HDInsight A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service made easy  Scale to petabytes on demand  Crunch all data—structured, semi-structured, unstructured  Skip buying and maintaining hardware  Spin up Apache Hadoop, Spark, and R clusters in the cloud  Use Excel or your favourite BI tool to visualize Hadoop data  Connect on-premises Hadoop clusters with the cloud
  • 15. Azure Machine Learning A fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions.  Powerful cloud based analytics, now part of Cortana Intelligence Suite  Azure Machine Learning Studio includes hundreds of built-in packages and support for custom code  Share your solution with the world in the Gallery or on the Azure Marketplace
  • 17. Preparation & Exploration  Copy data using Azcopy and decompress  Inspect files and load in to RStudio  Create external Hive tables and load  Query over full dataset for further exploration  Remove erroneous data e.g. passenger numbers, lat/long  Engineer features using Hive  Distance from start to finish using Haversine calculation  Binary indicator for tips  Tip level based on ranges for multiclass classification  Downsample dataset and save as internal table for Machine Learning
  • 18. Machine Learning & Deployment  Import Data using Hive Query  Build Training Experiments  Evaluate model performance  Create Predictive Experiments  Publish Web Service  Test Web Service  Call from Excel
  • 19. Next Steps To build a fully fledged enterprise solution with regular data ingestion and model execution consider the following:  Data Catalog  Data Factory  Event Hubs & Stream Analytics  Power BI  Cognitive Services
  • 21. Summary  Microsoft Azure provides a wide range of technologies for Data Science activities  Platform services reduce the management overhead  No capacity limitations and flexible provisioning – pay as you go  Choice of Open Source and Microsoft – use the best tool for the task  The tools are well integrated  Azure Machine Learning makes it trivial to deploy your models  It’s quick and easy to get started
  • 22. Getting Started  Sign up for free https://azure.microsoft.com/en-gb/free/  Create a Data Science VM https://azure.microsoft.com/en-us/marketplace/partners/microsoft- ads/standard-data-science-vm/  Visit Cortana Intelligence Gallery https://gallery.cortanaintelligence.com/
  • 23. Q&A
  • 24. Thank You Martin Thornalley Data Solution Architect, Microsoft @mthornal martin.thornalley@microsoft.com https://www.linkedin.com/in/martinthornalley