SlideShare a Scribd company logo
1 of 18
July 23, 2015
“Let’s turn Real User Data into a Science!”
Dan Boutin – Senior Product Evangelist
mPulse
What’s a Beacon?
www.w3.org/TR/Beacon
Total Beacons Collected since 6/2013:
~ 85 Billion
Run rate over 3B per week and growing
Projected ~ 175B by 1/1/166
Big Data Challenges
Data Scientists spend too much time ‘data wrangling’
“Data scientists, according to interviews and expert
estimates, spend from 50 percent to 80 percent of their
time mired in this more mundane labor of collecting and
preparing unruly digital data, before it can be explored for
useful nuggets.”
NY Times – August 17th, 2014
Big Data Challenges
Building a data science platform is very difficult
Infrastructure
•Choosing big data technologies and setting up a cluster can easily take 9
months or more
Data Pipeline
•Building a high performing big data schema requires specialized skills
•Extracting, transforming, and loading of data (data wrangling) is an
enormous time sink and a poor use of data scientists time
Analysis and Workflow
•Figuring out how you can ask questions of the data and how to visualize the
results takes time that data scientists should be using to generate actionable
insights from their studies
Trade-Offs
Julia Language & iJulia
Notebook UI
Julia is a rising star in scientific programming
processing speed
support for parallel processing
compatibility with 400+ prebuilt statistical packages
large number and growing number of visualization libraries.
Trade-Offs
Why Julia?
R vs Python vs Julia
Modern compiler technology
Data Connectivity
Package Ecosystem
Functional Programming Construct
Integration with Python, C, C++, R, …
© 2014 SOASTA. All rights reserved. July 28, 2015 8
Trade-Offs
o Amazon Redshift is a fully managed, petabyte-
scale data warehouse service in the cloud.
• Columnar Database
• Extremely fast query times
• Attractive Economics
Hadoop vs Big Query vs Red Shift vs …
Capabilty – managed Big Data up to 2 petabytes
Cloud Economics – $1,000 TB per month
Why Red Shift?
© 2014 SOASTA. All rights reserved. July 28, 2015 10
Now Let’s Talk Architecture
Data Science WorkbenchData Science without the data wrangling, and much more
Infrastructure
Data Pipeline
Analysis and Workflow
• Data Science Workbench comes with the
state-of-the-art technology you need to
analyze your customer experiences
• All of the real user beacon data is loaded into
Data Science Workbench into a highly
optimized schema ready for analysis
• Data science is done with Julia, a remarkably
fast and in-memory solution for analyzing huge
data-sets
• Access to an ever growing library of analysis
functions and visualizations based on
SOASTA’s and our customers’ expertise
© 2014 SOASTA. All rights reserved. July 28, 2015 13
The Result!
• Every customer beacon unpacked, transformed and loaded nightly by
SOASTA into a SOASTA designed Schema in Amazon Redshift. This
process designed, supplied and supported by SOASTA
• Amazon Redshift is an extremely inexpensive and powerful BIG DATA
database that can scale to almost 2 Petabytes in size. Amazon
estimates compute and storage costs of $1,000/TB/month for our
implementation
• An online, interactive explore, discover and develop interface based on
the Julia scientific programming language developed at MIT and the
iJulia Notebook UI
• SOASTA developed Functions & Statistical Models
Well, let’s see it!
procedure Traffic is
type Airplane_ID is range 1..10; -- 10
airplanes
task type Airplane (ID: Airplane_ID); -- task
representing airplanes, with ID as initialisation
parameter
type Airplane_Access is access Airplane; --
reference type to Airplane
protected type Runway is -- the shared
runway (protected to allow concurrent access)
entry Assign_Aircraft (ID: Airplane_ID); -- all entries
are guaranteed mutually exclusive
entry Cleared_Runway (ID: Airplane_ID);
entry Wait_For_Clear;
private
Clear: Boolean := True; -- protected
private data - generally more than just a flag...
end Runway;
type Runway_Access is access all Runway;
Trivia Time!
@DanBoutinSOASTA
1983
1995
© 2014 SOASTA. All rights reserved.
Thank You!
Dan Boutin – Senior Product Evangelist
dboutin@soasta.com
Mobile (404) 304-9529
@DanBoutinSOASTA
July 23, 2015
“Let’s turn Real User Data into a Science!”
Dan Boutin – Senior Product Evangelist

More Related Content

What's hot

Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and TypesAnjani Phuyal
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief OverviewHal Kalechofsky
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaObjectRocket
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & ArchitectureAnjani Phuyal
 
Real-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeReal-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeSingleStore
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
Data in Motion vs Data at Rest
Data in Motion vs Data at RestData in Motion vs Data at Rest
Data in Motion vs Data at RestInternap
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaTreasure Data, Inc.
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureNir Rubinstein
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
HBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBaseHBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBaseHBaseCon
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Jason Flittner
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesSingleStore
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Zhenxiao Luo
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiBrian Olsen
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformNavneet Gupta
 

What's hot (20)

Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
Real time bi solution architecture
Real time bi solution architectureReal time bi solution architecture
Real time bi solution architecture
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
 
Real-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil DahlkeReal-Time, Geospatial, Maps by Neil Dahlke
Real-Time, Geospatial, Maps by Neil Dahlke
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Data in Motion vs Data at Rest
Data in Motion vs Data at RestData in Motion vs Data at Rest
Data in Motion vs Data at Rest
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
HBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBaseHBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBase
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data Platform
 

Viewers also liked

6 Ways to Ease Consumer Privacy Consumer
6 Ways to Ease Consumer Privacy Consumer6 Ways to Ease Consumer Privacy Consumer
6 Ways to Ease Consumer Privacy ConsumerGintent
 
Nova event-s florida-dboutin
Nova event-s florida-dboutinNova event-s florida-dboutin
Nova event-s florida-dboutinDan Boutin
 
Parent Thank You Recommendation Notes
Parent Thank You Recommendation NotesParent Thank You Recommendation Notes
Parent Thank You Recommendation NotesElena Turczeniuk Nagy
 
40 Scams to Avoid
40 Scams to Avoid40 Scams to Avoid
40 Scams to AvoidPaul Irving
 
CDU Simulation Laboratory Guidelines
CDU Simulation Laboratory GuidelinesCDU Simulation Laboratory Guidelines
CDU Simulation Laboratory GuidelinesPaul Irving
 
People Integration: Creating & Sustaining Value
People Integration: Creating & Sustaining ValuePeople Integration: Creating & Sustaining Value
People Integration: Creating & Sustaining ValueRussell Podgorski
 
Week 8 nursing ethics(3)
Week 8 nursing ethics(3)Week 8 nursing ethics(3)
Week 8 nursing ethics(3)Paul Irving
 
Morals and ethics
Morals and ethicsMorals and ethics
Morals and ethicsPaul Irving
 
Jonah Osawa - UX Portfolio
Jonah Osawa - UX PortfolioJonah Osawa - UX Portfolio
Jonah Osawa - UX PortfolioJonah Osawa
 
Intro fredlug
Intro fredlugIntro fredlug
Intro fredlugplarsen67
 
Lord's prayer maori slides
Lord's prayer maori slidesLord's prayer maori slides
Lord's prayer maori slidesroom15pukekohe
 
VoIP Network Tips
VoIP Network TipsVoIP Network Tips
VoIP Network TipsPhil Turner
 

Viewers also liked (19)

6 Ways to Ease Consumer Privacy Consumer
6 Ways to Ease Consumer Privacy Consumer6 Ways to Ease Consumer Privacy Consumer
6 Ways to Ease Consumer Privacy Consumer
 
Nova event-s florida-dboutin
Nova event-s florida-dboutinNova event-s florida-dboutin
Nova event-s florida-dboutin
 
Marwiyah 21313 167_a
Marwiyah 21313 167_aMarwiyah 21313 167_a
Marwiyah 21313 167_a
 
Parent Thank You Recommendation Notes
Parent Thank You Recommendation NotesParent Thank You Recommendation Notes
Parent Thank You Recommendation Notes
 
Awards
AwardsAwards
Awards
 
40 Scams to Avoid
40 Scams to Avoid40 Scams to Avoid
40 Scams to Avoid
 
EDUC 6707 smitht
EDUC 6707 smithtEDUC 6707 smitht
EDUC 6707 smitht
 
CDU Simulation Laboratory Guidelines
CDU Simulation Laboratory GuidelinesCDU Simulation Laboratory Guidelines
CDU Simulation Laboratory Guidelines
 
People Integration: Creating & Sustaining Value
People Integration: Creating & Sustaining ValuePeople Integration: Creating & Sustaining Value
People Integration: Creating & Sustaining Value
 
Resume
ResumeResume
Resume
 
Resume.02.2016
Resume.02.2016Resume.02.2016
Resume.02.2016
 
Week 8 nursing ethics(3)
Week 8 nursing ethics(3)Week 8 nursing ethics(3)
Week 8 nursing ethics(3)
 
Morals and ethics
Morals and ethicsMorals and ethics
Morals and ethics
 
Jonah Osawa - UX Portfolio
Jonah Osawa - UX PortfolioJonah Osawa - UX Portfolio
Jonah Osawa - UX Portfolio
 
RESUME - Angelique Gaspar
RESUME - Angelique GasparRESUME - Angelique Gaspar
RESUME - Angelique Gaspar
 
Intro fredlug
Intro fredlugIntro fredlug
Intro fredlug
 
Lord's prayer maori slides
Lord's prayer maori slidesLord's prayer maori slides
Lord's prayer maori slides
 
VoIP Network Tips
VoIP Network TipsVoIP Network Tips
VoIP Network Tips
 
Acceptance Journeys: Whose Life Can You Change with Love?
Acceptance Journeys: Whose Life Can You Change with Love?Acceptance Journeys: Whose Life Can You Change with Love?
Acceptance Journeys: Whose Life Can You Change with Love?
 

Similar to Turn Real User Data into a Science with Data Science Workbench

Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightAmazon Web Services LATAM
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015Dan Boutin
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationDenodo
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure DataTreasure Data, Inc.
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfssuserd23711
 

Similar to Turn Real User Data into a Science with Data Science Workbench (20)

Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
 

More from Dan Boutin

Velocity - 2016 Digital Performance Management
Velocity - 2016 Digital Performance ManagementVelocity - 2016 Digital Performance Management
Velocity - 2016 Digital Performance ManagementDan Boutin
 
538210-rc220-rum
538210-rc220-rum538210-rc220-rum
538210-rc220-rumDan Boutin
 
Sd times-june-24-2015
Sd times-june-24-2015Sd times-june-24-2015
Sd times-june-24-2015Dan Boutin
 
They don't call it Continuous Integration for nothing!
They don't call it Continuous Integration for nothing!They don't call it Continuous Integration for nothing!
They don't call it Continuous Integration for nothing!Dan Boutin
 

More from Dan Boutin (6)

Rosetta Stone
Rosetta StoneRosetta Stone
Rosetta Stone
 
Velocity - 2016 Digital Performance Management
Velocity - 2016 Digital Performance ManagementVelocity - 2016 Digital Performance Management
Velocity - 2016 Digital Performance Management
 
DZone-RUM
DZone-RUMDZone-RUM
DZone-RUM
 
538210-rc220-rum
538210-rc220-rum538210-rc220-rum
538210-rc220-rum
 
Sd times-june-24-2015
Sd times-june-24-2015Sd times-june-24-2015
Sd times-june-24-2015
 
They don't call it Continuous Integration for nothing!
They don't call it Continuous Integration for nothing!They don't call it Continuous Integration for nothing!
They don't call it Continuous Integration for nothing!
 

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

Turn Real User Data into a Science with Data Science Workbench

  • 1. July 23, 2015 “Let’s turn Real User Data into a Science!” Dan Boutin – Senior Product Evangelist
  • 2. mPulse What’s a Beacon? www.w3.org/TR/Beacon Total Beacons Collected since 6/2013: ~ 85 Billion Run rate over 3B per week and growing Projected ~ 175B by 1/1/166
  • 3. Big Data Challenges Data Scientists spend too much time ‘data wrangling’ “Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.” NY Times – August 17th, 2014
  • 4. Big Data Challenges Building a data science platform is very difficult Infrastructure •Choosing big data technologies and setting up a cluster can easily take 9 months or more Data Pipeline •Building a high performing big data schema requires specialized skills •Extracting, transforming, and loading of data (data wrangling) is an enormous time sink and a poor use of data scientists time Analysis and Workflow •Figuring out how you can ask questions of the data and how to visualize the results takes time that data scientists should be using to generate actionable insights from their studies
  • 6. Julia Language & iJulia Notebook UI Julia is a rising star in scientific programming processing speed support for parallel processing compatibility with 400+ prebuilt statistical packages large number and growing number of visualization libraries. Trade-Offs
  • 7. Why Julia? R vs Python vs Julia Modern compiler technology Data Connectivity Package Ecosystem Functional Programming Construct Integration with Python, C, C++, R, …
  • 8. © 2014 SOASTA. All rights reserved. July 28, 2015 8 Trade-Offs o Amazon Redshift is a fully managed, petabyte- scale data warehouse service in the cloud. • Columnar Database • Extremely fast query times • Attractive Economics
  • 9. Hadoop vs Big Query vs Red Shift vs … Capabilty – managed Big Data up to 2 petabytes Cloud Economics – $1,000 TB per month Why Red Shift?
  • 10. © 2014 SOASTA. All rights reserved. July 28, 2015 10 Now Let’s Talk Architecture
  • 11. Data Science WorkbenchData Science without the data wrangling, and much more Infrastructure Data Pipeline Analysis and Workflow • Data Science Workbench comes with the state-of-the-art technology you need to analyze your customer experiences • All of the real user beacon data is loaded into Data Science Workbench into a highly optimized schema ready for analysis • Data science is done with Julia, a remarkably fast and in-memory solution for analyzing huge data-sets • Access to an ever growing library of analysis functions and visualizations based on SOASTA’s and our customers’ expertise
  • 12.
  • 13. © 2014 SOASTA. All rights reserved. July 28, 2015 13 The Result! • Every customer beacon unpacked, transformed and loaded nightly by SOASTA into a SOASTA designed Schema in Amazon Redshift. This process designed, supplied and supported by SOASTA • Amazon Redshift is an extremely inexpensive and powerful BIG DATA database that can scale to almost 2 Petabytes in size. Amazon estimates compute and storage costs of $1,000/TB/month for our implementation • An online, interactive explore, discover and develop interface based on the Julia scientific programming language developed at MIT and the iJulia Notebook UI • SOASTA developed Functions & Statistical Models
  • 15.
  • 16. procedure Traffic is type Airplane_ID is range 1..10; -- 10 airplanes task type Airplane (ID: Airplane_ID); -- task representing airplanes, with ID as initialisation parameter type Airplane_Access is access Airplane; -- reference type to Airplane protected type Runway is -- the shared runway (protected to allow concurrent access) entry Assign_Aircraft (ID: Airplane_ID); -- all entries are guaranteed mutually exclusive entry Cleared_Runway (ID: Airplane_ID); entry Wait_For_Clear; private Clear: Boolean := True; -- protected private data - generally more than just a flag... end Runway; type Runway_Access is access all Runway; Trivia Time! @DanBoutinSOASTA 1983 1995
  • 17. © 2014 SOASTA. All rights reserved. Thank You! Dan Boutin – Senior Product Evangelist dboutin@soasta.com Mobile (404) 304-9529 @DanBoutinSOASTA
  • 18. July 23, 2015 “Let’s turn Real User Data into a Science!” Dan Boutin – Senior Product Evangelist