SlideShare a Scribd company logo
BUILDING BETTER PREDICTIVE MODELS WITH
COGNITIVE ASSISTANCE IN A DATA SCIENCE
ECOSYSTEM
Dr. Alex Liu
Chief Data Scientist
Analytics Services @ IBM
aliu@us.ibm.com
Sep 12, 2018
NASA JPL SVCP
ALEX LIU INTRODUCTION
Chief Data Scientist – Analytics Services at
IBM
A Data Scientist Thought Leader
Chief Data Scientist for a few corporations
before joined IBM
Taught advanced data analytics for the
University of South California and the
University of California at Irvine
Consulted for the United Nations, Ingram
Micro …
M.S. and Ph.D. from Stanford University
DATA SCIENCE: TURNING DATA INTO VALUE WITH MODELS
Data Science produces insights/values via
a complicated proccese
a big set of tools
3
BigInsights
(HDFS)
Cloudant
(DBaaS)
dashDB
(Analytics)
Swift
(Object
Storage)
SQDB
(Managed
DB2)
DATA SCIENCE PROJECTS RETURN VERY
VALUABLE RESULTS
BUT A LOT FAILED
Netflix, for example, integrates data science into each part of their
business; they estimate a billion dollars in incremental value from their
personalization and recommendation alone.
Knight Capital Group, for instance, lost $440 million in 45 minutes after
a mistake in updating a model.
Gartner estimated that 60% of big data projects fail in 2016, and in
2017.
Reproducibility crisis & fast insight demands
DATA SCIENCE – COMPLICATED
VERY COMPLICATED FLOWS JUST FOR MODEL BUILDING STAGE
5
• More than 50 different algorithms: SVM, Neural Net, Decision Trees/Forests, Naïve Bayes,
Regression, SMO, k-nearest Neighbor, Clustering, Rules, …
• Combinatorially explosive number of parameter choices per algorithm: kernel type, pruning
strategy, number of trees in a forest, learning rate, …
• Wide variation in performance across different algorithm implementations (e.g., SPSS vs
Python vs WEKA vs SPARK …)
• User-Defined algorithms
• Substantial cost in user and compute time
• User spends time on trying new combinations and parameters
• Computational cost for training a single SVM can exceed 24h
• Selection commonly based on data scientist bias
• Each additional pipeline stage increases complexity dramatically!
IMPORTANCE OF AUTOMATIONS & COMMUNITIES
AUTOMATION ~ Compare Data Scientist with and without computer-based
augmentation
Show that computer-augmented data science can reduce time-to-result by an order of
magnitude and improve quality of results
COMMUNITY ~ Self-learn and validate using open competitions or
evaluations (e.g., Kaggle, OpenML), IBM customer engagements
6
DS ASSISTED BY AI WITHIN A DS COMMUNITY
1) Bring automation into key areas of large-scale data analysis tasks
Overcome “analytic decision overload” for Data Scientists
Enable Data Scientist to:
view and interact with decision making process in an online fashion
obtain rapid insights from data to answer key questions
2) Integrated System of tools, working with DS communities
An integrated system for scientists to easily handle data and analytical and
application needs
Upload and prepare data from various sources
Cross-platform modeling and machine learning implementation
Cross-platform analytic deployments on Big Data platforms
IBM Research
7
Augumention
Vs.
Automation
Db2 Warehouse
on Cloud
IBM
Cloudant
RStudio
Jupyter Notebooks
Data Science
Tools
IBM
Cloud Object
Storage
IBM
Compose
IBM Cloud
Data from the IBM Cloud
& third party clouds
On-premises data
Watson Studio
Watson Data
Platform
Persistence
Cloud Services
WATSON STUDIO LOCAL
IBM Analytics
Engine
Data Steward
Data scientist
Spark ML
Hadoop
Data Refinery
Data Engineer
Cognos
Watson Analytics
Dashboards
Developer
One Platform for
IBM Analytics
Team
32 Different
Connections
Plugin
IBM Confidential9
IBM Data
Science
Experience
summary
IBM Data Science Experience summary
TAKING A DATA SCIENCE ECOSYSTEM APPROACH
A DATA SCIENCE ECOSYSTEM HAS THREE BASIC ELEMENTS
1) DATA PORTAL
2) DATA SCIENCE COMMUNITY
3) DATA SCIENCE PLATFORM
RMDS COMMUNITIES AT IBM GLENDALE
Pasadena/Glendale Meetup Community
Local face to face community – more than 1100 members
https://www.meetup.com/RMDS_LA/
https://www.linkedin.com/groups/1895501 has 29K participants
Aim to create an environment for utilizing big data analytics to create smart cities and smart commerce
105,000+
collections
349 citizen apps
500,000 data
resources
175 agencies
450 APIs
14
Source: City of LA Mayor’s Tech Advisor Presentation
at RMDS Meetup.
EX1: citizen data science
ecosystem with open data
EXAMPLE – 1KM VISIBLE (GOES-R WILL BE
EVEN BETTER)
http://www.ibm.com/weather
EX2: A data science ecosystem with weather data
101
010
101
Platform
~ IBM DSX
Weather Data Transaction
Analytical
Insights for
Transformation
Connecting all
the data
scientists from
a DS
community
Applications
Optimizing Operations Solutions
IoT Data
A MAJORITY OF RETAIL AND CP EXECUTIVES
INDICATE WEATHER HAS A SIGNIFICANT
IMPACT ON BUSINESS DECISION-MAKING
50%
50%
45%
41%
40%
39%
35%
33%
Work safely
Inventory pricing
Customer interactions
Marketing / messaging
Inventory placements
Routes and transportation
Supply chain / sourcing
Product development
Weather either influences all human decisions or triggers automated actions in the following areas
51%Worker allocation and staff scheduling


BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE ECOSYSTEM

More Related Content

What's hot

Big Data Analytics on Hadoop RainStor Infographic
Big Data Analytics on Hadoop RainStor InfographicBig Data Analytics on Hadoop RainStor Infographic
Big Data Analytics on Hadoop RainStor Infographic
RainStor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and researchkchine3
 
OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)
Keiichiro Ono
 
DW Appliance
DW ApplianceDW Appliance
DW Appliance
Shankar R
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Introduction to the graph technologies landscape
Introduction to the graph technologies landscapeIntroduction to the graph technologies landscape
Introduction to the graph technologies landscape
Linkurious
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Vijay Srinivas Agneeswaran, Ph.D
 
Satwik mishra resume
Satwik mishra resumeSatwik mishra resume
Satwik mishra resume
Satwik Mishra
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
Andrei Savu
 
IBM and Apache Spark
IBM and Apache SparkIBM and Apache Spark
IBM and Apache Spark
Chris Sparshott
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
iACT Global
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
CodeOps Technologies LLP
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
yashbheda
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
Big data tools
Big data toolsBig data tools
Big data tools
Novita Sari
 
alphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat botsalphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat bots
André Karpištšenko
 
Intelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudIntelligent internet of things with Google Cloud
Intelligent internet of things with Google Cloud
Henrik Hammer Eliassen
 
PyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image RegistrationPyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image Registration
Matthew McCormick
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
Raffaele Montella
 

What's hot (20)

Big Data Analytics on Hadoop RainStor Infographic
Big Data Analytics on Hadoop RainStor InfographicBig Data Analytics on Hadoop RainStor Infographic
Big Data Analytics on Hadoop RainStor Infographic
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
 
OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)
 
DW Appliance
DW ApplianceDW Appliance
DW Appliance
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Introduction to the graph technologies landscape
Introduction to the graph technologies landscapeIntroduction to the graph technologies landscape
Introduction to the graph technologies landscape
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Satwik mishra resume
Satwik mishra resumeSatwik mishra resume
Satwik mishra resume
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
IBM and Apache Spark
IBM and Apache SparkIBM and Apache Spark
IBM and Apache Spark
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Big data tools
Big data toolsBig data tools
Big data tools
 
alphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat botsalphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat bots
 
Intelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudIntelligent internet of things with Google Cloud
Intelligent internet of things with Google Cloud
 
PyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image RegistrationPyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image Registration
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 

Similar to BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE ECOSYSTEM

Building Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart CommerceBuilding Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart Commerce
Alex Liu
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
Prof.Balakrishnan S
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
Clusterpoint
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Tomasz Bednarz
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
oj08
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Augmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data ScientistAugmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data Scientist
WhereScape
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
AWS Germany
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
VMware Tanzu
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Amazon Web Services
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
Denodo
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
Wasm1953
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
Selvaraj Kesavan
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 

Similar to BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE ECOSYSTEM (20)

Building Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart CommerceBuilding Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart Commerce
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Augmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data ScientistAugmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data Scientist
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE ECOSYSTEM

  • 1. BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE ECOSYSTEM Dr. Alex Liu Chief Data Scientist Analytics Services @ IBM aliu@us.ibm.com Sep 12, 2018 NASA JPL SVCP
  • 2. ALEX LIU INTRODUCTION Chief Data Scientist – Analytics Services at IBM A Data Scientist Thought Leader Chief Data Scientist for a few corporations before joined IBM Taught advanced data analytics for the University of South California and the University of California at Irvine Consulted for the United Nations, Ingram Micro … M.S. and Ph.D. from Stanford University
  • 3. DATA SCIENCE: TURNING DATA INTO VALUE WITH MODELS Data Science produces insights/values via a complicated proccese a big set of tools 3 BigInsights (HDFS) Cloudant (DBaaS) dashDB (Analytics) Swift (Object Storage) SQDB (Managed DB2)
  • 4. DATA SCIENCE PROJECTS RETURN VERY VALUABLE RESULTS BUT A LOT FAILED Netflix, for example, integrates data science into each part of their business; they estimate a billion dollars in incremental value from their personalization and recommendation alone. Knight Capital Group, for instance, lost $440 million in 45 minutes after a mistake in updating a model. Gartner estimated that 60% of big data projects fail in 2016, and in 2017. Reproducibility crisis & fast insight demands
  • 5. DATA SCIENCE – COMPLICATED VERY COMPLICATED FLOWS JUST FOR MODEL BUILDING STAGE 5 • More than 50 different algorithms: SVM, Neural Net, Decision Trees/Forests, Naïve Bayes, Regression, SMO, k-nearest Neighbor, Clustering, Rules, … • Combinatorially explosive number of parameter choices per algorithm: kernel type, pruning strategy, number of trees in a forest, learning rate, … • Wide variation in performance across different algorithm implementations (e.g., SPSS vs Python vs WEKA vs SPARK …) • User-Defined algorithms • Substantial cost in user and compute time • User spends time on trying new combinations and parameters • Computational cost for training a single SVM can exceed 24h • Selection commonly based on data scientist bias • Each additional pipeline stage increases complexity dramatically!
  • 6. IMPORTANCE OF AUTOMATIONS & COMMUNITIES AUTOMATION ~ Compare Data Scientist with and without computer-based augmentation Show that computer-augmented data science can reduce time-to-result by an order of magnitude and improve quality of results COMMUNITY ~ Self-learn and validate using open competitions or evaluations (e.g., Kaggle, OpenML), IBM customer engagements 6
  • 7. DS ASSISTED BY AI WITHIN A DS COMMUNITY 1) Bring automation into key areas of large-scale data analysis tasks Overcome “analytic decision overload” for Data Scientists Enable Data Scientist to: view and interact with decision making process in an online fashion obtain rapid insights from data to answer key questions 2) Integrated System of tools, working with DS communities An integrated system for scientists to easily handle data and analytical and application needs Upload and prepare data from various sources Cross-platform modeling and machine learning implementation Cross-platform analytic deployments on Big Data platforms IBM Research 7 Augumention Vs. Automation
  • 8. Db2 Warehouse on Cloud IBM Cloudant RStudio Jupyter Notebooks Data Science Tools IBM Cloud Object Storage IBM Compose IBM Cloud Data from the IBM Cloud & third party clouds On-premises data Watson Studio Watson Data Platform Persistence Cloud Services WATSON STUDIO LOCAL IBM Analytics Engine Data Steward Data scientist Spark ML Hadoop Data Refinery Data Engineer Cognos Watson Analytics Dashboards Developer One Platform for IBM Analytics Team 32 Different Connections Plugin
  • 11. IBM Data Science Experience summary
  • 12. TAKING A DATA SCIENCE ECOSYSTEM APPROACH A DATA SCIENCE ECOSYSTEM HAS THREE BASIC ELEMENTS 1) DATA PORTAL 2) DATA SCIENCE COMMUNITY 3) DATA SCIENCE PLATFORM
  • 13. RMDS COMMUNITIES AT IBM GLENDALE Pasadena/Glendale Meetup Community Local face to face community – more than 1100 members https://www.meetup.com/RMDS_LA/ https://www.linkedin.com/groups/1895501 has 29K participants Aim to create an environment for utilizing big data analytics to create smart cities and smart commerce
  • 14. 105,000+ collections 349 citizen apps 500,000 data resources 175 agencies 450 APIs 14 Source: City of LA Mayor’s Tech Advisor Presentation at RMDS Meetup. EX1: citizen data science ecosystem with open data
  • 15. EXAMPLE – 1KM VISIBLE (GOES-R WILL BE EVEN BETTER) http://www.ibm.com/weather
  • 16. EX2: A data science ecosystem with weather data 101 010 101 Platform ~ IBM DSX Weather Data Transaction Analytical Insights for Transformation Connecting all the data scientists from a DS community Applications Optimizing Operations Solutions IoT Data
  • 17. A MAJORITY OF RETAIL AND CP EXECUTIVES INDICATE WEATHER HAS A SIGNIFICANT IMPACT ON BUSINESS DECISION-MAKING 50% 50% 45% 41% 40% 39% 35% 33% Work safely Inventory pricing Customer interactions Marketing / messaging Inventory placements Routes and transportation Supply chain / sourcing Product development Weather either influences all human decisions or triggers automated actions in the following areas 51%Worker allocation and staff scheduling  