Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Siyuan Yin
Solution Engineer, Oracle Reston Hub
Stuti Deshpande
Solution Engineer, Oracle Reston Hub
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Agenda for today
- Data Science in Sandbox vs. Industry
- Data Science Challenges Break-down
- Oracle Approach to DS workloads
- Demo time!
- Q&A
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
A Little Bit About Me...
- Background in Cognitive Psychology and Information Science
- Oracle Cloud Solution Engineer focusing on data science solutions using
cloud
- Multiple DS/ML projects in program & with startups in industry, mostly on
user behavior & engagement
...little did I know the real world is way more interesting
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Data Science in Sandbox
- Where grass is green, data is clean, your teammates are data-geniuses,
and you can focus on playing with fancy machine learning models 100%
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Workflow in Sandbox
Data
download to
laptop
Open source
frenzy!
Try different
models/classi
fiers, choose
deep learning
Evaluate &
Tuning
Report performance with
confusion Matrix, AUC,
MSE, etc
Scoring for competition, or
write 13-pages report
Hooray!
Oversimplify the industry:
1. Machine learning as a toy, and the only solution
2. Goal: cutting-edge technology
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Workflow in Industry
Data
Collection
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Workflow in Industry
Business problem
- What values can you
add?
Improve product
- Inferential stats
- A/B testing
Build new feature
- Market/domain analysis
Evaluation metrics?
- User engagement
- User impression
What data?
- User behavior, feature
usage, images
How to get data?
- Pipelines, connectors
What if no data?
Predicting housing
price from a picture
- how good is this
feature?
Data Collection
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Workflow in Industry
Data
Collection
Open source
frenzy!
Build
models/classi
fiers, choose
deep learning
Problem
Definition
Data
Storage
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Workflow in Industry
Tools installed and supported
both in production
environment?
Access to compute/storage
resources when needed?
Diversity: collaborate when
teammate using different tools?
Python(2.7 vs. 3.4),
R, Julia, SAS,
Excel……
Compatible?
Evaluate current
price prediction
feature Open source
vs. Proprietary
Environment & Resource
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Workflow in Industry
Data
Collection
Open source/
Proprietary
DS tools
Problem
Definition
Resource
Allocation
Data approaches
based on problem
Evaluate &
Optimization
Performance
Analysis
Productionize
Profit!
Version
Control
Performance
Monitoring
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Sandbox Industry
100% on FANCY MODELS!
80% of time:
- Begging for data,
- Begging for resource,
- 50 meetings with marketing
people to prove “this will bring
$$$ ”,
- Begging software engineers to
help deploy your result
20% of time:
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Data Collection
Data Storage
Discovery Data
Analysis
Inference
Machine Learning
& Optimization
AI & Deep
Learning
+ Marketing
+ Infrastructure
& Architecture
+ more….
One-fit-all Solutions?
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Problems break-down
Talent
• Business sense
• Domain
knowledge
• Data Science skill
sets
• Be skeptical about
results
Operation
Technology
• Resources
• Environment
supporting major
DS tools
• Performant
model/algorithm
• Long wait for
resources
• Difficult
collaborating with
talents
• No standardized
workflow
What Oracle
is solving!
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Oracle’s Approach
Oracle Data Science Cloud
Oracle PaaS & IaaS
Projects Notebooks
Open Source
Languages &
Libraries
Version
Control
Use Case
Templates
Model
Build & Train
Self-Service Scalable Compute (OCI)
Object
Store
Catalog
Data
Lake
Streamin
g
Autonomous
Data Warehouse
Model
Deployment
Model
Monitoring
Access
Controls &
Security
Resources
- Scalable
- Self-service access
Collaboration
- Native support for Github,
Gitlab, Bitbucket
- Project-driven UI
DS-oriented environments
- Support latest open source
tools
- Preconfigured environments
Operationalize ML models
- Easy deployment as
API
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Resource
RequestData
Collection
Open source/
Proprietary
DS tools
Problem
Definition
Data approaches
based on problem
Evaluate &
Optimization
Performance
Analysis
Productionize
Profit!
Version
Control
Oracle Data Science Cloud
Performance
Monitoring
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Data Scientists
● Self-sufficient
● Collaborative
environment
● Sustainable,
standardized workflow
IT Admins
● Preconfigured, data
scientists approved
environments
● Scalable resources,
low maintenance
Software Engineers
Business
Stakeholders
As our own customer, we are happy
● Integrate data workflow
with business decision
making process
● Easy collaboration with
data scientists
● Access to performant,
production-ready
solutions
● Easy deployment in all
scenarios, no more being
“dragged and dropped”
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Demo Time!
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Demo 1 - Customer Churn Analysis
Business Problem:
- What customer will terminate their service in the next 30 days?
- How can we know in advance?
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
Q&A
Copyright © 2018 , Oracle and/or its affiliates. All rights reserved. |
BTW, There is one more thing…...
Oracle Cloud Trials!
- $500 credits
- Unlimited access to Oracle cloud products
- Sign up at Oracle table

Oracle Data Science Platform

  • 1.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
  • 2.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Siyuan Yin Solution Engineer, Oracle Reston Hub Stuti Deshpande Solution Engineer, Oracle Reston Hub
  • 3.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Agenda for today - Data Science in Sandbox vs. Industry - Data Science Challenges Break-down - Oracle Approach to DS workloads - Demo time! - Q&A
  • 4.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | A Little Bit About Me... - Background in Cognitive Psychology and Information Science - Oracle Cloud Solution Engineer focusing on data science solutions using cloud - Multiple DS/ML projects in program & with startups in industry, mostly on user behavior & engagement ...little did I know the real world is way more interesting
  • 5.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
  • 6.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data Science in Sandbox - Where grass is green, data is clean, your teammates are data-geniuses, and you can focus on playing with fancy machine learning models 100%
  • 7.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Workflow in Sandbox Data download to laptop Open source frenzy! Try different models/classi fiers, choose deep learning Evaluate & Tuning Report performance with confusion Matrix, AUC, MSE, etc Scoring for competition, or write 13-pages report Hooray! Oversimplify the industry: 1. Machine learning as a toy, and the only solution 2. Goal: cutting-edge technology
  • 8.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Workflow in Industry Data Collection
  • 9.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Workflow in Industry Business problem - What values can you add? Improve product - Inferential stats - A/B testing Build new feature - Market/domain analysis Evaluation metrics? - User engagement - User impression What data? - User behavior, feature usage, images How to get data? - Pipelines, connectors What if no data? Predicting housing price from a picture - how good is this feature? Data Collection
  • 10.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Workflow in Industry Data Collection Open source frenzy! Build models/classi fiers, choose deep learning Problem Definition Data Storage
  • 11.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Workflow in Industry Tools installed and supported both in production environment? Access to compute/storage resources when needed? Diversity: collaborate when teammate using different tools? Python(2.7 vs. 3.4), R, Julia, SAS, Excel…… Compatible? Evaluate current price prediction feature Open source vs. Proprietary Environment & Resource
  • 12.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Workflow in Industry Data Collection Open source/ Proprietary DS tools Problem Definition Resource Allocation Data approaches based on problem Evaluate & Optimization Performance Analysis Productionize Profit! Version Control Performance Monitoring
  • 13.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Sandbox Industry 100% on FANCY MODELS! 80% of time: - Begging for data, - Begging for resource, - 50 meetings with marketing people to prove “this will bring $$$ ”, - Begging software engineers to help deploy your result 20% of time:
  • 14.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data Collection Data Storage Discovery Data Analysis Inference Machine Learning & Optimization AI & Deep Learning + Marketing + Infrastructure & Architecture + more…. One-fit-all Solutions?
  • 15.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Problems break-down Talent • Business sense • Domain knowledge • Data Science skill sets • Be skeptical about results Operation Technology • Resources • Environment supporting major DS tools • Performant model/algorithm • Long wait for resources • Difficult collaborating with talents • No standardized workflow What Oracle is solving!
  • 16.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle’s Approach Oracle Data Science Cloud Oracle PaaS & IaaS Projects Notebooks Open Source Languages & Libraries Version Control Use Case Templates Model Build & Train Self-Service Scalable Compute (OCI) Object Store Catalog Data Lake Streamin g Autonomous Data Warehouse Model Deployment Model Monitoring Access Controls & Security Resources - Scalable - Self-service access Collaboration - Native support for Github, Gitlab, Bitbucket - Project-driven UI DS-oriented environments - Support latest open source tools - Preconfigured environments Operationalize ML models - Easy deployment as API
  • 17.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Resource RequestData Collection Open source/ Proprietary DS tools Problem Definition Data approaches based on problem Evaluate & Optimization Performance Analysis Productionize Profit! Version Control Oracle Data Science Cloud Performance Monitoring
  • 18.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data Scientists ● Self-sufficient ● Collaborative environment ● Sustainable, standardized workflow IT Admins ● Preconfigured, data scientists approved environments ● Scalable resources, low maintenance Software Engineers Business Stakeholders As our own customer, we are happy ● Integrate data workflow with business decision making process ● Easy collaboration with data scientists ● Access to performant, production-ready solutions ● Easy deployment in all scenarios, no more being “dragged and dropped”
  • 19.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Demo Time!
  • 20.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Demo 1 - Customer Churn Analysis Business Problem: - What customer will terminate their service in the next 30 days? - How can we know in advance?
  • 21.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Q&A
  • 22.
    Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | BTW, There is one more thing…... Oracle Cloud Trials! - $500 credits - Unlimited access to Oracle cloud products - Sign up at Oracle table

Editor's Notes

  • #6 These datasets are: Relatively small Clean Machine learning heavy. These are datasets that are commonly used in “Sandbox” Picture Source: Titanic: http://davidabramsbooks.blogspot.com/2012/04/soup-and-salad-titanic-books-reading.html Iris Classification: https://www.mygardenlife.com/plant-library/2244/iris/species MNIST Dataset: https://github.com/Orrimp/mnist_neural_net, http://yann.lecun.com/exdb/mnist/ Zillow: https://www.kaggle.com/zillow/zecon (banner)
  • #8 What is wrong with this workflow? Because it oversimplify the industry
  • #9 Picture source: Stop sign: http://vampireknight.wikia.com/wiki/File:Stop_Sign.png
  • #11 Picture Source: Stop sign: https://memegenerator.net/img/images/12726641/stop-signs.jpg
  • #14 Now we can see why people love the sandbox mindset. Picture Source: Beer: https://i.imgflip.com/zghl1.jpg Cat: https://kittentoob.com/wp-content/uploads/2013/04/cat.jpg
  • #22 Picture Source: Question & Answers: http://medgyne.com/wp-content/uploads/2017/04/IMG_5892.jpg