DESIGN FOR X
exploring data science product design with apache spark + graphlab {create}
@amcasari @Concur
data science summit 2016, san francisco
nasa
data science via random walks
senior product mgr +
data scientist
@ Concur Labs
control systems
engineering +
robotics + legos
officer in USN
operations research
analyst
wandering dirtbag +
conservation volunteer
EE +
applied math
+ complex systems
underwater robotics
engineer
technology
consultant
SAHM
INSANELY QUICK INTRO TO +
➤ Concur Accelerator Team
➤ Concur Labs
➤ Incubator (still brewing)
850K
Users log into Concur
300K
Expense reports
processed
120K
Trips booked
170M
Trips & expense
reports warehoused
Typical Day at Concur
How do we encourage a culture of innovation
while delivering quality service to our existing
33,000 business clients and 40M users?
DESIGN SPRINTS FOR DATA SCIENCEY PROTOTYPES
courtesy google ventures {we iterated…because data}
INSANELY QUICK INTRO TO
➤ “fast and general engine for large-scale data processing”
➤ advanced cyclic data flow and in-memory computing > runs
10x-100x faster than Hadoop MR
➤ interactive shells in several languages (incl. SQL)
➤ performant + scalable
courtesy databricks
ALMOST AS INSANELY QUICK INTRO TO +
➤ graphlab create is based on a python data science library
developed + (some) os’d by turi
➤ SFrame <<>> Spark DataFrame | SparkRDD
➤ (yes it works with Open Source SFrame and GLC)
courtesy turi
WHAT PROBLEM DO WE WANT TO DATA SCIENCE?
Knowledge
Gaps
IOT
Networks
Bots
Fairness
+
➤ “We could {build this} {answer this better} if….”
➤ Reciprocal Data Applications
DESIGN FOR KNOWLEDGE GAPS
rda rdarda
choose
your data
storage
choose
your data
storage
choose
your data
storage
the app you
really
want to make
➤ “Can we trust our sensors?”
➤ “Has our network been hacked?”
DESIGN FOR IOT NETWORKS
device
device
device
alerts,
notifications,
monitoring
dashboards
data
services
Anomaly Detection Toolkit
TimeSeries <<>> SFrame
➤ “How do we create a conversational interface?”
….nothing new, just the burning question since Turing, 1950
DESIGN FOR BOTS
what NOT to do….
non-creepy
unisex
animal mascot
conversational
ui
choose
or
create
your
framework
choose your data storage
Advanced Deep Learning
Text Analysis Toolkit
Graph Analytics Toolkit
➤ know your biases + limitations
➤ in your data, their data, all the data
➤ in your feature selection
➤ in your algorithm
…..because ethics (these ALL bias your results + communications)
DESIGN FOR FAIRNESS
learn more at data & society’s case studies
+ +
open source. reproducible. transparent.
{THANKS MUCH}
➤ Concur is hiring!
➤ SAP + SAP Ariba are
hiring!
concurlabs.com
github.com/
concurlabs
➤ example notebooks will
be posted on our
github in the future
@amcasari

Design for X: Exploring Product Design with Apache Spark and GraphLab

  • 1.
    DESIGN FOR X exploringdata science product design with apache spark + graphlab {create} @amcasari @Concur data science summit 2016, san francisco nasa
  • 2.
    data science viarandom walks senior product mgr + data scientist @ Concur Labs control systems engineering + robotics + legos officer in USN operations research analyst wandering dirtbag + conservation volunteer EE + applied math + complex systems underwater robotics engineer technology consultant SAHM
  • 3.
    INSANELY QUICK INTROTO + ➤ Concur Accelerator Team ➤ Concur Labs ➤ Incubator (still brewing) 850K Users log into Concur 300K Expense reports processed 120K Trips booked 170M Trips & expense reports warehoused Typical Day at Concur How do we encourage a culture of innovation while delivering quality service to our existing 33,000 business clients and 40M users?
  • 4.
    DESIGN SPRINTS FORDATA SCIENCEY PROTOTYPES courtesy google ventures {we iterated…because data}
  • 5.
    INSANELY QUICK INTROTO ➤ “fast and general engine for large-scale data processing” ➤ advanced cyclic data flow and in-memory computing > runs 10x-100x faster than Hadoop MR ➤ interactive shells in several languages (incl. SQL) ➤ performant + scalable courtesy databricks
  • 6.
    ALMOST AS INSANELYQUICK INTRO TO + ➤ graphlab create is based on a python data science library developed + (some) os’d by turi ➤ SFrame <<>> Spark DataFrame | SparkRDD ➤ (yes it works with Open Source SFrame and GLC) courtesy turi
  • 7.
    WHAT PROBLEM DOWE WANT TO DATA SCIENCE? Knowledge Gaps IOT Networks Bots Fairness +
  • 8.
    ➤ “We could{build this} {answer this better} if….” ➤ Reciprocal Data Applications DESIGN FOR KNOWLEDGE GAPS rda rdarda choose your data storage choose your data storage choose your data storage the app you really want to make
  • 9.
    ➤ “Can wetrust our sensors?” ➤ “Has our network been hacked?” DESIGN FOR IOT NETWORKS device device device alerts, notifications, monitoring dashboards data services Anomaly Detection Toolkit TimeSeries <<>> SFrame
  • 10.
    ➤ “How dowe create a conversational interface?” ….nothing new, just the burning question since Turing, 1950 DESIGN FOR BOTS what NOT to do…. non-creepy unisex animal mascot conversational ui choose or create your framework choose your data storage Advanced Deep Learning Text Analysis Toolkit Graph Analytics Toolkit
  • 11.
    ➤ know yourbiases + limitations ➤ in your data, their data, all the data ➤ in your feature selection ➤ in your algorithm …..because ethics (these ALL bias your results + communications) DESIGN FOR FAIRNESS learn more at data & society’s case studies + + open source. reproducible. transparent.
  • 12.
    {THANKS MUCH} ➤ Concuris hiring! ➤ SAP + SAP Ariba are hiring! concurlabs.com github.com/ concurlabs ➤ example notebooks will be posted on our github in the future @amcasari