Data science is revolutionizing the world around us. We’re incorporating artificial intelligence, machine learning, and data-driven decision making into all aspects of business. However, many software developers have yet to learn how to leverage these practices to create better software. In this presentation, we’ll learn how expert developers are using data science to create better software. We’ll learn how to use data analytics, machine learning, and anticipatory design to create more intelligent software. In addition, we’ll learn how to use data from our dev-ops pipeline to improve our software development practices.
27. Data Science Skills
Programming
Working with data
Descriptive statistics
Data visualization
Statistical modeling
Handling Big Data
Machine learning
Deploying to production
29. Programming Languages
0% 10% 20% 30% 40% 50% 60% 70%
SQL
Python
R
Bash
JavaScript
Java
Scala
Visual Basic/VBA
C++
Matlab
Share of Respondents
Source: O’Reilly 2017 Data Science Salary Survey
30. Relational Databases
0% 5% 10% 15% 20% 25% 30% 35% 40%
MySQL
Microsoft SQL Server
PostgreSQL
Oracle
SQLite
Teradata
IBM DB2
Netezza (IBM)
Vertica
SAP HANA
Share of Respondents
Source: O’Reilly 2017 Data Science Salary Survey
31. Big Data Platforms
0% 5% 10% 15% 20% 25% 30%
Spark
Hive
MongoDB
Amazon Redshift
Kafka
Pig
Redis
Zookeeper
Imapla
Hbase
Share of Respondents
Source: O’Reilly 2017 Data Science Salary Survey
32. Spreadsheets, BI, Reporting
0% 10% 20% 30% 40% 50% 60% 70%
Excel
Power BI
QlikView
BusinessObjects
PowerPivot
Cognos
Alteryx
Microstrategy
Adobe Analytics
Oracle BI
Share of Respondents
Source: O’Reilly 2017 Data Science Salary Survey
37. The Data Science Process
Find a
question
Collect
the data
Prepare
the data
Data
38. The Data Science Process
Find a
question
Collect
the data
Prepare
the data
Create
a model
Data
39. The Data Science Process
Find a
question
Collect
the data
Prepare
the data
Create
a model
Evaluate
the model
Data
40. The Data Science Process
Find a
question
Collect
the data
Prepare
the data
Create
a model
Evaluate
the model
Deploy
the model
Data
41. The Data Science Process
Find a
question
Collect
the data
Prepare
the data
Create
a model
Evaluate
the model
Deploy
the model
Data
42. The Data Science Process
Iterative process
Find a
question
Explore
the data
Prepare
the data
Create
a model
Evaluate
the
model
Deploy
the
model
Data
43. The Data Science Process
Iterative process
Non-sequential
Find a
question
Explore
the data
Prepare
the data
Create
a model
Evaluate
the
model
Deploy
the
model
Data
44. The Data Science Process
Iterative process
Non-sequential
Early termination
Find a
question
Explore
the data
Prepare
the data
Create
a model
Evaluate
the
model
Deploy
the
model
Data
74. Show me sales by gender and marital status.
“Show me sales by
gender and marital
status.”
Internet Sales
Displaying sum of sales by gender and marital status
Marital Status:
Married
Single
Male
Female
$0k $5k $10k $15k
93. Hypothesis-Driven Development
Hypothesis
ExperimentAnalysis
Analysis:
80% of users
prefer feature A
Hypothesis:
Users will prefer feature A
over feature B
Problem:
Two features; similar benefits
Experiment:
Survey 100 users and
ask for their preference
Level of Support:
At least 60% must
prefer feature A
94. Hypothesis-Driven Development
Hypothesis
ExperimentAnalysis
Analysis:
80% of users
prefer feature A
Hypothesis:
Users will prefer feature A
over feature B
Problem:
Two features; similar benefits
Experiment:
Survey 100 users and
ask for their preference
Level of Support:
At least 60% must
prefer feature A
Decision:
Only build feature A
102. Hypothesis Stories
Pair Programming Hypothesis
We assume that pair programming
Will result in less software defects
We will have succeeded when we have seen a 20%
or greater decrease in defects after 4 sprints.
124. Advice for Success
Get buy-in from leadership
Focus on low-hanging fruit
Don’t silo data science teams
Democratize your data
125. Advice for Success
Get buy-in from leadership
Focus on low-hanging fruit
Don’t silo data science teams
Democratize your data
Embrace smart failure
Focus on feedback
Embed data collection
Avoid the Observer Effect