A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
The Evolution of Data Science: From Mainframes to the Cloud
1. The Evolution of Data Science
Kenny Daniel
CTO, Algorithmia
July 24, 2015
2. Kenny Daniel - CTO, Algorithmia
• Graduate research in Artificial Intelligence and Mechanism Design
• Multiple published algorithms and papers in Machine Learning
• Received $1 million from DOT “Engineering Tomorrow’s Transportation Market”
• B.S. Carnegie Mellon University, M.S., Ph.D. (on leave) USC
• Data Scientist and Computer Vision specialist for Delectable, Inc
• Initial and current overall architect of Algorithmia Platform
6. Source and Inspiration: http://www.slideshare.net/AlbertWenger/the-no-stackstartup
1990s Connectivity
$10,000 per month
Servers
$20,000 per box
Storage
$1,000/GB
2000s Connectivity
$1,000 per month
Servers
$1,000 per box
Storage
$10/GB
2010s Connectivity
10 cents/GB
Servers
20 cents/hour
Storage
12 cents/GB
NOW Backend using Parse
Search using Algolia
Synchronization using Firebase
Video calls and SMS using Twilio
Payments using Stripe
Video recording using Ziggeo
Send and track emails using Mailgun
Customer service using Intercom
Ship product using Shyp
7. “no one got fired for using AWS”
cost, security, convenience
8. “We used to leak memory.
Now we leak instances.
Soon we will leak entire data centers.”
- Dan Kaminsky
9. Previously, data analysis was done by domain experts
Now, shift toward data science as its own field
A new field is born
13. “Data is inherently dumb. It doesn’t actually do anything unless
you know how to use it...
The next digital gold rush will be focused on how you do
something with data.”
- Peter Sondergaard (Gartner Research)
14. 1990s Technology
HPC, Mainframes
2000s
2010s
NOW Generalist Big Data such as Amazon EMR
Large Data Processing such as Databricks
Real Time Processing such as Amazon Kinesis
Data Repositories such as Socrata
Data Collectors such as Kimono
DSaaS for Customer Analytics such as Captricity
DSaaS for Marketing such as Acxiom
DSaaS for Security such as Fortscale
Hosted Machine Learning such as BigML, Dato
Algorithms-as-a-Service such as Algorithmia
Technology
In-house clusters
Technologies
Cloud, Hadoop, Spark
Users
Corporations, tech startups
Users
Individual data scientists
Users
Researchers, hw engineers, committees
16. Future of Data Science
● How will these trends continue?
● What will future tools look like?
● What is the role of data scientists going forward?
17. Data is less structured, and less amenable to traditional
data analysis without pre-processing
● Unstructured text
● Images
● Video
Future… new data sources