Description
Slides from RubyConf 2012 talk:
"Big data and data science have become hot topics in the developer community during the past year. This talk will show how ruby is used to build real data driven products at scale.
Data scientist Ryan Weald walks through the building of data driven products at Sharethrough, from exploratory analysis to production systems, with an emphasis on the role Ruby plays in each phase of the data driven product cycle.
He discusses how Ruby interacts with other data analysis tools -- such as Hadoop, Cascading, Python, and Javascript -- with a constructive look at Ruby's weaknesses, and presents suggestions on how Ruby can contribute more to data science in the areas of visualization and machine learning."
4. Sharethrough
Native video
advertising platform
Friday, November 2, 12 4
5. Outline
1) What is a data driven product?
2) What does the development cycle look like for a
data driven product?
3) Where does Ruby fit in the world of “data
science?”
4) How can Ruby be improved to stay relevant in
the age of “big data?”
Friday, November 2, 12 5
21. Data dump
% of users on publisher X and Y
What is value of a user on an ad
network?
What is the supply of given
type of user?
Can we predict it?
Friday, November 2, 12 21
22. Phase 2
Data
Collection
&
Cleaning
Friday, November 2, 12 22
53. Ruby Improvements
• Graphing library
• Unified matrix and vector library
• More publishing around Ruby & ML
• Academic buy in
Friday, November 2, 12 53
54. Ruby + Data
=
Agile Data Products
Friday, November 2, 12 54