Data Science Day New York: Data Scientist - The New Data Analyst


Published on

Learn what a Data Scientist is, what Data Scientists do, and what kind of problems Data Scientists are solving today.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 90% of industrial machine learning is feature extraction. What I really do is ETL.
  • Drive revenue and new business. Data is too important to be left to business people. The data warehouse is where data goes to die-- to be used in operational reporting or diagnosing problems. When we’re talking about data products, we’re talking about creating new revenue streams, optimizing existing ones, and solving problems for customers and for the business.
  • 1) Means I collect everything– I don’t want to waste time getting data from operational systems every time I need something new. 2) Means I keep all of the phases of data available to me– from the raw stuff, to cleansed stuff, to joined stuff. 3) Implications for denormalization-- 1) we go beyond dimensional modeling to full on denormalization, usually along the lines of one of our conformed dimensions (product, customer, etc.)
  • Similar to the EDW team. For a small datawarehouse/datamart, the DW architect is the ETL developer, the DBA, the dashboard builder, and the business analyst all rolled in to one. When we are talking about data products– classifiers, recommenders, interactive or real-time data tools, we need to bring in the ability to take things to production.
  • Most important decision: the metrics you’re going to use to measure performance. It is an anti-pattern to solve a problem exactly once. You should either solve a problem 0 times or N times.
  • Time is money. Your time costs a lot more than the cost of data storage. Data acquisition, data processing, reuse code. All things you do to save money over the long term.
  • Data Science Day New York: Data Scientist - The New Data Analyst

    1. 1. Data Scientist – The New Data Analyst Josh Wills, Senior Director of Data Science1
    2. 2. About Me2
    3. 3. What Do Data Scientists Do?3
    4. 4. What I Think I Do4
    5. 5. What Other People Think I Do5
    6. 6. What I Actually Do6
    7. 7. Think Like a Data Scientist7
    8. 8. Solving Problems vs. Finding Insights8
    9. 9. Parallelize Everything9
    10. 10. Abundance vs. Scarcity10
    11. 11. Building Data Products11
    12. 12. Create a Data Science Team12
    13. 13. Choose Good Problems13
    14. 14. Design the Model14
    15. 15. Mind the Gap15
    16. 16. Amortize Costs16
    17. 17. Measure Everything17
    18. 18. Rinse and Repeat18
    19. 19. Work Like a Data Scientist19
    20. 20. Introduction to Data Science: Building Recommender Systems December 12-14, New York, NY
    21. 21. Thank you!Josh Wills, Director of Data Science, Cloudera @josh_wills