Training a New Generation of Data Scientists
 

Training a New Generation of Data Scientists

on

  • 1,748 views

Data scientists drive data as a platform to answer previously unimaginable questions. These multi-talented data professionals are in demand like never before because they identify or create some of ...

Data scientists drive data as a platform to answer previously unimaginable questions. These multi-talented data professionals are in demand like never before because they identify or create some of the most exciting and potentially profitable business opportunities across industries. However, a scarcity of existing external talent will require companies of all sizes to find, develop, and train their people with backgrounds in software engineering, statistics, or traditional business intelligence as the next generation of data scientists.

In this video, Cloudera's Senior Director of Data Science, Josh Wills, discusses what data scientists do, how they think about problems, the relationship between data science and Hadoop, and how Cloudera training can help you join this increasingly important profession. Following the video, Josh answers questions about machine learning, analytics platforms, applications of data science in different industries, and Cloudera's Introduction to Data Science course.

Statistics

Views

Total Views
1,748
Slideshare-icon Views on SlideShare
707
Embed Views
1,041

Actions

Likes
4
Downloads
57
Comments
0

4 Embeds 1,041

http://www.cloudera.com 702
http://cloudera.com 328
http://author01.mtv.cloudera.com 9
http://author01.core.cloudera.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Training a New Generation of Data Scientists Training a New Generation of Data Scientists Presentation Transcript

    • Training a New Generation ofData ScientistsJosh Wills | Senior Director of Data Science
    • About Me
    • What Do Data Scientists Do?
    • What I Think I Do
    • What Other People Think I Do
    • What I Actually Do
    • The Emergence of Data Science
    • Data Storage in 2001: Databases• Structured schemas• Intensive processing done where data is stored• Somewhat reliable• Expensive at scale
    • Data Storage in 2001: Filers • No schemas, stores any kind of file • No data processing capability • Reliable • Expensive at scale
    • And Then, This Happened
    • Data Economics, Return on Byte
    • Big Data Economics• No individual record is Value = f(Bytes) particularly valuable• Having every record is incredibly valuable • Web index • Recommendation systems • Sensor data • Market basket analysis • Online advertising
    • Enter Hadoop
    • The Hadoop Distributed File System• Based on the Google File System• Data stored in large files • Large block size: 64MB to 256MB per block • Blocks are replicated to multiple nodes in the cluster
    • Simple, Reliable, Distributed Processing: MapReduce• Map Stage • Embarrassingly parallel• Shuffle Stage: Large-scale distributed sort• Reduce Stage • Process all the values that have the same key in a single step• Process the data where it is stored• Write once and you’re done.
    • Thinking Like a Data Scientist
    • Solving Problems vs. Finding Insights
    • Parallelize Everything
    • Abundance vs. Scarcity
    • Building Data Products
    • Create a Data Science Team
    • Choose Good Problems
    • Design the Model
    • Mind the Gap
    • Amortize Costs
    • Measure Everything
    • Rinse and Repeat
    • Work Like a Data Scientist
    • Train Like a Data Scientist Introduction to Data Hive and Pig Science Training Hadoop Developer Training
    • Introduction to Data Science:Building Recommender Systems http://university.cloudera.com/
    • • Submit questions in the Q&A panel Register now for Cloudera training at http://university.cloudera.com• Watch on-demand video of this webinar at http://cloudera.com Use discount code DSvideo_10 to save 10% on new enrollments in Cloudera-• Follow Josh on Twitter @josh_wills delivered training classes until June 1• Follow Cloudera University @ClouderaU Use discount code 15off2 to save 15% on• Thank you for attending! enrollments in two or more Cloudera- delivered training classes until June 1