Data scientist enablement dse 400 - week 1


Published on

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data scientist enablement dse 400 - week 1

  1. 1. Content of this document is under Creative Commons BY-NC-SA Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 1 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others
  2. 2. Agenda You can always find the latest version of this document at Welcome Mission and Objectives DSE Roadmap DSE 400 at a glance Week 1 at a glance Discussions Learning Practice Assignments and Submission Looking ahead References Acknowledgement
  3. 3. Welcome Welcome to DSE 2014 Track. You are on one of most exciting programs to disseminate knowledge, diffuse advancements and also stimulate adoption of Data/Decision Sciences, Big Data Analytics and what we call EvidenceOriented Systems Engineering. The content and the courses are designed to be easy, engaging and engendering. Consequently, we also hope this program will also be most rewarding for you from intellectual, pragmatic and professional development perspectives.
  4. 4. Mission and Objectives Mission of our program is to provide free, open and worldclass enablement of Data Scientists and help advance the profession of Data Science and allied disciplines. We aim to prepare the participants with analytical and practical skills emphasizing breadth and depth in a range of relevant disciplines and capabilities in Data/Decision Sciences, Big Data Analytics, Architecture and Systems Engineering.
  5. 5. Data Scientist Enablement Roadmap - 2014 Ramping up Machine Learning with R Advanced Techniques in Big Data Analytics Fast track to Data Science Modern Data Platforms “”“A Data Scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human.” - Rachel Schutt and Cathy O’Neil, Doing Data Science
  6. 6. DSE 2014 with tentative timeline Mar 30 - May 10 July 20 - Aug 30 Ramping up Machine Learning with R Fast track to Data Science Jan 19 - Mar 15 Modern Data Platforms May 25 - July 5 Advanced Techniques in Big Data Analytics
  7. 7. DSE 400 at a glance Introductory course with NO pre requisites. It employs socialized learning paradigm involving individual effort, team work, discussions and collaboration on SONO (Social Knowledge) platform. Topics include Algorithms, Statistical Inference, Data Analysis, Hadoop, R, Data Engineering, Machine Learning, Visualization, Applications, Case Studies, employing a variety of tools and techniques.
  8. 8. DSE 400 - Week 1 at a glance Discussions(on SONO): Welcome, Introductions, Programming and Analytics background etc. Reading plan: Read Chapters 1-3 from An Introduction to Data Science by Jeffrey Stanton and Big Data [sorry] & Data Science: What Does a Data Scientist Do? Activities: Installing R and R-Studio; Fun with Math; Playing with ML Datasets, Research on Data Visualization tools etc. Assignment 1: Download Housing dataset from UCI Machine Learning Repository to your local machine or cloud drive. Import this dataset into your R environment and display this dataset.
  9. 9. Social Engagement on SONO Login to SONO Community. Visit our Jump Pad (or Knowledge Domain) called DSE 400. Go to DSE 2014 Global then join right participant group based on first letter of your last name. Also feel free to explore other Knowledge-rich communities on SONO. user_knocell=992
  10. 10. Social Engagement on SONO - Week 1 Discussion 1: Welcome to DSE program. Discussion 2: What programming languages are you familiar with? What languages do you use on day to day basis? Do you have any experience using R Language? What kind of Analytics tools if any, you have used before. <Optional> Discussion 3: Q&A. General questions as well as questions specific to week1 are welcome. To participate in these discussions visit DSE 400 Week 1 at
  11. 11. Week 1 Reading Plan DSE 400 is designed be a broad introduction to Data Science, Analytics Architecture and Visualization from both learning as well as pragmatic perspectives. Following plan is recommend for Week 1 to kickstart the program. Read Chapters 1-3 from An Introduction to Data Science by Jeffrey Stanton. Read Big Data [sorry] & Data Science: What Does a Data Scientist Do?
  12. 12. Activities <Required> Visit Follow the instructions to download and install R and R-Studio. For specific advice on your system and its configuration, several how-to videos on Installing R and R-Studio can be found on Youtube. Skip this activity if you already have R and R-Studio. <Collaborative Research> <Required> Create a presentation on Data Visualization Tools - A Comparative Study . Incorporate your unique ideas, research and collective insights to arrive at the right evaluation methodology, explain your thought-process and justify your choices. Note: You will build this presentation for 4 weeks. You and your team will present it during 5th week
  13. 13. Activities - contd <Practice> Math is Fun. Create a bar chart quickly with 10 random values using Data Graphs widget at Math is Fun website. Change graph to Pie Chart. Display percentages only, not the original values. <Practice> Visit UCI Machine Learning Repository. Familiarize yourself with various datasets at this site. Feel free to download any dataset you like. We will be using this repository in DSE program extensively. For week 1 our focus is on just “Housing” dataset.
  14. 14. Assignment 1 - Submission Required Download R-Studio, in case you have not already done so. Download Housing dataset from UCI Machine Learning Repository to your local machine or cloud drive. Import this dataset into your R environment and display this dataset. Show the screenshot of your environment. (See the sample image in the next slide.)
  15. 15. Assignment 1 - Example screenshot
  16. 16. Submissions Deadline Saturday Jan 25, 11:59 PM your local time. Submit <mail to> the screenshots of your R workspace (on your machine/laptop/desktop) showing the Housing dataset. You can either paste the image into the body of email or create a document in PDF format and send it as an attachment. No links please.
  17. 17. Fun@Work DSE Participant Distribution Pattern
  18. 18. Fun@Work Tagcloud of professional backgrounds of DSE Participants
  19. 19. DSE 400 - Weeks 2-8 ahead Week 2 Basic Statistics, Hypothesis Testing, Regression, Playing with Spreadsheets,Visualization with R. If you are new to Statistics or need a refresher, read ahead Think Stats: Probability and Statistics for Programmers or watch Statistics Playlist by Khan Academy Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes, Recommendations and Boosting algorithms Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc. Week 8 Ethics, Privacy and Building Data Products.
  20. 20. References and Additional Reading An Introduction to Data Science by Jeffrey Stanton. This is a good introduction to Data Science for non-technical readers. This book is available under Creative Commons Licence. Learning R - Video Tutorial Lessons on Youtube R for Machine Learning by Allison Chung The Value of Big Data Isn't the Data HBR Article [MIT OCW] Prediction, Machine Learning and Statistics
  21. 21. Citation Housing Data Set Information: Concerns housing values in suburbs of Boston. Origin: This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. Creator : Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Content that appears as is on this document only, is under Creative Commons BY-NC-SA This license may not apply to material referenced here.
  22. 22. For More Information DSE 2014 stream is all set set to commence on Jan 19, 2004 For more details, visit DSE 400 Announcement Page Visit DSE 2014 Global to participate in DSE and to get to know the DSE Core Team and participants. Week 1 discussions can found at DSE 400 Week 1 We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <> You can always find the latest version of this document at
  23. 23. Acknowledgement We thank our community of committed and passionate volunteers, experts, educators, innovators, benefactors, advisers, advocates, mentors and supporters We are also grateful to the outstanding support and encouragement from SONO team as well as other organizations like R-Project, Open Courseware Consortium, MIT, IBM, HortonWorks, Stanford University, Caltech and Data Science Central etc.
  24. 24. Thank You