Data scientist enablement dse 400 week 3 roadmap


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data scientist enablement dse 400 week 3 roadmap

  1. 1. Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 3 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0
  2. 2. Agenda You can always find the latest version of this document at Recap Week 3 Overview Discussions on SONO Learning Path Activities and Practice Assignment Submission Looking ahead References Citation It is not in the stars to hold our destiny but in ourselves. - William Shakespeare
  3. 3. During weeks 1-2 we covered following areas Data Science and its Landscape Play with datasets in R-Studio Employ R packages Basic Statistical Concepts Visually describing the datasets Explored SONO and participated in Discussions ... Recap
  4. 4. Discussions: Big Data in 2014. Netflix 1 M Case Study. Optional Q&A. Learning plan: Read R for Machine Learning by Allison Chang Activities: Explore Amazon. Survey ML in your industry. Apply for Schmid Fellowship ... Assignment 3: Download Mushroom dataset from MIT OCW Prediction Dataset Import into your R-Studio environment and apply Apriori algorithm. DSE 400 - Week 3 at a glance
  5. 5. Discussion 1: Read Big Data In 2014: 6 Bold Predictions and share your thoughts on how impactful these predictions are going to be in your industry or the area of your focus. If you don’t have a preferred industry, focus on either on Healthcare or Education sectors. Discussion 2: Research on Netflix 1 M Prize - Belcor Solution. Discuss how Belcor solution benefited Netflix by improving Recommendations. Can this algorithm/technique be applied elsewhere? Share your thoughts. These discussions are required. If you already have access to SONO > DSE 400, you will be required to participate in these discussions. There will also be an Optional Q&A. Please do not create additional threads in weekly KCs. Social Engagement on SONO - Week 3
  6. 6. SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable Social Engagement, Collaboration as well as Knowledge Dissemination which are all important to an Open initiative like this. To facilitate easy navigation, here are some tweaks you could employ to reach the right destination. To enter a Knowledge Cell, login first then use the full url to enter right KC. For week 3 you would use this link Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on these urls. Once you are in a KC click on Threads link on left panel, to go to the current discussions. We certainly appreciate your patience during this transitory phase. SONO Tweaks
  7. 7. Recommended Learning Plan Read R for Machine Learning by Allison Chang (Sections 4.1 - 4.5, page 7) Look up and research recommended ML algorithms and associated R packages Also refer to the blog post Machine Learning for Beginners and presentation on Machine Learning With R by David Chiu <Optional> Watch Machine Learning: The Basics by Ron Bekkerman <Optional> Watch Introduction to R for Data Mining by Joseph Rickert A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. - Tom Mitchell, Machine Learning, 1997
  8. 8. <Practice> Visit Data Science Central. Examine “Visualization of the day” <Practice> Using publicly available resources, investigate what algorithmic techniques Amazon employs to recommend related products when you search for one. Do not employ private intellectual capital (iCap) <Practice> Survey Machine Learning Algorithmic Techniques your organization or industry employs. List top 10 of these with the use cases. Briefly discuss about the outcomes. Do not access or disclose any iCap. <Optional> Explore State of World Children 2014 in Numbers. Where do the poorest children live? What is being done to improve their lives? What are systemic problems that still need to be solved? Activities
  9. 9. Activities - contd ... <Optional> Check out The Eric and Wendy Schmidt 'Data Science for Social Good' Summer Fellowship. If interested, apply to this fellowship. <Optional> Eminent Economist and Nobel Laureate, Amartya Sen from Harvard University has a theory that effectively says, “poverty and famines are caused artificially by the inefficiency inherent in the economic system, not the result of natural forces.” Research on Prof. Sen’s methodologies and examine what data he employs to reach these rather remarkable conclusions. Need more? Reach out to our Research Fellow Ms. Rachel Fleming <> and ask for advanced activities, challenges and research topics.
  10. 10. Assignment 3 - Submission Required Download Mushroom dataset from MIT OCW Prediction Dataset page. Import this dataset into your R-Studio. Apply Apriori Algorithm to this dataset. You would require arules package to apply this algorithm. <Help On Demand> You may reach out to our Research Fellow Ms. Rachel Fleming <> if you have any difficulties with this assignment.
  11. 11. Submission Deadline Saturday, 11:59 PM your local time. Mail Assignment 3 to <> Notice the change in email address. Submit a single PDF document showing the screenshot/s of your R-Studio workspace and also the output from your Apriori Analysis. Use this naming convention: DSE 400 > Assignment 3 > Your Full Name for your document. No document links should be sent. Just one single PDF document. Please add DSE 400 > Assignment 3 in the subject line. Use only PDF format and kindly avoid other formats.
  12. 12. Week 4 Machine Learning - contd … Refer to R for Machine Learning by Allison Chang Week 5 Visualizations. Submit your research Data Visualization Tools - A Comparative Study Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc. Week 8 Ethics, Privacy and Building Data Products. DSE 400 - Weeks 4-8 ahead
  13. 13. References, Resources and Additional Reading [MIT OCW] R for Machine Learning by Allison Chung An Introduction to Machine Learning. Hilary Mason, O’Reilly Media Inc., 2011 Machine Learning, Tom Mitchell, Mc Graw-Hill Publishers, 1997 Advanced Machine Learning. Hilary Mason, O’Reilly Media Inc., 2012 Scaling Up Machine Learning. Bekkerman, Bilenko, and Langford, O’Reilly Publishers, 2011 [MIT OCW] Prediction: Machine Learning and Statistics Stanford University Machine Learning Video Collection Caltech Machine Learning Video Collection
  14. 14. Citation The dataset titled Mushroom (agaricus-lepiota) Data used here for Assignment 3, is drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf. Donor: Jeff Schlimmer Date: 27 April 1987. R for Machine Learning by Allison Chang is recommended by MIT Course Prediction: Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE 400 as per OCW guidelines. Content that appears as is on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document.
  15. 15. For More Information Week 3 discussions take place during this week on SONO DSE 400 Week 3 <Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming <> if you have any difficulties with the assignments. We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <> You can always find the latest version of this document at
  16. 16. Fun@Work
  17. 17. Richard Feynmann was awarded Nobel Prize for Physics in 1965 along with Sin-Itiro Tomonaga and Julian Schwinger, "for their fundamental work in quantum electrodynamics, with deep-ploughing consequences for the physics of elementary particles". Thank You