Data Scientist Enablement
DSE 400 - Fast Track to Data Science
Week 3 Roadmap
Advanced Center of Excellence
Modern Renaissance Corporation
In Collaboration with SONO team and others
Content of this document is under Creative Commons Licence CC BY 4.0
You can always find the latest version of this document at http://bit.ly/1dILgbT
Week 3 Overview
Discussions on SONO
Activities and Practice
Citation It is not in the stars to hold our destiny but in
ourselves. - William Shakespeare
During weeks 1-2 we covered following areas
Data Science and its Landscape
Play with datasets in R-Studio
Employ R packages
Basic Statistical Concepts
Visually describing the datasets
Explored SONO and participated in Discussions
Big Data in 2014. Netflix 1 M Case Study. Optional Q&A.
Read R for Machine Learning by Allison Chang
Explore Amazon. Survey ML in your industry. Apply for Schmid Fellowship ...
Download Mushroom dataset from MIT OCW Prediction Dataset Import into your R-Studio
environment and apply Apriori algorithm.
DSE 400 - Week 3 at a glance
Discussion 1: Read Big Data In 2014: 6 Bold Predictions and share your thoughts on
how impactful these predictions are going to be in your industry or the area of your
focus. If you don’t have a preferred industry, focus on either on Healthcare or Education
Discussion 2: Research on Netflix 1 M Prize - Belcor Solution. Discuss how Belcor
solution benefited Netflix by improving Recommendations. Can this algorithm/technique
be applied elsewhere? Share your thoughts.
These discussions are required. If you already have access to SONO > DSE 400, you
will be required to participate in these discussions. There will also be an Optional Q&A.
Please do not create additional threads in weekly KCs.
Social Engagement on SONO - Week 3
SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable
Social Engagement, Collaboration as well as Knowledge Dissemination which are all
important to an Open initiative like this.
To facilitate easy navigation, here are some tweaks you could employ to reach the right
destination. To enter a Knowledge Cell, login first then use the full url to enter right KC. For
week 3 you would use this link http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1003
Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on
these urls. Once you are in a KC click on Threads link on left panel, to go to the current
discussions. We certainly appreciate your patience during this transitory phase.
Recommended Learning Plan
Read R for Machine Learning by Allison Chang (Sections 4.1 - 4.5, page 7)
Look up and research recommended ML algorithms and associated R packages
Also refer to the blog post Machine Learning for Beginners
and presentation on Machine Learning With R by David Chiu
<Optional> Watch Machine Learning: The Basics by Ron Bekkerman
<Optional> Watch Introduction to R for Data Mining by Joseph Rickert
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.
- Tom Mitchell, Machine Learning, 1997
<Practice> Visit Data Science Central. Examine “Visualization of the day”
<Practice> Using publicly available resources, investigate what algorithmic
techniques Amazon employs to recommend related products when you search
for one. Do not employ private intellectual capital (iCap)
<Practice> Survey Machine Learning Algorithmic Techniques your organization
or industry employs. List top 10 of these with the use cases. Briefly discuss
about the outcomes. Do not access or disclose any iCap.
<Optional> Explore State of World Children 2014 in Numbers. Where do the
poorest children live? What is being done to improve their lives? What are
systemic problems that still need to be solved?
Activities - contd ...
<Optional> Check out The Eric and Wendy Schmidt 'Data Science for Social
Good' Summer Fellowship. If interested, apply to this fellowship.
<Optional> Eminent Economist and Nobel Laureate, Amartya Sen from Harvard
University has a theory that effectively says, “poverty and famines are caused
artificially by the inefficiency inherent in the economic system, not the result of
natural forces.” Research on Prof. Sen’s methodologies and examine what data
he employs to reach these rather remarkable conclusions.
Need more? Reach out to our Research Fellow Ms. Rachel Fleming
< Rachel@emodern.biz> and ask for advanced activities, challenges and
Assignment 3 - Submission Required
Download Mushroom dataset from MIT OCW Prediction Dataset page. Import
this dataset into your R-Studio. Apply Apriori Algorithm to this dataset. You
would require arules package to apply this algorithm.
<Help On Demand> You may reach out to our Research Fellow Ms. Rachel
Fleming <firstname.lastname@example.org> if you have any difficulties with this assignment.
Deadline Saturday, 11:59 PM your local time.
Mail Assignment 3 to <email@example.com> Notice the change in
email address. Submit a single PDF document showing the screenshot/s of your
R-Studio workspace and also the output from your Apriori Analysis. Use this
naming convention: DSE 400 > Assignment 3 > Your Full Name for your
document. No document links should be sent. Just one single PDF document.
Please add DSE 400 > Assignment 3 in the subject line. Use only PDF format
and kindly avoid other formats.
Week 4 Machine Learning - contd … Refer to R for Machine Learning by Allison Chang
Week 5 Visualizations. Submit your research Data Visualization Tools - A Comparative Study
Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.
Week 8 Ethics, Privacy and Building Data Products.
DSE 400 - Weeks 4-8 ahead
References, Resources and Additional Reading
[MIT OCW] R for Machine Learning by Allison Chung
An Introduction to Machine Learning. Hilary Mason, O’Reilly Media Inc., 2011
Machine Learning, Tom Mitchell, Mc Graw-Hill Publishers, 1997
Advanced Machine Learning. Hilary Mason, O’Reilly Media Inc., 2012
Scaling Up Machine Learning. Bekkerman, Bilenko, and Langford, O’Reilly Publishers, 2011
[MIT OCW] Prediction: Machine Learning and Statistics
Stanford University Machine Learning Video Collection
Caltech Machine Learning Video Collection
The dataset titled Mushroom (agaricus-lepiota) Data used here for Assignment 3, is drawn
from The Audubon Society Field Guide to North American Mushrooms (1981). G. H.
Lincoff (Pres.), New York: Alfred A. Knopf.
Donor: Jeff Schlimmer Jeffrey.Schlimmer@a.gp.cs.cmu.edu. Date: 27 April 1987.
R for Machine Learning by Allison Chang is recommended by MIT Course Prediction:
Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE
400 as per OCW guidelines.
Content that appears as is on this document only, is under Creative Commons License CC
BY 4.0 This license may not necessarily apply to other material referenced here in this
For More Information
Week 3 discussions take place during this week on SONO DSE 400 Week 3
<Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming
<firstname.lastname@example.org> if you have any difficulties with the assignments.
We welcome questions, thoughts and suggestions. Post these on SONO in the right
forum/discussion or write to us at <email@example.com>
You can always find the latest version of this document at
Richard Feynmann was awarded Nobel Prize
for Physics in 1965 along with Sin-Itiro
Tomonaga and Julian Schwinger,
"for their fundamental work in quantum
electrodynamics, with deep-ploughing
consequences for the physics of elementary