Data Scientist Enablement
DSE 400 - Fast Track to Data Science
Week 3 Roadmap
Advanced Center of Excellence
Modern Renaissance Corporation
In Collaboration with SONO team and others
Content of this document is under Creative Commons Licence CC BY 4.0
Agenda
You can always find the latest version of this document at http://bit.ly/1dILgbT
Recap
Week 3 Overview
Discussions on SONO
Learning Path
Activities and Practice
Assignment
Submission
Looking ahead
References
Citation It is not in the stars to hold our destiny but in
ourselves. - William Shakespeare
During weeks 1-2 we covered following areas
Data Science and its Landscape
Play with datasets in R-Studio
Employ R packages
Basic Statistical Concepts
Visually describing the datasets
Explored SONO and participated in Discussions
...
Recap
Discussions:
Big Data in 2014. Netflix 1 M Case Study. Optional Q&A.
Learning plan:
Read R for Machine Learning by Allison Chang
Activities:
Explore Amazon. Survey ML in your industry. Apply for Schmid Fellowship ...
Assignment 3:
Download Mushroom dataset from MIT OCW Prediction Dataset Import into your R-Studio
environment and apply Apriori algorithm.
DSE 400 - Week 3 at a glance
Discussion 1: Read Big Data In 2014: 6 Bold Predictions and share your thoughts on
how impactful these predictions are going to be in your industry or the area of your
focus. If you don’t have a preferred industry, focus on either on Healthcare or Education
sectors.
Discussion 2: Research on Netflix 1 M Prize - Belcor Solution. Discuss how Belcor
solution benefited Netflix by improving Recommendations. Can this algorithm/technique
be applied elsewhere? Share your thoughts.
These discussions are required. If you already have access to SONO > DSE 400, you
will be required to participate in these discussions. There will also be an Optional Q&A.
Please do not create additional threads in weekly KCs.
Social Engagement on SONO - Week 3
http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1003
SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable
Social Engagement, Collaboration as well as Knowledge Dissemination which are all
important to an Open initiative like this.
To facilitate easy navigation, here are some tweaks you could employ to reach the right
destination. To enter a Knowledge Cell, login first then use the full url to enter right KC. For
week 3 you would use this link http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1003
Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on
these urls. Once you are in a KC click on Threads link on left panel, to go to the current
discussions. We certainly appreciate your patience during this transitory phase.
SONO Tweaks
Recommended Learning Plan
Read R for Machine Learning by Allison Chang (Sections 4.1 - 4.5, page 7)
Look up and research recommended ML algorithms and associated R packages
Also refer to the blog post Machine Learning for Beginners
and presentation on Machine Learning With R by David Chiu
<Optional> Watch Machine Learning: The Basics by Ron Bekkerman
<Optional> Watch Introduction to R for Data Mining by Joseph Rickert
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.
- Tom Mitchell, Machine Learning, 1997
<Practice> Visit Data Science Central. Examine “Visualization of the day”
<Practice> Using publicly available resources, investigate what algorithmic
techniques Amazon employs to recommend related products when you search
for one. Do not employ private intellectual capital (iCap)
<Practice> Survey Machine Learning Algorithmic Techniques your organization
or industry employs. List top 10 of these with the use cases. Briefly discuss
about the outcomes. Do not access or disclose any iCap.
<Optional> Explore State of World Children 2014 in Numbers. Where do the
poorest children live? What is being done to improve their lives? What are
systemic problems that still need to be solved?
Activities
Activities - contd ...
<Optional> Check out The Eric and Wendy Schmidt 'Data Science for Social
Good' Summer Fellowship. If interested, apply to this fellowship.
<Optional> Eminent Economist and Nobel Laureate, Amartya Sen from Harvard
University has a theory that effectively says, “poverty and famines are caused
artificially by the inefficiency inherent in the economic system, not the result of
natural forces.” Research on Prof. Sen’s methodologies and examine what data
he employs to reach these rather remarkable conclusions.
Need more? Reach out to our Research Fellow Ms. Rachel Fleming
< Rachel@emodern.biz> and ask for advanced activities, challenges and
research topics.
Assignment 3 - Submission Required
Download Mushroom dataset from MIT OCW Prediction Dataset page. Import
this dataset into your R-Studio. Apply Apriori Algorithm to this dataset. You
would require arules package to apply this algorithm.
<Help On Demand> You may reach out to our Research Fellow Ms. Rachel
Fleming <rachel@emodern.biz> if you have any difficulties with this assignment.
Submission
Deadline Saturday, 11:59 PM your local time.
Mail Assignment 3 to <dse400.datascience@gmail.com> Notice the change in
email address. Submit a single PDF document showing the screenshot/s of your
R-Studio workspace and also the output from your Apriori Analysis. Use this
naming convention: DSE 400 > Assignment 3 > Your Full Name for your
document. No document links should be sent. Just one single PDF document.
Please add DSE 400 > Assignment 3 in the subject line. Use only PDF format
and kindly avoid other formats.
Week 4 Machine Learning - contd … Refer to R for Machine Learning by Allison Chang
Week 5 Visualizations. Submit your research Data Visualization Tools - A Comparative Study
Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.
Week 8 Ethics, Privacy and Building Data Products.
DSE 400 - Weeks 4-8 ahead
References, Resources and Additional Reading
[MIT OCW] R for Machine Learning by Allison Chung
An Introduction to Machine Learning. Hilary Mason, O’Reilly Media Inc., 2011
Machine Learning, Tom Mitchell, Mc Graw-Hill Publishers, 1997
Advanced Machine Learning. Hilary Mason, O’Reilly Media Inc., 2012
Scaling Up Machine Learning. Bekkerman, Bilenko, and Langford, O’Reilly Publishers, 2011
[MIT OCW] Prediction: Machine Learning and Statistics
Stanford University Machine Learning Video Collection
Caltech Machine Learning Video Collection
Citation
The dataset titled Mushroom (agaricus-lepiota) Data used here for Assignment 3, is drawn
from The Audubon Society Field Guide to North American Mushrooms (1981). G. H.
Lincoff (Pres.), New York: Alfred A. Knopf.
Donor: Jeff Schlimmer Jeffrey.Schlimmer@a.gp.cs.cmu.edu. Date: 27 April 1987.
R for Machine Learning by Allison Chang is recommended by MIT Course Prediction:
Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE
400 as per OCW guidelines.
Content that appears as is on this document only, is under Creative Commons License CC
BY 4.0 This license may not necessarily apply to other material referenced here in this
document.
For More Information
Week 3 discussions take place during this week on SONO DSE 400 Week 3
<Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming
<rachel@emodern.biz> if you have any difficulties with the assignments.
We welcome questions, thoughts and suggestions. Post these on SONO in the right
forum/discussion or write to us at <dse400.datascience@gmail.com>
You can always find the latest version of this document at
http://bit.ly/1dILgbT
Fun@Work
Richard Feynmann was awarded Nobel Prize
for Physics in 1965 along with Sin-Itiro
Tomonaga and Julian Schwinger,
"for their fundamental work in quantum
electrodynamics, with deep-ploughing
consequences for the physics of elementary
particles".
Thank You

Data scientist enablement dse 400 week 3 roadmap

  • 1.
    Data Scientist Enablement DSE400 - Fast Track to Data Science Week 3 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0
  • 2.
    Agenda You can alwaysfind the latest version of this document at http://bit.ly/1dILgbT Recap Week 3 Overview Discussions on SONO Learning Path Activities and Practice Assignment Submission Looking ahead References Citation It is not in the stars to hold our destiny but in ourselves. - William Shakespeare
  • 3.
    During weeks 1-2we covered following areas Data Science and its Landscape Play with datasets in R-Studio Employ R packages Basic Statistical Concepts Visually describing the datasets Explored SONO and participated in Discussions ... Recap
  • 4.
    Discussions: Big Data in2014. Netflix 1 M Case Study. Optional Q&A. Learning plan: Read R for Machine Learning by Allison Chang Activities: Explore Amazon. Survey ML in your industry. Apply for Schmid Fellowship ... Assignment 3: Download Mushroom dataset from MIT OCW Prediction Dataset Import into your R-Studio environment and apply Apriori algorithm. DSE 400 - Week 3 at a glance
  • 5.
    Discussion 1: ReadBig Data In 2014: 6 Bold Predictions and share your thoughts on how impactful these predictions are going to be in your industry or the area of your focus. If you don’t have a preferred industry, focus on either on Healthcare or Education sectors. Discussion 2: Research on Netflix 1 M Prize - Belcor Solution. Discuss how Belcor solution benefited Netflix by improving Recommendations. Can this algorithm/technique be applied elsewhere? Share your thoughts. These discussions are required. If you already have access to SONO > DSE 400, you will be required to participate in these discussions. There will also be an Optional Q&A. Please do not create additional threads in weekly KCs. Social Engagement on SONO - Week 3 http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1003
  • 6.
    SONO or SOKNO(Social Knowledge platform) is chosen for the DSE program to enable Social Engagement, Collaboration as well as Knowledge Dissemination which are all important to an Open initiative like this. To facilitate easy navigation, here are some tweaks you could employ to reach the right destination. To enter a Knowledge Cell, login first then use the full url to enter right KC. For week 3 you would use this link http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1003 Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on these urls. Once you are in a KC click on Threads link on left panel, to go to the current discussions. We certainly appreciate your patience during this transitory phase. SONO Tweaks
  • 7.
    Recommended Learning Plan ReadR for Machine Learning by Allison Chang (Sections 4.1 - 4.5, page 7) Look up and research recommended ML algorithms and associated R packages Also refer to the blog post Machine Learning for Beginners and presentation on Machine Learning With R by David Chiu <Optional> Watch Machine Learning: The Basics by Ron Bekkerman <Optional> Watch Introduction to R for Data Mining by Joseph Rickert A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. - Tom Mitchell, Machine Learning, 1997
  • 8.
    <Practice> Visit DataScience Central. Examine “Visualization of the day” <Practice> Using publicly available resources, investigate what algorithmic techniques Amazon employs to recommend related products when you search for one. Do not employ private intellectual capital (iCap) <Practice> Survey Machine Learning Algorithmic Techniques your organization or industry employs. List top 10 of these with the use cases. Briefly discuss about the outcomes. Do not access or disclose any iCap. <Optional> Explore State of World Children 2014 in Numbers. Where do the poorest children live? What is being done to improve their lives? What are systemic problems that still need to be solved? Activities
  • 9.
    Activities - contd... <Optional> Check out The Eric and Wendy Schmidt 'Data Science for Social Good' Summer Fellowship. If interested, apply to this fellowship. <Optional> Eminent Economist and Nobel Laureate, Amartya Sen from Harvard University has a theory that effectively says, “poverty and famines are caused artificially by the inefficiency inherent in the economic system, not the result of natural forces.” Research on Prof. Sen’s methodologies and examine what data he employs to reach these rather remarkable conclusions. Need more? Reach out to our Research Fellow Ms. Rachel Fleming < Rachel@emodern.biz> and ask for advanced activities, challenges and research topics.
  • 10.
    Assignment 3 -Submission Required Download Mushroom dataset from MIT OCW Prediction Dataset page. Import this dataset into your R-Studio. Apply Apriori Algorithm to this dataset. You would require arules package to apply this algorithm. <Help On Demand> You may reach out to our Research Fellow Ms. Rachel Fleming <rachel@emodern.biz> if you have any difficulties with this assignment.
  • 11.
    Submission Deadline Saturday, 11:59PM your local time. Mail Assignment 3 to <dse400.datascience@gmail.com> Notice the change in email address. Submit a single PDF document showing the screenshot/s of your R-Studio workspace and also the output from your Apriori Analysis. Use this naming convention: DSE 400 > Assignment 3 > Your Full Name for your document. No document links should be sent. Just one single PDF document. Please add DSE 400 > Assignment 3 in the subject line. Use only PDF format and kindly avoid other formats.
  • 12.
    Week 4 MachineLearning - contd … Refer to R for Machine Learning by Allison Chang Week 5 Visualizations. Submit your research Data Visualization Tools - A Comparative Study Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc. Week 8 Ethics, Privacy and Building Data Products. DSE 400 - Weeks 4-8 ahead
  • 13.
    References, Resources andAdditional Reading [MIT OCW] R for Machine Learning by Allison Chung An Introduction to Machine Learning. Hilary Mason, O’Reilly Media Inc., 2011 Machine Learning, Tom Mitchell, Mc Graw-Hill Publishers, 1997 Advanced Machine Learning. Hilary Mason, O’Reilly Media Inc., 2012 Scaling Up Machine Learning. Bekkerman, Bilenko, and Langford, O’Reilly Publishers, 2011 [MIT OCW] Prediction: Machine Learning and Statistics Stanford University Machine Learning Video Collection Caltech Machine Learning Video Collection
  • 14.
    Citation The dataset titledMushroom (agaricus-lepiota) Data used here for Assignment 3, is drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf. Donor: Jeff Schlimmer Jeffrey.Schlimmer@a.gp.cs.cmu.edu. Date: 27 April 1987. R for Machine Learning by Allison Chang is recommended by MIT Course Prediction: Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE 400 as per OCW guidelines. Content that appears as is on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document.
  • 15.
    For More Information Week3 discussions take place during this week on SONO DSE 400 Week 3 <Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming <rachel@emodern.biz> if you have any difficulties with the assignments. We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <dse400.datascience@gmail.com> You can always find the latest version of this document at http://bit.ly/1dILgbT
  • 16.
  • 17.
    Richard Feynmann wasawarded Nobel Prize for Physics in 1965 along with Sin-Itiro Tomonaga and Julian Schwinger, "for their fundamental work in quantum electrodynamics, with deep-ploughing consequences for the physics of elementary particles". Thank You