SlideShare a Scribd company logo
Data Scientist Enablement
DSE 400 - Fast Track to Data Science
Week 2 Roadmap
Advanced Center of Excellence
Modern Renaissance Corporation
In Collaboration with SONO team and others
Content of this document is under Creative Commons Licence CC BY 4.0
Agenda
You can always find the latest version of this document at http://bit.ly/1dVHJwO
Week 1 Recap
Week 2 At a Glance
Discussions
Required Reading
Practice
Assignments and Submission
Looking ahead
References
Citation
Acknowledgement A strong will, a settled purpose, and an
invincible determination can accomplish almost
anything. - Thomas Fuller
During week 1 you were able to
Understand Data Science is and articulate what Data
Scientists do on day-to-day basis
Installed R and R-Studio
Explored UCI Machine Learning Repository
Import Housing Dataset into R
Explored SONO and participated in Discussions
DSE 400 - Week 1 Recap
Discussions:
Fuss about Big Data. Statistical sampling etc. Optional Q&A
Reading plan:
Read Chapters 4-7 from An Introduction to Data Science
R for Machine Learning by Allison Chung
Activities:
Play with spreadsheets, continue research on Data Viz. tools, connect with local groups etc.
Assignment 2:
Download Haberman dataset from UCI Machine Learning Repository into your R-Studio
environment and visually describe this dataset.
DSE 400 - Week 2 at a glance
Discussion 1: What’s all this fuss about Big Data? How would you go beyond
talking about 3 or 4 Vs of Big Data? Volume, Variety, Velocity, and Veracity
(by the way veracity means trustworthiness of this data). How about Value?
Do the people talk about it in the context of Big Data? Share your thoughts.
Discussion 2: “Statistics is defined as the discipline of using data samples to
support claims about populations.” Comments?
These discussions are required. These will be posted sequentially. If you have
access to SONO you are encouraged to participate in these discussions.
There will also be an Optional Q&A
For the sake of simplicity and ease of navigation, please do not create additional threads.
Social Engagement on SONO - Week 2
http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002
SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable
Social Engagement, Collaboration as well as Knowledge Dissemination which are all
important to an Open initiative like this.
We understand that many of you may be initially having navigational issues. To ease things,
here are some tweaks SONO team and the DSE community are developing, as we speak.
To enter a Knowledge Cell, login first then use the full url to enter right KC. For week 2 you
would use the following link
http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002
Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on
these urls. Once you are in a KC click on threads to go to the current discussions. We
certainly appreciate your patience during this transitory phase.
SONO Tweaks
Read Chapters 4-7 from An Introduction to Data Science
Read R for Machine Learning by Allison Chung (Sections 1 to 3.4, pages 1-5 )
<Optional> Introduction to Probability and Statistics Using R (Chapters 1-3)
<Optional> Read Chapter 2-7 from Think Stats: Probability and Statistics for
Programmers
If you are unfamiliar with basic Statistical concepts or if you need a quick refresher
on this topic, please refer to Statistics Playlist by Khan Academy
Week 2 - Recommended Reading Plan
Data Scientists (n.): Person who is better at statistics than any software engineer and
better at software engineering than any statistician. - Josh Wills
<Practice> Given the following dataset, find manually the mean, median, mode, variance and standard
deviation for this population.
{ 3, 15, 17, 18, 20, 20, 12, 20, 20, 16, 17, 12, 4, 7, 15, 20, 12, 6, 1, 20 }
Also try using a spreadsheet (such as Excel or Google Spreadsheet) to find the above measures
for the same dataset.
<Practice> Math is Fun. Learn what Relative Frequency Distribution is. Try the example at the bottom
of the this page.
<Community Outreach> <Optional> Explore and connect with your local R Group (or Data
Scientist/Big Data groups) and check out their projects, talks and seminars that might interest you. Also
discuss with them how you can engage with them and help them out in their endeavors.
Activities
<Optional> If you are not fully happy with the statistical functionality of your familiar
spreadsheet package, download PSPP free statistical analysis tool from SourceForge and
play with it.
<Optional> <Advanced> Import Housing Data into R-Studio and describe it statistically.
You may need packages like pastecs which let you use stat.desc function.
<Optional> Register for Big Data in Motion This is a free online webinar scheduled for Jan
30, 2014, 1 PM EST. Attendance is optional but recommended.
Need more? Reach out to our Research Scholar Ms. Rachel Fleming
< Rachel@emodern.biz> and ask for more activities and challenges.
Activities - contd ...
Assignment 2 - Submission Required
Assignment: Download Haberman Survival dataset from UCI Machine Learning Repository. Import this
dataset into your R-Studio. Generate three graphic representations: Histogram, Scatter Plot and Box
Plot , as depicted above. Refer to R for Machine Learning by Allison Chung before you attempt this
assignment.
Image credit: R for Machine Learning by Allison Chung
<Help On Demand> You may
reach out to our Research
Scholar Ms. Rachel Fleming
<rachel@emodern.biz>
if you have any difficulties
with this assignment.
Submissions
Deadline Saturday, 11:59 PM your local time.
Mail Assignment 2 to <datascience400@gmail.com>
Submit a PDF document of the screenshots of your R-
Studio workspace showing the three visualizations
discussed. Use the naming convention: DSE 400 >
Assignment 2 > Your Full Name for your document.
No document links should be sent. Please add DSE 400
> Assignment 2 in the subject line.
Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes,
Recommendations and Boosting algorithms . Refer to R for Machine Learning by Allison Chung
Watch Caltech Machine Learning Videos on Youtube
Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study
Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.
Week 8 Ethics, Privacy and Building Data Products.
DSE 400 - Weeks 3-8 ahead
References, Resources and Additional Reading
An Introduction to Data Science
Think Stats: Probability and Statistics for Programmers
Statistics Playlist by Khan Academy
R for Machine Learning by Allison Chung
Introduction to Hypothesis Testing
Single Sample Hypothesis Testing Part 1 and Part 2
R for Beginners by Emmanuel Paradis
R - Reference Cards
Introduction to R Playlist (Video Collection) on Youtube
Caltech Machine Learning Playlist on Youtube
[MIT OCW] Prediction: Machine Learning and Statistics from MIT Sloan School of
Management,
Citation
The dataset titled Haberman's Survival Data used here for Assignment 2 comes from UCI
Machine Learning Repository
Donor for Haberman's Survival Data: Tjen-Sien Lim (limt@stat.wisc.edu). It was added UCI
Machine Learning Repository on March 4, 1999
R for Machine Learning by Allison Chung is recommended by MIT Course Prediction:
Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE
400 as per OCW guidelines.
Content that appears as is on this document only, is under Creative Commons License CC
BY 4.0 This license may not necessarily apply to other material referenced here in this
document.
For More Information
Presentation deck for DSE 400 > Week 1 Roadmap can be found at http://bit.ly/1hC5wAV
Week 2 discussions take place during this week on SONO DSE 400 Week 2
<Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming
<rachel@emodern.biz> if you have any difficulties with the assignments.
We welcome questions, thoughts and suggestions. Post these on SONO in the right
forum/discussion or write to us at <datascience400@gmail.com>
You can always find the latest version of this document at http://bit.ly/1dVHJwO
Fun@Work
Geographic distribution of clicks for Week 1 Roadmap
Fun@Work
Open Source Humor
We thank our community of committed and passionate volunteers, experts,
educators, innovators, benefactors, advisers, advocates, mentors and
supporters
We are also grateful to the outstanding support and encouragement from
SONO team as well as other organizations like MIT Sloan of Management,
IBM, HortonWorks, R-Project, Creative Commons, Open Courseware
Consortium, Stanford University, Caltech, O’Reilly Publications and Data
Science Central etc.
Acknowledgement
Thank You

More Related Content

What's hot

Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
Joshua S. White, PhD josh@securemind.org
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
Jeromy Anglim
 
Simplifying Social Network Diagrams
Simplifying Social Network Diagrams Simplifying Social Network Diagrams
Simplifying Social Network Diagrams
Lynn Cherny
 
Carma internet research module: Sampling for internet
Carma internet research module: Sampling for internetCarma internet research module: Sampling for internet
Carma internet research module: Sampling for internet
Syracuse University
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
Marc Smith
 
Internet-based research
Internet-based researchInternet-based research
Internet-based research
Vivian Tequillo
 
Murpha11
Murpha11Murpha11
Murpha11
Philip Bourne
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap Dr. Mohan K. Bavirisetty
 
Understanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS ProjectsUnderstanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS Projects
Betsey Merkel
 

What's hot (9)

Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
 
Simplifying Social Network Diagrams
Simplifying Social Network Diagrams Simplifying Social Network Diagrams
Simplifying Social Network Diagrams
 
Carma internet research module: Sampling for internet
Carma internet research module: Sampling for internetCarma internet research module: Sampling for internet
Carma internet research module: Sampling for internet
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
Internet-based research
Internet-based researchInternet-based research
Internet-based research
 
Murpha11
Murpha11Murpha11
Murpha11
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap
 
Understanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS ProjectsUnderstanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS Projects
 

Viewers also liked

Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...
Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...
Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...
Cognizant
 
2014 technology roadmap slides
2014 technology roadmap slides2014 technology roadmap slides
2014 technology roadmap slides
Trevor S. Mitchell, CAE
 
The life science_innovation_roadmap%5b1%5d
The life science_innovation_roadmap%5b1%5dThe life science_innovation_roadmap%5b1%5d
The life science_innovation_roadmap%5b1%5d
Arlen Meyers, MD, MBA
 
european open science cloud (EOSC). visions and impact on DARIAH roadmap
european open science cloud (EOSC). visions and impact on DARIAH roadmapeuropean open science cloud (EOSC). visions and impact on DARIAH roadmap
european open science cloud (EOSC). visions and impact on DARIAH roadmap
eveline wandl-vogt
 
Data scientist enablement dse 400 - week 1 roadmap
Data scientist enablement   dse 400 - week 1 roadmapData scientist enablement   dse 400 - week 1 roadmap
Data scientist enablement dse 400 - week 1 roadmapDr. Mohan K. Bavirisetty
 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring Technology
Dan McKinley
 
PP 71 Tahun 2010
PP 71 Tahun 2010PP 71 Tahun 2010
PP 71 Tahun 2010
Rizki Amalia
 
Technology Roadmap Google Checkout, Oct 2009
Technology Roadmap Google Checkout, Oct 2009Technology Roadmap Google Checkout, Oct 2009
Technology Roadmap Google Checkout, Oct 2009
downbeat
 
Build your Own Technology Roadmap!
Build your Own Technology Roadmap!Build your Own Technology Roadmap!
Build your Own Technology Roadmap!
Sascha Wenninger
 
Technology Roadmap
Technology RoadmapTechnology Roadmap
Technology Roadmaprobduvall
 
2016 05 technology roadmapping update for u mass (1)
2016 05 technology roadmapping update for u mass (1)2016 05 technology roadmapping update for u mass (1)
2016 05 technology roadmapping update for u mass (1)
Karen Ali
 
Strategic technology roadmap for space x
Strategic technology roadmap for space xStrategic technology roadmap for space x
Strategic technology roadmap for space xCarles Debart
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Design for Continuous Experimentation
Design for Continuous ExperimentationDesign for Continuous Experimentation
Design for Continuous Experimentation
Dan McKinley
 

Viewers also liked (14)

Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...
Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...
Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...
 
2014 technology roadmap slides
2014 technology roadmap slides2014 technology roadmap slides
2014 technology roadmap slides
 
The life science_innovation_roadmap%5b1%5d
The life science_innovation_roadmap%5b1%5dThe life science_innovation_roadmap%5b1%5d
The life science_innovation_roadmap%5b1%5d
 
european open science cloud (EOSC). visions and impact on DARIAH roadmap
european open science cloud (EOSC). visions and impact on DARIAH roadmapeuropean open science cloud (EOSC). visions and impact on DARIAH roadmap
european open science cloud (EOSC). visions and impact on DARIAH roadmap
 
Data scientist enablement dse 400 - week 1 roadmap
Data scientist enablement   dse 400 - week 1 roadmapData scientist enablement   dse 400 - week 1 roadmap
Data scientist enablement dse 400 - week 1 roadmap
 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring Technology
 
PP 71 Tahun 2010
PP 71 Tahun 2010PP 71 Tahun 2010
PP 71 Tahun 2010
 
Technology Roadmap Google Checkout, Oct 2009
Technology Roadmap Google Checkout, Oct 2009Technology Roadmap Google Checkout, Oct 2009
Technology Roadmap Google Checkout, Oct 2009
 
Build your Own Technology Roadmap!
Build your Own Technology Roadmap!Build your Own Technology Roadmap!
Build your Own Technology Roadmap!
 
Technology Roadmap
Technology RoadmapTechnology Roadmap
Technology Roadmap
 
2016 05 technology roadmapping update for u mass (1)
2016 05 technology roadmapping update for u mass (1)2016 05 technology roadmapping update for u mass (1)
2016 05 technology roadmapping update for u mass (1)
 
Strategic technology roadmap for space x
Strategic technology roadmap for space xStrategic technology roadmap for space x
Strategic technology roadmap for space x
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Design for Continuous Experimentation
Design for Continuous ExperimentationDesign for Continuous Experimentation
Design for Continuous Experimentation
 

Similar to Data scientist enablement dse 400 week 2 roadmap

Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 - week 1
Data scientist enablement   dse 400 - week 1Data scientist enablement   dse 400 - week 1
Data scientist enablement dse 400 - week 1Dr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 7 roadmap
Data scientist enablement   dse 400   week 7 roadmapData scientist enablement   dse 400   week 7 roadmap
Data scientist enablement dse 400 week 7 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap Dr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmapDr. Mohan K. Bavirisetty
 
Brand Niemann11272010
Brand Niemann11272010Brand Niemann11272010
Brand Niemann11272010Brand Niemann
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Qazi Maaz Arshad
 
Assignment 3 Presenting With PowerPointJane R. Doe .docx
Assignment 3 Presenting With PowerPointJane R. Doe           .docxAssignment 3 Presenting With PowerPointJane R. Doe           .docx
Assignment 3 Presenting With PowerPointJane R. Doe .docx
rock73
 
COM 106 help Making Decisions/Snaptutorial
COM 106 help Making Decisions/SnaptutorialCOM 106 help Making Decisions/Snaptutorial
COM 106 help Making Decisions/Snaptutorial
pinck2324
 
ITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docx
ITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docxITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docx
ITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docx
vrickens
 
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Dr. Mohan K. Bavirisetty
 
AdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docx
AdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docxAdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docx
AdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docx
galerussel59292
 
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdfEmpowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Chris selebio
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
Deep Kayal
 
Dr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docx
Dr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docxDr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docx
Dr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docx
kanepbyrne80830
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
MartineMccracken314
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
AbbyWhyte974
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
Marta Fajlhauer
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
Dinesh K
 

Similar to Data scientist enablement dse 400 week 2 roadmap (20)

Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmap
 
Data scientist enablement dse 400 - week 1
Data scientist enablement   dse 400 - week 1Data scientist enablement   dse 400 - week 1
Data scientist enablement dse 400 - week 1
 
Data scientist enablement dse 400 week 7 roadmap
Data scientist enablement   dse 400   week 7 roadmapData scientist enablement   dse 400   week 7 roadmap
Data scientist enablement dse 400 week 7 roadmap
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmap
 
Brand Niemann11272010
Brand Niemann11272010Brand Niemann11272010
Brand Niemann11272010
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
 
Assignment 3 Presenting With PowerPointJane R. Doe .docx
Assignment 3 Presenting With PowerPointJane R. Doe           .docxAssignment 3 Presenting With PowerPointJane R. Doe           .docx
Assignment 3 Presenting With PowerPointJane R. Doe .docx
 
COM 106 help Making Decisions/Snaptutorial
COM 106 help Making Decisions/SnaptutorialCOM 106 help Making Decisions/Snaptutorial
COM 106 help Making Decisions/Snaptutorial
 
ITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docx
ITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docxITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docx
ITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docx
 
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
 
AdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docx
AdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docxAdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docx
AdvanceStorage.zipyyy.docxMOVIE VIEWS SYSTEMProp.docx
 
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdfEmpowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Dr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docx
Dr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docxDr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docx
Dr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docx
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
 
OSCELOT
OSCELOTOSCELOT
OSCELOT
 

More from Dr. Mohan K. Bavirisetty

Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr. Mohan K. Bavirisetty
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Dr. Mohan K. Bavirisetty
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Dr. Mohan K. Bavirisetty
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr. Mohan K. Bavirisetty
 
Building Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 FinalBuilding Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 FinalDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 6 roadmap
Data scientist enablement   dse 400   week 6 roadmapData scientist enablement   dse 400   week 6 roadmap
Data scientist enablement dse 400 week 6 roadmapDr. Mohan K. Bavirisetty
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Dr. Mohan K. Bavirisetty
 
8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise Modernizartion8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise Modernizartion
Dr. Mohan K. Bavirisetty
 
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0Dr. Mohan K. Bavirisetty
 

More from Dr. Mohan K. Bavirisetty (10)

Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
 
Building Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 FinalBuilding Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 Final
 
Data scientist enablement dse 400 week 6 roadmap
Data scientist enablement   dse 400   week 6 roadmapData scientist enablement   dse 400   week 6 roadmap
Data scientist enablement dse 400 week 6 roadmap
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence
 
8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise Modernizartion8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise Modernizartion
 
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0
 

Data scientist enablement dse 400 week 2 roadmap

  • 1. Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 2 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0
  • 2. Agenda You can always find the latest version of this document at http://bit.ly/1dVHJwO Week 1 Recap Week 2 At a Glance Discussions Required Reading Practice Assignments and Submission Looking ahead References Citation Acknowledgement A strong will, a settled purpose, and an invincible determination can accomplish almost anything. - Thomas Fuller
  • 3. During week 1 you were able to Understand Data Science is and articulate what Data Scientists do on day-to-day basis Installed R and R-Studio Explored UCI Machine Learning Repository Import Housing Dataset into R Explored SONO and participated in Discussions DSE 400 - Week 1 Recap
  • 4. Discussions: Fuss about Big Data. Statistical sampling etc. Optional Q&A Reading plan: Read Chapters 4-7 from An Introduction to Data Science R for Machine Learning by Allison Chung Activities: Play with spreadsheets, continue research on Data Viz. tools, connect with local groups etc. Assignment 2: Download Haberman dataset from UCI Machine Learning Repository into your R-Studio environment and visually describe this dataset. DSE 400 - Week 2 at a glance
  • 5. Discussion 1: What’s all this fuss about Big Data? How would you go beyond talking about 3 or 4 Vs of Big Data? Volume, Variety, Velocity, and Veracity (by the way veracity means trustworthiness of this data). How about Value? Do the people talk about it in the context of Big Data? Share your thoughts. Discussion 2: “Statistics is defined as the discipline of using data samples to support claims about populations.” Comments? These discussions are required. These will be posted sequentially. If you have access to SONO you are encouraged to participate in these discussions. There will also be an Optional Q&A For the sake of simplicity and ease of navigation, please do not create additional threads. Social Engagement on SONO - Week 2 http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002
  • 6. SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable Social Engagement, Collaboration as well as Knowledge Dissemination which are all important to an Open initiative like this. We understand that many of you may be initially having navigational issues. To ease things, here are some tweaks SONO team and the DSE community are developing, as we speak. To enter a Knowledge Cell, login first then use the full url to enter right KC. For week 2 you would use the following link http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002 Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on these urls. Once you are in a KC click on threads to go to the current discussions. We certainly appreciate your patience during this transitory phase. SONO Tweaks
  • 7. Read Chapters 4-7 from An Introduction to Data Science Read R for Machine Learning by Allison Chung (Sections 1 to 3.4, pages 1-5 ) <Optional> Introduction to Probability and Statistics Using R (Chapters 1-3) <Optional> Read Chapter 2-7 from Think Stats: Probability and Statistics for Programmers If you are unfamiliar with basic Statistical concepts or if you need a quick refresher on this topic, please refer to Statistics Playlist by Khan Academy Week 2 - Recommended Reading Plan Data Scientists (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. - Josh Wills
  • 8. <Practice> Given the following dataset, find manually the mean, median, mode, variance and standard deviation for this population. { 3, 15, 17, 18, 20, 20, 12, 20, 20, 16, 17, 12, 4, 7, 15, 20, 12, 6, 1, 20 } Also try using a spreadsheet (such as Excel or Google Spreadsheet) to find the above measures for the same dataset. <Practice> Math is Fun. Learn what Relative Frequency Distribution is. Try the example at the bottom of the this page. <Community Outreach> <Optional> Explore and connect with your local R Group (or Data Scientist/Big Data groups) and check out their projects, talks and seminars that might interest you. Also discuss with them how you can engage with them and help them out in their endeavors. Activities
  • 9. <Optional> If you are not fully happy with the statistical functionality of your familiar spreadsheet package, download PSPP free statistical analysis tool from SourceForge and play with it. <Optional> <Advanced> Import Housing Data into R-Studio and describe it statistically. You may need packages like pastecs which let you use stat.desc function. <Optional> Register for Big Data in Motion This is a free online webinar scheduled for Jan 30, 2014, 1 PM EST. Attendance is optional but recommended. Need more? Reach out to our Research Scholar Ms. Rachel Fleming < Rachel@emodern.biz> and ask for more activities and challenges. Activities - contd ...
  • 10. Assignment 2 - Submission Required Assignment: Download Haberman Survival dataset from UCI Machine Learning Repository. Import this dataset into your R-Studio. Generate three graphic representations: Histogram, Scatter Plot and Box Plot , as depicted above. Refer to R for Machine Learning by Allison Chung before you attempt this assignment. Image credit: R for Machine Learning by Allison Chung <Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming <rachel@emodern.biz> if you have any difficulties with this assignment.
  • 11. Submissions Deadline Saturday, 11:59 PM your local time. Mail Assignment 2 to <datascience400@gmail.com> Submit a PDF document of the screenshots of your R- Studio workspace showing the three visualizations discussed. Use the naming convention: DSE 400 > Assignment 2 > Your Full Name for your document. No document links should be sent. Please add DSE 400 > Assignment 2 in the subject line.
  • 12. Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes, Recommendations and Boosting algorithms . Refer to R for Machine Learning by Allison Chung Watch Caltech Machine Learning Videos on Youtube Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc. Week 8 Ethics, Privacy and Building Data Products. DSE 400 - Weeks 3-8 ahead
  • 13. References, Resources and Additional Reading An Introduction to Data Science Think Stats: Probability and Statistics for Programmers Statistics Playlist by Khan Academy R for Machine Learning by Allison Chung Introduction to Hypothesis Testing Single Sample Hypothesis Testing Part 1 and Part 2 R for Beginners by Emmanuel Paradis R - Reference Cards Introduction to R Playlist (Video Collection) on Youtube Caltech Machine Learning Playlist on Youtube [MIT OCW] Prediction: Machine Learning and Statistics from MIT Sloan School of Management,
  • 14. Citation The dataset titled Haberman's Survival Data used here for Assignment 2 comes from UCI Machine Learning Repository Donor for Haberman's Survival Data: Tjen-Sien Lim (limt@stat.wisc.edu). It was added UCI Machine Learning Repository on March 4, 1999 R for Machine Learning by Allison Chung is recommended by MIT Course Prediction: Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE 400 as per OCW guidelines. Content that appears as is on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document.
  • 15. For More Information Presentation deck for DSE 400 > Week 1 Roadmap can be found at http://bit.ly/1hC5wAV Week 2 discussions take place during this week on SONO DSE 400 Week 2 <Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming <rachel@emodern.biz> if you have any difficulties with the assignments. We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <datascience400@gmail.com> You can always find the latest version of this document at http://bit.ly/1dVHJwO
  • 16. Fun@Work Geographic distribution of clicks for Week 1 Roadmap
  • 18. We thank our community of committed and passionate volunteers, experts, educators, innovators, benefactors, advisers, advocates, mentors and supporters We are also grateful to the outstanding support and encouragement from SONO team as well as other organizations like MIT Sloan of Management, IBM, HortonWorks, R-Project, Creative Commons, Open Courseware Consortium, Stanford University, Caltech, O’Reilly Publications and Data Science Central etc. Acknowledgement