SlideShare a Scribd company logo
Data Scientist Enablement
DSE 400 - Fast Track to Data Science
Week 1 Roadmap
Advanced Center of Excellence
Modern Renaissance Corporation
In Collaboration with SONO team and others
Content of this document is under Creative Commons Licence CC-BY-4.0
Agenda
You can always find the latest version of this document at bit.ly/1hC5wAV
Welcome
Mission and Objectives
DSE Roadmap
DSE 400 at a glance
Week 1 at a glance
Discussions
Learning
Practice
Assignments and Submission
Looking ahead
References
Acknowledgement
In God we trust. all others must bring data. - W Edwards Deming
Welcome
Welcome to DSE 2014 Track. You are on one of he tmost
exciting programs to disseminate knowledge, diffuse
advancements and also stimulate adoption of Data/Decision
Sciences, Big Data Analytics and what we call Evidence-
Oriented Systems Engineering. The content and the courses
are designed to be easy, engaging and engendering.
Consequently, we also hope this program will also be most
rewarding for you from intellectual, pragmatic and
professional development perspectives.
Mission and Objectives
Mission of our program is to provide free, open and world-
class enablement of Data Scientists and help advance the
profession of Data Science as well as allied disciplines.
We aim to prepare the participants with analytical and
practical skills emphasizing breadth and depth in a range of
relevant disciplines and capabilities in Data/Decision
Sciences, Big Data Analytics, Architecture and Systems
Engineering.
Data Scientist Enablement Roadmap - 2014
Fast track to
Data Science
Machine Learning with R
Modern Data Platforms
Advanced Techniques in
Big Data Analytics
“”“A Data Scientist is someone who knows how to extract meaning from and interpret data, which
requires both tools and methods from statistics and machine learning, as well as being human.”
- Rachel Schutt and Cathy O’Neil, Doing Data Science
DSE 2014 with tentative timeline
Fast track to
Data Science
(DSE 400)
Modern Data Platforms (DSE 502)
Advanced Techniques in
Big Data Analytics (DSE 600)
Jan 19 - Mar 15
Mar 30 - May 10
May 25 - July 5
July 20 - Aug 30
Machine Learning with R (DSE 501)
Introductory course with NO pre-requisites. It employs
socialized learning paradigm involving individual effort,
team work, discussions and collaboration on SONO (Social
Knowledge) platform.
Topics include Algorithms, Statistical
Inference, Data Analysis, Hadoop, R,
Data Engineering, Machine Learning,
Visualization, Applications, Case Studies,
employing a variety of tools and techniques.
DSE 400 at a glance
Discussions(on SONO):
Welcome, Introductions, Programming and Analytics background etc.
Reading plan:
Read Chapters 1-3 from An Introduction to Data Science by Jeffrey Stanton and Big Data
[sorry] & Data Science: What Does a Data Scientist Do?
Activities:
Installing R and R-Studio; Fun with Math; Playing with ML Datasets, Research on Data
Visualization tools etc.
Assignment 1:
Download Housing dataset from UCI Machine Learning Repository to your local machine or
cloud drive. Import this dataset into your R environment and display this dataset.
DSE 400 - Week 1 at a glance
Login to SONO Community. Visit our Jump Pad (or
Knowledge Domain) called DSE 400. Go to DSE 2014
Global then join right participant group based on first letter
of your last name. Also feel free to explore other
Knowledge-rich communities on SONO.
http://getsokno.com/redvinef/controllers/cell.php?
user_knocell=992
Social Engagement on SONO
Discussion 1: Welcome to DSE program.
Discussion 2: What programming languages are you
familiar with? What languages do you use on day to day
basis? Do you have any experience using R Language?
What kind of Analytics tools if any, you have used before?
<Optional> Discussion 3: Q&A. We will focus on topics
central to Week1. But General questions are also welcome.
To participate in these discussions visit DSE 400 Week 1 at
http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1001
Social Engagement on SONO - Week 1
DSE 400 is designed be a broad introduction to Data
Science, Analytics Architecture and Visualization from both
learning as well as pragmatic perspectives. Following plan
is recommend for Week 1 to kickstart the program.
Read Chapters 1-3 from An Introduction to Data Science
by Jeffrey Stanton.
Read Big Data [sorry] & Data Science: What Does a Data
Scientist Do?
Week 1 Reading Plan
<Required> Visit http://www.rstudio.com/ Follow the instructions to
download and install R and R-Studio. For specific advice on your system and its
configuration, several how-to videos on Installing R and R-Studio can be found
on Youtube. Skip this activity if you already have R and R-Studio.
<Collaborative Research> <Required> Create a presentation on Data
Visualization Tools - A Comparative Study . Incorporate your unique ideas,
research and collective insights to arrive at the right evaluation methodology,
explain your thought-process and justify your choices. Note: You will build this
presentation for 4 weeks. You and your team will present it during 5th week
Activities
<Practice> Math is Fun. Create a bar chart quickly with 10 random values
using Data Graphs widget at Math is Fun website. Change graph to Pie Chart.
Display percentages only, not the original values.
<Practice> Visit UCI Machine Learning Repository. Familiarize yourself
with various datasets at this site. Feel free to download any dataset you like. We
will be using this repository in DSE program extensively. For week 1 our focus is
on just “Housing” dataset.
Activities - contd
Download R-Studio, in case you have not already done so.
Download Housing dataset from UCI Machine Learning
Repository to your local machine or cloud drive. Import this
dataset into your R environment and display this dataset.
Show the screenshot of your environment.
(See the sample image in the next slide.)
http://archive.ics.uci.edu/ml/datasets.html
Assignment 1 - Submission Required
Assignment 1 - Example screenshot
Submissions
Deadline Saturday Jan 25, 11:59 PM your local time.
Submit <mail to datascience400@gmail.com> the
screenshots of your R workspace (on your
machine/laptop/desktop) showing the Housing dataset.
You can either paste the image into the body of email or
create a document in PDF format and send it as an
attachment. No links please.
Fun@Work
DSE Participant Distribution Pattern
Fun@Work
Tagcloud of professional backgrounds of DSE Participants
Week 2 Basic Statistics, Hypothesis Testing, Regression, Playing with Spreadsheets,Visualization with
R. If you are new to Statistics or need a refresher, read ahead Think Stats: Probability and Statistics for
Programmers or watch Statistics Playlist by Khan Academy
Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes,
Recommendations and Boosting algorithms
Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study
Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.
Week 8 Ethics, Privacy and Building Data Products.
DSE 400 - Weeks 2-8 ahead
References and Additional Reading
An Introduction to Data Science by Jeffrey Stanton. This
is a good introduction to Data Science for non-technical
readers. This book is available under Creative Commons
Licence.
Learning R - Video Tutorial Lessons on Youtube
R for Machine Learning by Allison Chung
The Value of Big Data Isn't the Data HBR Article
[MIT OCW] Prediction, Machine Learning and Statistics
Housing Data Set Information: Concerns housing values in suburbs of Boston.
Origin: This dataset was taken from the StatLib library which is maintained at
Carnegie Mellon University. Creator : Harrison, D. and Rubinfeld, D.L.
'Hedonic prices and the demand for clean air', J. Environ. Economics &
Management, vol.5, 81-102, 1978.
Content that appears as is on this document only, is under Creative Commons
BY-NC-SA This license may not apply to material referenced here.
Citation
For More Information
DSE 2014 stream is all set set to commence on Jan 19, 2004
For more details, visit DSE 400 Announcement Page bit.ly/18zPE1j
Visit DSE 2014 Global to participate in DSE and to get to know the DSE Core
Team and participants. Week 1 discussions can found at DSE 400 Week 1
We welcome questions, thoughts and suggestions. Post these on SONO in the
right forum/discussion or write to us at <datascience400@gmail.com>
You can always find the latest version of this document at bit.ly/1hC5wAV
We thank our community of committed and passionate
volunteers, experts, educators, innovators, benefactors,
advisers, advocates, mentors and supporters.
We are also grateful to the outstanding support and
encouragement from SONO team as well as other
organizations like R-Project, Open Courseware
Consortium, MIT, IBM, Creative Commons, HortonWorks,
Stanford University, Caltech and Data Science Central etc.
Acknowledgement
Thank You

More Related Content

Viewers also liked

Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Dr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 7 roadmap
Data scientist enablement   dse 400   week 7 roadmapData scientist enablement   dse 400   week 7 roadmap
Data scientist enablement dse 400 week 7 roadmapDr. Mohan K. Bavirisetty
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr. Mohan K. Bavirisetty
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Dr. Mohan K. Bavirisetty
 
Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator
BSGAfrica
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
Haoran Du
 
BICC - A key element to your BI strategy
BICC - A key element to your BI strategyBICC - A key element to your BI strategy
BICC - A key element to your BI strategyGuyVanderSande
 
Center of Excellence Building Blocks
Center of Excellence Building BlocksCenter of Excellence Building Blocks
Center of Excellence Building Blocks
Arup Dutta
 
The Road to Becoming a Center of Excellence
The Road to Becoming a Center of ExcellenceThe Road to Becoming a Center of Excellence
The Road to Becoming a Center of Excellence
Lisa D'Adamo-Weinstein
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff Allen
Sri Ambati
 
Creating a Business Intelligence Competency Center
Creating a Business Intelligence Competency CenterCreating a Business Intelligence Competency Center
Creating a Business Intelligence Competency Center
Tommy Tavenner
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Business intelligence competency centre strategy and road map
Business intelligence competency centre strategy and road mapBusiness intelligence competency centre strategy and road map
Business intelligence competency centre strategy and road map
Omar Khan
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Dr. Mohan K. Bavirisetty
 
Successfully establishing a SOA Center of Excellence
Successfully establishing a SOA Center of ExcellenceSuccessfully establishing a SOA Center of Excellence
Successfully establishing a SOA Center of Excellence
Kelly Emo
 

Viewers also liked (16)

Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
 
Data scientist enablement dse 400 week 7 roadmap
Data scientist enablement   dse 400   week 7 roadmapData scientist enablement   dse 400   week 7 roadmap
Data scientist enablement dse 400 week 7 roadmap
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0
 
Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
 
BICC - A key element to your BI strategy
BICC - A key element to your BI strategyBICC - A key element to your BI strategy
BICC - A key element to your BI strategy
 
Center of Excellence Building Blocks
Center of Excellence Building BlocksCenter of Excellence Building Blocks
Center of Excellence Building Blocks
 
The Road to Becoming a Center of Excellence
The Road to Becoming a Center of ExcellenceThe Road to Becoming a Center of Excellence
The Road to Becoming a Center of Excellence
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff Allen
 
Creating a Business Intelligence Competency Center
Creating a Business Intelligence Competency CenterCreating a Business Intelligence Competency Center
Creating a Business Intelligence Competency Center
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Business intelligence competency centre strategy and road map
Business intelligence competency centre strategy and road mapBusiness intelligence competency centre strategy and road map
Business intelligence competency centre strategy and road map
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence
 
Successfully establishing a SOA Center of Excellence
Successfully establishing a SOA Center of ExcellenceSuccessfully establishing a SOA Center of Excellence
Successfully establishing a SOA Center of Excellence
 

Similar to Data scientist enablement dse 400 - week 1 roadmap

Data scientist enablement dse 400 week 2 roadmap
Data scientist enablement   dse 400   week 2 roadmapData scientist enablement   dse 400   week 2 roadmap
Data scientist enablement dse 400 week 2 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap Dr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmapDr. Mohan K. Bavirisetty
 
Hithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptxHithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptx
ssuser22b2ec
 
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docxHorton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
wellesleyterresa
 
Guia 2-examen-de-ingles
Guia 2-examen-de-inglesGuia 2-examen-de-ingles
Guia 2-examen-de-ingles
Liz Castro B
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Qazi Maaz Arshad
 
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05
tracykteal
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET Journal
 
vaibhav sisinty Data Science Career Accelerator with growth school
vaibhav sisinty Data Science  Career Accelerator with growth schoolvaibhav sisinty Data Science  Career Accelerator with growth school
vaibhav sisinty Data Science Career Accelerator with growth school
teliwit365
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
MartineMccracken314
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
AbbyWhyte974
 
Ornl IT
Ornl ITOrnl IT
Ornl IT
Scott Studham
 
Data Science Career Accelerator with growth school (1).pdf
Data Science  Career Accelerator with growth school (1).pdfData Science  Career Accelerator with growth school (1).pdf
Data Science Career Accelerator with growth school (1).pdf
teliwit365
 
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdfEmpowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Chris selebio
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCE
USDSI
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmapDr. Mohan K. Bavirisetty
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
Carole Goble
 

Similar to Data scientist enablement dse 400 - week 1 roadmap (20)

Data scientist enablement dse 400 week 2 roadmap
Data scientist enablement   dse 400   week 2 roadmapData scientist enablement   dse 400   week 2 roadmap
Data scientist enablement dse 400 week 2 roadmap
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap
 
Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmap
 
Hithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptxHithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptx
 
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docxHorton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
 
Guia 2-examen-de-ingles
Guia 2-examen-de-inglesGuia 2-examen-de-ingles
Guia 2-examen-de-ingles
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
 
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
 
vaibhav sisinty Data Science Career Accelerator with growth school
vaibhav sisinty Data Science  Career Accelerator with growth schoolvaibhav sisinty Data Science  Career Accelerator with growth school
vaibhav sisinty Data Science Career Accelerator with growth school
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
 
1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric 1 IDS 403 Final Project Part Two Guidelines and Rubric
1 IDS 403 Final Project Part Two Guidelines and Rubric
 
Ornl IT
Ornl ITOrnl IT
Ornl IT
 
Sanjay cv
Sanjay cvSanjay cv
Sanjay cv
 
Sanjay CV
Sanjay CVSanjay CV
Sanjay CV
 
Data Science Career Accelerator with growth school (1).pdf
Data Science  Career Accelerator with growth school (1).pdfData Science  Career Accelerator with growth school (1).pdf
Data Science Career Accelerator with growth school (1).pdf
 
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdfEmpowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
Empowerment Tech-Mod8_Developing and Constructing the ICT Project.pdf
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCE
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmap
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 

Data scientist enablement dse 400 - week 1 roadmap

  • 1. Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 1 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC-BY-4.0
  • 2. Agenda You can always find the latest version of this document at bit.ly/1hC5wAV Welcome Mission and Objectives DSE Roadmap DSE 400 at a glance Week 1 at a glance Discussions Learning Practice Assignments and Submission Looking ahead References Acknowledgement In God we trust. all others must bring data. - W Edwards Deming
  • 3. Welcome Welcome to DSE 2014 Track. You are on one of he tmost exciting programs to disseminate knowledge, diffuse advancements and also stimulate adoption of Data/Decision Sciences, Big Data Analytics and what we call Evidence- Oriented Systems Engineering. The content and the courses are designed to be easy, engaging and engendering. Consequently, we also hope this program will also be most rewarding for you from intellectual, pragmatic and professional development perspectives.
  • 4. Mission and Objectives Mission of our program is to provide free, open and world- class enablement of Data Scientists and help advance the profession of Data Science as well as allied disciplines. We aim to prepare the participants with analytical and practical skills emphasizing breadth and depth in a range of relevant disciplines and capabilities in Data/Decision Sciences, Big Data Analytics, Architecture and Systems Engineering.
  • 5. Data Scientist Enablement Roadmap - 2014 Fast track to Data Science Machine Learning with R Modern Data Platforms Advanced Techniques in Big Data Analytics “”“A Data Scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human.” - Rachel Schutt and Cathy O’Neil, Doing Data Science
  • 6. DSE 2014 with tentative timeline Fast track to Data Science (DSE 400) Modern Data Platforms (DSE 502) Advanced Techniques in Big Data Analytics (DSE 600) Jan 19 - Mar 15 Mar 30 - May 10 May 25 - July 5 July 20 - Aug 30 Machine Learning with R (DSE 501)
  • 7. Introductory course with NO pre-requisites. It employs socialized learning paradigm involving individual effort, team work, discussions and collaboration on SONO (Social Knowledge) platform. Topics include Algorithms, Statistical Inference, Data Analysis, Hadoop, R, Data Engineering, Machine Learning, Visualization, Applications, Case Studies, employing a variety of tools and techniques. DSE 400 at a glance
  • 8. Discussions(on SONO): Welcome, Introductions, Programming and Analytics background etc. Reading plan: Read Chapters 1-3 from An Introduction to Data Science by Jeffrey Stanton and Big Data [sorry] & Data Science: What Does a Data Scientist Do? Activities: Installing R and R-Studio; Fun with Math; Playing with ML Datasets, Research on Data Visualization tools etc. Assignment 1: Download Housing dataset from UCI Machine Learning Repository to your local machine or cloud drive. Import this dataset into your R environment and display this dataset. DSE 400 - Week 1 at a glance
  • 9. Login to SONO Community. Visit our Jump Pad (or Knowledge Domain) called DSE 400. Go to DSE 2014 Global then join right participant group based on first letter of your last name. Also feel free to explore other Knowledge-rich communities on SONO. http://getsokno.com/redvinef/controllers/cell.php? user_knocell=992 Social Engagement on SONO
  • 10. Discussion 1: Welcome to DSE program. Discussion 2: What programming languages are you familiar with? What languages do you use on day to day basis? Do you have any experience using R Language? What kind of Analytics tools if any, you have used before? <Optional> Discussion 3: Q&A. We will focus on topics central to Week1. But General questions are also welcome. To participate in these discussions visit DSE 400 Week 1 at http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1001 Social Engagement on SONO - Week 1
  • 11. DSE 400 is designed be a broad introduction to Data Science, Analytics Architecture and Visualization from both learning as well as pragmatic perspectives. Following plan is recommend for Week 1 to kickstart the program. Read Chapters 1-3 from An Introduction to Data Science by Jeffrey Stanton. Read Big Data [sorry] & Data Science: What Does a Data Scientist Do? Week 1 Reading Plan
  • 12. <Required> Visit http://www.rstudio.com/ Follow the instructions to download and install R and R-Studio. For specific advice on your system and its configuration, several how-to videos on Installing R and R-Studio can be found on Youtube. Skip this activity if you already have R and R-Studio. <Collaborative Research> <Required> Create a presentation on Data Visualization Tools - A Comparative Study . Incorporate your unique ideas, research and collective insights to arrive at the right evaluation methodology, explain your thought-process and justify your choices. Note: You will build this presentation for 4 weeks. You and your team will present it during 5th week Activities
  • 13. <Practice> Math is Fun. Create a bar chart quickly with 10 random values using Data Graphs widget at Math is Fun website. Change graph to Pie Chart. Display percentages only, not the original values. <Practice> Visit UCI Machine Learning Repository. Familiarize yourself with various datasets at this site. Feel free to download any dataset you like. We will be using this repository in DSE program extensively. For week 1 our focus is on just “Housing” dataset. Activities - contd
  • 14. Download R-Studio, in case you have not already done so. Download Housing dataset from UCI Machine Learning Repository to your local machine or cloud drive. Import this dataset into your R environment and display this dataset. Show the screenshot of your environment. (See the sample image in the next slide.) http://archive.ics.uci.edu/ml/datasets.html Assignment 1 - Submission Required
  • 15. Assignment 1 - Example screenshot
  • 16. Submissions Deadline Saturday Jan 25, 11:59 PM your local time. Submit <mail to datascience400@gmail.com> the screenshots of your R workspace (on your machine/laptop/desktop) showing the Housing dataset. You can either paste the image into the body of email or create a document in PDF format and send it as an attachment. No links please.
  • 18. Fun@Work Tagcloud of professional backgrounds of DSE Participants
  • 19. Week 2 Basic Statistics, Hypothesis Testing, Regression, Playing with Spreadsheets,Visualization with R. If you are new to Statistics or need a refresher, read ahead Think Stats: Probability and Statistics for Programmers or watch Statistics Playlist by Khan Academy Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes, Recommendations and Boosting algorithms Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc. Week 8 Ethics, Privacy and Building Data Products. DSE 400 - Weeks 2-8 ahead
  • 20. References and Additional Reading An Introduction to Data Science by Jeffrey Stanton. This is a good introduction to Data Science for non-technical readers. This book is available under Creative Commons Licence. Learning R - Video Tutorial Lessons on Youtube R for Machine Learning by Allison Chung The Value of Big Data Isn't the Data HBR Article [MIT OCW] Prediction, Machine Learning and Statistics
  • 21. Housing Data Set Information: Concerns housing values in suburbs of Boston. Origin: This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. Creator : Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Content that appears as is on this document only, is under Creative Commons BY-NC-SA This license may not apply to material referenced here. Citation
  • 22. For More Information DSE 2014 stream is all set set to commence on Jan 19, 2004 For more details, visit DSE 400 Announcement Page bit.ly/18zPE1j Visit DSE 2014 Global to participate in DSE and to get to know the DSE Core Team and participants. Week 1 discussions can found at DSE 400 Week 1 We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <datascience400@gmail.com> You can always find the latest version of this document at bit.ly/1hC5wAV
  • 23. We thank our community of committed and passionate volunteers, experts, educators, innovators, benefactors, advisers, advocates, mentors and supporters. We are also grateful to the outstanding support and encouragement from SONO team as well as other organizations like R-Project, Open Courseware Consortium, MIT, IBM, Creative Commons, HortonWorks, Stanford University, Caltech and Data Science Central etc. Acknowledgement