SlideShare a Scribd company logo
1 of 13
Download to read offline
Data Scientist Enablement
DSE 400 - Fast Track to Data Science
Week 7 Roadmap
Advanced Center of Excellence
Modern Renaissance Corporation
In Collaboration with SONO team and others
Content of this document is under Creative Commons Licence CC BY 4.0
Agenda
You can always find the latest version of this document at http://bit.ly/1fyOSnN
Week 7 Overview
Discussions
Learning Path
Activities
Assignment
Submission
Adaptive Learning
References
Citation
“Action is the foundational key to all success” - Pablo Picasso
Social Discourse:
Discuss about IBM Watson. Continue building R-COP and Modern Data Platforms-COP
Learning plan:
Read about MapReduce, Lambda Architecture, Google Query
Activities:
Continue Hortonworks Tutorials. Explore Google Public Datasets and BigQuery
Assignment 7:
Perform queries on Baseball Statistics dataset
DSE 400 - Week 7 at a glance
Discussion: Watch Ken Jennings: Watson, Jeopardy and me, the obsolete and share
your thoughts/reflections on the evolving domain of “Cognitive Computing”
Inline with our Open Innovation model, we are expanding our Social Discourse mode to
Linkedin, Facebook and Google+ Discussions on SONO will continue as planned on
DSE 400 Jump Pad. This will allow more choice for participants. We are hoping this will
result in the increased social engagement.
Check out Language R and Modern Data Platforms Communities of Practice (COPs) to
help you increase your competence in R, Machine Learning, Hadoop ecosystem and
other platforms. Reach out to Olivia Ramirez, Ellen Brock or Manju Rupani if you want
to contribute to these communities.
Social Engagement - Week 7
SONO Linkedin Facebook Google+
Read Practical illustration of Map-Reduce (Hadoop-style), on real data by Dr. Vincent Granville
Read Lambda Architecture for Big Data Systems by Michael Walker
Read Google BigQuery Tutorial
<Optional> Watch Hadoop - The Data Scientist's Dream
<Optional> Watch Hadoop MapReduce Example - How good are a city's farmer's markets by Helen Zeng
<Optional> Watch Google BigQuery in Ten Minutes
Recommended Learning Plan
Activities
<Practice> Check out Visualization of the Day at Data Science Central. As the name suggests, it is
going be different everyday. Explore the alternative ways of representing this. Could you have
presented this in a better way?
<Practice> Visit Google Public Data Directory. Explore Greenhouse Gas Emissions by country.
How does your country fare per capita wise compared to leading contributors. Also check out IMF
World Outlook dataset. Visualize the data on Unemployment rate (this can be found under people
category).
<Practice> Continue Hortonworks Tutorials on HDP 2.0. We will return to Hadoop and its
ecosystem in DSE 502 which will focus on Modern Data Platforms. In the meantime you can also
participate in Modern Data Platforms-Community of Practice, contribute to discussions on this
subject.
Assignment 7 - Submission Required
HDP 2.0 R-SQLDF BigQuery
Download Sean Lehman’s baseball statistics dataset. Using either HDP 2.0 (or its
equivalent Hadoop platform), or R-sqldf or Google BigQuery compute the following.
a) group the data contained in Batting table showing maximum runs every year
b) similarly group the data contained in Batting table showing average runs every year
c) display maximum runs for each year and the associated player (last_name and
first_name) using Batting and Master tables in combination (i.e. by joining Batting and
Master tables)
You may reach out to Rachel Fleming <rachel@emodern.biz> if you have any difficulties
with the assignments or looking for more challenging assignments or activities.
Submission in PDF format is required
Recommended Deadline: Saturday, 11:59 PM your local time. If you can’t submit
your assignment in time, please complete it and turn it in ASAP. While there is
no penalty for late submission, it will help you focus on next week’s lessons if
you turn in assignments in time.
Mail Assignment 7 to <dse400.datascience@gmail.com> with DSE 400 >
Assignment 7 in the subject line. Submit a single PDF document showing your
queries and result samples. Include screenshots as necessary. Naming
convention DSE 400 - Assignment 7 - Your Full Name is required for your
document for the sake of consistency. No document links should be sent. Just
one single PDF document, and Only in PDF format is accepted.
Adaptive Learning Options
Data Scientist Enablement program
Maturity Composite Score * Proficiency Certificate
Level 5 > 90 Innovating Capability Black Belt
Level 4 > 80 and <= 90 Architectural Capability Green Belt
Level 3 > 70 and <= 80 Solutioning Capability Yellow Belt
Level 2 > 60 and <= 70 Basic Understanding Completion
Level 1 <= 60 Basic Familiarity Audit
* Composite score is computed taking into consideration of performance of participants in assignments, activities, projects, social
engagement, collaboration, team development, publications and advanced research etc. in all 4 modules of DSE program
References, Resources and Additional Reading
17 short tutorials all Data Scientists should read (and practice). Dr. Granville. Data Science Central
Hadoop Illuminated. Kerzner and Maniyam. Hadoop Illuminated LLC 2013
Hadoop Definitive Guide. 3rd Edition. Tom White. O’Reilly Publications. 2012
Programming Hive. Capriolo et. al. O’Reilly Publications. 2012
Mapreduce: Simplified Data Processing on Large Clusters. Dean and Ghemavat. Google 2004
[MIT OCW] How to Process, Analyze and Visualize Data. Marcus & Wu. 2012
The Modern Data Architecture for Predictive Analytics
Big Data - Hadoop, Hive, Pig and Hbase video collection
Language R-Community of Practice
Modern Data Platforms-Community of Practice
Data Science Enablement playlist
Citation
Content that appears as is, on this document only, is under Creative Commons
License CC BY 4.0 This license may not necessarily apply to other material
referenced here in this document.
Baseball dataset used in this week’s activities and assignment is attributed to
Sean Lehman. This dataset is adapted under Creative Commons Licence 3.0
Content from IBM, Hortonworks, Google, Data Science Central and O’Reilly
Media etc. is excluded from the above Creative Commons License.
For More Information
Week 7 discussions take place during this week on DSE 400 forums on Linkedin, Facebook, Google+
and SONO. There is also an active Q&A session for everyone's benefit. Also check out Language R-
Community of Practice if you would like to advance your competence in R or if you would like to
contribute to this community.
<Mentoring On Demand> You may reach out to Rachel Fleming <rachel@emodern.biz> if you have
any difficulties with the assignments or looking for more challenging activities. If you need a mentor or
someone to help you accelerate along the DSE program, you may reach out to Vishal Kumar <wishall.
kumar@gmail.com> or Ligia Buzan<ligia.buzan@gmail.com>
We welcome questions, thoughts and suggestions. Post these in the right forums/discussions or write
to us at <dse400.datascience@gmail.com>
You can always find the latest version of this document and other DSE 400 roadmaps at http://bitly.
com/bundles/o_4ldaljhta4/1
Thank You
The Analytical Engine has no pretensions whatever to
originate anything. It can do whatever we know how to
order it to perform. It can follow analysis, but it has no
power of anticipating any analytical revelations or truths.
Its province is to assist us in making available what we
are already acquainted with. - Ada Lovelace

More Related Content

Viewers also liked

Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap Dr. Mohan K. Bavirisetty
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr. Mohan K. Bavirisetty
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Dr. Mohan K. Bavirisetty
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Big Data Joe™ Rossi
 
Africa lesson plan
Africa lesson planAfrica lesson plan
Africa lesson planbecclehead
 
Music Magazine Evaluation
Music Magazine EvaluationMusic Magazine Evaluation
Music Magazine EvaluationDavid Hay
 
2015 meet the teacher yr4
2015 meet the teacher yr42015 meet the teacher yr4
2015 meet the teacher yr4WGPS
 
Chinese folk music
Chinese folk musicChinese folk music
Chinese folk musicbecclehead
 
Division Jeopardy
Division JeopardyDivision Jeopardy
Division JeopardyRenegarmath
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr. Mohan K. Bavirisetty
 
Languages qr code poster
Languages qr code posterLanguages qr code poster
Languages qr code posterFiona Boughey
 
Year 9 song writing unit overview
Year 9 song writing unit overviewYear 9 song writing unit overview
Year 9 song writing unit overviewbecclehead
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Dr. Mohan K. Bavirisetty
 
Types of symmetry
Types of symmetryTypes of symmetry
Types of symmetryagabo75
 
Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator BSGAfrica
 
La famille sound-spelling links in French
La famille   sound-spelling links in FrenchLa famille   sound-spelling links in French
La famille sound-spelling links in FrenchFiona Boughey
 

Viewers also liked (20)

Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Africa lesson plan
Africa lesson planAfrica lesson plan
Africa lesson plan
 
Music Magazine Evaluation
Music Magazine EvaluationMusic Magazine Evaluation
Music Magazine Evaluation
 
2015 meet the teacher yr4
2015 meet the teacher yr42015 meet the teacher yr4
2015 meet the teacher yr4
 
Chinese folk music
Chinese folk musicChinese folk music
Chinese folk music
 
Division Jeopardy
Division JeopardyDivision Jeopardy
Division Jeopardy
 
Spellings
SpellingsSpellings
Spellings
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
 
Spellings 28.1.11
Spellings 28.1.11Spellings 28.1.11
Spellings 28.1.11
 
Languages qr code poster
Languages qr code posterLanguages qr code poster
Languages qr code poster
 
Flip lesson!
Flip lesson!Flip lesson!
Flip lesson!
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Year 9 song writing unit overview
Year 9 song writing unit overviewYear 9 song writing unit overview
Year 9 song writing unit overview
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0
 
Types of symmetry
Types of symmetryTypes of symmetry
Types of symmetry
 
Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator Business Analytics Competency centre: A strategic Differentiator
Business Analytics Competency centre: A strategic Differentiator
 
La famille sound-spelling links in French
La famille   sound-spelling links in FrenchLa famille   sound-spelling links in French
La famille sound-spelling links in French
 

Similar to DSE 400 Week 7 Roadmap

Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap Dr. Mohan K. Bavirisetty
 
Jisc HESA and Heidi Lab at Tableau users conference Nov 15
Jisc HESA and Heidi Lab at Tableau users conference Nov 15Jisc HESA and Heidi Lab at Tableau users conference Nov 15
Jisc HESA and Heidi Lab at Tableau users conference Nov 15mylesdanson
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 
Creating A Company Wide Data Science Learning Environment
Creating A Company Wide Data Science Learning EnvironmentCreating A Company Wide Data Science Learning Environment
Creating A Company Wide Data Science Learning EnvironmentRobert Joseph, Ph.D.
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Qazi Maaz Arshad
 
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)Shadeed Eleazer
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
 
Hands-On Database 2nd Edition Steve Conger Solutions Manual
Hands-On Database 2nd Edition Steve Conger Solutions ManualHands-On Database 2nd Edition Steve Conger Solutions Manual
Hands-On Database 2nd Edition Steve Conger Solutions ManualPearlHansonss
 
An Introduction To Share Point 2007
An Introduction To Share Point 2007An Introduction To Share Point 2007
An Introduction To Share Point 2007TechSoup
 
An Introduction To Share Point 2007
An Introduction To Share Point 2007An Introduction To Share Point 2007
An Introduction To Share Point 2007TechSoup
 
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05tracykteal
 
Digital content leadingpractice_webconference9_aug3
Digital content leadingpractice_webconference9_aug3Digital content leadingpractice_webconference9_aug3
Digital content leadingpractice_webconference9_aug3Colleen Hodgins
 
These are topics we have worked in residency week in group project
These are topics we have worked in residency week in group projectThese are topics we have worked in residency week in group project
These are topics we have worked in residency week in group projectchestnutkaitlyn
 
These are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docxThese are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docxrandymartin91030
 
Live streaming to internal, remote and external locations
Live streaming to internal, remote and external locationsLive streaming to internal, remote and external locations
Live streaming to internal, remote and external locationsPaul Richards
 
Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmapDr. Mohan K. Bavirisetty
 

Similar to DSE 400 Week 7 Roadmap (20)

Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmap
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap
 
Jisc HESA and Heidi Lab at Tableau users conference Nov 15
Jisc HESA and Heidi Lab at Tableau users conference Nov 15Jisc HESA and Heidi Lab at Tableau users conference Nov 15
Jisc HESA and Heidi Lab at Tableau users conference Nov 15
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Creating A Company Wide Data Science Learning Environment
Creating A Company Wide Data Science Learning EnvironmentCreating A Company Wide Data Science Learning Environment
Creating A Company Wide Data Science Learning Environment
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
 
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
 
Hands-On Database 2nd Edition Steve Conger Solutions Manual
Hands-On Database 2nd Edition Steve Conger Solutions ManualHands-On Database 2nd Edition Steve Conger Solutions Manual
Hands-On Database 2nd Edition Steve Conger Solutions Manual
 
An Introduction To Share Point 2007
An Introduction To Share Point 2007An Introduction To Share Point 2007
An Introduction To Share Point 2007
 
An Introduction To Share Point 2007
An Introduction To Share Point 2007An Introduction To Share Point 2007
An Introduction To Share Point 2007
 
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05
 
OSCELOT
OSCELOTOSCELOT
OSCELOT
 
Digital content leadingpractice_webconference9_aug3
Digital content leadingpractice_webconference9_aug3Digital content leadingpractice_webconference9_aug3
Digital content leadingpractice_webconference9_aug3
 
Community Engagement
Community EngagementCommunity Engagement
Community Engagement
 
These are topics we have worked in residency week in group project
These are topics we have worked in residency week in group projectThese are topics we have worked in residency week in group project
These are topics we have worked in residency week in group project
 
These are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docxThese are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docx
 
Introduction to the Software Sustainability Institute
Introduction to the Software Sustainability InstituteIntroduction to the Software Sustainability Institute
Introduction to the Software Sustainability Institute
 
Live streaming to internal, remote and external locations
Live streaming to internal, remote and external locationsLive streaming to internal, remote and external locations
Live streaming to internal, remote and external locations
 
Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmap
 

DSE 400 Week 7 Roadmap

  • 1. Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 7 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0
  • 2. Agenda You can always find the latest version of this document at http://bit.ly/1fyOSnN Week 7 Overview Discussions Learning Path Activities Assignment Submission Adaptive Learning References Citation “Action is the foundational key to all success” - Pablo Picasso
  • 3. Social Discourse: Discuss about IBM Watson. Continue building R-COP and Modern Data Platforms-COP Learning plan: Read about MapReduce, Lambda Architecture, Google Query Activities: Continue Hortonworks Tutorials. Explore Google Public Datasets and BigQuery Assignment 7: Perform queries on Baseball Statistics dataset DSE 400 - Week 7 at a glance
  • 4. Discussion: Watch Ken Jennings: Watson, Jeopardy and me, the obsolete and share your thoughts/reflections on the evolving domain of “Cognitive Computing” Inline with our Open Innovation model, we are expanding our Social Discourse mode to Linkedin, Facebook and Google+ Discussions on SONO will continue as planned on DSE 400 Jump Pad. This will allow more choice for participants. We are hoping this will result in the increased social engagement. Check out Language R and Modern Data Platforms Communities of Practice (COPs) to help you increase your competence in R, Machine Learning, Hadoop ecosystem and other platforms. Reach out to Olivia Ramirez, Ellen Brock or Manju Rupani if you want to contribute to these communities. Social Engagement - Week 7 SONO Linkedin Facebook Google+
  • 5. Read Practical illustration of Map-Reduce (Hadoop-style), on real data by Dr. Vincent Granville Read Lambda Architecture for Big Data Systems by Michael Walker Read Google BigQuery Tutorial <Optional> Watch Hadoop - The Data Scientist's Dream <Optional> Watch Hadoop MapReduce Example - How good are a city's farmer's markets by Helen Zeng <Optional> Watch Google BigQuery in Ten Minutes Recommended Learning Plan
  • 6. Activities <Practice> Check out Visualization of the Day at Data Science Central. As the name suggests, it is going be different everyday. Explore the alternative ways of representing this. Could you have presented this in a better way? <Practice> Visit Google Public Data Directory. Explore Greenhouse Gas Emissions by country. How does your country fare per capita wise compared to leading contributors. Also check out IMF World Outlook dataset. Visualize the data on Unemployment rate (this can be found under people category). <Practice> Continue Hortonworks Tutorials on HDP 2.0. We will return to Hadoop and its ecosystem in DSE 502 which will focus on Modern Data Platforms. In the meantime you can also participate in Modern Data Platforms-Community of Practice, contribute to discussions on this subject.
  • 7. Assignment 7 - Submission Required HDP 2.0 R-SQLDF BigQuery Download Sean Lehman’s baseball statistics dataset. Using either HDP 2.0 (or its equivalent Hadoop platform), or R-sqldf or Google BigQuery compute the following. a) group the data contained in Batting table showing maximum runs every year b) similarly group the data contained in Batting table showing average runs every year c) display maximum runs for each year and the associated player (last_name and first_name) using Batting and Master tables in combination (i.e. by joining Batting and Master tables) You may reach out to Rachel Fleming <rachel@emodern.biz> if you have any difficulties with the assignments or looking for more challenging assignments or activities.
  • 8. Submission in PDF format is required Recommended Deadline: Saturday, 11:59 PM your local time. If you can’t submit your assignment in time, please complete it and turn it in ASAP. While there is no penalty for late submission, it will help you focus on next week’s lessons if you turn in assignments in time. Mail Assignment 7 to <dse400.datascience@gmail.com> with DSE 400 > Assignment 7 in the subject line. Submit a single PDF document showing your queries and result samples. Include screenshots as necessary. Naming convention DSE 400 - Assignment 7 - Your Full Name is required for your document for the sake of consistency. No document links should be sent. Just one single PDF document, and Only in PDF format is accepted.
  • 9. Adaptive Learning Options Data Scientist Enablement program Maturity Composite Score * Proficiency Certificate Level 5 > 90 Innovating Capability Black Belt Level 4 > 80 and <= 90 Architectural Capability Green Belt Level 3 > 70 and <= 80 Solutioning Capability Yellow Belt Level 2 > 60 and <= 70 Basic Understanding Completion Level 1 <= 60 Basic Familiarity Audit * Composite score is computed taking into consideration of performance of participants in assignments, activities, projects, social engagement, collaboration, team development, publications and advanced research etc. in all 4 modules of DSE program
  • 10. References, Resources and Additional Reading 17 short tutorials all Data Scientists should read (and practice). Dr. Granville. Data Science Central Hadoop Illuminated. Kerzner and Maniyam. Hadoop Illuminated LLC 2013 Hadoop Definitive Guide. 3rd Edition. Tom White. O’Reilly Publications. 2012 Programming Hive. Capriolo et. al. O’Reilly Publications. 2012 Mapreduce: Simplified Data Processing on Large Clusters. Dean and Ghemavat. Google 2004 [MIT OCW] How to Process, Analyze and Visualize Data. Marcus & Wu. 2012 The Modern Data Architecture for Predictive Analytics Big Data - Hadoop, Hive, Pig and Hbase video collection Language R-Community of Practice Modern Data Platforms-Community of Practice Data Science Enablement playlist
  • 11. Citation Content that appears as is, on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document. Baseball dataset used in this week’s activities and assignment is attributed to Sean Lehman. This dataset is adapted under Creative Commons Licence 3.0 Content from IBM, Hortonworks, Google, Data Science Central and O’Reilly Media etc. is excluded from the above Creative Commons License.
  • 12. For More Information Week 7 discussions take place during this week on DSE 400 forums on Linkedin, Facebook, Google+ and SONO. There is also an active Q&A session for everyone's benefit. Also check out Language R- Community of Practice if you would like to advance your competence in R or if you would like to contribute to this community. <Mentoring On Demand> You may reach out to Rachel Fleming <rachel@emodern.biz> if you have any difficulties with the assignments or looking for more challenging activities. If you need a mentor or someone to help you accelerate along the DSE program, you may reach out to Vishal Kumar <wishall. kumar@gmail.com> or Ligia Buzan<ligia.buzan@gmail.com> We welcome questions, thoughts and suggestions. Post these in the right forums/discussions or write to us at <dse400.datascience@gmail.com> You can always find the latest version of this document and other DSE 400 roadmaps at http://bitly. com/bundles/o_4ldaljhta4/1
  • 13. Thank You The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis, but it has no power of anticipating any analytical revelations or truths. Its province is to assist us in making available what we are already acquainted with. - Ada Lovelace