DATA+SCIENCE
A FIRST COURSE
What is Data Science?
Data Science is, in general terms,
the extraction of knowledge from
data
What is Data Science?
Data is increasingly cheap and ubiquitous. We
are collecting and analyzing data,
unprecedented in variety, complexity and
scale.
At the same time, new technologies are
emerging to organize and make sense of this
avalanche of data.
What is Data Science?
Data Science is an interdisciplinary subject
employing concepts and techniques from
mathematics, statistics, computer science
and economics.
It is used to identify patterns and regularities in
data, affecting all aspects of work and society
from medicine to marketing to scientific
research.
Who is a Data Scientist?
A data scientist is someone who is
better at statistics than most
software engineers and better at
software engineering than most
statisticians
Who is a Data Scientist?
A Data Scientist is a professional
with the training and curiosity to
make discoveries while swimming in
an ocean of data; communicating
what they learn and suggesting its
implications for new decisions.
Who is a Data Scientist?
They identify and combine rich and potentially
incomplete data sources, and bring structure to
large quantities of formless data, making
analysis possible.
They engage decision makers in an ongoing
conversation based on the implications of the
data for products, processes, and decisions.
Who is a Data Scientist?
★ A Data Scientist should have solid
quantitative and analytic skills
Statistical
Modelling
Experimental
Design
Bayesian
Inference
Machine
Learning
Information
Theory
Complex
Systems
Who is a Data Scientist?
★ A Data Scientist should be a good
programmer
Scripting:
e.g. python
Statistical
Packages: e.g. R
Databases: SQL
and NoSQL
MapReduce
concepts
Hadoop and
Hive/Pig
Computer
Science
Who is a Data Scientist?
In addition, a Data Scientist should
★ excel at communication and visualization
★ understand economics and business
concepts
★ be curious and creative
Demand for Data Scientists
Demand for Data Scientists
There is a growing demand for data-savvy
professionals in businesses, public agencies,
and nonprofits.
There is a limited supply of professionals who
can efficiently work with data at scale.
Thus, the salaries for data engineers, data
scientists, statisticians, and data analysts
have increased rapidly.
A recent study by the McKinsey Global
Institute estimates that there will be four to
five million jobs in the U.S. requiring data
analysis skills by 2018, and that large numbers
of positions will only be filled through training
or retraining.
In a survey of 816 data professionals in 53
countries, O’Reilly Media report a median
annual salary for Data Science professionals
as $98,000.
SQL, R, Python and Excel are the top earning
skills.
Data Science in India
According to a survey by Gartner
★ In 2013, the Data Analytics market in India
was $1.6 Billion with a growth rate of 8%
★ By 2018, the market is projected to be $3.7
Billion
"For the fourth year in a row, analytics ranks as the No.
1 priority in Gartner's CIO [India] Survey." Bhavish Sood,
research director at Gartner explains.
India is one of the strongest countries in the Data
Science marketplace that boasts of clients including
Facebook, GE, NASA, Tesco and Merck. It can
potentially build a talent pipeline for data scientists that
are virtually non-existent today.
India will need 200,000 data scientists in the next few
years. A single company, Wipro, already has as many as
8,000 people in analytics functions.
Data Science in India
The median annual salary for a Data Scientists in
India is Rs 670,665
The highest paying skills are
Python, Machine Learning,
Statistical Analysis, Big Data
Analytics, and R.
Bengal Chamber proposes smart and
green city for business analytics firms
The Bengal Chamber of Commerce and Industry has
taken an initiative to set up a smart city for business
analytics in West Bengal.
The project would involve service providers like KPMG
Advisory Services and PricewaterhouseCoopers,
corporate consumers, education institutions such as
Indian Institute of Technology Kharagpur, the Indian
Statistical Institute, and the Indian Institute of
Management, Calcutta.
How can you be a Data Scientist?
A Master’s degree is a natural route to be a Data
Scientist.
Massive Open Online Courses (MOOCs) give access to
self-learning at a low cost (often free), but leave it to the
student to identify a suitable set of courses and tools to
round out a coherent skill set.
Bootcamps offer students a practical and structured
learning environment at a far more affordable rate
compared with obtaining a Master’s Degree.
Master’s Degree
Duration 9 - 20 months
Faculty University Professors
Learning Theory and Assignments
Outcome Degree
Projects Practicum and Internship
Placement University Recruiting
Examples UC Berkeley, NYU, NCSU
IIT+IIM+ISI
Tuition $20,000 - $70,000 (US)
₹20,000,000 (India)
Self-Learning (MOOCs)
Duration 6 - 18 months (part time)
Faculty University Professors
(recorded lectures)
Learning Self guided
Outcome Certificate
Projects Projects on own time
Placement Self-driven job search
Examples Coursera, Udacity
Tuition Free- $500 (US)
Bootcamps
Duration 2 - 3 months
Faculty Professors & Data Scientists
Learning Experiential Learning
Outcome Certificate and Portfolio
Projects Built-In Projects
Placement Hiring Day and
Placement Assistance
Examples Zipfan, Metis, Data Incubator
Tuition Free - $16,000 (US)
The Course
Data+Science: A First Course is an intensive
eight-week program based on the bootcamp
model, organized by The Data+Science
Initiative.
It is designed to teach and train graduates in
quantitative fields to take an entry-level
position as a data scientist.
Objectives of the Course
Upon graduating a student will:
1. Have a clear understanding of and practical
experience with the process of designing,
implementing, and communicating the results of a
data science project.
2. Understand the landscape of data science tools and
their applications, and be prepared to identify and
dig into new technologies and algorithms needed
for the job at hand.
Overview
Data science gives valuable meaning to large sets
of complex and unstructured data.
The focus is around concepts and techniques to
mine, store, analyse and visualize data.
Data science is a highly interdisciplinary drawing
from fields such as computer science (algorithms
and databases), statistics (hypothesis testing and
inference), artificial intelligence (pattern
recognition and machine learning).
Course Content
Data Mining (⅛):
identifying data sources; extracting, cleaning
and verifying structured and unstructured data
Data Storage (¼):
structuring, storage and retrieval of data;
including big data and NoSQL
Data Analysis (½):
descriptive and inferential analysis; predictive
modelling, risk analysis and decision making
Data Visualization (⅛)
Course Content
Graduating students will:
1. Be proficient in statistical concepts and
mathematical techniques including correlation
functions, inference and hypothesis testing.
2. Be able to make predictive analyses by modelling
stochastic processes based on available data.
3. Learn and apply Machine Learning concepts to
solve data science problems
Course Content
4. Be capable coders in Python and R, including the
related packages and toolsets most commonly
used in data science.
5. Know the fundamentals of data visualization and
have experience creating static and dynamic data
visuals using JavaScript and D3.js.
6. Have introductory exposure to big data tools and
architecture such as the Hadoop stack, know when
these tools are necessary, and be poised to quickly
train up and utilize them in a big data project.
Prerequisites
Basic Statistics and Probability
descriptive statistics and distributions
Linear Algebra
vectors and matrices
Calculus and Differential Equations
basic calculus and finding extrema, ordinary
differential equations
Programming
basic proficiency in any programming language
Preferred Subjects
Computer Science
algorithms, data structures and databases
Advanced Statistics
bayesian inference and stochoastic processes
Statistical Mechanics/Information Theory
entropy, information, complexity
Economics
supply/demand, game theory
Web Development
HTML, CSS and Javascript
Eligibility
Anyone meeting the prerequisite criteria is
eligible, determined by a qualifying exam, with
preference given to those with knowledge of
the preferred subjects.
However, we would prefer applicants to have a
bachelor’s degree in a quantitative field, such
as: Engineering, Physics, Mathematics,
Statistics, Economics or Computer
Applications.
Course Details
The course consists of 24 classes over 8 weeks.
Each class (Mondays, Wednesdays, Fridays) is 6
hours in duration (10AM-4PM) including a lunch
hour.
Morning sessions consists of lectures and
discussions while the afternoons is a guided
programming session.
In addition, instructors will be available for office
hours at scheduled times.
Course Projects
The course is divided into three parts.
Part A (Weeks 1-4): daily programming projects
executed individually or in groups
Part B (Weeks 5-8): weekly projects in groups
drawn from the industry
Part C (Weeks 9-11, optional): course project in
groups with biweekly meetings with instructors
Benefits
Employment: Students will have the skill set and
portfolio to find employment as an entry level
data scientist. Such a skill set is in great demand,
both domestically as well as in developed
countries.
Research: Since Data Science is at the core of
academic research, our students, armed with the
knowledge, portfolio and recommendation will
find easier admission to universities, especially
abroad.

Data+Science : A First Course

  • 1.
  • 2.
    What is DataScience? Data Science is, in general terms, the extraction of knowledge from data
  • 3.
    What is DataScience? Data is increasingly cheap and ubiquitous. We are collecting and analyzing data, unprecedented in variety, complexity and scale. At the same time, new technologies are emerging to organize and make sense of this avalanche of data.
  • 4.
    What is DataScience? Data Science is an interdisciplinary subject employing concepts and techniques from mathematics, statistics, computer science and economics. It is used to identify patterns and regularities in data, affecting all aspects of work and society from medicine to marketing to scientific research.
  • 5.
    Who is aData Scientist? A data scientist is someone who is better at statistics than most software engineers and better at software engineering than most statisticians
  • 6.
    Who is aData Scientist? A Data Scientist is a professional with the training and curiosity to make discoveries while swimming in an ocean of data; communicating what they learn and suggesting its implications for new decisions.
  • 7.
    Who is aData Scientist? They identify and combine rich and potentially incomplete data sources, and bring structure to large quantities of formless data, making analysis possible. They engage decision makers in an ongoing conversation based on the implications of the data for products, processes, and decisions.
  • 8.
    Who is aData Scientist? ★ A Data Scientist should have solid quantitative and analytic skills Statistical Modelling Experimental Design Bayesian Inference Machine Learning Information Theory Complex Systems
  • 9.
    Who is aData Scientist? ★ A Data Scientist should be a good programmer Scripting: e.g. python Statistical Packages: e.g. R Databases: SQL and NoSQL MapReduce concepts Hadoop and Hive/Pig Computer Science
  • 10.
    Who is aData Scientist? In addition, a Data Scientist should ★ excel at communication and visualization ★ understand economics and business concepts ★ be curious and creative
  • 11.
    Demand for DataScientists
  • 12.
    Demand for DataScientists There is a growing demand for data-savvy professionals in businesses, public agencies, and nonprofits. There is a limited supply of professionals who can efficiently work with data at scale. Thus, the salaries for data engineers, data scientists, statisticians, and data analysts have increased rapidly.
  • 13.
    A recent studyby the McKinsey Global Institute estimates that there will be four to five million jobs in the U.S. requiring data analysis skills by 2018, and that large numbers of positions will only be filled through training or retraining.
  • 14.
    In a surveyof 816 data professionals in 53 countries, O’Reilly Media report a median annual salary for Data Science professionals as $98,000. SQL, R, Python and Excel are the top earning skills.
  • 15.
    Data Science inIndia According to a survey by Gartner ★ In 2013, the Data Analytics market in India was $1.6 Billion with a growth rate of 8% ★ By 2018, the market is projected to be $3.7 Billion "For the fourth year in a row, analytics ranks as the No. 1 priority in Gartner's CIO [India] Survey." Bhavish Sood, research director at Gartner explains.
  • 16.
    India is oneof the strongest countries in the Data Science marketplace that boasts of clients including Facebook, GE, NASA, Tesco and Merck. It can potentially build a talent pipeline for data scientists that are virtually non-existent today. India will need 200,000 data scientists in the next few years. A single company, Wipro, already has as many as 8,000 people in analytics functions.
  • 17.
    Data Science inIndia The median annual salary for a Data Scientists in India is Rs 670,665 The highest paying skills are Python, Machine Learning, Statistical Analysis, Big Data Analytics, and R.
  • 18.
    Bengal Chamber proposessmart and green city for business analytics firms The Bengal Chamber of Commerce and Industry has taken an initiative to set up a smart city for business analytics in West Bengal. The project would involve service providers like KPMG Advisory Services and PricewaterhouseCoopers, corporate consumers, education institutions such as Indian Institute of Technology Kharagpur, the Indian Statistical Institute, and the Indian Institute of Management, Calcutta.
  • 20.
    How can yoube a Data Scientist? A Master’s degree is a natural route to be a Data Scientist. Massive Open Online Courses (MOOCs) give access to self-learning at a low cost (often free), but leave it to the student to identify a suitable set of courses and tools to round out a coherent skill set. Bootcamps offer students a practical and structured learning environment at a far more affordable rate compared with obtaining a Master’s Degree.
  • 21.
    Master’s Degree Duration 9- 20 months Faculty University Professors Learning Theory and Assignments Outcome Degree Projects Practicum and Internship Placement University Recruiting Examples UC Berkeley, NYU, NCSU IIT+IIM+ISI Tuition $20,000 - $70,000 (US) ₹20,000,000 (India)
  • 22.
    Self-Learning (MOOCs) Duration 6- 18 months (part time) Faculty University Professors (recorded lectures) Learning Self guided Outcome Certificate Projects Projects on own time Placement Self-driven job search Examples Coursera, Udacity Tuition Free- $500 (US)
  • 23.
    Bootcamps Duration 2 -3 months Faculty Professors & Data Scientists Learning Experiential Learning Outcome Certificate and Portfolio Projects Built-In Projects Placement Hiring Day and Placement Assistance Examples Zipfan, Metis, Data Incubator Tuition Free - $16,000 (US)
  • 24.
    The Course Data+Science: AFirst Course is an intensive eight-week program based on the bootcamp model, organized by The Data+Science Initiative. It is designed to teach and train graduates in quantitative fields to take an entry-level position as a data scientist.
  • 25.
    Objectives of theCourse Upon graduating a student will: 1. Have a clear understanding of and practical experience with the process of designing, implementing, and communicating the results of a data science project. 2. Understand the landscape of data science tools and their applications, and be prepared to identify and dig into new technologies and algorithms needed for the job at hand.
  • 26.
    Overview Data science givesvaluable meaning to large sets of complex and unstructured data. The focus is around concepts and techniques to mine, store, analyse and visualize data. Data science is a highly interdisciplinary drawing from fields such as computer science (algorithms and databases), statistics (hypothesis testing and inference), artificial intelligence (pattern recognition and machine learning).
  • 27.
    Course Content Data Mining(⅛): identifying data sources; extracting, cleaning and verifying structured and unstructured data Data Storage (¼): structuring, storage and retrieval of data; including big data and NoSQL Data Analysis (½): descriptive and inferential analysis; predictive modelling, risk analysis and decision making Data Visualization (⅛)
  • 28.
    Course Content Graduating studentswill: 1. Be proficient in statistical concepts and mathematical techniques including correlation functions, inference and hypothesis testing. 2. Be able to make predictive analyses by modelling stochastic processes based on available data. 3. Learn and apply Machine Learning concepts to solve data science problems
  • 29.
    Course Content 4. Becapable coders in Python and R, including the related packages and toolsets most commonly used in data science. 5. Know the fundamentals of data visualization and have experience creating static and dynamic data visuals using JavaScript and D3.js. 6. Have introductory exposure to big data tools and architecture such as the Hadoop stack, know when these tools are necessary, and be poised to quickly train up and utilize them in a big data project.
  • 30.
    Prerequisites Basic Statistics andProbability descriptive statistics and distributions Linear Algebra vectors and matrices Calculus and Differential Equations basic calculus and finding extrema, ordinary differential equations Programming basic proficiency in any programming language
  • 31.
    Preferred Subjects Computer Science algorithms,data structures and databases Advanced Statistics bayesian inference and stochoastic processes Statistical Mechanics/Information Theory entropy, information, complexity Economics supply/demand, game theory Web Development HTML, CSS and Javascript
  • 32.
    Eligibility Anyone meeting theprerequisite criteria is eligible, determined by a qualifying exam, with preference given to those with knowledge of the preferred subjects. However, we would prefer applicants to have a bachelor’s degree in a quantitative field, such as: Engineering, Physics, Mathematics, Statistics, Economics or Computer Applications.
  • 33.
    Course Details The courseconsists of 24 classes over 8 weeks. Each class (Mondays, Wednesdays, Fridays) is 6 hours in duration (10AM-4PM) including a lunch hour. Morning sessions consists of lectures and discussions while the afternoons is a guided programming session. In addition, instructors will be available for office hours at scheduled times.
  • 34.
    Course Projects The courseis divided into three parts. Part A (Weeks 1-4): daily programming projects executed individually or in groups Part B (Weeks 5-8): weekly projects in groups drawn from the industry Part C (Weeks 9-11, optional): course project in groups with biweekly meetings with instructors
  • 35.
    Benefits Employment: Students willhave the skill set and portfolio to find employment as an entry level data scientist. Such a skill set is in great demand, both domestically as well as in developed countries. Research: Since Data Science is at the core of academic research, our students, armed with the knowledge, portfolio and recommendation will find easier admission to universities, especially abroad.