SlideShare a Scribd company logo
1 of 54
Download to read offline
NASSCOM Future Skills Training
Course – Data Science & Analytics
Dhruv Saxena
Assistant Professor (TEQIP-NPIU)
1
2
3
4
5
6
7
Introduction
to
Data Science
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 8
OBJECTIVES
The objective of this course is to Impart necessary knowledge of the
mathematical foundations needed for data science and develop
programming skills required to build data science applications.
Duration – 60 Hours (40L + 20C)
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 10
LEARNING OUTCOMES
At the end of this course, the students will be able to:
● Demonstrate understanding of the mathematical foundations
needed for data science.
● Collect, explore, clean, munge and manipulate data.
● Implement models such as k-nearest Neighbors, Naïve Bayes,
linear and logistic regression, decision trees, neural networks and
clustering.
● Build data science applications using Python based toolkits.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 11
Data, Big Data and Challenges
Data Science
◦ Introduction
◦ Why Data Science
Data Scientists
◦ What do they do?
Major/Concentration in Data Science
◦ What courses to take.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 12
Data All Around
Lots of data is being collected and warehoused
◦Web data, e-commerce
◦Financial transactions, bank/credit transactions
◦Online trading and purchasing
◦Social Network
13
How Much Data Do We have?
Google processes 20 PB a day (2008)
Facebook has 60 TB of daily logs
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
1000 genomes project: 200 TB
Cost of 1 TB of disk: $35
Time to read 1 TB disk: 3 hrs
(100 MB/s)
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 14
Big Data
Big Data is any data that is expensive to manage and hard to extract value
from
◦ Volume
◦ The size of the data
◦ Velocity
◦ The latency of data processing relative to the growing demand for interactivity
◦ Variety and Complexity
◦ the diversity of sources, formats, quality, structures.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 15
Big Data
vs
Data Science
vs
Data Analytics
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 16
What is Data Science?
Dealing with unstructured and structured data, Data Science is a
field that comprises everything that related to data cleansing,
preparation, and analysis.
Data Science is the combination of statistics, mathematics,
programming, problem-solving, capturing data in ingenious ways,
the ability to look at things differently, and the activity of cleansing,
preparing, and aligning the data.
In simple terms, it is the umbrella of techniques used when trying
to extract insights and information from data.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 17
What is Big Data?
Big Data refers to humongous volumes of data that cannot be processed effectively with
the traditional applications that exist. The processing of Big Data begins with the raw data
that isn’t aggregated and is most often impossible to store in the memory of a single
computer.
A buzzword that is used to describe immense volumes of data, both unstructured and
structured, Big Data inundates a business on a day-to-day basis. Big Data is something that
can be used to analyze insights that can lead to better decisions and strategic business
moves.
The definition of Big Data, given by Gartner, is, “Big data is high-volume, and high-velocity
or high-variety information assets that demand cost-effective, innovative forms of
information processing that enable enhanced insight, decision making, and process
automation.”
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 18
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 19
Big Data
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 20
What is Data Analytics?
Data Analytics the science of examining raw data to conclude that
information.
Data Analytics involves applying an algorithmic or mechanical process to
derive insights and, for example, running through several data sets to look for
meaningful correlations between each other.
It is used in several industries to allow organizations and companies to
make better decisions as well as verify and disprove existing theories or
models. The focus of Data Analytics lies in inference, which is the process of
deriving conclusions that are solely based on what the researcher already
knows.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 21
Types of Data We Have
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data
You can afford to scan the data once
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 22
What To Do With These Data?
Aggregation and Statistics
◦ Data warehousing and OLAP
Indexing, Searching, and Querying
◦ Keyword based search
◦ Pattern matching (XML/RDF)
Knowledge discovery
◦ Data Mining
◦ Statistical Modeling
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 23
Big Data and Data Science
“… the sexy job in the next 10 years will be statisticians,” Hal Varian, Google Chief
Economist
The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts
by 2018.
McKinsey Global Institute’s June 2011
India will be needing around 160,000+ Data Scientists by 2020 and World demand
predicted to be around 2.7million by 2020.
New Data Science institutes being created or repurposed – NYU, Columbia, Washington,
UCB,...
New degree programs, courses, boot-camps:
◦ e.g., at Berkeley: Stats, I-School, CS, Astronomy…
◦ One proposal (elsewhere) for an MS in “Big Data Science”
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 24
What is Data Science?
An area that manages, manipulates, extracts, and interprets knowledge from
tremendous amount of data.
Data science (DS) is a multidisciplinary field of study with goal to address the challenges
in big data.
Data science principles apply to all data – big and small.
Simply – Extraction of knowledge from large volumes of data that are structure or
unstructured.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 25
What is Data Science?
Theories and techniques from many fields and disciplines are used to
investigate and analyze a large amount of data to help decision makers in
many industries such as science, engineering, economics, politics, finance,
and education.
◦ Computer Science
◦ Pattern recognition, visualization, data warehousing, High performance computing,
Databases, AI
◦ Mathematics
◦ Mathematical Modeling
◦ Statistics
◦ Statistical and Stochastic modeling, Probability.
Mr. Dhruv Saxena, Asst. Professor (TEQIP-NPIU) 26
Why is it sexy?
Gartner’s 2014 Hype Cycle
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 27
Data Science
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 28
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 29
Real Life Examples
Companies learn your secrets, shopping patterns, and preferences
◦ For example, can we know if a woman is pregnant, even if she doesn’t want us to know?
Target case study
Data Science and election (2008, 2012)
◦ 1 million people installed the Obama Facebook app that gave access to info on “friends”
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 30
Applications of Data Science
Internet Search
Search engines make use of data science algorithms to deliver the best results for search queries
in a fraction of seconds.
Digital Advertisements
The entire digital marketing spectrum uses the data science algorithms - from display banners to
digital billboards. This is the mean reason for digital ads getting higher CTR than traditional
advertisements.
Recommender Systems
The recommender systems not only make it easy to find relevant products from billions of
products available but also adds a lot to user-experience. A lot of companies use this system to
promote their products and suggestions in accordance with the user’s demands and relevance of
information. The recommendations are based on the user’s previous search results.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 31
Big Data for Retail
Brick and Mortar or an online e-tailer, the answer to staying the
game and being competitive is understanding the customer better
to serve them. This requires the ability to analyze all the disparate
data sources that companies deal with every day, including the
weblogs, customer transaction data, social media, store-branded
credit card data, and loyalty program data.
32
Applications of Big Data
Big Data for Financial Services
Credit card companies, retail banks, private wealth management
advisories, insurance firms, venture funds, and institutional investment
banks use big data for their financial services. The common problem
among them all is the massive amounts of multi-structured data living
in multiple disparate systems, which can be solved by big data. Thus big
data is used in several ways like:
Customer analytics
Compliance analytics
Fraud analytics
Operational analytics
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 33
Big Data in Communications
Gaining new subscribers, retaining customers, and
expanding within current subscriber bases are top
priorities for telecommunication service providers. The
solutions to these challenges lie in the ability to combine
and analyze the masses of customer-generated data and
machine-generated data that is being created every day.
34
Applications of Data Analytics
Healthcare
The main challenge for hospitals with cost pressures tightens is to treat as many patients
as they can efficiently, keeping in mind the improvement of the quality of care. Instrument
and machine data are being used increasingly to track as well as optimize patient flow,
treatment, and equipment used in the hospitals. It is estimated that there will be a 1%
efficiency gain that could yield more than $63 billion in global healthcare savings.
Travel
Data analytics can optimize the buying experience through mobile/ weblog and social
media data analysis. Travel sights can gain insights into the customer’s desires and
preferences. Products can be up-sold by correlating the current sales to the subsequent
browsing increase browse-to-buy conversions via customized packages and offers.
Personalized travel recommendations can also be delivered by data analytics based on
social media data.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 35
Gaming
Data Analytics helps in collecting data to optimize and spend within as well as
across games. Game companies gain insight into the dislikes, the
relationships, and the likes of the users.
Energy Management
Most firms are using data analytics for energy management, including smart-
grid management, energy optimization, energy distribution, and building
automation in utility companies. The application here is centered on the
controlling and monitoring of network devices, dispatch crews, and manage
service outages. Utilities are given the ability to integrate millions of data
points in the network performance and lets the engineers use the analytics to
monitor the network.
36
Data Scientists
Data Scientist
◦ The Sexiest Job of the 21st Century
“They find stories, extract
knowledge. They are not reporters “
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 37
Data Scientists
Data scientists are the key to realizing the opportunities presented by big data. They bring
structure to it, find compelling patterns in it, and advise executives on the implications for
products, processes, and decisions
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 38
What do Data Scientists do?
National Security
Cyber Security
Business Analytics
Engineering
Healthcare
And more ….
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 39
Concentration in Data Science
Mathematics and Applied Mathematics
Applied Statistics/Data Analysis
Solid Programming Skills (R, Python, Julia, SQL)
Data Mining
Data Base Storage and Management
Machine Learning and discovery
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 40
Machine Learning
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 41
What is Machine Learning ?
Machine learning (ML) is the study of computer algorithms
that improve automatically through experience.
It is seen as a subset of artificial intelligence.
Machine learning algorithms build a mathematical model
based on sample data, known as "training data", in order to
make predictions or decisions without being explicitly
programmed to do so.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 42
What is Machine Learning ?
Machine learning algorithms are used in a wide variety of
applications, such as email filtering and computer vision,
where it is difficult or infeasible to develop conventional
algorithms to perform the needed tasks.
Machine learning is closely related to computational
statistics, which focuses on making predictions using
computers.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 43
Real-time applications
Video
44
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 45
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 46
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 47
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 48
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 49
NASSCOM Formative Assessments (Mid-training)
 Formative assessment of students shall be conducted for 100 marks and the test duration shall be
between 45-60 min.
Post training assessment and certification shall be conducted after the successful completion of
training.
Only those students who are Registered and Attending training on Future Skills shall be eligible for
mid-training and post-training assessment.
All assessments shall be conducted online and Auto Proctored through NASSCOM SSC.
The assessment results shall be shared within 3 working days with the SPOC of the institute.
Formative Assessment scores are independent and shall not be counted in the final assessment
scores for certification.
Tentative Date – 16th August 2020
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 50
NASSCOM Formative Assessment
Syllabus for Data Sci. & Analytics
Module
No. of
Questions
Type of
Questions
Indicative
Time/Module
Marks
Introduction to
Data Science
2
MCQ & DC 2 min 6
Mathematical
Foundations
18
MCQ, DC &
ScB
20 min 44
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 51
Multiple Choice
Questions
MCQ
In this type of question, the candidate is asked to choose one or more
responses from a limited list of choices. It also includes True/ False
questions(T/F) depending on the level of difficulty.
Scenario based ScB
This question asks the candidate to describe how they might respond
to a hypothetical situation.
Direct Concept DC
This type of question revolves around the concept that particular subject
deals with. The candidate would be asked a direct question pertaining
to the concept of that particular subject. This can be an MCQ or Fill in
the Blank or Multiple Response
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 52
Next Lecture
Mathematical Foundations
Introduction & Syllabus
Linear Algebra – Vectors & Matrices
53Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU)
Mr. Dhruv Saxena
Asst. Professor (TEQIP-NPIU)54

More Related Content

What's hot

An introduction to Business intelligence
An introduction to Business intelligenceAn introduction to Business intelligence
An introduction to Business intelligenceHadi Fadlallah
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Data visualization introduction
Data visualization introductionData visualization introduction
Data visualization introductionManokamnaKochar1
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data AnalyticsUtkarsh Sharma
 
What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?Bernard Marr
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics Incorta
 
Tableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesTableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesBurn & Born
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data ScienceMaloy Manna, PMP®
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Arun K
 

What's hot (20)

Data analytics
Data analyticsData analytics
Data analytics
 
An introduction to Business intelligence
An introduction to Business intelligenceAn introduction to Business intelligence
An introduction to Business intelligence
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Data visualization introduction
Data visualization introductionData visualization introduction
Data visualization introduction
 
5desc
5desc5desc
5desc
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics
 
Tableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesTableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, Disadvantages
 
Tableau ppt
Tableau pptTableau ppt
Tableau ppt
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2
 
Ppt
PptPpt
Ppt
 

Similar to Introduction to Data Science and Analytics

Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. maigva
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Datahemayadav41
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing servicePhd Assistance
 
DSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdfDSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdfBizuayehuDesalegn
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
 

Similar to Introduction to Data Science and Analytics (20)

Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
BIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.pptBIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.ppt
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Data
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing service
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
DSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdfDSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdf
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
BIG DATA.ppt
BIG DATA.pptBIG DATA.ppt
BIG DATA.ppt
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 

More from Dhruv Saxena

Disaster Management Course Objectives
Disaster Management Course ObjectivesDisaster Management Course Objectives
Disaster Management Course ObjectivesDhruv Saxena
 
Disaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangementDisaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangementDhruv Saxena
 
Disaster Preparedness
Disaster PreparednessDisaster Preparedness
Disaster PreparednessDhruv Saxena
 
Disaster Management Introduction & Classification
Disaster Management Introduction & ClassificationDisaster Management Introduction & Classification
Disaster Management Introduction & ClassificationDhruv Saxena
 
Hazards in Textile processing Industries
Hazards in Textile processing IndustriesHazards in Textile processing Industries
Hazards in Textile processing IndustriesDhruv Saxena
 
Drought - Disaster management
Drought - Disaster managementDrought - Disaster management
Drought - Disaster managementDhruv Saxena
 
Cloudburst | Disaster Management
Cloudburst | Disaster ManagementCloudburst | Disaster Management
Cloudburst | Disaster ManagementDhruv Saxena
 
Small bore system: Wastewater Engineering
Small bore system: Wastewater EngineeringSmall bore system: Wastewater Engineering
Small bore system: Wastewater EngineeringDhruv Saxena
 

More from Dhruv Saxena (8)

Disaster Management Course Objectives
Disaster Management Course ObjectivesDisaster Management Course Objectives
Disaster Management Course Objectives
 
Disaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangementDisaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangement
 
Disaster Preparedness
Disaster PreparednessDisaster Preparedness
Disaster Preparedness
 
Disaster Management Introduction & Classification
Disaster Management Introduction & ClassificationDisaster Management Introduction & Classification
Disaster Management Introduction & Classification
 
Hazards in Textile processing Industries
Hazards in Textile processing IndustriesHazards in Textile processing Industries
Hazards in Textile processing Industries
 
Drought - Disaster management
Drought - Disaster managementDrought - Disaster management
Drought - Disaster management
 
Cloudburst | Disaster Management
Cloudburst | Disaster ManagementCloudburst | Disaster Management
Cloudburst | Disaster Management
 
Small bore system: Wastewater Engineering
Small bore system: Wastewater EngineeringSmall bore system: Wastewater Engineering
Small bore system: Wastewater Engineering
 

Recently uploaded

Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excelKapilSidhpuria3
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligencePriyadharshiniG41
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdfDSP Mutual Fund
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbas73678sri
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachAdekunleJoseph4
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvws73678sri
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 

Recently uploaded (20)

Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excel
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligence
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approach
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 

Introduction to Data Science and Analytics

  • 1. NASSCOM Future Skills Training Course – Data Science & Analytics Dhruv Saxena Assistant Professor (TEQIP-NPIU) 1
  • 2. 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. Introduction to Data Science Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 8
  • 9.
  • 10. OBJECTIVES The objective of this course is to Impart necessary knowledge of the mathematical foundations needed for data science and develop programming skills required to build data science applications. Duration – 60 Hours (40L + 20C) Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 10
  • 11. LEARNING OUTCOMES At the end of this course, the students will be able to: ● Demonstrate understanding of the mathematical foundations needed for data science. ● Collect, explore, clean, munge and manipulate data. ● Implement models such as k-nearest Neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks and clustering. ● Build data science applications using Python based toolkits. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 11
  • 12. Data, Big Data and Challenges Data Science ◦ Introduction ◦ Why Data Science Data Scientists ◦ What do they do? Major/Concentration in Data Science ◦ What courses to take. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 12
  • 13. Data All Around Lots of data is being collected and warehoused ◦Web data, e-commerce ◦Financial transactions, bank/credit transactions ◦Online trading and purchasing ◦Social Network 13
  • 14. How Much Data Do We have? Google processes 20 PB a day (2008) Facebook has 60 TB of daily logs eBay has 6.5 PB of user data + 50 TB/day (5/2009) 1000 genomes project: 200 TB Cost of 1 TB of disk: $35 Time to read 1 TB disk: 3 hrs (100 MB/s) Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 14
  • 15. Big Data Big Data is any data that is expensive to manage and hard to extract value from ◦ Volume ◦ The size of the data ◦ Velocity ◦ The latency of data processing relative to the growing demand for interactivity ◦ Variety and Complexity ◦ the diversity of sources, formats, quality, structures. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 15
  • 16. Big Data vs Data Science vs Data Analytics Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 16
  • 17. What is Data Science? Dealing with unstructured and structured data, Data Science is a field that comprises everything that related to data cleansing, preparation, and analysis. Data Science is the combination of statistics, mathematics, programming, problem-solving, capturing data in ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing, and aligning the data. In simple terms, it is the umbrella of techniques used when trying to extract insights and information from data. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 17
  • 18. What is Big Data? Big Data refers to humongous volumes of data that cannot be processed effectively with the traditional applications that exist. The processing of Big Data begins with the raw data that isn’t aggregated and is most often impossible to store in the memory of a single computer. A buzzword that is used to describe immense volumes of data, both unstructured and structured, Big Data inundates a business on a day-to-day basis. Big Data is something that can be used to analyze insights that can lead to better decisions and strategic business moves. The definition of Big Data, given by Gartner, is, “Big data is high-volume, and high-velocity or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 18
  • 19. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 19
  • 20. Big Data Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 20
  • 21. What is Data Analytics? Data Analytics the science of examining raw data to conclude that information. Data Analytics involves applying an algorithmic or mechanical process to derive insights and, for example, running through several data sets to look for meaningful correlations between each other. It is used in several industries to allow organizations and companies to make better decisions as well as verify and disprove existing theories or models. The focus of Data Analytics lies in inference, which is the process of deriving conclusions that are solely based on what the researcher already knows. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 21
  • 22. Types of Data We Have Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can afford to scan the data once Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 22
  • 23. What To Do With These Data? Aggregation and Statistics ◦ Data warehousing and OLAP Indexing, Searching, and Querying ◦ Keyword based search ◦ Pattern matching (XML/RDF) Knowledge discovery ◦ Data Mining ◦ Statistical Modeling Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 23
  • 24. Big Data and Data Science “… the sexy job in the next 10 years will be statisticians,” Hal Varian, Google Chief Economist The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts by 2018. McKinsey Global Institute’s June 2011 India will be needing around 160,000+ Data Scientists by 2020 and World demand predicted to be around 2.7million by 2020. New Data Science institutes being created or repurposed – NYU, Columbia, Washington, UCB,... New degree programs, courses, boot-camps: ◦ e.g., at Berkeley: Stats, I-School, CS, Astronomy… ◦ One proposal (elsewhere) for an MS in “Big Data Science” Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 24
  • 25. What is Data Science? An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data. Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data. Data science principles apply to all data – big and small. Simply – Extraction of knowledge from large volumes of data that are structure or unstructured. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 25
  • 26. What is Data Science? Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education. ◦ Computer Science ◦ Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI ◦ Mathematics ◦ Mathematical Modeling ◦ Statistics ◦ Statistical and Stochastic modeling, Probability. Mr. Dhruv Saxena, Asst. Professor (TEQIP-NPIU) 26
  • 27. Why is it sexy? Gartner’s 2014 Hype Cycle Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 27
  • 28. Data Science Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 28
  • 29. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 29
  • 30. Real Life Examples Companies learn your secrets, shopping patterns, and preferences ◦ For example, can we know if a woman is pregnant, even if she doesn’t want us to know? Target case study Data Science and election (2008, 2012) ◦ 1 million people installed the Obama Facebook app that gave access to info on “friends” Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 30
  • 31. Applications of Data Science Internet Search Search engines make use of data science algorithms to deliver the best results for search queries in a fraction of seconds. Digital Advertisements The entire digital marketing spectrum uses the data science algorithms - from display banners to digital billboards. This is the mean reason for digital ads getting higher CTR than traditional advertisements. Recommender Systems The recommender systems not only make it easy to find relevant products from billions of products available but also adds a lot to user-experience. A lot of companies use this system to promote their products and suggestions in accordance with the user’s demands and relevance of information. The recommendations are based on the user’s previous search results. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 31
  • 32. Big Data for Retail Brick and Mortar or an online e-tailer, the answer to staying the game and being competitive is understanding the customer better to serve them. This requires the ability to analyze all the disparate data sources that companies deal with every day, including the weblogs, customer transaction data, social media, store-branded credit card data, and loyalty program data. 32
  • 33. Applications of Big Data Big Data for Financial Services Credit card companies, retail banks, private wealth management advisories, insurance firms, venture funds, and institutional investment banks use big data for their financial services. The common problem among them all is the massive amounts of multi-structured data living in multiple disparate systems, which can be solved by big data. Thus big data is used in several ways like: Customer analytics Compliance analytics Fraud analytics Operational analytics Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 33
  • 34. Big Data in Communications Gaining new subscribers, retaining customers, and expanding within current subscriber bases are top priorities for telecommunication service providers. The solutions to these challenges lie in the ability to combine and analyze the masses of customer-generated data and machine-generated data that is being created every day. 34
  • 35. Applications of Data Analytics Healthcare The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of the quality of care. Instrument and machine data are being used increasingly to track as well as optimize patient flow, treatment, and equipment used in the hospitals. It is estimated that there will be a 1% efficiency gain that could yield more than $63 billion in global healthcare savings. Travel Data analytics can optimize the buying experience through mobile/ weblog and social media data analysis. Travel sights can gain insights into the customer’s desires and preferences. Products can be up-sold by correlating the current sales to the subsequent browsing increase browse-to-buy conversions via customized packages and offers. Personalized travel recommendations can also be delivered by data analytics based on social media data. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 35
  • 36. Gaming Data Analytics helps in collecting data to optimize and spend within as well as across games. Game companies gain insight into the dislikes, the relationships, and the likes of the users. Energy Management Most firms are using data analytics for energy management, including smart- grid management, energy optimization, energy distribution, and building automation in utility companies. The application here is centered on the controlling and monitoring of network devices, dispatch crews, and manage service outages. Utilities are given the ability to integrate millions of data points in the network performance and lets the engineers use the analytics to monitor the network. 36
  • 37. Data Scientists Data Scientist ◦ The Sexiest Job of the 21st Century “They find stories, extract knowledge. They are not reporters “ Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 37
  • 38. Data Scientists Data scientists are the key to realizing the opportunities presented by big data. They bring structure to it, find compelling patterns in it, and advise executives on the implications for products, processes, and decisions Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 38
  • 39. What do Data Scientists do? National Security Cyber Security Business Analytics Engineering Healthcare And more …. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 39
  • 40. Concentration in Data Science Mathematics and Applied Mathematics Applied Statistics/Data Analysis Solid Programming Skills (R, Python, Julia, SQL) Data Mining Data Base Storage and Management Machine Learning and discovery Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 40
  • 41. Machine Learning Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 41
  • 42. What is Machine Learning ? Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 42
  • 43. What is Machine Learning ? Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 43
  • 45. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 45
  • 46. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 46
  • 47. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 47
  • 48. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 48
  • 49. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 49
  • 50. NASSCOM Formative Assessments (Mid-training)  Formative assessment of students shall be conducted for 100 marks and the test duration shall be between 45-60 min. Post training assessment and certification shall be conducted after the successful completion of training. Only those students who are Registered and Attending training on Future Skills shall be eligible for mid-training and post-training assessment. All assessments shall be conducted online and Auto Proctored through NASSCOM SSC. The assessment results shall be shared within 3 working days with the SPOC of the institute. Formative Assessment scores are independent and shall not be counted in the final assessment scores for certification. Tentative Date – 16th August 2020 Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 50
  • 51. NASSCOM Formative Assessment Syllabus for Data Sci. & Analytics Module No. of Questions Type of Questions Indicative Time/Module Marks Introduction to Data Science 2 MCQ & DC 2 min 6 Mathematical Foundations 18 MCQ, DC & ScB 20 min 44 Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 51
  • 52. Multiple Choice Questions MCQ In this type of question, the candidate is asked to choose one or more responses from a limited list of choices. It also includes True/ False questions(T/F) depending on the level of difficulty. Scenario based ScB This question asks the candidate to describe how they might respond to a hypothetical situation. Direct Concept DC This type of question revolves around the concept that particular subject deals with. The candidate would be asked a direct question pertaining to the concept of that particular subject. This can be an MCQ or Fill in the Blank or Multiple Response Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 52
  • 53. Next Lecture Mathematical Foundations Introduction & Syllabus Linear Algebra – Vectors & Matrices 53Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU)
  • 54. Mr. Dhruv Saxena Asst. Professor (TEQIP-NPIU)54