SlideShare a Scribd company logo
1 of 54
Mr.M.NOHAN M.TECH,(Ph.D)
Assistant Professor / CSE
Madanapalle Institute of Technology and Science
Python for Data Science
Laboratory – 20CSE604
Introduction
Data Vs Information
 Data is a collection of facts, while information
puts those facts into context.
 While data is raw and unorganized, information is
organized.
 Data points are individual and sometimes
unrelated. Information maps out that data to
provide a big-picture view of how it all fits
together.
Database Vs Data warehouse
 A database is any collection of data organized for
storage, accessibility, and retrieval.
Example: Employee Database
 A data warehouse is a type of database the
integrates copies of transaction data from
disparate source systems and provisions them for
analytical use.
Example: TCS data warehouse that
integrates from their multiple location databases
Big Data
 It is huge, large, or voluminous data, information,
or the relevant statistics acquired by large
organizations and ventures.
 Many software and data storages is created and
prepared as it is difficult to compute the big data
manually. It is used to discover patterns and
trends and make decisions related to human
behavior and interaction technology.
Data Science
 Data Science: Data Science is a field or domain
which includes and involves working with a huge
amount of data and using it for building
predictive, prescriptive, and prescriptive
analytical models.
 It’s about digging, capturing, (building the model)
analyzing (validating the model), and utilizing the
data (deploying the best model). It is an
intersection of Data and computing.
 It is a blend of the field of Computer Science,
Data Science Vs Big Data
Python for Data Science
Laboratory – 20CSE604
Course Objectives
 To train the students in solving computational
problems.
 To elucidate solving mathematical problems
using Python programming language.
 To understand the fundamentals of Python
programming concepts and its applications.
 Practical understanding of building different
types of models and their evaluation.
Syllabus
UNIT I - INTRODUCTION TO DATA
SCIENCE
 Introduction to Data Science and its
importance - Data Science and Big data-, The
life cycle of Data Science- The Art of Data
Science - Work with data – data Cleaning,
data Managing, data manipulation.
Establishing computational environments for
data scientists using Python with IPython and
Jupyter.
UNIT I - INTRODUCTION TO DATA
SCIENCE
 Launch the IPython shell and the Jupyter notebook.
 Write a python script to control the behaviour of
IPython using magic commands.
Create a file called hello.py
 Replace the missing values with the expected, or
mean income of custdata dataset.
 Import data in python.
UNIT II INTRODUCTION TO
NUMPY
 NumPy Basics: Arrays and Vectorized
Computation- The NumPy ndarray- Creating
ndarrays- Data Types for ndarrays- Arithmetic
with NumPy Arrays- Basic Indexing and
Slicing - Boolean Indexing-Transposing Arrays
and Swapping Axes. Universal Functions:
Fast Element-Wise Array Functions-
Mathematical and Statistical Methods-
SortingUnique and Other Set Logic.
UNIT II INTRODUCTION TO
NUMPY
 Create NumPy arrays from Python Data Structures,
Intrinsic NumPy objects and Random Functions.
 Manipulation of NumPy arrays- Indexing, Slicing,
Reshaping, Joining and Splitting.
 Computation on NumPy arrays using Universal
Functions and Mathematical methods.
 Import a CSV file and perform various Statistical and
Comparison operations on rows/columns.
 Load an image file and do crop and flip operation using
NumPy Indexing.
 Write a program to compute summary statistics such as
mean, median, mode, standard deviation and variance
of the given different types of data.
UNIT III DATA MANIPULATION
WITH PYTHON
 Introduction to pandas Data Structures:
Series, DataFrame, Essential Functionality:
Dropping Entries Indexing, Selection, and
Filtering- Function Application and Mapping-
Sorting and Ranking. Summarizing and
Computing Descriptive Statistics- Unique
Values, Value Counts, and Membership.
Reading and Writing Data in Text Format.
UNIT III DATA MANIPULATION
WITH PYTHON
a. Create Pandas Series and DataFrame from
various inputs.
b. Import any CSV file to Pandas DataFrame and
perform the following:
 Visualize the first and last 10 records
 Get the shape, index and column details.
 Select/Delete the records(rows)/columns based
on conditions.
 Perform ranking and sorting operations.
 Do required statistical operations on the given
columns.
 Find the count and uniqueness of the given
categorical values.
UNIT IV DATA CLEANING,
PREPARATION AND
VISUALIZATION
 Data Cleaning and Preparation: Handling
Missing Data - Data Transformation:
Removing Duplicates, Transforming Data
Using a Function or Mapping, Replacing
Values, Detecting and Filtering Outliers- String
Manipulation: Vectorized String Functions in
pandas. Plotting with pandas: Line Plots, Bar
Plots, Histograms and Density Plots, Scatter
or Point Plots.
UNIT IV DATA CLEANING,
PREPARATION AND
VISUALIZATION
a. Import any CSV file to Pandas DataFrame
and perform the following:
 Handle missing data by detecting and
dropping/ filling missing values.
 Transform data using apply() and map()
method.
 Detect and filter outliers.
 Perform Vectorized String operations on
Pandas Series.
 Visualize data using Line Plots, Bar Plots,
Histograms, Density Plots and Scatter Plots.
UNIT V MACHINE LEARNING
USING PYTHON
 Introduction Machine Learning: Categories of
Machine Learning algorithms, Dimensionality
reduction-Introducing ScikitApplication:
Exploring Hand-written Digits. Feature
EngineeringNaive Bayes Classification -
Linear Regression - kMeans Clustering.
UNIT V MACHINE LEARNING
USING PYTHON
 Write a program to demonstrate Linear
Regression analysis with residual plots on a given
data set.
 Write a program to implement the Naïve Bayesian
classifier for a sample training data set stored as
a .CSV file. Compute the accuracy of the
classifier, considering few test data sets.
 Write a program to implement k-Nearest
Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions using
Python ML library classes.
 Write a program to implement k-Means clustering
algorithm to cluster the set of data stored in .CSV
file. Compare the results of various “k” values for
Text Book(s)
 Wes McKinney, “Python for Data Analysis:
Data Wrangling with Pandas, NumPy, and
IPython”, O’Reilly, 2nd Edition,2018.
 Jake VanderPlas, “Python Data Science
Handbook: Essential Tools for Working with
Data”, O’Reilly, 2017.
Reference Books
 Y. Daniel Liang, “Introduction to Programming
using Python”, Pearson,2012.
 Francois Chollet, Deep Learning with Python, 1/e,
Manning Publications Company, 2017.
 Peter Wentworth, Jeffrey Elkner, Allen B. Downey
and Chris Meyers, “How to Think Like a
Computer Scientist: Learning with Python 3”, 3rd
edition, Available at
 https://www.ict.ru.ac.za/Resources/cspw/thinkcsp
y3/thinkcspy3.pdf
 Paul Barry, “Head First Python a Brain Friendly
Guide” 2nd Edition, O’Reilly, 2016 4. Dainel
Y.Chen “Pandas for Everyone Python Data
Analysis” Pearson Education, 2019
Introduction for Data Science
Outline
 Data, Big Data and Challenges
 Data Science
 Introduction
 Why Data Science
 Data Scientists
 What do they do?
 Major/Concentration in Data Science
 What courses to take.
Data All Around
 Lots of data is being collected
and warehoused
 Web data, e-commerce
 Financial transactions, bank/credit transactions
 Online trading and purchasing
 Social Network
How Much Data Do We have?
 Google processes 20 PB a day (2008)
 Facebook has 60 TB of daily logs
 eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
 1000 genomes project: 200 TB
 Cost of 1 TB of disk: $35
 Time to read 1 TB disk: 3 hrs
(100 MB/s)
Big Data
Big Data is any data that is expensive to manage and
hard to extract value from
 Volume
 The size of the data
 Velocity
 The latency of data processing relative to the growing demand for
interactivity
 Variety and Complexity
 the diversity of sources, formats, quality, structures.
Big Data
Types of Data We Have
 Relational Data (Tables/Transaction/Legacy Data)
 Text Data (Web)
 Semi-structured Data (XML)
 Graph Data
 Social Network, Semantic Web (RDF), …
 Streaming Data
 You can afford to scan the data once
What To Do With These Data?
 Aggregation and Statistics
 Data warehousing and OLAP
 Indexing, Searching, and Querying
 Keyword based search
 Pattern matching (XML/RDF)
 Knowledge discovery
 Data Mining
 Statistical Modeling
Big Data and Data Science
 “… the sexy job in the next 10 years will be
statisticians,” Hal Varian, Google Chief Economist
 The U.S. will need 140,000-190,000 predictive
analysts and 1.5 million managers/analysts by
2018. McKinsey Global Institute’s June 2011
 New Data Science institutes being created or
repurposed – NYU, Columbia, Washington,
UCB,...
 New degree programs, courses, boot-camps:
 e.g., at Berkeley: Stats, I-School, CS, Astronomy…
 One proposal (elsewhere) for an MS in “Big Data
Science”
What is Data Science?
 An area that manages, manipulates, extracts, and
interprets knowledge from tremendous amount of
data
 Data science (DS) is a multidisciplinary field of
study with goal to address the challenges in big
data
 Data science principles apply to all data – big and
small
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
What is Data Science?
 Theories and techniques from many fields and
disciplines are used to investigate and
analyze a large amount of data to help
decision makers in many industries such as
science, engineering, economics, politics,
finance, and education
 Computer Science
 Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
 Mathematics
 Mathematical Modeling
 Statistics
 Statistical and Stochastic modeling, Probability.
Why is it sexy?
 Gartner’s 2014 Hype Cycle
Data Science
Data Science
Real Life Examples
 Companies learn your secrets, shopping patterns,
and preferences
 For example, can we know if a woman is pregnant,
even if she doesn’t want us to know? Target case
study
 Data Science and election (2008, 2012)
 1 million people installed the Obama Facebook app
that gave access to info on “friends”
Data Scientists
 Data Scientist
 The Sexiest Job of the 21st Century
 They find stories, extract knowledge. They are not
reporters
Data Scientists
 Data scientists are the key to realizing the
opportunities presented by big data. They
bring structure to it, find compelling patterns in it,
and advise executives on the implications for
products, processes, and decisions
What do Data Scientists
do?
 National Security
 Cyber Security
 Business Analytics
 Engineering
 Healthcare
 And more ….
Concentration in Data Science
 Mathematics and Applied Mathematics
 Applied Statistics/Data Analysis
 Solid Programming Skills (R, Python, Julia, SQL)
 Data Mining
 Data Base Storage and Management
 Machine Learning and discovery
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt

More Related Content

Similar to PDS Unit - 1 Introdiction to DS.ppt

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Fernando de Assis Rodrigues
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Templatebutest
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxvipulkondekar
 
An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsIRJET Journal
 

Similar to PDS Unit - 1 Introdiction to DS.ppt (20)

Lecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptxLecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptx
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data Analytics
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

PDS Unit - 1 Introdiction to DS.ppt

  • 1. Mr.M.NOHAN M.TECH,(Ph.D) Assistant Professor / CSE Madanapalle Institute of Technology and Science Python for Data Science Laboratory – 20CSE604
  • 3. Data Vs Information  Data is a collection of facts, while information puts those facts into context.  While data is raw and unorganized, information is organized.  Data points are individual and sometimes unrelated. Information maps out that data to provide a big-picture view of how it all fits together.
  • 4. Database Vs Data warehouse  A database is any collection of data organized for storage, accessibility, and retrieval. Example: Employee Database  A data warehouse is a type of database the integrates copies of transaction data from disparate source systems and provisions them for analytical use. Example: TCS data warehouse that integrates from their multiple location databases
  • 5. Big Data  It is huge, large, or voluminous data, information, or the relevant statistics acquired by large organizations and ventures.  Many software and data storages is created and prepared as it is difficult to compute the big data manually. It is used to discover patterns and trends and make decisions related to human behavior and interaction technology.
  • 6. Data Science  Data Science: Data Science is a field or domain which includes and involves working with a huge amount of data and using it for building predictive, prescriptive, and prescriptive analytical models.  It’s about digging, capturing, (building the model) analyzing (validating the model), and utilizing the data (deploying the best model). It is an intersection of Data and computing.  It is a blend of the field of Computer Science,
  • 7. Data Science Vs Big Data
  • 8. Python for Data Science Laboratory – 20CSE604
  • 9. Course Objectives  To train the students in solving computational problems.  To elucidate solving mathematical problems using Python programming language.  To understand the fundamentals of Python programming concepts and its applications.  Practical understanding of building different types of models and their evaluation.
  • 11. UNIT I - INTRODUCTION TO DATA SCIENCE  Introduction to Data Science and its importance - Data Science and Big data-, The life cycle of Data Science- The Art of Data Science - Work with data – data Cleaning, data Managing, data manipulation. Establishing computational environments for data scientists using Python with IPython and Jupyter.
  • 12. UNIT I - INTRODUCTION TO DATA SCIENCE  Launch the IPython shell and the Jupyter notebook.  Write a python script to control the behaviour of IPython using magic commands. Create a file called hello.py  Replace the missing values with the expected, or mean income of custdata dataset.  Import data in python.
  • 13. UNIT II INTRODUCTION TO NUMPY  NumPy Basics: Arrays and Vectorized Computation- The NumPy ndarray- Creating ndarrays- Data Types for ndarrays- Arithmetic with NumPy Arrays- Basic Indexing and Slicing - Boolean Indexing-Transposing Arrays and Swapping Axes. Universal Functions: Fast Element-Wise Array Functions- Mathematical and Statistical Methods- SortingUnique and Other Set Logic.
  • 14. UNIT II INTRODUCTION TO NUMPY  Create NumPy arrays from Python Data Structures, Intrinsic NumPy objects and Random Functions.  Manipulation of NumPy arrays- Indexing, Slicing, Reshaping, Joining and Splitting.  Computation on NumPy arrays using Universal Functions and Mathematical methods.  Import a CSV file and perform various Statistical and Comparison operations on rows/columns.  Load an image file and do crop and flip operation using NumPy Indexing.  Write a program to compute summary statistics such as mean, median, mode, standard deviation and variance of the given different types of data.
  • 15. UNIT III DATA MANIPULATION WITH PYTHON  Introduction to pandas Data Structures: Series, DataFrame, Essential Functionality: Dropping Entries Indexing, Selection, and Filtering- Function Application and Mapping- Sorting and Ranking. Summarizing and Computing Descriptive Statistics- Unique Values, Value Counts, and Membership. Reading and Writing Data in Text Format.
  • 16. UNIT III DATA MANIPULATION WITH PYTHON a. Create Pandas Series and DataFrame from various inputs. b. Import any CSV file to Pandas DataFrame and perform the following:  Visualize the first and last 10 records  Get the shape, index and column details.  Select/Delete the records(rows)/columns based on conditions.  Perform ranking and sorting operations.  Do required statistical operations on the given columns.  Find the count and uniqueness of the given categorical values.
  • 17. UNIT IV DATA CLEANING, PREPARATION AND VISUALIZATION  Data Cleaning and Preparation: Handling Missing Data - Data Transformation: Removing Duplicates, Transforming Data Using a Function or Mapping, Replacing Values, Detecting and Filtering Outliers- String Manipulation: Vectorized String Functions in pandas. Plotting with pandas: Line Plots, Bar Plots, Histograms and Density Plots, Scatter or Point Plots.
  • 18. UNIT IV DATA CLEANING, PREPARATION AND VISUALIZATION a. Import any CSV file to Pandas DataFrame and perform the following:  Handle missing data by detecting and dropping/ filling missing values.  Transform data using apply() and map() method.  Detect and filter outliers.  Perform Vectorized String operations on Pandas Series.  Visualize data using Line Plots, Bar Plots, Histograms, Density Plots and Scatter Plots.
  • 19. UNIT V MACHINE LEARNING USING PYTHON  Introduction Machine Learning: Categories of Machine Learning algorithms, Dimensionality reduction-Introducing ScikitApplication: Exploring Hand-written Digits. Feature EngineeringNaive Bayes Classification - Linear Regression - kMeans Clustering.
  • 20. UNIT V MACHINE LEARNING USING PYTHON  Write a program to demonstrate Linear Regression analysis with residual plots on a given data set.  Write a program to implement the Naïve Bayesian classifier for a sample training data set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.  Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print both correct and wrong predictions using Python ML library classes.  Write a program to implement k-Means clustering algorithm to cluster the set of data stored in .CSV file. Compare the results of various “k” values for
  • 21. Text Book(s)  Wes McKinney, “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”, O’Reilly, 2nd Edition,2018.  Jake VanderPlas, “Python Data Science Handbook: Essential Tools for Working with Data”, O’Reilly, 2017.
  • 22. Reference Books  Y. Daniel Liang, “Introduction to Programming using Python”, Pearson,2012.  Francois Chollet, Deep Learning with Python, 1/e, Manning Publications Company, 2017.  Peter Wentworth, Jeffrey Elkner, Allen B. Downey and Chris Meyers, “How to Think Like a Computer Scientist: Learning with Python 3”, 3rd edition, Available at  https://www.ict.ru.ac.za/Resources/cspw/thinkcsp y3/thinkcspy3.pdf  Paul Barry, “Head First Python a Brain Friendly Guide” 2nd Edition, O’Reilly, 2016 4. Dainel Y.Chen “Pandas for Everyone Python Data Analysis” Pearson Education, 2019
  • 24. Outline  Data, Big Data and Challenges  Data Science  Introduction  Why Data Science  Data Scientists  What do they do?  Major/Concentration in Data Science  What courses to take.
  • 25. Data All Around  Lots of data is being collected and warehoused  Web data, e-commerce  Financial transactions, bank/credit transactions  Online trading and purchasing  Social Network
  • 26. How Much Data Do We have?  Google processes 20 PB a day (2008)  Facebook has 60 TB of daily logs  eBay has 6.5 PB of user data + 50 TB/day (5/2009)  1000 genomes project: 200 TB  Cost of 1 TB of disk: $35  Time to read 1 TB disk: 3 hrs (100 MB/s)
  • 27. Big Data Big Data is any data that is expensive to manage and hard to extract value from  Volume  The size of the data  Velocity  The latency of data processing relative to the growing demand for interactivity  Variety and Complexity  the diversity of sources, formats, quality, structures.
  • 29. Types of Data We Have  Relational Data (Tables/Transaction/Legacy Data)  Text Data (Web)  Semi-structured Data (XML)  Graph Data  Social Network, Semantic Web (RDF), …  Streaming Data  You can afford to scan the data once
  • 30. What To Do With These Data?  Aggregation and Statistics  Data warehousing and OLAP  Indexing, Searching, and Querying  Keyword based search  Pattern matching (XML/RDF)  Knowledge discovery  Data Mining  Statistical Modeling
  • 31. Big Data and Data Science  “… the sexy job in the next 10 years will be statisticians,” Hal Varian, Google Chief Economist  The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts by 2018. McKinsey Global Institute’s June 2011  New Data Science institutes being created or repurposed – NYU, Columbia, Washington, UCB,...  New degree programs, courses, boot-camps:  e.g., at Berkeley: Stats, I-School, CS, Astronomy…  One proposal (elsewhere) for an MS in “Big Data Science”
  • 32. What is Data Science?  An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data  Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data  Data science principles apply to all data – big and small https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
  • 33. What is Data Science?  Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education  Computer Science  Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI  Mathematics  Mathematical Modeling  Statistics  Statistical and Stochastic modeling, Probability.
  • 34. Why is it sexy?  Gartner’s 2014 Hype Cycle
  • 37. Real Life Examples  Companies learn your secrets, shopping patterns, and preferences  For example, can we know if a woman is pregnant, even if she doesn’t want us to know? Target case study  Data Science and election (2008, 2012)  1 million people installed the Obama Facebook app that gave access to info on “friends”
  • 38. Data Scientists  Data Scientist  The Sexiest Job of the 21st Century  They find stories, extract knowledge. They are not reporters
  • 39. Data Scientists  Data scientists are the key to realizing the opportunities presented by big data. They bring structure to it, find compelling patterns in it, and advise executives on the implications for products, processes, and decisions
  • 40. What do Data Scientists do?  National Security  Cyber Security  Business Analytics  Engineering  Healthcare  And more ….
  • 41. Concentration in Data Science  Mathematics and Applied Mathematics  Applied Statistics/Data Analysis  Solid Programming Skills (R, Python, Julia, SQL)  Data Mining  Data Base Storage and Management  Machine Learning and discovery