Data Science
Muhammad Suleman Memon
Assistant Professor
Department of Information Technology,
Dadu Campus,
University of Sindh
What is
Data
Science?
Data science is the domain of
study that deals with vast
volumes.
Find unseen patterns, derive
meaningful information, and
make business decisions.
Data science uses complex
machine learning algorithms to
build predictive models.
Data Science
Applications
Sources of the Data
Data Science Lifecycle
Prerequisites
for Data
Science
1. Machine Learning
2. Modeling
3. Statistics
4. Programming
5. Databases
Who Oversees the Data Science Process?
• Business Managers
• To collaborate with the data science team to characterize the problem and
establish an analytical method.
• IT Managers
• Developing the infrastructure and architecture to enable data science
activities.
• Data Science Managers
• Supervise the working procedures of all data science team members.
• They also manage and keep track of the day-to-day activities of the three data
science teams.
What is a
Data
Scientist?
professionals who have the technical ability
to handle complicated issues as well as the
desire to investigate what questions need to
be answered.
They're a mix of mathematicians, computer
scientists, and trend forecasters.
They're also in high demand and well-paid
because they work in both the business and
IT sectors.
On a daily
basis, a data
scientist
may do the
following
tasks:
Discover patterns and
trends in datasets to get
insights.
Create forecasting
algorithms and data
models.
Improve the quality of
data or product offerings
by utilising machine
learning techniques.
Distribute suggestions to
other teams and top
management.
In data analysis, use data
tools such as R, SAS,
Python, or SQL.
Top the field of data
science innovations.
What Does a
Data Scientist
Do?
Determine the
problem.
Determines the
correct set of
variables and
datasets.
Gather structured
and unstructured
data from many
sources.
Convert raw data
into a suitable
format.
Apply ML
algorithms.
Interpret the data to
find opportunities
and solutions.
Prepare the
results and
insights to share
with stake
holders.
Why Become
a Data
Scientist?
• According to Glassdoor and Forbes,
demand for data scientists will
increase by 28 percent by 2026,
which speaks of the profession’s
durability and longevity, so if you
want a secure career, data science
offers you that chance.
Use of Data
Science
1. Data science may detect patterns in seemingly
unstructured or unconnected data, allowing
conclusions and predictions to be made.
2. Tech businesses that acquire user data can
utilize strategies to transform that data into
valuable or profitable information.
3. Data Science has also made inroads into the
transportation industry, such as with driverless
cars.
4. Data Science applications provide a better level
of therapeutic customization through genetics
and genomics research.
Data Scientist
Job role: Determine what the
problem is, what questions
need answers, and where to
find the data. Also, they mine,
clean, and present the relevant
data.
Skills needed: Programming
skills (SAS, R, Python),
storytelling and data
visualization, statistical and
mathematical skills, knowledge
of Hadoop, SQL, and Machine
Learning.
Data Analyst
Job role: Analysts bridge the gap
between the data scientists and the
business analysts, organizing and
analyzing data to answer the
questions the organization poses.
They take the technical analyses and
turn them into qualitative action
items.
Skills needed: Statistical and
mathematical skills, programming
skills (SAS, R, Python), plus
experience in data wrangling and
data visualization.
Data Engineer
Job role: Data engineers focus on
developing, deploying, managing,
and optimizing the organization’s
data infrastructure and data
pipelines. Engineers support data
scientists by helping to transfer
and transform data for queries.
Skills needed: NoSQL databases
(e.g., MongoDB, Cassandra DB),
programming languages such as
Java and Scala, and frameworks
(Apache Hadoop).
Data
Science
Tools
Data Analysis: SAS, Jupyter, R
Studio, MATLAB, Excel, RapidMiner
Data Warehousing: Informatica/
Talend, AWS Redshift
Data Visualization: Jupyter, Tableau,
Cognos, RAW
Machine Learning: Spark MLib,
Mahout, Azure ML studio
Difference
Between
Business
Intelligence
and Data
Science
BUSINESS INTELLIGENCE DATA SCIENCE
Uses structured data Uses both structured and
unstructured data
Analytical in nature - provides a
historical report of the data
Scientific in nature - perform an in-
depth statistical analysis on the
data
Use of basic statistics with
emphasis on visualization
(dashboards, reports)
Leverages more sophisticated
statistical and predictive analysis
and machine learning (ML)
Compares historical data to current
data to identify trends
Combines historical and current
data to predict future performance
and outcomes
Applications
of Data
Science
1. Healthcare
2. Gaming
3. Image
Recognition
4.
Recommendation
Systems
5. Logistics
6. Fraud
Detection
7. Internet Search
8. Speech
recognition
9. Targeted
Advertising
10. Airline Route
Planning
11. Augmented
Reality
Programming Language
for Data Science
Python
Fundamental
Python
Libraries for
Data
Scientists
Numpy
SciPy
Pandas
Scikit-Learn
IDE
Pycharm
Getting Started
Import pandas as pd
1
Import numpy as np
2
Import
matplotlib.pyplot as
plt
3
Getting Started
data = { ’year ’: [2010 , 2011 , 2012 ,2010 , 2011 , 2012 ,2010 , 2011 , 2012],
’team ’: [’ FCBarcelona ’, ’ FCBarcelona ’,’ FCBarcelona ’, ’ RMadrid ’,’ RMadrid ’, ’ RMadrid ’,’ ValenciaCF ’, ’
ValenciaCF ’,’ ValenciaCF ’
],
’wins ’: [30 , 28 , 32 , 29 , 32 , 26 , 21 , 17 , 19] ,
’ draws ’: [6 , 7, 4, 5, 4, 7, 8, 10 , 8] ,
’ losses ’: [2 , 3, 2, 4, 2, 5, 9, 11 , 11]
}
football = pd . DataFrame ( data , columns = [
’year ’, ’team ’, ’wins ’, ’ draws ’, ’ losses ’
]
)
Output
Read CSV
• Import pandas as pd
• mydata = pd.read_csv(‘data.csv’)
First Five Rows
• mydata.head()
Last Five Rows
• mydata.tail()
Show Statistical Information
• mydata.describe()
Selecting Data
• mydata[‘column’]
Subset of Rows
• mydata[5:10]
Thank You

Introduction to Data Science.pdf

  • 1.
    Data Science Muhammad SulemanMemon Assistant Professor Department of Information Technology, Dadu Campus, University of Sindh
  • 2.
    What is Data Science? Data scienceis the domain of study that deals with vast volumes. Find unseen patterns, derive meaningful information, and make business decisions. Data science uses complex machine learning algorithms to build predictive models.
  • 3.
  • 4.
  • 5.
  • 6.
    Prerequisites for Data Science 1. MachineLearning 2. Modeling 3. Statistics 4. Programming 5. Databases
  • 7.
    Who Oversees theData Science Process? • Business Managers • To collaborate with the data science team to characterize the problem and establish an analytical method. • IT Managers • Developing the infrastructure and architecture to enable data science activities. • Data Science Managers • Supervise the working procedures of all data science team members. • They also manage and keep track of the day-to-day activities of the three data science teams.
  • 8.
    What is a Data Scientist? professionalswho have the technical ability to handle complicated issues as well as the desire to investigate what questions need to be answered. They're a mix of mathematicians, computer scientists, and trend forecasters. They're also in high demand and well-paid because they work in both the business and IT sectors.
  • 9.
    On a daily basis,a data scientist may do the following tasks: Discover patterns and trends in datasets to get insights. Create forecasting algorithms and data models. Improve the quality of data or product offerings by utilising machine learning techniques. Distribute suggestions to other teams and top management. In data analysis, use data tools such as R, SAS, Python, or SQL. Top the field of data science innovations.
  • 10.
    What Does a DataScientist Do? Determine the problem. Determines the correct set of variables and datasets. Gather structured and unstructured data from many sources. Convert raw data into a suitable format. Apply ML algorithms. Interpret the data to find opportunities and solutions. Prepare the results and insights to share with stake holders.
  • 11.
    Why Become a Data Scientist? •According to Glassdoor and Forbes, demand for data scientists will increase by 28 percent by 2026, which speaks of the profession’s durability and longevity, so if you want a secure career, data science offers you that chance.
  • 12.
    Use of Data Science 1.Data science may detect patterns in seemingly unstructured or unconnected data, allowing conclusions and predictions to be made. 2. Tech businesses that acquire user data can utilize strategies to transform that data into valuable or profitable information. 3. Data Science has also made inroads into the transportation industry, such as with driverless cars. 4. Data Science applications provide a better level of therapeutic customization through genetics and genomics research.
  • 13.
    Data Scientist Job role:Determine what the problem is, what questions need answers, and where to find the data. Also, they mine, clean, and present the relevant data. Skills needed: Programming skills (SAS, R, Python), storytelling and data visualization, statistical and mathematical skills, knowledge of Hadoop, SQL, and Machine Learning.
  • 14.
    Data Analyst Job role:Analysts bridge the gap between the data scientists and the business analysts, organizing and analyzing data to answer the questions the organization poses. They take the technical analyses and turn them into qualitative action items. Skills needed: Statistical and mathematical skills, programming skills (SAS, R, Python), plus experience in data wrangling and data visualization.
  • 15.
    Data Engineer Job role:Data engineers focus on developing, deploying, managing, and optimizing the organization’s data infrastructure and data pipelines. Engineers support data scientists by helping to transfer and transform data for queries. Skills needed: NoSQL databases (e.g., MongoDB, Cassandra DB), programming languages such as Java and Scala, and frameworks (Apache Hadoop).
  • 16.
    Data Science Tools Data Analysis: SAS,Jupyter, R Studio, MATLAB, Excel, RapidMiner Data Warehousing: Informatica/ Talend, AWS Redshift Data Visualization: Jupyter, Tableau, Cognos, RAW Machine Learning: Spark MLib, Mahout, Azure ML studio
  • 17.
    Difference Between Business Intelligence and Data Science BUSINESS INTELLIGENCEDATA SCIENCE Uses structured data Uses both structured and unstructured data Analytical in nature - provides a historical report of the data Scientific in nature - perform an in- depth statistical analysis on the data Use of basic statistics with emphasis on visualization (dashboards, reports) Leverages more sophisticated statistical and predictive analysis and machine learning (ML) Compares historical data to current data to identify trends Combines historical and current data to predict future performance and outcomes
  • 18.
    Applications of Data Science 1. Healthcare 2.Gaming 3. Image Recognition 4. Recommendation Systems 5. Logistics 6. Fraud Detection 7. Internet Search 8. Speech recognition 9. Targeted Advertising 10. Airline Route Planning 11. Augmented Reality
  • 19.
  • 20.
  • 21.
  • 22.
    Getting Started Import pandasas pd 1 Import numpy as np 2 Import matplotlib.pyplot as plt 3
  • 23.
    Getting Started data ={ ’year ’: [2010 , 2011 , 2012 ,2010 , 2011 , 2012 ,2010 , 2011 , 2012], ’team ’: [’ FCBarcelona ’, ’ FCBarcelona ’,’ FCBarcelona ’, ’ RMadrid ’,’ RMadrid ’, ’ RMadrid ’,’ ValenciaCF ’, ’ ValenciaCF ’,’ ValenciaCF ’ ], ’wins ’: [30 , 28 , 32 , 29 , 32 , 26 , 21 , 17 , 19] , ’ draws ’: [6 , 7, 4, 5, 4, 7, 8, 10 , 8] , ’ losses ’: [2 , 3, 2, 4, 2, 5, 9, 11 , 11] } football = pd . DataFrame ( data , columns = [ ’year ’, ’team ’, ’wins ’, ’ draws ’, ’ losses ’ ] )
  • 24.
  • 25.
    Read CSV • Importpandas as pd • mydata = pd.read_csv(‘data.csv’)
  • 26.
    First Five Rows •mydata.head()
  • 27.
    Last Five Rows •mydata.tail()
  • 28.
  • 29.
  • 30.
    Subset of Rows •mydata[5:10]
  • 31.