SlideShare a Scribd company logo
1 of 53
FUNDAMENTALS of DATA SCIENCE
Third Year Computer Science & Engineering (Data Science)
By:
Mr. Ganesh. I. Rathod
H.O.D, Data Science
D Y Patil College of Engineering
Salokhe Nagar, Kolhapur
19-11-2022 Department of Data Science Engineering 1
Contents
• Course Objectives.
• Course Outcomes.
• Introduction to Data Science
• Understanding the Syllabus
• Content Beyond the Syllabus
• Online Resources
19-11-2022 Department of Data Science Engineering 2
Course Objective
Course Description:
• The aim is to make them up-to-date with common tools used for Data Science
application development. It serves as an introduction to the basics of data science
including programming for data analytics.
Course Objectives:
1. To provide the students with the basic knowledge of Data Science.
2. To make the students develop solutions using Data Science tools.
3. To introduce them to Python packages and their usability.
19-11-2022 Department of Data Science Engineering 3
Course Outcomes
1. Study1 basics of data science and its scope.
2. Describe2 basics of data science process and recognize common tools
used for Data Science application development.
3. Explore3 functions of Python libraries & packages.
4. Apply4 data science concepts and methods to find solution to real-world
problems and will communicate these solutions effectively.
19-11-2022 Department of Data Science Engineering 4
Program Specific Outcomes
• PSO1: Knowledge of recent technology: Demonstrate the knowledge of
recent technologies like web development, mobile computing, grid
computing, cloud computing, big data analytics, mainframe etc.
• PSO2: Knowledge of programming languages: Demonstrate the knowledge
of programming languages in computer based problem solving.
• PSO3: Software development: Demonstrate the ability to analyse, design
and implement software products.
19-11-2022 Department of Data Science Engineering 5
19-11-2022 Department of Data Science Engineering 6
CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
PSO3
CO1 3 3 3 2 2 - - - - - 2 - - 2
CO2 3 3 2 3 3 - - - - - 2 - 3 3 2
CO3 2 3 2 3 3 - - - - - 2 - 3 -
CO4 3 3 3 3 2 - - - - - 2 - 3 2 2
CO5 3 2 3 3 3 - - - - - 2 - - -
CO6 3 2 3 3 3 - - - - - 2 - 3 - 2
Correlation matrix of Course Outcomes with Programme Outcomes (CO-PO)
1=Low correlation, 2=Medium correlation, 3=High correlation
Introduction to Data Science
Data science is about extracting knowledge and insights from data.
The tools and techniques of data science are used to drive business
and process decisions.
19-11-2022 Department of Data Science Engineering 7
19-11-2022 Department of Data Science Engineering 8
Understanding the Syllabus
19-11-2022 Department of Data Science Engineering 9
UNIT NO. UNIT NAME & DETAILS NO. OF
LECTURES
1. Data Science and Its Scope: What Is Data Science, Data Science and Statistics, Role of
Statistics in Data Science, A Brief History, Difference between Data Science and Data
Analytics, Knowledge and Skills for Data Science Professionals, Some Technologies used in
Data Science, Benefits and uses of data science, Facets of data.
6
2.
The data science process: Overview, defining research goals and creating a project charter,
retrieving data, Cleansing, integrating, and transforming data, Exploratory data analysis,
Build the models, presenting findings and building applications on top of them.
7
3.
Data Analysis Tools for Data Science and Analytics: Data Analysis Using Excel: Introduction,
Getting Started with Excel, Format Data as a Table, Filter and Sort, Perform Simple
Calculations, Data Manipulation Sorting and Filtering Data Derived Data, Highlighting Data,
Aggregating Data: Count, Total Sum Basic Calculation using Excel, Analyzing Data using
Pivot Table/Pivot Chart, Descriptive Statistics using Excel, Visualizing Data using Excel
Charts and Graphs, Visualizing Categorical Data: Bar Charts, Pie Charts, Cross Tabulation,
Exploring the Relationship between Two and Three Variables: Scatter Plot Bubble Graph
and Time-Series Plot.
8
19-11-2022 Department of Data Science Engineering 10
4.
Introduction to NumPy: Creating Arrays from Scratch, NumPy Standard Data Types, The
Basics of NumPy Arrays, Array Indexing, slicing, reshaping, Concatenation, splitting,
Computation on NumPy Arrays: Universal Functions, Aggregations: Min, Max, Comparison
operator, Boolean arrays.
7
5.
Data Manipulation with Pandas: Introducing Pandas Objects, Data Indexing and Selection,
Operating on Data in Pandas, Handling Missing Data, Hierarchical Indexing. Combining
Datasets: Concat and Append, Combining Datasets: Merge and Join, Aggregation and
Grouping, Pivot Tables
7
6.
Visualization with Matplotlib: General Matplotlib Tips, Simple Line Plots, Simple Scatter
Plots, Visualizing Errors, Density and Contour Plots, Histograms, Bindings, and Density.
7
19-11-2022 Department of Data Science Engineering 11
Text Books
1) Davy Cielen, Arno D. B. Meysman, Mohamed Ali, “Introducing Data Science”,Manning
Publications.[Unit 1 and 2]
2) Jake VanderPlas, “Python Data Science Handbook: Essential Tools for Working with Data”,
O’REILLY Publication.[Unit 3,4,5]
3) DR.AmarSahay, “Essentials of Data Science and Analytics”, O’REILLY Publication.
[Unit 1 and 3]
Reference Books
1. Data Science from Scratch: First Principles with Python, O‟Reilly Media, 2015.
2. Glenn J. Myatt John, Making sense of Data: A practical Guide to Exploratory Data Analysis and Data
Mining, Wiley Publishers, 2000.
19-11-2022 Department of Data Science Engineering 12
Content beyond the Syllabus
•R Programming
R is a programming language and free software developed by Ross Ihaka and Robert
Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods.
It includes machine learning algorithms, linear regression, time series, statistical inference
to name a few.
•Power BI
"Power BI," Microsoft says, "is a business analytics solution that lets you visualize your data and share
insights across your organization, or embed them in your app or website."
19-11-2022 Department of Data Science Engineering 13
Online Resources
• https://nptel.ac.in/courses/106/106/106106212/
• https://www.coursera.org/specializations/data-science-fundamentals-python-sql
• https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
• https://www.youtube.com/watch?v=-ETQ97mXXF0&t=561s (Edureka)
• https://www.youtube.com/watch?v=KxryzSO1Fjs (Simplilearn)
19-11-2022 Department of Data Science Engineering 14
Job opportunities
19-11-2022 Department of Data Science Engineering 15
Job opportunities
19-11-2022 Department of Data Science Engineering 16
Job opportunities
19-11-2022 Department of Data Science Engineering 17
A Brief History
19-11-2022 Department of Data Science Engineering 18
Eurostat
Eurostat
Eurostat
Eurostat
Eurostat
Eurostat
Eurostat
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 26
Data Science Data Analytics
Data science is a multi-disciplinary blend that involves
algorithm development, data inference, and predictive
modeling to solve analytically complex business
problems.
Data analytics involves a few different branches of
broader statistics and analysis..
Data science focuses more on machine learning and
predictive modeling.
Data analytics focuses more on viewing the historical
data
Data science focuses on discovering new questions that
you might not have realized needed answering to drive
innovation.
Data analysis involves answering questions generated for
better business decision making. It uses existing
information to uncover actionable data. Data analytics
focuses on specific areas with specific goals.
Data science tries to build connections and shapes the
questions to answer them for the future
Data analytics involves checking a hypothesis
If data science is a home for all the methods and tools, data
analytics is a small room in that house.
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 27
Feature Data Science Data Analytics
Coding
Language
Python is the most
commonly used language for
data science along with the
use of other languages such
as C++, Java, Perl, etc.
The Knowledge of Python
and R Language is essential
for Data Analytics.
Programming
Skills
In- depth knowledge of
programming is required for
data science.
Basic Programming skills is
necessary for data analytics.
Use of
Machine
Learning
Data Science makes use of
machine learning algorithms
to get insights.
Data Analytics doesn’t makes
use of machine learning.
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 28
Feature Data Science Data Analytics
Scope
The scope of data science
is large.
The Scope of data analysis
is micro i.e., small.
Goals
Data science deals with
explorations and new
innovations.
Data Analysis makes use of
existing resources.
Data Type
Data Science mostly deals
with unstructured data.
Data Analytics deals with
structured data.
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 29
Feature Data Science Data Analytics
Scope
The scope of data science
is large.
The Scope of data analysis
is micro i.e., small.
Goals
Data science deals with
explorations and new
innovations.
Data Analysis makes use of
existing resources.
Data Type
Data Science mostly deals
with unstructured data.
Data Analytics deals with
structured data.
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 30
Data Science vs Data Analytics — The Skills
Data Analytics —
• Knowledge of Intermediate Statistics and excellent
problem-solving skills along with expert in Excel and
SQL database.
• Experience working with BI tools like Power BI for
reporting.
• Knowledge of Stats tools like Python and R
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 31
Data Science vs Data Analytics — The Skills
Data Science —
• Math, Advanced Statistics, Predictive Modelling,
Machine Learning, Programming along with-
Proficiency in using big data tools like Hadoop and
Spark.
• Expertise in SQL and NoSQL databases like
Cassandra and MongoDB.
• Experience with data visualization tools like QlikView,
D3.js, and Tableau.
• Expertise in programming languages like Python, R,
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 32
Data Science vs Data Analytics — Sample Job
Description
Data Analyst
Difference between Data Scientist & Data Analyst
19-11-2022 Department of Data Science Engineering 33
Data Science vs Data Analytics — Sample Job
Description
Data Scientist
Knowledge and Skills for Data Science Professionals
19-11-2022 Department of Data Science Engineering 34
Knowledge and Skills for Data Science Professionals
19-11-2022 Department of Data Science Engineering 35
• At least one programming language – R/ Python
• Data Extraction, Transformation, and Loading
• Data Wrangling and Data Exploration
• Machine Learning Algorithms
• Advanced Machine Learning (Deep Learning)
• Big Data Processing Frameworks
• Data Visualization
Knowledge and Skills for Data Science Professionals
19-11-2022 Department of Data Science Engineering 36
• As a Data Scientist, you’ll be responsible for jobs that span
three domains of skills.
• Statistical/mathematical reasoning,
• Business communication/leadership, and
• Programming
Knowledge and Skills for Data Science Professionals
19-11-2022 Department of Data Science Engineering 37
1. Statistics:
Wikipedia defines it as the study of the collection,
interpretation, presentation, and organization of
shouldn’t be a surprise that data scientists need to
For example, data analysis requires descriptive
probability theory, at a minimum. These concepts
better business decisions from data.
Knowledge and Skills for Data Science Professionals
19-11-2022 Department of Data Science Engineering 38
2. Programming Language R/ Python:
Python and R are one of the most widely used languages by Data
Scientists. The primary reason is the number of packages available for
computing.
3. Data Extraction, Transformation, and Loading:
Suppose we have multiple data sources like MySQL DB, MongoDB,
have to Extract data from such sources, and then transform it for
structure for the purposes of querying and analysis. Finally, you have
the Data Warehouse, where you will analyze the data. So, for people
Transform and Load) background Data Science can be a good career
option
Knowledge and Skills for Data Science Professionals
19-11-2022 Department of Data Science Engineering 39
4. Data Wrangling and Data Exploration:
• Cleaning and unify the messy and complex data sets for easy access and
as Data Wrangling.
• Exploratory Data Analysis (EDA) is the first step in your data analysis
sense of the data you have and then figure out what questions you want
them, as well as how best to manipulate your available data sources to
5. Machine Learning
Machine Learning, as the name suggests, is the process of making
machines intelligent, that have the power to think, analyze and make
Machine Learning models, an organization has a better chance of
– or avoiding unknown risks.
You should have good hands-on knowledge of various Supervised and
Knowledge and Skills for Data Science Professionals
19-11-2022 Department of Data Science Engineering 40
6. Big Data Processing Frameworks:
• Nowadays, most of the organizations are using Big Data analytics to gain
insights. It is, therefore, a must-have skill for a Data Scientist.
• Therefore, we require frameworks like Hadoop and Spark to handle Big
Benefits and uses of data science and big data
19-11-2022 Department of Data Science Engineering 41
• Commercial companies in almost every industry use data science
and big data to gain insights into their customers, processes, staff,
completion, and products.
• A good example of this is GoogleAdSense, which collects data from internet users so relevant commercial
messages canbe matched to the person browsing the internet.
• Human resource professionals use people analytics and text
mining to screen candidates, monitor the mood of employees, and
study informal networks among coworkers.
• Financial institutions use data science to predict stock markets,
determine the risk of lending money, and learn how to attract new
clients for their services.
Some Technologies used in Data Science
19-11-2022 Department of Data Science Engineering 42
Benefits and uses of data science and big data
19-11-2022 Department of Data Science Engineering 43
• Governmental organizations are also aware of data’s value. A
data scientist in a governmental organization gets to work on
diverse projects such as detecting fraud and other criminal activity
or optimizing project funding.
• Nongovernmental organizations (NGOs) are also no strangers to
using data. They use it to raise money and defend their causes. The
World Wildlife Fund (WWF), for instance, employs data scientists to
increase the effectiveness of their fundraising efforts.
• Universities use data science in their research but also to enhance
the study experience of their students.
• Ex: MOOC’s- Massive open online courses.
Facets of data
19-11-2022 Department of Data Science Engineering 44
• The main categories of data are these:
• ■ Structured
• ■ Semi structured
• ■ Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming
Structured Data
It concerns all data which can be stored in database
SQL in table with rows and columns.
They have relational key and can be easily mapped into
pre-designed fields.
Today, those data are the most processed in development
and the simplest way to manage information.
But structured data represent only 5 to 10% of all
informatics data.
Structured Data
Semi Structured Data
• Semi-structured data is information that doesn’t reside in a
relational database but that does have some organizational
properties that make it easier to analyze.
• With some process you can store them in relation database (it
could be very hard for some kind of semi structured data), but
the semi structure exist to ease space, clarity or compute…
Examples of semi-structured :JSON, CSV , XML documents
are semi structured documents.
But as Structured data, semi structured data represents a few
parts of data (5 to 10%).
Unstructured data
• Unstructured data represent around 80% of data.
• It often include text and multimedia content.
• Examples include e-mail messages, word processing documents, videos, photos, audio
files, presentations, webpages and many other kinds of business documents.
• Unstructured data is everywhere.
• In fact, most individuals and organizations conduct their lives around unstructured
data.
• Just as with structured data, unstructured data is either machine generated or human
generated.
Unstructured data
Here are some examples of machine-generated unstructured data:
• Satellite images: This includes weather data or the data that the government captures in its satellite
surveillance imagery. Just think about Google Earth, and you get the picture.
• Photographs and video: This includes security, surveillance, and traffic video.
• Radar or sonar data: This includes vehicular, meteorological, and Seismic oceanography.
• The following list shows a few examples of human-generated unstructured data:
• Social media data: This data is generated from the social media platforms such as YouTube, Facebook,
Twitter, LinkedIn, and Flickr.
• Mobile data: This includes data such as text messages and location information.
• website content: This comes from any site delivering unstructured content, like YouTube, Flickr, or
Instagram.
Facets of data
• Natural language is a special type of unstructured data; it’s
challenging to process because it requires knowledge of specific
data science techniques and linguistics.
• The natural language processing community has had success in
entity recognition, topic recognition, summarization, and sentiment
analysis, but models trained in one domain don’t generalize well to
other domains.
19-11-2022 Department of Data Science Engineering 50
Natural Language
Facets of data
• In graph theory, a graph is a mathematical structure to model pair-
wise relationships between objects.
• Graph or network data is, in short, data that focuses on the
relationship or adjacency of objects.
• The graph structures use nodes, edges, and properties to represent
and store graphical data. Graph-based data is a natural way to
represent social networks.
19-11-2022 Department of Data Science Engineering 51
Graph based or Network Data
19-11-2022 Department of Data Science Engineering 52
Facets of data
• Audio, image, and video are data types that pose specific challenges to a
data scientist.
• MLBAM (Major League Baseball Advanced Media) announced in 2014
that they’ll increase video capture to approximately 7 TB per game for the
purpose of live, in-game analytics. High-speed cameras at stadiums will
capture ball and athlete movements to calculate in real time, for example,
the path taken by a defender relative to two baselines.
19-11-2022 Department of Data Science Engineering 53
Audio, Image & Video
Streaming Data
• Streaming data is data that is generated continuously by thousands
of data sources, which typically send in the data records
simultaneously, and in small sizes (order of Kilobytes).
• Examples are the-Log files generated by customers using your mobile or
web applications, online game activity, “What’s trending” on Twitter, live
sporting or music events, and the stock market.

More Related Content

What's hot

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPTANUSUYA T K
 

What's hot (20)

Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Data Structure (Tree)
Data Structure (Tree)Data Structure (Tree)
Data Structure (Tree)
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Data science
Data scienceData science
Data science
 

Similar to FDS_dept_ppt.pptx

An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsIRJET Journal
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Data Science Certification in Pune-January
Data Science Certification in Pune-JanuaryData Science Certification in Pune-January
Data Science Certification in Pune-JanuaryDataMites
 
Data Science Certification in Pune-January
Data Science Certification in Pune-JanuaryData Science Certification in Pune-January
Data Science Certification in Pune-JanuaryDataMites
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Data Science Training in Chennai-January
Data Science Training in Chennai-JanuaryData Science Training in Chennai-January
Data Science Training in Chennai-JanuaryDataMites
 
Data Science Course in Chennai-January-1
Data Science Course in Chennai-January-1Data Science Course in Chennai-January-1
Data Science Course in Chennai-January-1DataMites
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Data Science Course In Chennai-October
Data Science Course In Chennai-OctoberData Science Course In Chennai-October
Data Science Course In Chennai-OctoberDataMites
 
Data Science Course In Bangalore-October
Data Science Course In Bangalore-OctoberData Science Course In Bangalore-October
Data Science Course In Bangalore-OctoberDataMites
 
Data Science Course In Pune-October
Data Science Course In Pune-OctoberData Science Course In Pune-October
Data Science Course In Pune-OctoberDataMites
 
Data Science Course In Delhi-October
Data Science Course In Delhi-OctoberData Science Course In Delhi-October
Data Science Course In Delhi-OctoberDataMites
 
Data Science Course In Mumbai-October
Data Science Course In Mumbai-OctoberData Science Course In Mumbai-October
Data Science Course In Mumbai-OctoberDataMites
 
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfR18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfNaveen Kumar
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaBabasab Patil
 
Hithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptxHithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptxssuser22b2ec
 

Similar to FDS_dept_ppt.pptx (20)

An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data Analytics
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Data Science Certification in Pune-January
Data Science Certification in Pune-JanuaryData Science Certification in Pune-January
Data Science Certification in Pune-January
 
Data Science Certification in Pune-January
Data Science Certification in Pune-JanuaryData Science Certification in Pune-January
Data Science Certification in Pune-January
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Science Training in Chennai-January
Data Science Training in Chennai-JanuaryData Science Training in Chennai-January
Data Science Training in Chennai-January
 
Data Science Course in Chennai-January-1
Data Science Course in Chennai-January-1Data Science Course in Chennai-January-1
Data Science Course in Chennai-January-1
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
37.%20 m.e.%20cse%20
37.%20 m.e.%20cse%2037.%20 m.e.%20cse%20
37.%20 m.e.%20cse%20
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Data Science Course In Chennai-October
Data Science Course In Chennai-OctoberData Science Course In Chennai-October
Data Science Course In Chennai-October
 
Data Science Course In Bangalore-October
Data Science Course In Bangalore-OctoberData Science Course In Bangalore-October
Data Science Course In Bangalore-October
 
Data Science Course In Pune-October
Data Science Course In Pune-OctoberData Science Course In Pune-October
Data Science Course In Pune-October
 
Data Science Course In Delhi-October
Data Science Course In Delhi-OctoberData Science Course In Delhi-October
Data Science Course In Delhi-October
 
Data Science Course In Mumbai-October
Data Science Course In Mumbai-OctoberData Science Course In Mumbai-October
Data Science Course In Mumbai-October
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
 
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfR18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 
Hithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptxHithai Shree.J and Varsha.R.pptx
Hithai Shree.J and Varsha.R.pptx
 

Recently uploaded

Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

FDS_dept_ppt.pptx

  • 1. FUNDAMENTALS of DATA SCIENCE Third Year Computer Science & Engineering (Data Science) By: Mr. Ganesh. I. Rathod H.O.D, Data Science D Y Patil College of Engineering Salokhe Nagar, Kolhapur 19-11-2022 Department of Data Science Engineering 1
  • 2. Contents • Course Objectives. • Course Outcomes. • Introduction to Data Science • Understanding the Syllabus • Content Beyond the Syllabus • Online Resources 19-11-2022 Department of Data Science Engineering 2
  • 3. Course Objective Course Description: • The aim is to make them up-to-date with common tools used for Data Science application development. It serves as an introduction to the basics of data science including programming for data analytics. Course Objectives: 1. To provide the students with the basic knowledge of Data Science. 2. To make the students develop solutions using Data Science tools. 3. To introduce them to Python packages and their usability. 19-11-2022 Department of Data Science Engineering 3
  • 4. Course Outcomes 1. Study1 basics of data science and its scope. 2. Describe2 basics of data science process and recognize common tools used for Data Science application development. 3. Explore3 functions of Python libraries & packages. 4. Apply4 data science concepts and methods to find solution to real-world problems and will communicate these solutions effectively. 19-11-2022 Department of Data Science Engineering 4
  • 5. Program Specific Outcomes • PSO1: Knowledge of recent technology: Demonstrate the knowledge of recent technologies like web development, mobile computing, grid computing, cloud computing, big data analytics, mainframe etc. • PSO2: Knowledge of programming languages: Demonstrate the knowledge of programming languages in computer based problem solving. • PSO3: Software development: Demonstrate the ability to analyse, design and implement software products. 19-11-2022 Department of Data Science Engineering 5
  • 6. 19-11-2022 Department of Data Science Engineering 6 CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3 CO1 3 3 3 2 2 - - - - - 2 - - 2 CO2 3 3 2 3 3 - - - - - 2 - 3 3 2 CO3 2 3 2 3 3 - - - - - 2 - 3 - CO4 3 3 3 3 2 - - - - - 2 - 3 2 2 CO5 3 2 3 3 3 - - - - - 2 - - - CO6 3 2 3 3 3 - - - - - 2 - 3 - 2 Correlation matrix of Course Outcomes with Programme Outcomes (CO-PO) 1=Low correlation, 2=Medium correlation, 3=High correlation
  • 7. Introduction to Data Science Data science is about extracting knowledge and insights from data. The tools and techniques of data science are used to drive business and process decisions. 19-11-2022 Department of Data Science Engineering 7
  • 8. 19-11-2022 Department of Data Science Engineering 8
  • 9. Understanding the Syllabus 19-11-2022 Department of Data Science Engineering 9
  • 10. UNIT NO. UNIT NAME & DETAILS NO. OF LECTURES 1. Data Science and Its Scope: What Is Data Science, Data Science and Statistics, Role of Statistics in Data Science, A Brief History, Difference between Data Science and Data Analytics, Knowledge and Skills for Data Science Professionals, Some Technologies used in Data Science, Benefits and uses of data science, Facets of data. 6 2. The data science process: Overview, defining research goals and creating a project charter, retrieving data, Cleansing, integrating, and transforming data, Exploratory data analysis, Build the models, presenting findings and building applications on top of them. 7 3. Data Analysis Tools for Data Science and Analytics: Data Analysis Using Excel: Introduction, Getting Started with Excel, Format Data as a Table, Filter and Sort, Perform Simple Calculations, Data Manipulation Sorting and Filtering Data Derived Data, Highlighting Data, Aggregating Data: Count, Total Sum Basic Calculation using Excel, Analyzing Data using Pivot Table/Pivot Chart, Descriptive Statistics using Excel, Visualizing Data using Excel Charts and Graphs, Visualizing Categorical Data: Bar Charts, Pie Charts, Cross Tabulation, Exploring the Relationship between Two and Three Variables: Scatter Plot Bubble Graph and Time-Series Plot. 8 19-11-2022 Department of Data Science Engineering 10
  • 11. 4. Introduction to NumPy: Creating Arrays from Scratch, NumPy Standard Data Types, The Basics of NumPy Arrays, Array Indexing, slicing, reshaping, Concatenation, splitting, Computation on NumPy Arrays: Universal Functions, Aggregations: Min, Max, Comparison operator, Boolean arrays. 7 5. Data Manipulation with Pandas: Introducing Pandas Objects, Data Indexing and Selection, Operating on Data in Pandas, Handling Missing Data, Hierarchical Indexing. Combining Datasets: Concat and Append, Combining Datasets: Merge and Join, Aggregation and Grouping, Pivot Tables 7 6. Visualization with Matplotlib: General Matplotlib Tips, Simple Line Plots, Simple Scatter Plots, Visualizing Errors, Density and Contour Plots, Histograms, Bindings, and Density. 7 19-11-2022 Department of Data Science Engineering 11
  • 12. Text Books 1) Davy Cielen, Arno D. B. Meysman, Mohamed Ali, “Introducing Data Science”,Manning Publications.[Unit 1 and 2] 2) Jake VanderPlas, “Python Data Science Handbook: Essential Tools for Working with Data”, O’REILLY Publication.[Unit 3,4,5] 3) DR.AmarSahay, “Essentials of Data Science and Analytics”, O’REILLY Publication. [Unit 1 and 3] Reference Books 1. Data Science from Scratch: First Principles with Python, O‟Reilly Media, 2015. 2. Glenn J. Myatt John, Making sense of Data: A practical Guide to Exploratory Data Analysis and Data Mining, Wiley Publishers, 2000. 19-11-2022 Department of Data Science Engineering 12
  • 13. Content beyond the Syllabus •R Programming R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithms, linear regression, time series, statistical inference to name a few. •Power BI "Power BI," Microsoft says, "is a business analytics solution that lets you visualize your data and share insights across your organization, or embed them in your app or website." 19-11-2022 Department of Data Science Engineering 13
  • 14. Online Resources • https://nptel.ac.in/courses/106/106/106106212/ • https://www.coursera.org/specializations/data-science-fundamentals-python-sql • https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/ • https://www.youtube.com/watch?v=-ETQ97mXXF0&t=561s (Edureka) • https://www.youtube.com/watch?v=KxryzSO1Fjs (Simplilearn) 19-11-2022 Department of Data Science Engineering 14
  • 15. Job opportunities 19-11-2022 Department of Data Science Engineering 15
  • 16. Job opportunities 19-11-2022 Department of Data Science Engineering 16
  • 17. Job opportunities 19-11-2022 Department of Data Science Engineering 17
  • 18. A Brief History 19-11-2022 Department of Data Science Engineering 18
  • 26. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 26 Data Science Data Analytics Data science is a multi-disciplinary blend that involves algorithm development, data inference, and predictive modeling to solve analytically complex business problems. Data analytics involves a few different branches of broader statistics and analysis.. Data science focuses more on machine learning and predictive modeling. Data analytics focuses more on viewing the historical data Data science focuses on discovering new questions that you might not have realized needed answering to drive innovation. Data analysis involves answering questions generated for better business decision making. It uses existing information to uncover actionable data. Data analytics focuses on specific areas with specific goals. Data science tries to build connections and shapes the questions to answer them for the future Data analytics involves checking a hypothesis If data science is a home for all the methods and tools, data analytics is a small room in that house.
  • 27. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 27 Feature Data Science Data Analytics Coding Language Python is the most commonly used language for data science along with the use of other languages such as C++, Java, Perl, etc. The Knowledge of Python and R Language is essential for Data Analytics. Programming Skills In- depth knowledge of programming is required for data science. Basic Programming skills is necessary for data analytics. Use of Machine Learning Data Science makes use of machine learning algorithms to get insights. Data Analytics doesn’t makes use of machine learning.
  • 28. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 28 Feature Data Science Data Analytics Scope The scope of data science is large. The Scope of data analysis is micro i.e., small. Goals Data science deals with explorations and new innovations. Data Analysis makes use of existing resources. Data Type Data Science mostly deals with unstructured data. Data Analytics deals with structured data.
  • 29. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 29 Feature Data Science Data Analytics Scope The scope of data science is large. The Scope of data analysis is micro i.e., small. Goals Data science deals with explorations and new innovations. Data Analysis makes use of existing resources. Data Type Data Science mostly deals with unstructured data. Data Analytics deals with structured data.
  • 30. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 30 Data Science vs Data Analytics — The Skills Data Analytics — • Knowledge of Intermediate Statistics and excellent problem-solving skills along with expert in Excel and SQL database. • Experience working with BI tools like Power BI for reporting. • Knowledge of Stats tools like Python and R
  • 31. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 31 Data Science vs Data Analytics — The Skills Data Science — • Math, Advanced Statistics, Predictive Modelling, Machine Learning, Programming along with- Proficiency in using big data tools like Hadoop and Spark. • Expertise in SQL and NoSQL databases like Cassandra and MongoDB. • Experience with data visualization tools like QlikView, D3.js, and Tableau. • Expertise in programming languages like Python, R,
  • 32. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 32 Data Science vs Data Analytics — Sample Job Description Data Analyst
  • 33. Difference between Data Scientist & Data Analyst 19-11-2022 Department of Data Science Engineering 33 Data Science vs Data Analytics — Sample Job Description Data Scientist
  • 34. Knowledge and Skills for Data Science Professionals 19-11-2022 Department of Data Science Engineering 34
  • 35. Knowledge and Skills for Data Science Professionals 19-11-2022 Department of Data Science Engineering 35 • At least one programming language – R/ Python • Data Extraction, Transformation, and Loading • Data Wrangling and Data Exploration • Machine Learning Algorithms • Advanced Machine Learning (Deep Learning) • Big Data Processing Frameworks • Data Visualization
  • 36. Knowledge and Skills for Data Science Professionals 19-11-2022 Department of Data Science Engineering 36 • As a Data Scientist, you’ll be responsible for jobs that span three domains of skills. • Statistical/mathematical reasoning, • Business communication/leadership, and • Programming
  • 37. Knowledge and Skills for Data Science Professionals 19-11-2022 Department of Data Science Engineering 37 1. Statistics: Wikipedia defines it as the study of the collection, interpretation, presentation, and organization of shouldn’t be a surprise that data scientists need to For example, data analysis requires descriptive probability theory, at a minimum. These concepts better business decisions from data.
  • 38. Knowledge and Skills for Data Science Professionals 19-11-2022 Department of Data Science Engineering 38 2. Programming Language R/ Python: Python and R are one of the most widely used languages by Data Scientists. The primary reason is the number of packages available for computing. 3. Data Extraction, Transformation, and Loading: Suppose we have multiple data sources like MySQL DB, MongoDB, have to Extract data from such sources, and then transform it for structure for the purposes of querying and analysis. Finally, you have the Data Warehouse, where you will analyze the data. So, for people Transform and Load) background Data Science can be a good career option
  • 39. Knowledge and Skills for Data Science Professionals 19-11-2022 Department of Data Science Engineering 39 4. Data Wrangling and Data Exploration: • Cleaning and unify the messy and complex data sets for easy access and as Data Wrangling. • Exploratory Data Analysis (EDA) is the first step in your data analysis sense of the data you have and then figure out what questions you want them, as well as how best to manipulate your available data sources to 5. Machine Learning Machine Learning, as the name suggests, is the process of making machines intelligent, that have the power to think, analyze and make Machine Learning models, an organization has a better chance of – or avoiding unknown risks. You should have good hands-on knowledge of various Supervised and
  • 40. Knowledge and Skills for Data Science Professionals 19-11-2022 Department of Data Science Engineering 40 6. Big Data Processing Frameworks: • Nowadays, most of the organizations are using Big Data analytics to gain insights. It is, therefore, a must-have skill for a Data Scientist. • Therefore, we require frameworks like Hadoop and Spark to handle Big
  • 41. Benefits and uses of data science and big data 19-11-2022 Department of Data Science Engineering 41 • Commercial companies in almost every industry use data science and big data to gain insights into their customers, processes, staff, completion, and products. • A good example of this is GoogleAdSense, which collects data from internet users so relevant commercial messages canbe matched to the person browsing the internet. • Human resource professionals use people analytics and text mining to screen candidates, monitor the mood of employees, and study informal networks among coworkers. • Financial institutions use data science to predict stock markets, determine the risk of lending money, and learn how to attract new clients for their services.
  • 42. Some Technologies used in Data Science 19-11-2022 Department of Data Science Engineering 42
  • 43. Benefits and uses of data science and big data 19-11-2022 Department of Data Science Engineering 43 • Governmental organizations are also aware of data’s value. A data scientist in a governmental organization gets to work on diverse projects such as detecting fraud and other criminal activity or optimizing project funding. • Nongovernmental organizations (NGOs) are also no strangers to using data. They use it to raise money and defend their causes. The World Wildlife Fund (WWF), for instance, employs data scientists to increase the effectiveness of their fundraising efforts. • Universities use data science in their research but also to enhance the study experience of their students. • Ex: MOOC’s- Massive open online courses.
  • 44. Facets of data 19-11-2022 Department of Data Science Engineering 44 • The main categories of data are these: • ■ Structured • ■ Semi structured • ■ Unstructured • Natural language • Machine-generated • Graph-based • Audio, video, and images • Streaming
  • 45. Structured Data It concerns all data which can be stored in database SQL in table with rows and columns. They have relational key and can be easily mapped into pre-designed fields. Today, those data are the most processed in development and the simplest way to manage information. But structured data represent only 5 to 10% of all informatics data.
  • 47. Semi Structured Data • Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. • With some process you can store them in relation database (it could be very hard for some kind of semi structured data), but the semi structure exist to ease space, clarity or compute… Examples of semi-structured :JSON, CSV , XML documents are semi structured documents. But as Structured data, semi structured data represents a few parts of data (5 to 10%).
  • 48. Unstructured data • Unstructured data represent around 80% of data. • It often include text and multimedia content. • Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. • Unstructured data is everywhere. • In fact, most individuals and organizations conduct their lives around unstructured data. • Just as with structured data, unstructured data is either machine generated or human generated.
  • 49. Unstructured data Here are some examples of machine-generated unstructured data: • Satellite images: This includes weather data or the data that the government captures in its satellite surveillance imagery. Just think about Google Earth, and you get the picture. • Photographs and video: This includes security, surveillance, and traffic video. • Radar or sonar data: This includes vehicular, meteorological, and Seismic oceanography. • The following list shows a few examples of human-generated unstructured data: • Social media data: This data is generated from the social media platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr. • Mobile data: This includes data such as text messages and location information. • website content: This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram.
  • 50. Facets of data • Natural language is a special type of unstructured data; it’s challenging to process because it requires knowledge of specific data science techniques and linguistics. • The natural language processing community has had success in entity recognition, topic recognition, summarization, and sentiment analysis, but models trained in one domain don’t generalize well to other domains. 19-11-2022 Department of Data Science Engineering 50 Natural Language
  • 51. Facets of data • In graph theory, a graph is a mathematical structure to model pair- wise relationships between objects. • Graph or network data is, in short, data that focuses on the relationship or adjacency of objects. • The graph structures use nodes, edges, and properties to represent and store graphical data. Graph-based data is a natural way to represent social networks. 19-11-2022 Department of Data Science Engineering 51 Graph based or Network Data
  • 52. 19-11-2022 Department of Data Science Engineering 52
  • 53. Facets of data • Audio, image, and video are data types that pose specific challenges to a data scientist. • MLBAM (Major League Baseball Advanced Media) announced in 2014 that they’ll increase video capture to approximately 7 TB per game for the purpose of live, in-game analytics. High-speed cameras at stadiums will capture ball and athlete movements to calculate in real time, for example, the path taken by a defender relative to two baselines. 19-11-2022 Department of Data Science Engineering 53 Audio, Image & Video Streaming Data • Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). • Examples are the-Log files generated by customers using your mobile or web applications, online game activity, “What’s trending” on Twitter, live sporting or music events, and the stock market.