SlideShare a Scribd company logo
SEMINAR
D E P A R T M E N T O F C O M P U T E R S C I E N C E
Data Science
& Analysis
TOPIC
Prashant Yadav
M.Tech (CS)
Roll No. 223410
PRESENTED BY
BABASAHEB BHIMRAO AMBEDKAR UNIVERSITY
LUCKNOW UTTAR PRADESH
CONTENTS
 Introduction
 Open Source Tools
 Methodology
 Python For Data Science
 Data Analysis
 Applications
 Challenges
INTRODUCTION
• Data science is an interdisciplinary field that involves the use of statistical and
computational methods to extract insights and knowledge from data. It combines
techniques from mathematics, statistics, computer science, and domain-specific
knowledge to analyze and interpret complex data sets.
• Data science involves various stages, including data collection, data cleaning, data
analysis, and data visualization.
• The goal of data science is to uncover patterns, trends, and insights that can be used to
inform decision-making and solve real-world problems. It has applications in a wide range
of fields, including business, healthcare, finance, and social sciences.
• NEED OF DATA SCIENCE
PARAMETER Description
Data-driven decision making
Data science enables organizations to make informed
decisions based on data insights, rather than relying on
intuition or guesswork.
Predictive analytics
Data science allows organizations to use historical data
to make predictions about future events or trends, such
as customer behavior or market trends.
Improved efficiency and productivity
By automating repetitive tasks and streamlining
processes, data science can help organizations improve
efficiency and productivity.
Personalization
Data science enables organizations to personalize their
products or services to individual customers, based on
their preferences and behavior.
Fraud detection
Data science can be used to detect fraudulent activity,
such as credit card fraud or insurance fraud, by
analyzing patterns and anomalies in data.
Risk management
Data science can help organizations identify and
mitigate risks, such as financial risks or cybersecurity
risks, by analyzing data and identifying potential
threats.
Improved customer experience
By analyzing customer data, data science can help
organizations improve the customer experience by
identifying pain points and areas for improvement.
Competitive advantage
Data science can provide organizations with a
competitive advantage by enabling them to make data-
• Real World Example Of Data Science
1. Credit Risk Assessment:
• A bank uses data science to analyze customer data and credit history to assess
the risk of default on loans.
• Machine learning algorithms are used to identify patterns in customer behavior
and credit history that are associated with higher risk.
• Based on these insights, the bank can make informed decisions about loan
approvals and interest rates.
2. Predictive Maintenance:
• A manufacturing company uses data science to predict when equipment is likely to
fail.
• Sensor data is collected from the equipment and analyzed using machine learning
algorithms to identify patterns that are associated with equipment failure.
• Based on these insights, the company can schedule maintenance before equipment
failure occurs, reducing downtime and maintenance costs.
OPEN SOURCE TOOLS
Tool Description Suitable for
Python
A popular programming language for data
science, with a wide range of libraries and
frameworks for data analysis, machine
learning, and visualization.
Programmers
R
A programming language and environment for
statistical computing and graphics, with a wide
range of packages for data analysis and
visualization.
Programmers
Jupyter
Notebook
An open-source web application that allows
users to create and share documents that
contain live code, equations, visualizations,
and narrative text.
Both
Apache Spark
An open-source distributed computing system
for big data processing, with support for data
analysis, machine learning, and graph
processing.
Programmers
Apache
Hadoop
An open-source distributed computing system
for storing and processing large data sets, with
support for data analysis and machine
learning.
Programmers
Tableau
A data visualization tool that allows users to
create interactive dashboards and reports.
Non-
programmers
KNIME
An open-source data analytics platform that
allows users to create workflows for data
analysis, machine learning, and visualization.
Both
RapidMiner
An open-source data science platform
that allows users to create workflows for
data analysis, machine learning, and
visualization.
Both
Orange
An open-source data visualization and
analysis tool that allows users to create
workflows for data analysis and machine
learning.
Both
Weka
An open-source machine learning tool
that allows users to create and apply
machine learning models to data sets.
Both
METHODOLOGY
METHODOLOGY
• The Business Understanding stage is crucial because it helps to clarify the goal of the customer. In this
stage, we have to ask a lot of questions to the customer about every single aspect of the problem.
• The next step is the Analytic Approach, where, once the business problem has been clearly stated, the
data scientist can define the analytic approach to solve the problem.
• Data Requirements is the stage where
we identify the necessary data content,
formats, and sources for initial data
collection, and we use this data inside the
algorithm of the approach we chose.
• In the Data Collection Stage, data
scientists identify the available data
resources relevant to the problem
domain. To retrieve data, we can do web
scraping on a related website, or we can
use repository with premade datasets
ready to use.
( Decision Tree)
METHODOLOGY
• In the Data Understanding stage, data scientists try to understand more about the data collected before.
We have to check the type of each data and to learn more about the attributes and their names.
• In the Data Preparation stage, data scientists prepare data for modeling, which is one of the most crucial
steps because the model has to be clean and without errors.
• In the Modeling stage, the data scientist has the chance to understand if his work is ready to go or if it
needs review. Modeling focuses on developing models that are either descriptive or predictive, and these
models are based on the analytic approach that was taken statistically or through machine learning.
METHODOLOGY
• In the Model Evaluation stage, data scientists can evaluate the model in two ways: Hold-Out
and Cross-Validation. In the Hold-Out method, the dataset is divided into three subsets:
a training set as we said in the modeling stage; a validation set that is a subset used to
assess the performance of the model built in the training phase; a test set is a subset to
evaluate the likely future performance of a model.
• The Deployment stage depends on the purpose of the model, and it may be rolled out to a
limited group of users or in a test environment.
• The Feedback stage is usually made the most from the customer.
METHODOLOGY
Common Algorithms :
1 .Linear Regression
A statistical method used to model the relationship between a dependent variable and one or
more independent variables.
We can use simple linear regression when you want to know:
1. How strong the relationship is between two variables (e.g., the relationship between rainfall
and soil erosion).
2. The value of the dependent variable at a certain value of the independent variable (e.g., the
amount of soil erosion at a certain level of rainfall).
METHODOLOGY
Simple linear regression formula :
y is the predicted value of the dependent
variable (y) for any given value of the
independent variable (x).
B0 is the intercept, the predicted value
of y when the x is 0.
B1 is the regression coefficient – how much
we expect y to change as x increases.
x is the independent variable ( the variable
we expect is influencing y).
e is the error of the estimate, or how much
variation there is in our estimate of the
regression coefficient.
METHODOLOGY
2. Decision Tree :
• A decision tree is a machine learning
algorithm that uses a tree-like model of
decisions and their possible consequences
to predict outcomes. It is a supervised
learning algorithm that can be used for both
classification and regression tasks.
• The decision tree algorithm works by
recursively splitting the data into subsets
based on the values of the input features.
The goal is to create a tree that predicts the
target variable with high accuracy.
METHODOLOGY
( Example of decision tree )
PYTHON FOR DATA SCIENCE
Python is a popular programming language for data science due to its simplicity, versatility, and
extensive libraries and frameworks for data analysis, machine learning, and visualization.
Here are some of the key libraries and frameworks in Python for data science:
• NumPy: A library for numerical computing in Python, with support for arrays, matrices, and
mathematical functions.
• Pandas: A library for data manipulation and analysis in Python, with support for data structures
such as data frames and series.
• Matplotlib: A library for data visualization in Python, with support for creating a wide range of
charts and graphs.
• Scikit-learn: A library for machine learning in Python, with support for a wide range of algorithms
for classification, regression, clustering, and more.
• TensorFlow: A library for machine learning and deep learning in Python, with support for building
and training neural networks.
DATA ANALYSIS
• Data analysis using Python involves using the Python programming language and its
associated libraries and frameworks to manipulate, analyze, and visualize data. Python
is a popular language for data analysis due to its simplicity, versatility, and extensive
libraries and frameworks for data analysis, machine learning, and visualization.
• By using Python for data analysis, we can gain insights into complex data sets and
make informed decisions based on data insights. Python's popularity in data analysis is
also due to its ease of use and readability, making it accessible to both experienced
programmers and beginners.
• The process of data analysis using Python typically involves several steps, including
data cleaning, data manipulation, data analysis, and data visualization. Python libraries
such as Pandas, NumPy, and Matplotlib are commonly used for these tasks.
APPLICATIONS
• Business: Data science and analysis are widely used in business to analyze
customer data, sales data, and market trends to inform decision-making. This
includes customer segmentation, product recommendations, and pricing
optimization.
• Healthcare: Data science and analysis are used in healthcare to analyze patient
data, identify disease patterns, and improve patient outcomes. This includes
disease diagnosis, drug discovery, and personalized medicine.
• Finance: Data science and analysis are used in finance to analyze financial data,
identify market trends, and inform investment decisions. This includes risk
assessment, fraud detection, and portfolio optimization.
• Social media: Data science and analysis are used in social media to analyze user
behavior, identify trends, and improve user engagement. This includes sentiment
analysis, user profiling, and content recommendation And Many more as such
applications of data Science and analysis exist.
CHALLENGES
• Data Quality: One of the biggest challenges in data science is ensuring that the data being used is accurate,
complete, and reliable. Poor data quality can lead to inaccurate results and flawed insights.
• Data Volume: With the increasing amount of data being generated, managing and processing large volumes
of data can be a significant challenge. This requires specialized tools and techniques for data storage,
processing, and analysis.
• Data Variety: Data comes in many different forms, including structured, semi-structured, and unstructured
data. Working with unstructured data, such as text, images, and video, can be particularly challenging and
requires specialized techniques for natural language processing, computer vision, and other areas of artificial
intelligence.
• Data Privacy and Security: As data becomes more valuable, ensuring its privacy and security becomes
increasingly important. Data scientists need to be aware of privacy regulations and take steps to protect
sensitive data from unauthorized access.
• Model Interpretability: Machine learning models can be complex and difficult to interpret, making it
challenging to understand how they arrived at their conclusions. This can be particularly problematic in
applications where decisions have significant consequences, such as healthcare or finance.
• Business Understanding: Data scientists need to have a deep understanding of the business context in
which they are working in order to develop insights that are relevant and actionable.
THANK YOU

More Related Content

What's hot

Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
Arc & Codementor
 
Machine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule MiningMachine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule Mining
Pier Luca Lanzi
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
ASHOK KUMAR
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
Gregory Piatetsky-Shapiro
 
Pattern Recognition.pptx
Pattern Recognition.pptxPattern Recognition.pptx
Pattern Recognition.pptx
hafeez504942
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
Stat Analytica
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
Carol Hargreaves
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Srishti44
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Semi-supervised Machine Learning
Semi-supervised Machine LearningSemi-supervised Machine Learning
Semi-supervised Machine Learning
Spotle.ai
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Edureka!
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Data science
Data scienceData science
Data science
Mohamed Loey
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
Learnbay Datascience
 

What's hot (20)

Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
 
Machine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule MiningMachine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule Mining
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Pattern Recognition.pptx
Pattern Recognition.pptxPattern Recognition.pptx
Pattern Recognition.pptx
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Semi-supervised Machine Learning
Semi-supervised Machine LearningSemi-supervised Machine Learning
Semi-supervised Machine Learning
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Data science
Data scienceData science
Data science
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 

Similar to Data Science and Analysis.pptx

Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 
Chapter-1 - Notes.pptx
Chapter-1 - Notes.pptxChapter-1 - Notes.pptx
Chapter-1 - Notes.pptx
DATASCIENCE41
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
How to use Python to conduct regression analysis in management PhD research.pptx
How to use Python to conduct regression analysis in management PhD research.pptxHow to use Python to conduct regression analysis in management PhD research.pptx
How to use Python to conduct regression analysis in management PhD research.pptx
Phd Assistance
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Ahmed Elmalla
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
sumitkumar600840
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
JhimarPeredoJurado
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
MuhammadRizwanAmanat
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
AbdulrahimShaibuIssa
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
mustaq4
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Neha Singh
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
priti jadhao
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
Basma Gamal
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Mahir Haque
 
D sppt
D spptD sppt
D sppt
sterlingit
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
vishwajeetparmar1
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
neelakandan2001kpm
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
nikshaikh786
 

Similar to Data Science and Analysis.pptx (20)

Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Chapter-1 - Notes.pptx
Chapter-1 - Notes.pptxChapter-1 - Notes.pptx
Chapter-1 - Notes.pptx
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
How to use Python to conduct regression analysis in management PhD research.pptx
How to use Python to conduct regression analysis in management PhD research.pptxHow to use Python to conduct regression analysis in management PhD research.pptx
How to use Python to conduct regression analysis in management PhD research.pptx
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
D sppt
D spptD sppt
D sppt
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

Data Science and Analysis.pptx

  • 1. SEMINAR D E P A R T M E N T O F C O M P U T E R S C I E N C E Data Science & Analysis TOPIC Prashant Yadav M.Tech (CS) Roll No. 223410 PRESENTED BY BABASAHEB BHIMRAO AMBEDKAR UNIVERSITY LUCKNOW UTTAR PRADESH
  • 2. CONTENTS  Introduction  Open Source Tools  Methodology  Python For Data Science  Data Analysis  Applications  Challenges
  • 3. INTRODUCTION • Data science is an interdisciplinary field that involves the use of statistical and computational methods to extract insights and knowledge from data. It combines techniques from mathematics, statistics, computer science, and domain-specific knowledge to analyze and interpret complex data sets. • Data science involves various stages, including data collection, data cleaning, data analysis, and data visualization. • The goal of data science is to uncover patterns, trends, and insights that can be used to inform decision-making and solve real-world problems. It has applications in a wide range of fields, including business, healthcare, finance, and social sciences.
  • 4. • NEED OF DATA SCIENCE PARAMETER Description Data-driven decision making Data science enables organizations to make informed decisions based on data insights, rather than relying on intuition or guesswork. Predictive analytics Data science allows organizations to use historical data to make predictions about future events or trends, such as customer behavior or market trends. Improved efficiency and productivity By automating repetitive tasks and streamlining processes, data science can help organizations improve efficiency and productivity. Personalization Data science enables organizations to personalize their products or services to individual customers, based on their preferences and behavior. Fraud detection Data science can be used to detect fraudulent activity, such as credit card fraud or insurance fraud, by analyzing patterns and anomalies in data. Risk management Data science can help organizations identify and mitigate risks, such as financial risks or cybersecurity risks, by analyzing data and identifying potential threats. Improved customer experience By analyzing customer data, data science can help organizations improve the customer experience by identifying pain points and areas for improvement. Competitive advantage Data science can provide organizations with a competitive advantage by enabling them to make data-
  • 5. • Real World Example Of Data Science 1. Credit Risk Assessment: • A bank uses data science to analyze customer data and credit history to assess the risk of default on loans. • Machine learning algorithms are used to identify patterns in customer behavior and credit history that are associated with higher risk. • Based on these insights, the bank can make informed decisions about loan approvals and interest rates. 2. Predictive Maintenance: • A manufacturing company uses data science to predict when equipment is likely to fail. • Sensor data is collected from the equipment and analyzed using machine learning algorithms to identify patterns that are associated with equipment failure. • Based on these insights, the company can schedule maintenance before equipment failure occurs, reducing downtime and maintenance costs.
  • 6. OPEN SOURCE TOOLS Tool Description Suitable for Python A popular programming language for data science, with a wide range of libraries and frameworks for data analysis, machine learning, and visualization. Programmers R A programming language and environment for statistical computing and graphics, with a wide range of packages for data analysis and visualization. Programmers Jupyter Notebook An open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. Both
  • 7. Apache Spark An open-source distributed computing system for big data processing, with support for data analysis, machine learning, and graph processing. Programmers Apache Hadoop An open-source distributed computing system for storing and processing large data sets, with support for data analysis and machine learning. Programmers Tableau A data visualization tool that allows users to create interactive dashboards and reports. Non- programmers KNIME An open-source data analytics platform that allows users to create workflows for data analysis, machine learning, and visualization. Both
  • 8. RapidMiner An open-source data science platform that allows users to create workflows for data analysis, machine learning, and visualization. Both Orange An open-source data visualization and analysis tool that allows users to create workflows for data analysis and machine learning. Both Weka An open-source machine learning tool that allows users to create and apply machine learning models to data sets. Both
  • 10. METHODOLOGY • The Business Understanding stage is crucial because it helps to clarify the goal of the customer. In this stage, we have to ask a lot of questions to the customer about every single aspect of the problem. • The next step is the Analytic Approach, where, once the business problem has been clearly stated, the data scientist can define the analytic approach to solve the problem. • Data Requirements is the stage where we identify the necessary data content, formats, and sources for initial data collection, and we use this data inside the algorithm of the approach we chose. • In the Data Collection Stage, data scientists identify the available data resources relevant to the problem domain. To retrieve data, we can do web scraping on a related website, or we can use repository with premade datasets ready to use. ( Decision Tree)
  • 11. METHODOLOGY • In the Data Understanding stage, data scientists try to understand more about the data collected before. We have to check the type of each data and to learn more about the attributes and their names. • In the Data Preparation stage, data scientists prepare data for modeling, which is one of the most crucial steps because the model has to be clean and without errors. • In the Modeling stage, the data scientist has the chance to understand if his work is ready to go or if it needs review. Modeling focuses on developing models that are either descriptive or predictive, and these models are based on the analytic approach that was taken statistically or through machine learning.
  • 12. METHODOLOGY • In the Model Evaluation stage, data scientists can evaluate the model in two ways: Hold-Out and Cross-Validation. In the Hold-Out method, the dataset is divided into three subsets: a training set as we said in the modeling stage; a validation set that is a subset used to assess the performance of the model built in the training phase; a test set is a subset to evaluate the likely future performance of a model. • The Deployment stage depends on the purpose of the model, and it may be rolled out to a limited group of users or in a test environment. • The Feedback stage is usually made the most from the customer.
  • 13. METHODOLOGY Common Algorithms : 1 .Linear Regression A statistical method used to model the relationship between a dependent variable and one or more independent variables. We can use simple linear regression when you want to know: 1. How strong the relationship is between two variables (e.g., the relationship between rainfall and soil erosion). 2. The value of the dependent variable at a certain value of the independent variable (e.g., the amount of soil erosion at a certain level of rainfall).
  • 14. METHODOLOGY Simple linear regression formula : y is the predicted value of the dependent variable (y) for any given value of the independent variable (x). B0 is the intercept, the predicted value of y when the x is 0. B1 is the regression coefficient – how much we expect y to change as x increases. x is the independent variable ( the variable we expect is influencing y). e is the error of the estimate, or how much variation there is in our estimate of the regression coefficient.
  • 15. METHODOLOGY 2. Decision Tree : • A decision tree is a machine learning algorithm that uses a tree-like model of decisions and their possible consequences to predict outcomes. It is a supervised learning algorithm that can be used for both classification and regression tasks. • The decision tree algorithm works by recursively splitting the data into subsets based on the values of the input features. The goal is to create a tree that predicts the target variable with high accuracy.
  • 16. METHODOLOGY ( Example of decision tree )
  • 17. PYTHON FOR DATA SCIENCE Python is a popular programming language for data science due to its simplicity, versatility, and extensive libraries and frameworks for data analysis, machine learning, and visualization. Here are some of the key libraries and frameworks in Python for data science: • NumPy: A library for numerical computing in Python, with support for arrays, matrices, and mathematical functions. • Pandas: A library for data manipulation and analysis in Python, with support for data structures such as data frames and series. • Matplotlib: A library for data visualization in Python, with support for creating a wide range of charts and graphs. • Scikit-learn: A library for machine learning in Python, with support for a wide range of algorithms for classification, regression, clustering, and more. • TensorFlow: A library for machine learning and deep learning in Python, with support for building and training neural networks.
  • 18.
  • 19. DATA ANALYSIS • Data analysis using Python involves using the Python programming language and its associated libraries and frameworks to manipulate, analyze, and visualize data. Python is a popular language for data analysis due to its simplicity, versatility, and extensive libraries and frameworks for data analysis, machine learning, and visualization. • By using Python for data analysis, we can gain insights into complex data sets and make informed decisions based on data insights. Python's popularity in data analysis is also due to its ease of use and readability, making it accessible to both experienced programmers and beginners. • The process of data analysis using Python typically involves several steps, including data cleaning, data manipulation, data analysis, and data visualization. Python libraries such as Pandas, NumPy, and Matplotlib are commonly used for these tasks.
  • 20.
  • 21.
  • 22.
  • 23. APPLICATIONS • Business: Data science and analysis are widely used in business to analyze customer data, sales data, and market trends to inform decision-making. This includes customer segmentation, product recommendations, and pricing optimization. • Healthcare: Data science and analysis are used in healthcare to analyze patient data, identify disease patterns, and improve patient outcomes. This includes disease diagnosis, drug discovery, and personalized medicine. • Finance: Data science and analysis are used in finance to analyze financial data, identify market trends, and inform investment decisions. This includes risk assessment, fraud detection, and portfolio optimization. • Social media: Data science and analysis are used in social media to analyze user behavior, identify trends, and improve user engagement. This includes sentiment analysis, user profiling, and content recommendation And Many more as such applications of data Science and analysis exist.
  • 24. CHALLENGES • Data Quality: One of the biggest challenges in data science is ensuring that the data being used is accurate, complete, and reliable. Poor data quality can lead to inaccurate results and flawed insights. • Data Volume: With the increasing amount of data being generated, managing and processing large volumes of data can be a significant challenge. This requires specialized tools and techniques for data storage, processing, and analysis. • Data Variety: Data comes in many different forms, including structured, semi-structured, and unstructured data. Working with unstructured data, such as text, images, and video, can be particularly challenging and requires specialized techniques for natural language processing, computer vision, and other areas of artificial intelligence. • Data Privacy and Security: As data becomes more valuable, ensuring its privacy and security becomes increasingly important. Data scientists need to be aware of privacy regulations and take steps to protect sensitive data from unauthorized access. • Model Interpretability: Machine learning models can be complex and difficult to interpret, making it challenging to understand how they arrived at their conclusions. This can be particularly problematic in applications where decisions have significant consequences, such as healthcare or finance. • Business Understanding: Data scientists need to have a deep understanding of the business context in which they are working in order to develop insights that are relevant and actionable.