ARCOT SRI MAHALAKSHMI WOMEN’S COLLEGE
ADVANCE DATA SCIENCE WITH PYTHON
NAAN MUDHALVAN
SUBJECT CODE:23UNM40A
PRESENTED BY,
NAME:K.SABITHA
REG.NO:30323U09073
BACHELOR OF COMPUTER
APPLICATION
Introduction to Data Science
What is Data Science?
Data Science is the study of data to extract meaningful
insights.
It combines statistics for analysis.
Computer Science: For programming and automation
Domain Knowledge: For real-world understanding
Goal: Turn raw data into useful information for decision-making
Why is Data Science Important?
Helps businesses make informed decisions.
Enables predictive analytics and automation.
Used in:
1.Healthcare (disease prediction)
2.E-commerce (recommendation engines)
3.Finance (fraud detection)
4.Social media (sentiment analysis)
Data Science Process
1. Data Collection – Gathering raw data from various sources
2. Data Cleaning – Removing errors and formatting data
3. Exploratory Data Analysis (EDA) – Understanding data patterns
4. Modeling – Applying algorithms to make predictions
5. Evaluation – Checking model performance
6. Deployment – Implementing the model in real-world systems
Python for Data Science
Why Python for Data Science?
Easy to learn and useLarge community and open-
source librariesGreat for data manipulation, analysis, and
visualizationWidely used in industry and academia
Popular Python Libraries
NumPy - Numerical computing
Pandas – Data manipulation and analysis
Matplotlib / Seaborn – Data visualization
Scikit-learn – Machine learning
TensorFlow / PyTorch – Deep learning
Jupyter Notebook – Interactive coding environment
Python Workflow in Data Science
1. Import libraries
2. Load and explore data (Pandas)
3. Clean and preprocess data
4. Visualize insights (Matplotlib/Seaborn)
5. Build and evaluate models (Scikit-learn)
6. Deploy models
Explore Machine learning using python
What is Machine Learning?
A subset of Artificial Intelligence (AI)Allows computers to
learn from data without being explicitly programmed
Types of ML:
Supervised Learning (e.g., classification, regression)
Unsupervised Learning (e.g., clustering, dimensionality
reduction)
Reinforcement Learning (learning through feedback)
Python Libraries for Machine Learning
Scikit-learn – Simple and efficient tools for ML tasks
Pandas & NumPy – Data manipulation and numerical
computing
Matplotlib & Seaborn – Visualization
TensorFlow / PyTorch – Deep learning frameworks
Jupyter Notebook – For interactive development
ML Workflow in Python
1. Load data – Use pandas to import and inspect
2. Preprocess data – Handle missing values, encode
features
3. Split dataset – Train/Test split
4. Train model – Use scikit-learn classifiers or regressors
5. Evaluate – Accuracy, confusion matrix, etc.
6. Predict – Use model to make predictions
Data visualisation using python
Importance of Data Visualization , Makes complex data
easier to understand . Helps identify trends, patterns, and outliers.
Improves communication of results . Essential for
storytelling in data science
Popular Python Libraries for Visualization
Matplotlib – Basic plotting (line, bar, pie, etc.)
Seaborn – Statistical plots with beautiful default
stylesPlotly – Interactive, web-based
visualizationsPandas – Built-in plotting for quick
visualsAltair / Bokeh – Declarative and interactive visualizations
Common Chart Types in Python
Line Chart – Trends over time
Bar Chart – Comparing categories
Histogram – Distribution of values
Scatter Plot – Correlation between variables
Heatmap – Visualizing matrix-style data
(e.g., correlation)
Exploratory Data Analysis
What is EDA?
EDA is the process of analyzing datasets to summarize
their main characteristics.
Goals of EDA:
Discover patterns,Spot anomalies,Test
hypotheses,Check assumptions
Key EDA Techniques
Descriptive Statistics:
Mean, Median, Mode, Std Dev
Data Visualization:
Histograms (distribution)
Box plots (outliers)
Scatter plots (relationships)
Heatmaps (correlation)
Missing Value Analysis
Outlier Detection
EDA Using Python
Popular Libraries:
Pandas – Data manipulation
Matplotlib & Seaborn – Visualizations
Missingno – Missing value analysis

K.sabitha NM.pptx advance data science with python

  • 1.
    ARCOT SRI MAHALAKSHMIWOMEN’S COLLEGE ADVANCE DATA SCIENCE WITH PYTHON NAAN MUDHALVAN SUBJECT CODE:23UNM40A PRESENTED BY, NAME:K.SABITHA REG.NO:30323U09073 BACHELOR OF COMPUTER APPLICATION
  • 2.
    Introduction to DataScience What is Data Science? Data Science is the study of data to extract meaningful insights. It combines statistics for analysis. Computer Science: For programming and automation Domain Knowledge: For real-world understanding Goal: Turn raw data into useful information for decision-making
  • 3.
    Why is DataScience Important? Helps businesses make informed decisions. Enables predictive analytics and automation. Used in: 1.Healthcare (disease prediction) 2.E-commerce (recommendation engines) 3.Finance (fraud detection) 4.Social media (sentiment analysis)
  • 4.
    Data Science Process 1.Data Collection – Gathering raw data from various sources 2. Data Cleaning – Removing errors and formatting data 3. Exploratory Data Analysis (EDA) – Understanding data patterns 4. Modeling – Applying algorithms to make predictions 5. Evaluation – Checking model performance 6. Deployment – Implementing the model in real-world systems
  • 5.
    Python for DataScience Why Python for Data Science? Easy to learn and useLarge community and open- source librariesGreat for data manipulation, analysis, and visualizationWidely used in industry and academia
  • 6.
    Popular Python Libraries NumPy- Numerical computing Pandas – Data manipulation and analysis Matplotlib / Seaborn – Data visualization Scikit-learn – Machine learning TensorFlow / PyTorch – Deep learning Jupyter Notebook – Interactive coding environment
  • 7.
    Python Workflow inData Science 1. Import libraries 2. Load and explore data (Pandas) 3. Clean and preprocess data 4. Visualize insights (Matplotlib/Seaborn) 5. Build and evaluate models (Scikit-learn) 6. Deploy models
  • 8.
    Explore Machine learningusing python What is Machine Learning? A subset of Artificial Intelligence (AI)Allows computers to learn from data without being explicitly programmed Types of ML: Supervised Learning (e.g., classification, regression) Unsupervised Learning (e.g., clustering, dimensionality reduction) Reinforcement Learning (learning through feedback)
  • 9.
    Python Libraries forMachine Learning Scikit-learn – Simple and efficient tools for ML tasks Pandas & NumPy – Data manipulation and numerical computing Matplotlib & Seaborn – Visualization TensorFlow / PyTorch – Deep learning frameworks Jupyter Notebook – For interactive development
  • 10.
    ML Workflow inPython 1. Load data – Use pandas to import and inspect 2. Preprocess data – Handle missing values, encode features 3. Split dataset – Train/Test split 4. Train model – Use scikit-learn classifiers or regressors 5. Evaluate – Accuracy, confusion matrix, etc. 6. Predict – Use model to make predictions
  • 11.
    Data visualisation usingpython Importance of Data Visualization , Makes complex data easier to understand . Helps identify trends, patterns, and outliers. Improves communication of results . Essential for storytelling in data science
  • 12.
    Popular Python Librariesfor Visualization Matplotlib – Basic plotting (line, bar, pie, etc.) Seaborn – Statistical plots with beautiful default stylesPlotly – Interactive, web-based visualizationsPandas – Built-in plotting for quick visualsAltair / Bokeh – Declarative and interactive visualizations
  • 13.
    Common Chart Typesin Python Line Chart – Trends over time Bar Chart – Comparing categories Histogram – Distribution of values Scatter Plot – Correlation between variables Heatmap – Visualizing matrix-style data (e.g., correlation)
  • 14.
    Exploratory Data Analysis Whatis EDA? EDA is the process of analyzing datasets to summarize their main characteristics. Goals of EDA: Discover patterns,Spot anomalies,Test hypotheses,Check assumptions
  • 15.
    Key EDA Techniques DescriptiveStatistics: Mean, Median, Mode, Std Dev Data Visualization: Histograms (distribution) Box plots (outliers) Scatter plots (relationships) Heatmaps (correlation) Missing Value Analysis Outlier Detection
  • 16.
    EDA Using Python PopularLibraries: Pandas – Data manipulation Matplotlib & Seaborn – Visualizations Missingno – Missing value analysis