1.Introduction to data science
2.Python for data science
3.Explore machine learning using
python
4.Data visualization using python
5.Exploratory data analysis
ARCOT SRI MAHAALAKSHMI WOMEN’S
COLLEGE
Subject code:23UNM40A
Advance data science with python
INTRODUCTION TO DATA SCIENCE
Definition: Interdisciplinary field using scientific methods to
extract insights from data
Combines: Statistics, Computer Science, Domain Knowledge
Real-world Applications:
Healthcare (predictive diagnosis)
Business (customer behavior analysis)
• Social Media (recommendation systems)
KEY COMPONENTS OF DATA SCIENCE
Data Collection: Gathering raw data from multiple sources
Data Cleaning: Removing errors, filling missing values
Exploratory Data Analysis (EDA): Understanding patterns and
trends
Modeling: Using algorithms (like regression, decision trees)
Evaluation: Measuring model performance
• Deployment: Integrating model into real-world applications
TOOLS AND SKILLS REQUIRED
Programming: Python, R
Libraries: Pandas, NumPy, Matplotlib, Scikit-learn
Databases: SQL, NoSQL
Machine Learning: Supervised & Unsupervised Learning
• Soft Skills: Critical Thinking, Communication, Business
Understanding
PYTHON FOR DATA SCIENCE
Easy to learn and readable syntax
Large community and vast number of libraries
Integration with data tools (like Jupyter, SQL, Hadoop)
• Widely used in machine learning, AI, and big data
ESSENTIAL PYTHON LIBRARIES
NumPy – numerical operations and array handling
Pandas – data manipulation and analysis
Matplotlib / Seaborn – data visualization
Scikit-learn – machine learning algorithms
• TensorFlow / PyTorch – deep learning frameworks
SAMPLE PYTHON WORKFLOW IN DATA SCIENCE
Import libraries
1.import pandas as pd
import matplotlib.pyplot as plt
2.Load dataset
data = pd.read_csv(‘data.csv’)
3.Clean & analyze
data.dropna(inplace=True)
print(data.describe())
4.Visualize data
data[‘age’].hist()
plt.show()
EXPLORE MECHANIC LEARNING USING
PYTHON
Definition: Mechanic Learning involves applying machine learning (ML)
techniques to mechanical systems for analysis, prediction, and optimization.
Applications:
Predictive maintenance
Fault detection
Performance optimization
Why Python?
Rich libraries (NumPy, SciPy, Scikit-learn, TensorFlow)
• Easy data manipulation and visualization
TOOLS AND TECHNIQUES
Title: Python Tools for Mechanic Learning
Content:
Data Handling: pandas, numpy
Modeling: scikit-learn, tensorflow, keras
Visualization: matplotlib, seaborn
Example Techniques:
Regression for load prediction
Classification for fault detection
• Clustering for operational modes
CASE STUDY AND BENEFITS
1.Content:
Case Study: Vibration data used to detect motor faults using an SVM classifier.
2.Process:
Collect sensor data
Preprocess with Python
Train ML model
Evaluate performance
3.Benefits:
Cost saving
Reduced downtime
Intelligent systems
4.Visuals: Graph showing prediction vs actual, or a sensor-to-model flowchart
DATA VISUALIZATION USING PYTHON
Title: What is Data Visualization?
Content:
Definition: Graphical representation of data to identify patterns, trends, and insights.
Importance:
Makes data easier to understand
Helps in better decision making
Why Python?
Simple syntax and powerful libraries
Popular tools: matplotlib, seaborn, plotly, pandas
• Visuals: Comparison image (raw data table vs bar chart)
POPULAR PYTHON LIBRARIES
Title: Python Libraries for Data Visualization
Content:
Matplotlib: Basic plotting (line, bar, scatter)
Seaborn: Statistical plots (heatmaps, boxplots)
Plotly: Interactive visualizations
Pandas: Built-in plotting for DataFrames
Code Example:
import seaborn as sns
sns.boxplot(x=‘day’, y=‘total_bill’, data=tips)
• Visuals: Side-by-side visuals of different plot types
APPLICATION AND USE CASES
Title: Real-World Applications
Content:
Business Analytics: Sales trend visualization
Healthcare: Patient data visualization
Machine Learning: Model performance plots
Finance: Stock market trends
• Visuals: Example dashboard/chart grid
EXPLORATORY DATA ANALYSIS
Title: What is Exploratory Data Analysis?
Content:
Definition: EDA is the process of analyzing data sets to summarize their main
characteristics.
Purpose:
Understand data structure
Detect outliers and missing values
Find patterns and relationships
Why Python?
Powerful tools like pandas, matplotlib, seaborn, plotly
• Visuals: EDA pipeline (Load Clean Visualize Analyze)
→ → →
KEY STEPS IN EDA
Title: EDA Workflow in Python
Content:
1. Loading Data
import pandas as pd
df = pd.read_csv(‘data.csv’)
2. Summary Statistics
df.describe()
3. Handling Missing Values
df.isnull().sum()
4. Visualizations:
Histograms
Boxplots
Correlation heatmaps
BENEFITS AND USE CASES
Title: EDA in Action
Content:
Use Case: Analyzing Titanic dataset
Found age and class affect survival
Detected missing age values
Benefits:
Drives data cleaning and model prep
Reveals hidden patterns
Supports better decision-making
• Visuals: Titanic survival barplot or correlation matrix
THANK YOU

Radhika (30323U09065).pptx data science with python

  • 1.
    1.Introduction to datascience 2.Python for data science 3.Explore machine learning using python 4.Data visualization using python 5.Exploratory data analysis ARCOT SRI MAHAALAKSHMI WOMEN’S COLLEGE Subject code:23UNM40A Advance data science with python
  • 2.
    INTRODUCTION TO DATASCIENCE Definition: Interdisciplinary field using scientific methods to extract insights from data Combines: Statistics, Computer Science, Domain Knowledge Real-world Applications: Healthcare (predictive diagnosis) Business (customer behavior analysis) • Social Media (recommendation systems)
  • 3.
    KEY COMPONENTS OFDATA SCIENCE Data Collection: Gathering raw data from multiple sources Data Cleaning: Removing errors, filling missing values Exploratory Data Analysis (EDA): Understanding patterns and trends Modeling: Using algorithms (like regression, decision trees) Evaluation: Measuring model performance • Deployment: Integrating model into real-world applications
  • 4.
    TOOLS AND SKILLSREQUIRED Programming: Python, R Libraries: Pandas, NumPy, Matplotlib, Scikit-learn Databases: SQL, NoSQL Machine Learning: Supervised & Unsupervised Learning • Soft Skills: Critical Thinking, Communication, Business Understanding
  • 5.
    PYTHON FOR DATASCIENCE Easy to learn and readable syntax Large community and vast number of libraries Integration with data tools (like Jupyter, SQL, Hadoop) • Widely used in machine learning, AI, and big data
  • 6.
    ESSENTIAL PYTHON LIBRARIES NumPy– numerical operations and array handling Pandas – data manipulation and analysis Matplotlib / Seaborn – data visualization Scikit-learn – machine learning algorithms • TensorFlow / PyTorch – deep learning frameworks
  • 7.
    SAMPLE PYTHON WORKFLOWIN DATA SCIENCE Import libraries 1.import pandas as pd import matplotlib.pyplot as plt 2.Load dataset data = pd.read_csv(‘data.csv’) 3.Clean & analyze data.dropna(inplace=True) print(data.describe()) 4.Visualize data data[‘age’].hist() plt.show()
  • 8.
    EXPLORE MECHANIC LEARNINGUSING PYTHON Definition: Mechanic Learning involves applying machine learning (ML) techniques to mechanical systems for analysis, prediction, and optimization. Applications: Predictive maintenance Fault detection Performance optimization Why Python? Rich libraries (NumPy, SciPy, Scikit-learn, TensorFlow) • Easy data manipulation and visualization
  • 9.
    TOOLS AND TECHNIQUES Title:Python Tools for Mechanic Learning Content: Data Handling: pandas, numpy Modeling: scikit-learn, tensorflow, keras Visualization: matplotlib, seaborn Example Techniques: Regression for load prediction Classification for fault detection • Clustering for operational modes
  • 10.
    CASE STUDY ANDBENEFITS 1.Content: Case Study: Vibration data used to detect motor faults using an SVM classifier. 2.Process: Collect sensor data Preprocess with Python Train ML model Evaluate performance 3.Benefits: Cost saving Reduced downtime Intelligent systems 4.Visuals: Graph showing prediction vs actual, or a sensor-to-model flowchart
  • 11.
    DATA VISUALIZATION USINGPYTHON Title: What is Data Visualization? Content: Definition: Graphical representation of data to identify patterns, trends, and insights. Importance: Makes data easier to understand Helps in better decision making Why Python? Simple syntax and powerful libraries Popular tools: matplotlib, seaborn, plotly, pandas • Visuals: Comparison image (raw data table vs bar chart)
  • 12.
    POPULAR PYTHON LIBRARIES Title:Python Libraries for Data Visualization Content: Matplotlib: Basic plotting (line, bar, scatter) Seaborn: Statistical plots (heatmaps, boxplots) Plotly: Interactive visualizations Pandas: Built-in plotting for DataFrames Code Example: import seaborn as sns sns.boxplot(x=‘day’, y=‘total_bill’, data=tips) • Visuals: Side-by-side visuals of different plot types
  • 13.
    APPLICATION AND USECASES Title: Real-World Applications Content: Business Analytics: Sales trend visualization Healthcare: Patient data visualization Machine Learning: Model performance plots Finance: Stock market trends • Visuals: Example dashboard/chart grid
  • 14.
    EXPLORATORY DATA ANALYSIS Title:What is Exploratory Data Analysis? Content: Definition: EDA is the process of analyzing data sets to summarize their main characteristics. Purpose: Understand data structure Detect outliers and missing values Find patterns and relationships Why Python? Powerful tools like pandas, matplotlib, seaborn, plotly • Visuals: EDA pipeline (Load Clean Visualize Analyze) → → →
  • 15.
    KEY STEPS INEDA Title: EDA Workflow in Python Content: 1. Loading Data import pandas as pd df = pd.read_csv(‘data.csv’) 2. Summary Statistics df.describe() 3. Handling Missing Values df.isnull().sum() 4. Visualizations: Histograms Boxplots Correlation heatmaps
  • 16.
    BENEFITS AND USECASES Title: EDA in Action Content: Use Case: Analyzing Titanic dataset Found age and class affect survival Detected missing age values Benefits: Drives data cleaning and model prep Reveals hidden patterns Supports better decision-making • Visuals: Titanic survival barplot or correlation matrix
  • 17.