Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Prof. S. A. Shivarkar
Assistant Professor
Contact No.8275032712
Email- shivarkarsandipcomp@sanjivani.org.in
Subject- Unsupervised Modeling for AIML (CO9301)
Subject- Unsupervised Modeling for AIML (CO9301)
Content
Content
 Lectures: 4 Hrs/Week
 Credits: 4
 Examination Scheme:
 CIA: 40
 End Semester: 60
Course Outcome
Course Outcome
 Understand project management methodology and
Exploratory data analysis.
 Apply feature engineering techniques.
 Apply clustering techniques.
 Apply dimensionality reduction techniques
 Apply association rules and recommendation system
Tecniques.
 Apply text mining and NLP Techniques.
Course Objective
Course Objective
 To learn CRISP-ML(Q) method of machine learning models
 To understand Clustering, dimensionality reduction
 To learn Association rules and recommendation system
 To understand various NLP strategies
 To learn how to evaluate the models and performance metrics
Unit I: Requirement to Machine Learning
Unit I: Requirement to Machine Learning
 Project management methodology(CRISP-ML (Q)),Prescriptive
Analytics, Predictive Analytics, Diagnostic Analytics, Descriptive
Analytics, introduction of data types, measurement levels,
measure of central tendency, expected value ,Explorative data
analysis, number summary, boxplot, bargraph, Histogram,
correlation graph, scatter plots ,exploring two or more
variables,Data sampling and its types,various types bias.
Unit II: Feature Engineering Techniques
Unit II: Feature Engineering Techniques
 Dummy variables conversion techniques Standardization and
normalization, outlier identification and outlier treatment
techniques, skewness identification and its treatment. Finding
null values and its treatment.
Unit III: Unsupervised Learning-Clustering
Unit III: Unsupervised Learning-Clustering
 Supervised Vs Unsupervised learning, clustering/segmentation
algorithms-Hierarchical, Distance metrics for categorical data,
Distance metrics for continuous ,distance metrics for mixed
data, distance for clusters, k-means clustering, k selection-
elbow curve, drawbacks and comparison
Unit IV: Unsupervised Learning -Dimensionality Reduction
Unit IV: Unsupervised Learning -Dimensionality Reduction
 Need for dimensionality reduction, Principal component
analysis(PCA),applications for PCA, Singular Value
Decomposition(SVD),application of SVD
Unit V: Unsupervised Learning -
Unit V: Unsupervised Learning -Association rules and
Association rules and
recommendation system
recommendation system
 Market basket analysis,Association rules intuition,Association
rules applications ,Association rules terminology, need for
recommendation systems,similaritymeasures,user based
recommendation system,item to item collaborative filtering.
Unit VI:
Unit VI: Text Mining-Sentiment Analysis and NLP
Text Mining-Sentiment Analysis and NLP
 Need of text mining, Bag of words, terminology and
preprocessing,DTM and TDM,corpus level word cloud.
Introduction of NLP,data preprocessing in NLP context ,NLP
terminology ,feature extraction from text,topic modeling,
vector representation
Unit I: Requirement to Machine Learning
Unit I: Requirement to Machine Learning
 Project management methodology(CRISP-ML (Q)),Prescriptive
Analytics, Predictive Analytics, Diagnostic Analytics, Descriptive
Analytics, introduction of data types, measurement levels,
measure of central tendency, expected value ,Explorative data
analysis, number summary, boxplot, bargraph, Histogram,
correlation graph, scatter plots ,exploring two or more
variables,Data sampling and its types,various types bias.
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
 Overall, the CRISP-ML(Q) process model describes six phases:
1. Business and Data Understanding
2. Data Engineering (Data Preparation)
3. Machine Learning Model Engineering
4. Quality Assurance for Machine Learning Applications
5. Deployment
6. Monitoring and Maintenance.
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
 Business and Data Understanding:
 Developing machine learning applications starts with
identifying the scope of the ML application, the success
criteria, and a data quality verification.
 The goal of this first phase is to ensure the feasibility of the
project.
 Defining clear and measurable Key Performance Indicators
(KPI) such as “time savings per user and session” is required.
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
 Machine Learning Model Engineering
 The modeling phase includes model selection, model
specialization, and model training tasks.
 Additionally, depending on the application, we might use a pre-trained
model, compress the model, or apply ensemble learning methods to get the
final ML model.
 Many phases in ML development are iterative.
 Sometimes, we might need to review the business goals, KPIs, and available
data from the previous steps to adjust the outcomes of the ML model results.
 Finally, we package the ML workflow in a pipeline to create repeatable model
training during the modeling phase.
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
 Evaluating Machine Learning Models
 Model training is followed by a model evaluation phase, also known as offline testing.
 During this phase, the performance of the trained model needs to be validated on a
test set.
 Additionally, the model robustness should be assessed using noisy or wrong input data.
 Finally, the model deployment decision should be met automatically based on success
criteria or manually by domain and ML experts. Similar to the modeling phase, all
outcomes of the evaluation phase need to be documented.
 Deployment: a process of the ML model integration into the existing software system.
 Monitoring and Maintenance
 https://ml-ops.org/content/crisp-ml
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
 Deployment:
 The ML model deployment denotes a process of the ML model integration
into the existing software system.
 After succeeding in the evaluation step in the ML development life cycle, the
ML model is graduated to be deployed in the (pre-) production environment.
 The ML model deployment includes the following tasks: inference hardware
definition, model evaluation in a production environment (online testing, e.g.,
A/B tests), providing user acceptance and usability testing, providing a fall-
back plan for model outages, and setting up the deployment strategy to roll
out the new model gradually (e.g. canary or green/blue deployment).
Project management methodology(CRISP-ML (Q))
Project management methodology(CRISP-ML (Q))
 Monitoring and Maintenance
 Once the ML model has been put into production, it is essential to monitor its performance
and maintain it.
 When an ML model performs on real-world data, the main risk is the “model staleness”
effect when the performance of the ML model drops as it starts operating on unseen data.
 Furthermore, model performance is affected by hardware performance and the existing
software stack.
 Therefore, the best practice to prevent the model performance drop is to perform
the monitoring task when the model performance is continuously evaluated to decide
whether the model needs to be re-trained.
 This is known as the Continued Model Evaluation pattern.
 The decision from the monitoring task leads to the second task - updating the ML model.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 20
Reference
Reference
 https://ml-ops.org/content/crisp-ml

Project management methodology(CRISP-ML (Q))

  • 1.
    Sanjivani Rural EducationSociety’s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Prof. S. A. Shivarkar Assistant Professor Contact No.8275032712 Email- shivarkarsandipcomp@sanjivani.org.in Subject- Unsupervised Modeling for AIML (CO9301) Subject- Unsupervised Modeling for AIML (CO9301)
  • 2.
    Content Content  Lectures: 4Hrs/Week  Credits: 4  Examination Scheme:  CIA: 40  End Semester: 60
  • 3.
    Course Outcome Course Outcome Understand project management methodology and Exploratory data analysis.  Apply feature engineering techniques.  Apply clustering techniques.  Apply dimensionality reduction techniques  Apply association rules and recommendation system Tecniques.  Apply text mining and NLP Techniques.
  • 4.
    Course Objective Course Objective To learn CRISP-ML(Q) method of machine learning models  To understand Clustering, dimensionality reduction  To learn Association rules and recommendation system  To understand various NLP strategies  To learn how to evaluate the models and performance metrics
  • 5.
    Unit I: Requirementto Machine Learning Unit I: Requirement to Machine Learning  Project management methodology(CRISP-ML (Q)),Prescriptive Analytics, Predictive Analytics, Diagnostic Analytics, Descriptive Analytics, introduction of data types, measurement levels, measure of central tendency, expected value ,Explorative data analysis, number summary, boxplot, bargraph, Histogram, correlation graph, scatter plots ,exploring two or more variables,Data sampling and its types,various types bias.
  • 6.
    Unit II: FeatureEngineering Techniques Unit II: Feature Engineering Techniques  Dummy variables conversion techniques Standardization and normalization, outlier identification and outlier treatment techniques, skewness identification and its treatment. Finding null values and its treatment.
  • 7.
    Unit III: UnsupervisedLearning-Clustering Unit III: Unsupervised Learning-Clustering  Supervised Vs Unsupervised learning, clustering/segmentation algorithms-Hierarchical, Distance metrics for categorical data, Distance metrics for continuous ,distance metrics for mixed data, distance for clusters, k-means clustering, k selection- elbow curve, drawbacks and comparison
  • 8.
    Unit IV: UnsupervisedLearning -Dimensionality Reduction Unit IV: Unsupervised Learning -Dimensionality Reduction  Need for dimensionality reduction, Principal component analysis(PCA),applications for PCA, Singular Value Decomposition(SVD),application of SVD
  • 9.
    Unit V: UnsupervisedLearning - Unit V: Unsupervised Learning -Association rules and Association rules and recommendation system recommendation system  Market basket analysis,Association rules intuition,Association rules applications ,Association rules terminology, need for recommendation systems,similaritymeasures,user based recommendation system,item to item collaborative filtering.
  • 10.
    Unit VI: Unit VI:Text Mining-Sentiment Analysis and NLP Text Mining-Sentiment Analysis and NLP  Need of text mining, Bag of words, terminology and preprocessing,DTM and TDM,corpus level word cloud. Introduction of NLP,data preprocessing in NLP context ,NLP terminology ,feature extraction from text,topic modeling, vector representation
  • 11.
    Unit I: Requirementto Machine Learning Unit I: Requirement to Machine Learning  Project management methodology(CRISP-ML (Q)),Prescriptive Analytics, Predictive Analytics, Diagnostic Analytics, Descriptive Analytics, introduction of data types, measurement levels, measure of central tendency, expected value ,Explorative data analysis, number summary, boxplot, bargraph, Histogram, correlation graph, scatter plots ,exploring two or more variables,Data sampling and its types,various types bias.
  • 12.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))  Overall, the CRISP-ML(Q) process model describes six phases: 1. Business and Data Understanding 2. Data Engineering (Data Preparation) 3. Machine Learning Model Engineering 4. Quality Assurance for Machine Learning Applications 5. Deployment 6. Monitoring and Maintenance.
  • 13.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))
  • 14.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))
  • 15.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))  Business and Data Understanding:  Developing machine learning applications starts with identifying the scope of the ML application, the success criteria, and a data quality verification.  The goal of this first phase is to ensure the feasibility of the project.  Defining clear and measurable Key Performance Indicators (KPI) such as “time savings per user and session” is required.
  • 16.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))  Machine Learning Model Engineering  The modeling phase includes model selection, model specialization, and model training tasks.  Additionally, depending on the application, we might use a pre-trained model, compress the model, or apply ensemble learning methods to get the final ML model.  Many phases in ML development are iterative.  Sometimes, we might need to review the business goals, KPIs, and available data from the previous steps to adjust the outcomes of the ML model results.  Finally, we package the ML workflow in a pipeline to create repeatable model training during the modeling phase.
  • 17.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))  Evaluating Machine Learning Models  Model training is followed by a model evaluation phase, also known as offline testing.  During this phase, the performance of the trained model needs to be validated on a test set.  Additionally, the model robustness should be assessed using noisy or wrong input data.  Finally, the model deployment decision should be met automatically based on success criteria or manually by domain and ML experts. Similar to the modeling phase, all outcomes of the evaluation phase need to be documented.  Deployment: a process of the ML model integration into the existing software system.  Monitoring and Maintenance  https://ml-ops.org/content/crisp-ml
  • 18.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))  Deployment:  The ML model deployment denotes a process of the ML model integration into the existing software system.  After succeeding in the evaluation step in the ML development life cycle, the ML model is graduated to be deployed in the (pre-) production environment.  The ML model deployment includes the following tasks: inference hardware definition, model evaluation in a production environment (online testing, e.g., A/B tests), providing user acceptance and usability testing, providing a fall- back plan for model outages, and setting up the deployment strategy to roll out the new model gradually (e.g. canary or green/blue deployment).
  • 19.
    Project management methodology(CRISP-ML(Q)) Project management methodology(CRISP-ML (Q))  Monitoring and Maintenance  Once the ML model has been put into production, it is essential to monitor its performance and maintain it.  When an ML model performs on real-world data, the main risk is the “model staleness” effect when the performance of the ML model drops as it starts operating on unseen data.  Furthermore, model performance is affected by hardware performance and the existing software stack.  Therefore, the best practice to prevent the model performance drop is to perform the monitoring task when the model performance is continuously evaluated to decide whether the model needs to be re-trained.  This is known as the Continued Model Evaluation pattern.  The decision from the monitoring task leads to the second task - updating the ML model.
  • 20.
    DEPARTMENT OF COMPUTERENGINEERING, Sanjivani COE, Kopargaon 20 Reference Reference  https://ml-ops.org/content/crisp-ml