SlideShare a Scribd company logo
1 of 17
Download to read offline
DATA
SCIENCE LAB
PROJECT
Master Degree: Data Science
Accomplished by:
A. Portaluppi & L. Ravazzi &
M. Spandri
A.A. 2019-2020
1
INTRODUCTION
DATA
SET
DEMS Publications
(Dipartimento di
Economia, Metodi
Quantitativi e Strategie
di Impresa).
Find topics studied
by DEMS universitary
researcher.
Multidimensional
Scaling techniques
and Cluster
Analysis.
2
PURPOSES TOOLS
DATA
MANAGEMENT:
1. Exploration
2. Preprocessing
3. Data Cleaning
4. NLP
MULTI-
DIMENSIONAL
SCALING:
1. Common
Multidimension
al Scaling
2. Metric Scaling
3. Sammon
Mapping
CLUSTER
ANALYSIS:
Prototype-Based:
Fuzzy Algorithm
DATA
VISUALIZATION:
RShiny
Application
STEP BY STEP
3
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
ID TITLE JOURNAL
ABSTRAC
T
ABSTRAC
T_ENG
KEYWOR
DS
KEYWOR
DS_ENG
235 … … …
ID DEMS_AUTHORS
235 …
235 …
235 …
…
HOW TO MANAGE THE DATA SETS?
4
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
HOW TO CHOOSE
RECORDS?
30% DOCUMENTS LEFT
5
JOURNAL ARTICLES
WRITTEN BY
ASSISTANT
PROFESSORS ETC.
ENGLISH
LANGUAGE
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
What does it means in English?
Perfect! There is a field
which specifies the
language.
The language of an article is
the language of the abstract
(textcat function).
6
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
NATURAL LANGUAGE PROCESSING
Start with mixed texts
(title, abstract,
keywords and journal)
Bag of
words
1. Drop out punctuation,
stop-words, non-letter
character
2. All in lower case
3. Stemming process
1 2 3 4
Compute
tf_idf
7
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
MULTIDIMENSIONAL SCALING
WHAT?
A function to
project data from
a N-dimensional
space to 2 or 3
dimensions
WHY?
• Graphical
approach
(Clustering)
• Increase
Interpretability
HOW?
• Metric
• Non Metric
8
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
Table with text and the
number of terms into the
bag of words.
Choose a proximity measures.
Apply the desidered
technique.
9
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
1. Common Multidimensional Scaling
(Euclidean distance)
2. Metric Scaling
3. Sammon Mapping (Manhattan
distance)
We applied three
techniques:
and will describe only the last
one due to the good results.
10
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
SAMMON MAPPING
Minimize Sammon Stress:
where is the distance
between the i-th and j-th
observation in the initial space,
while refers to the final
space.
For metric
and non
metric data
Non-linear
trasformation
approach
(different from
PCA)
11
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
Since an article can touch
different topics, clustering must
be of fuzzy type.
CLUSTERING
Labels of clusters rely on the
fifteen words most frequent
in the bag of words.
12
Manhattan distance is used
in order to build clusters.
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
LABELS
13
DEMS
ECONOMICS
STATISTICS
BUSINESS
STRATEGY
• Finance and
Energy
• Economic policy
• Macroeconomics
• Income
Distribution
• Game Theory
• Health Statistics
• Pure Statistics
• Statistics and
Finance
• Social Issues
• Industrial Economic
• Corporate Finance
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
14
MOVE TO
RSHINY!
1
2
3
SUMMARY
CONCLUSIONS
15
Multidimensional
scaling is a
powerful tool to
visualize data.
We found the main
topics studied by
DEMS researches.
MDS and
Clustering can
show interesting
patterns in data.
FUTURE DEVELOPMENTS
Other techniques for scaling, such as
Self Organizing Maps.
Other proximity measures for MDS.
Consider not only singleton into the
bag of words (Association Analysis).
16
17
THANK YOU
FOR YOUR
ATTENTION

More Related Content

What's hot

What's hot (20)

2.6 Curve Sketching Rcbhs
2.6 Curve Sketching Rcbhs2.6 Curve Sketching Rcbhs
2.6 Curve Sketching Rcbhs
 
Presentation on application of matrix
Presentation on application of matrixPresentation on application of matrix
Presentation on application of matrix
 
Matrix and it's Application
Matrix and it's ApplicationMatrix and it's Application
Matrix and it's Application
 
Spatial Data Model 2
Spatial Data Model 2Spatial Data Model 2
Spatial Data Model 2
 
Geographical information system unit 5
Geographical information  system unit 5Geographical information  system unit 5
Geographical information system unit 5
 
Matrix in software engineering
Matrix in software engineeringMatrix in software engineering
Matrix in software engineering
 
Applications of Matrix
Applications of MatrixApplications of Matrix
Applications of Matrix
 
How to train your mind to think like the ai machine you are training
How to train your mind to think like the ai machine you are trainingHow to train your mind to think like the ai machine you are training
How to train your mind to think like the ai machine you are training
 
Application of calculus in cse
Application of calculus in cseApplication of calculus in cse
Application of calculus in cse
 
Uses Of Calculus is Computer Science
Uses Of Calculus is Computer ScienceUses Of Calculus is Computer Science
Uses Of Calculus is Computer Science
 
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
Data Visualisation using SSRS: Euclid's Royal Road to the numbersData Visualisation using SSRS: Euclid's Royal Road to the numbers
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
 
Use of matrix in daily life
Use of matrix in daily lifeUse of matrix in daily life
Use of matrix in daily life
 
Application of matrices in real life and matrix
Application of matrices in real life and matrixApplication of matrices in real life and matrix
Application of matrices in real life and matrix
 
datamodel_vector
datamodel_vectordatamodel_vector
datamodel_vector
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysis
 
2.6b scatter plots and lines of best fit
2.6b scatter plots and lines of best fit2.6b scatter plots and lines of best fit
2.6b scatter plots and lines of best fit
 
Calculus
CalculusCalculus
Calculus
 
Geo-spatial Analysis and Modelling
Geo-spatial Analysis and ModellingGeo-spatial Analysis and Modelling
Geo-spatial Analysis and Modelling
 
Applications of Linear Algebra in Computer Sciences
Applications of Linear Algebra in Computer SciencesApplications of Linear Algebra in Computer Sciences
Applications of Linear Algebra in Computer Sciences
 
Applications of matrices in Real\Daily life
Applications of matrices in Real\Daily lifeApplications of matrices in Real\Daily life
Applications of matrices in Real\Daily life
 

Similar to Data science lab project

Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
butest
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
NitishChoudhary23
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 

Similar to Data science lab project (20)

UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
 
Intro & Applications of Discrete Math
Intro & Applications of Discrete MathIntro & Applications of Discrete Math
Intro & Applications of Discrete Math
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
Linear Regression with R programming.pptx
Linear Regression with R programming.pptxLinear Regression with R programming.pptx
Linear Regression with R programming.pptx
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Data science lab project