SlideShare a Scribd company logo
Effective and
Unsupervised
Fractal-based
Feature Selection
for Very Large
Datasets
Removing linear and non-linear attribute correlations
Antonio Canabrava Fraideinberze
Jose F Rodrigues-Jr
Robson Leonardo Ferreira Cordeiro
Databases and Images Group
University of São Paulo
São Carlos - SP - Brazil
2
Terabytes?
…
How to analyze that data?
3
Terabytes?
Parallel processing
and dimensionality
reduction, for
sure...
…
How to analyze that data?
How to analyze that data?
4
Terabytes?
, but how to remove
linear and non-linear
attribute correlations,
besides irrelevant
attributes?
…
How to analyze that data?
5
Terabytes?
, and how to reduce
dimensionality without
human supervision
and being task
independent?
…
6
Terabytes?
Curl-Remover
Medium-
dimensionality
…
How to analyze that data?
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
7
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
8
Fundamental Concepts
Fractal Theory
...
...
...
...
9
Fundamental Concepts
Fractal Theory
...
...
...
...
10
Fundamental Concepts
Fractal Theory
11
Fundamental Concepts
Fractal Theory
12
Fundamental Concepts
Fractal Theory
Embedded, Intrinsic and Fractal Correlation Dimension
Fractal Correlation Dimension ≅ Intrinsic Dimension
13
Fundamental Concepts
Fractal Theory
Embedded, Intrinsic and Fractal Correlation Dimension
Embedded dimension ≅ 3
Intrinsic dimension ≅ 1
Embedded dimension ≅ 3
Intrinsic dimension ≅ 2
14
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
15
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
16
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
log(r)
17
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
log(r)
18
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
19
Multidimensional
Quad-tree
[Traina Jr. et al, 2000]
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
20
Related Work
Dimensionality Reduction - Taxonomy 1
Dimensionality
Reduction
Supervised Algorithms
Unsupervised
Algorithms
Principal Component
Analysis
Singular Vector
Decomposition
Fractal Dimension
Reduction
21
Related Work
Dimensionality Reduction - Taxonomy 2
Dimensionality
Reduction
Feature ExtractionFeature Selection
Principal Component
Analysis
Singular Vector
Decomposition
Fractal Dimension
Reduction
EmbeddedFilterWrapper
22
Related Work
23
Terabytes?
Existing methods need supervision,
miss non-linear correlations, cannot
handle Big Data or work for
classification only
…
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
24
General Idea
25
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
General Idea
26
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Builds partial trees
for the full dataset
and for its E
(E-1)-dimensional
projections
General Idea
27
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
TreeID
+
cell
spatial
position
Partial
count of
points
General Idea
28
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Sums partial point
counts and reports
log(r) and log(sum2)
for each tree
General Idea
29
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Computes D2 for
the full dataset and
pD2 for each of its E
(E-1)-dimensional
projections
General Idea
30
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
The least relevant
attribute, i.e., the one
not in the projection
that minimizes
| D2 - pD2 |
General Idea
31
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Spots the second
least relevant
attribute …
General Idea
3 Main Issues
32
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
General Idea
3 Main Issues
33
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
1° Too much data to
be shuffled – one
data pair per cell/tree
General Idea
3 Main Issues
34
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
2° One
data pass
per
irrelevant
attribute
General Idea
3 Main Issues
35
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
3° Not enough
memory for mappers
Proposed Method
Curl-Remover
36
1° Issue - Too much data to be shuffled; one data pair per
cell/tree;
Our solution - Two-phase dimensionality reduction:
a) Serial feature selection in a tiny data sample (one reducer). Used to
speed-up processing only;
b) All mappers project data into a fixed subspace
37
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.Builds/reports N (2 or
3) tree levels of
lowest resolution…
Proposed Method
Curl-Remover
38
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
… plus the points
projected into the M (2
or 3) most relevant
attributes of sample
Proposed Method
Curl-Remover
39
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Builds the full trees from
their low resolution level
cells and the projected
points
Proposed Method
Curl-Remover
40
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Proposed Method
Curl-Remover
High resolution cells
are never shuffled
Proposed Method
Curl-Remover
41
2° Issue - One data pass per irrelevant attribute;
Our solution – Stores/reads the tree level of highest
resolution, instead of the original data.
42
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Rdb = cost to read dataset;
TWRtree = cost to transfer,
write and read the last tree
level in next reduce step;
If (Rdb > TWRtree)
then writes tree;
Proposed Method
Curl-Remover
43
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Proposed Method
Curl-Remover
44
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Writes tree’s last level in
HDFS
Proposed Method
Curl-Remover
45
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Reads tree’s last level
from HDFS
Proposed Method
Curl-Remover
46
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Proposed Method
Curl-Remover
Reads dataset
only twice
Proposed Method
Curl-Remover
47
3° Issue - Not enough memory for mappers;
Our solution – Sorts data in mappers and reports “tree slices”
whenever needed.
48
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Sorts its local points and
builds “tree slices”
monitoring memory
consumption
Proposed Method
Curl-Remover
Proposed Method
Curl-Remover
49
Y
X
Proposed Method
Curl-Remover
50
Reports “tree slices”
with very little overlap
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
51
Evaluation
Datasets
Sierpinski - Sierpinski Triangle + 1 attribute linearly correlated + 2 attributes non-
linearly correlated. 5 attributes, 1.1 billion points;
Sierpinski Hybrid - Sierpinski Triangle + 1 attribute non-linearly correlated + 2
random attributes. 5 attributes, 1.1 billion points;
Yahoo! Network Flows - communication patterns between end-users in the web. 12
attributes, 562 million points;
Astro - high-resolution cosmological simulation. 6 attributes, 1 billion points;
Hepmass - physics-related dataset with particles of unknown mass. 28 attributes, 10.5
million points;
Hepmass Duplicated – Hepmass + 28 correlated attributes. 56 attributes, 10.5
million points.
52
Evaluation
Fractal Dimension
Hepmass
53
Evaluation
Fractal Dimension
Hepmass Duplicated
54
Evaluation
Comparison with sPCA - Classification
55
Evaluation
Comparison with sPCA - Classification
56
8% more accurate,
7.5% faster
Evaluation
Comparison with sPCA
Percentage of Fractal Dimension after selection
57
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
58
Conclusions
 Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
Unsupervised - it does not require the user to guess the number of attributes
to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics of
the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
59
Conclusions
 Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
 Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
Unsupervised - it does not require the user to guess the number of attributes
to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics of
the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
60
Conclusions
 Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
 Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
 Unsupervised - it does not require the user to guess the number of
attributes to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics of
the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
61
Conclusions
 Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
 Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
 Unsupervised - it does not require the user to guess the number of
attributes to be removed neither requires a training set;
 Semantics - it is a feature selection method, thus maintaining the semantics
of the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
62
Conclusions
 Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
 Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
 Unsupervised - it does not require the user to guess the number of
attributes to be removed neither requires a training set;
 Semantics - it is a feature selection method, thus maintaining the semantics
of the attributes;
 Generality - it suits for analytical tasks in general, and not only for
classification;
63
Questions?
robson@icmc.usp.br
Hepmass Duplicated

More Related Content

What's hot

Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
Venkata Reddy Konasani
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
izahn
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
07 learning
07 learning07 learning
07 learning
ankit_ppt
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Alexandros Karatzoglou
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
IRJET Journal
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
台灣資料科學年會
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
Ashwin Shiv
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
Pranav Challa
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
Sri Ambati
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
ankit_ppt
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
Jenny Liu
 
Fuzzy logic member functions
Fuzzy logic member functionsFuzzy logic member functions
Fuzzy logic member functions
Dr. C.V. Suresh Babu
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Intuit Inc.
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
Kaniska Mandal
 

What's hot (18)

Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
07 learning
07 learning07 learning
07 learning
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
Fuzzy logic member functions
Fuzzy logic member functionsFuzzy logic member functions
Fuzzy logic member functions
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 

Viewers also liked

Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisions
Universidade de São Paulo
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
Universidade de São Paulo
 
Frequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data explorationFrequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data exploration
Universidade de São Paulo
 
SuperGraph visualization
SuperGraph visualizationSuperGraph visualization
SuperGraph visualization
Universidade de São Paulo
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
Universidade de São Paulo
 
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagioUniversidade de São Paulo
 
Apresentacao vldb
Apresentacao vldbApresentacao vldb
Apresentacao vldb
Universidade de São Paulo
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical Study
Universidade de São Paulo
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
Universidade de São Paulo
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...
Universidade de São Paulo
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Universidade de São Paulo
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Universidade de São Paulo
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
Universidade de São Paulo
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Universidade de São Paulo
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
Universidade de São Paulo
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Universidade de São Paulo
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Universidade de São Paulo
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
Universidade de São Paulo
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Universidade de São Paulo
 

Viewers also liked (19)

Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisions
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Frequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data explorationFrequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data exploration
 
SuperGraph visualization
SuperGraph visualizationSuperGraph visualization
SuperGraph visualization
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
 
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
 
Apresentacao vldb
Apresentacao vldbApresentacao vldb
Apresentacao vldb
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical Study
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
 

Similar to Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations

Data analytics concepts
Data analytics conceptsData analytics concepts
Data analytics concepts
Hiranthi Tennakoon
 
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Parinda Rajapaksha
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
Bhaskar Mitra
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsBranch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection Algorithms
Chamin Nalinda Loku Gam Hewage
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
SanjanaSaxena17
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
Collin Bennett
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
hyunsung lee
 
background.pptx
background.pptxbackground.pptx
background.pptx
KabileshCm
 
principle component analysis pca - machine learning - unsupervised learning
principle component analysis pca - machine learning - unsupervised learningprinciple component analysis pca - machine learning - unsupervised learning
principle component analysis pca - machine learning - unsupervised learning
EmanAsem4
 
PCA_csep546.pptx
PCA_csep546.pptxPCA_csep546.pptx
PCA_csep546.pptx
KhushiHarsure
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
Sivam Chinna
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
Arthur Mensch
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
Bhaskar Mitra
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
mariuseriksen4
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
Bhaskar Mitra
 

Similar to Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations (20)

Data analytics concepts
Data analytics conceptsData analytics concepts
Data analytics concepts
 
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsBranch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection Algorithms
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
PCA_csep546.pptx
PCA_csep546.pptxPCA_csep546.pptx
PCA_csep546.pptx
 
principle component analysis pca - machine learning - unsupervised learning
principle component analysis pca - machine learning - unsupervised learningprinciple component analysis pca - machine learning - unsupervised learning
principle component analysis pca - machine learning - unsupervised learning
 
PCA_csep546.pptx
PCA_csep546.pptxPCA_csep546.pptx
PCA_csep546.pptx
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 

More from Universidade de São Paulo

A gentle introduction to Deep Learning
A gentle introduction to Deep LearningA gentle introduction to Deep Learning
A gentle introduction to Deep Learning
Universidade de São Paulo
 
Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
Universidade de São Paulo
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Universidade de São Paulo
 
Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...
Universidade de São Paulo
 
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Universidade de São Paulo
 

More from Universidade de São Paulo (11)

A gentle introduction to Deep Learning
A gentle introduction to Deep LearningA gentle introduction to Deep Learning
A gentle introduction to Deep Learning
 
Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
 
Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...
 
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
 
Java generics-basics
Java generics-basicsJava generics-basics
Java generics-basics
 
Java collections-basic
Java collections-basicJava collections-basic
Java collections-basic
 
Java network-sockets-etc
Java network-sockets-etcJava network-sockets-etc
Java network-sockets-etc
 
Java streams
Java streamsJava streams
Java streams
 
Infovis tutorial
Infovis tutorialInfovis tutorial
Infovis tutorial
 
Java platform
Java platformJava platform
Java platform
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations

  • 1. Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets Removing linear and non-linear attribute correlations Antonio Canabrava Fraideinberze Jose F Rodrigues-Jr Robson Leonardo Ferreira Cordeiro Databases and Images Group University of São Paulo São Carlos - SP - Brazil
  • 3. 3 Terabytes? Parallel processing and dimensionality reduction, for sure... … How to analyze that data?
  • 4. How to analyze that data? 4 Terabytes? , but how to remove linear and non-linear attribute correlations, besides irrelevant attributes? …
  • 5. How to analyze that data? 5 Terabytes? , and how to reduce dimensionality without human supervision and being task independent? …
  • 7. Agenda Fundamental Concepts Related Work Proposed Method Evaluation Conclusion 7
  • 8. Agenda Fundamental Concepts Related Work Proposed Method Evaluation Conclusion 8
  • 13. Fundamental Concepts Fractal Theory Embedded, Intrinsic and Fractal Correlation Dimension Fractal Correlation Dimension ≅ Intrinsic Dimension 13
  • 14. Fundamental Concepts Fractal Theory Embedded, Intrinsic and Fractal Correlation Dimension Embedded dimension ≅ 3 Intrinsic dimension ≅ 1 Embedded dimension ≅ 3 Intrinsic dimension ≅ 2 14
  • 15. Fundamental Concepts Fractal Theory Fractal Correlation Dimension - Box Counting 15
  • 16. Fundamental Concepts Fractal Theory Fractal Correlation Dimension - Box Counting 16
  • 17. Fundamental Concepts Fractal Theory Fractal Correlation Dimension - Box Counting log(r) 17
  • 18. Fundamental Concepts Fractal Theory Fractal Correlation Dimension - Box Counting log(r) 18
  • 19. Fundamental Concepts Fractal Theory Fractal Correlation Dimension - Box Counting 19 Multidimensional Quad-tree [Traina Jr. et al, 2000]
  • 20. Agenda Fundamental Concepts Related Work Proposed Method Evaluation Conclusion 20
  • 21. Related Work Dimensionality Reduction - Taxonomy 1 Dimensionality Reduction Supervised Algorithms Unsupervised Algorithms Principal Component Analysis Singular Vector Decomposition Fractal Dimension Reduction 21
  • 22. Related Work Dimensionality Reduction - Taxonomy 2 Dimensionality Reduction Feature ExtractionFeature Selection Principal Component Analysis Singular Vector Decomposition Fractal Dimension Reduction EmbeddedFilterWrapper 22
  • 23. Related Work 23 Terabytes? Existing methods need supervision, miss non-linear correlations, cannot handle Big Data or work for classification only …
  • 24. Agenda Fundamental Concepts Related Work Proposed Method Evaluation Conclusion 24
  • 25. General Idea 25 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance.
  • 26. General Idea 26 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Builds partial trees for the full dataset and for its E (E-1)-dimensional projections
  • 27. General Idea 27 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. TreeID + cell spatial position Partial count of points
  • 28. General Idea 28 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Sums partial point counts and reports log(r) and log(sum2) for each tree
  • 29. General Idea 29 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Computes D2 for the full dataset and pD2 for each of its E (E-1)-dimensional projections
  • 30. General Idea 30 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. The least relevant attribute, i.e., the one not in the projection that minimizes | D2 - pD2 |
  • 31. General Idea 31 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Spots the second least relevant attribute …
  • 32. General Idea 3 Main Issues 32 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance.
  • 33. General Idea 3 Main Issues 33 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. 1° Too much data to be shuffled – one data pair per cell/tree
  • 34. General Idea 3 Main Issues 34 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. 2° One data pass per irrelevant attribute
  • 35. General Idea 3 Main Issues 35 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. 3° Not enough memory for mappers
  • 36. Proposed Method Curl-Remover 36 1° Issue - Too much data to be shuffled; one data pair per cell/tree; Our solution - Two-phase dimensionality reduction: a) Serial feature selection in a tiny data sample (one reducer). Used to speed-up processing only; b) All mappers project data into a fixed subspace
  • 37. 37 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance.Builds/reports N (2 or 3) tree levels of lowest resolution… Proposed Method Curl-Remover
  • 38. 38 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. … plus the points projected into the M (2 or 3) most relevant attributes of sample Proposed Method Curl-Remover
  • 39. 39 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Builds the full trees from their low resolution level cells and the projected points Proposed Method Curl-Remover
  • 40. 40 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Proposed Method Curl-Remover High resolution cells are never shuffled
  • 41. Proposed Method Curl-Remover 41 2° Issue - One data pass per irrelevant attribute; Our solution – Stores/reads the tree level of highest resolution, instead of the original data.
  • 42. 42 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Rdb = cost to read dataset; TWRtree = cost to transfer, write and read the last tree level in next reduce step; If (Rdb > TWRtree) then writes tree; Proposed Method Curl-Remover
  • 43. 43 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Proposed Method Curl-Remover
  • 44. 44 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Writes tree’s last level in HDFS Proposed Method Curl-Remover
  • 45. 45 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Reads tree’s last level from HDFS Proposed Method Curl-Remover
  • 46. 46 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Proposed Method Curl-Remover Reads dataset only twice
  • 47. Proposed Method Curl-Remover 47 3° Issue - Not enough memory for mappers; Our solution – Sorts data in mappers and reports “tree slices” whenever needed.
  • 48. 48 Removes the E - ⌈D2⌉ least relevant attributes, one at a time in ascending order of relevance. Sorts its local points and builds “tree slices” monitoring memory consumption Proposed Method Curl-Remover
  • 50. Proposed Method Curl-Remover 50 Reports “tree slices” with very little overlap
  • 51. Agenda Fundamental Concepts Related Work Proposed Method Evaluation Conclusion 51
  • 52. Evaluation Datasets Sierpinski - Sierpinski Triangle + 1 attribute linearly correlated + 2 attributes non- linearly correlated. 5 attributes, 1.1 billion points; Sierpinski Hybrid - Sierpinski Triangle + 1 attribute non-linearly correlated + 2 random attributes. 5 attributes, 1.1 billion points; Yahoo! Network Flows - communication patterns between end-users in the web. 12 attributes, 562 million points; Astro - high-resolution cosmological simulation. 6 attributes, 1 billion points; Hepmass - physics-related dataset with particles of unknown mass. 28 attributes, 10.5 million points; Hepmass Duplicated – Hepmass + 28 correlated attributes. 56 attributes, 10.5 million points. 52
  • 55. Evaluation Comparison with sPCA - Classification 55
  • 56. Evaluation Comparison with sPCA - Classification 56 8% more accurate, 7.5% faster
  • 57. Evaluation Comparison with sPCA Percentage of Fractal Dimension after selection 57
  • 58. Agenda Fundamental Concepts Related Work Proposed Method Evaluation Conclusion 58
  • 59. Conclusions  Accuracy - eliminates both linear and non-linear attribute correlations, besides irrelevant attributes; 8% better than sPCA; Scalability – linear scalability on the data size (theoretical analysis); experiments with up to 1.1 billion points; Unsupervised - it does not require the user to guess the number of attributes to be removed neither requires a training set; Semantics - it is a feature selection method, thus maintaining the semantics of the attributes; Generality - it suits for analytical tasks in general, and not only for classification; 59
  • 60. Conclusions  Accuracy - eliminates both linear and non-linear attribute correlations, besides irrelevant attributes; 8% better than sPCA;  Scalability – linear scalability on the data size (theoretical analysis); experiments with up to 1.1 billion points; Unsupervised - it does not require the user to guess the number of attributes to be removed neither requires a training set; Semantics - it is a feature selection method, thus maintaining the semantics of the attributes; Generality - it suits for analytical tasks in general, and not only for classification; 60
  • 61. Conclusions  Accuracy - eliminates both linear and non-linear attribute correlations, besides irrelevant attributes; 8% better than sPCA;  Scalability – linear scalability on the data size (theoretical analysis); experiments with up to 1.1 billion points;  Unsupervised - it does not require the user to guess the number of attributes to be removed neither requires a training set; Semantics - it is a feature selection method, thus maintaining the semantics of the attributes; Generality - it suits for analytical tasks in general, and not only for classification; 61
  • 62. Conclusions  Accuracy - eliminates both linear and non-linear attribute correlations, besides irrelevant attributes; 8% better than sPCA;  Scalability – linear scalability on the data size (theoretical analysis); experiments with up to 1.1 billion points;  Unsupervised - it does not require the user to guess the number of attributes to be removed neither requires a training set;  Semantics - it is a feature selection method, thus maintaining the semantics of the attributes; Generality - it suits for analytical tasks in general, and not only for classification; 62
  • 63. Conclusions  Accuracy - eliminates both linear and non-linear attribute correlations, besides irrelevant attributes; 8% better than sPCA;  Scalability – linear scalability on the data size (theoretical analysis); experiments with up to 1.1 billion points;  Unsupervised - it does not require the user to guess the number of attributes to be removed neither requires a training set;  Semantics - it is a feature selection method, thus maintaining the semantics of the attributes;  Generality - it suits for analytical tasks in general, and not only for classification; 63