SlideShare a Scribd company logo
1 of 21
Robust Unsupervised Feature Selection on
Networked Data
Reading Group: Manqing
Paper resource: SDM 2016
Introduction
Networked data: encodes pairwise relations among instances in a network.
Features:
1. Structural interactions
2. High-dimensional features
Challenges:
1. The curse of dimensionality.
2. Memory storage requirements and computational costs for data analytics.
3. The existence of irrelevant, redundant and noisy features.
Introduction
Why feature selection?
Feature selection, as a data preprocessing step has shown to be effective in
preparing high-dimensional data for many data mining tasks such as
sentiment analysis and node classification.
How feature selection?
According to the availability of labels, feature selection methods consist of
supervised methods and unsupervised methods.
Introduction
1. Supervised
Minimum redundancy feature selection from microarray gene expression data. (C. Ding et al.
2005)
Efficient and robust feature selectionvia joint 2,1-norm minimization. (F. Nie et al. 2010)
Regression shrinkage and selection via the lasso. (R. Tibshirani 1996)
1. Unsupervised
Unsupervised feature selection for multi-cluster data. (D. Cai et al. 2010)
Laplacian score for feature selection. (X. He et al. 2005)
Unsupervised feature selection using nonnegative spectral analysis. (Z. Li et al. 2012)
Introduction
As it is easy to amass substantial amounts of unlabeled data while label information is costly to
obtain, unsupervised feature selection has received increasingly attention in the past few years.
And it exploit different criteria to define the relevance of features such as: data similarity, local
discriminative information, and data reconstruction error.
Challenges for using current model to analyze network data:
1. In networks, data instances are not independent and identically distributed but inherently
interconnected with each other. Meanwhile, they are often associated with some content
features.
2. In addition to noisy features in the content space, link information is prevalent and also
contains a lot of noise.
Problem Statement A robust unsupervised
feature selection framework
NetFS:
First, to capture the inherent
interactions among
networked instances,
introduce the concept of
latent representations to
uncover some hidden
attributes encoded in the
network structure.
Second, reduce the negative
effects from noisy links:
embedding the latent
representation learning into
the feature selection phase.
Specifically, content
information can help
mitigate noisy links.
Problem Statement
denote a set of n
linked instances in the network.
represent the network
structure of
For undirected network,
For directed network,
Feature set:
Content information:
Robust Unsupervised Feature Selection for Networked Data - NetFS
Modeling link information with latent representation.
Embedding latent representation learning into feature selection.
Optimization solution.
Convergence analysis.
1. Modeling link information with latent representation
By a symmetric nonnegative matrix factorization model (SymNMF).
Mathematically, it decomposes the adjacency matrix A into a product of a
nonnegative matrix U and its transpose U′ in a low-dimensional latent space:
2. Embedding latent Representation Learning into Feature selection
As latent factors encode some hidden attributes of instances, they should be related to some
features (or attributes) of networked instances.
Therefore, we take U as a constraint to model the content information through a multivariate linear
regression model:
2. Embedding latent Representation Learning into Feature selection
3. Optimization solution
Equation (3.4) is not convex in both U and W.
Besides, is not smooth.
And based on paper: Efficient and robust feature selection via joint 2,1-norms minimization, it
adopted alternating optimization scheme.
Thus, when U is fixed, the objective funciton is convex.
Therefore, we take the derivative of with respect to W to be zero, then we have:
3. Optimization solution
4. Convergence Analysis
Goal: to prove
Experiments
Datasets:BlogCatalog Flickr Epinions
Experiments
Datasets
Experimental settings
Quality of Selected Features by Netfs
Effect of Parameters
Experimental Settings
Using NMI (normalized mutual information) and ACC (accuracy) to see the clustering performance.
[Notes: Let C and C’ denote the clustering results from ground truth class labels and the predicted
cluster labels, respectively.
The definition:
The mutual information between two clusters C and C’ is:
NMI:
ACC:
So the higher the ACC and NMI values are, the better the feature selection performance is.
Experimental settings
Effect of parameters
alpha controls the sparsity of the model while beta balances the latent
representation learning and feature selection phase.
Future Work
First, in this work, the authors use social media as a test bed to evaluate the
proposed NetFS framework, we also would like to validate the proposed
framework on other kinds of networks such as gene networks, citation
networks.
Second, real-world networks are usually not static but evolve over time such
that both the network structure and the content information may change.
Future works can include how to perform feature selection on dynamic
networks in the future.
Thank you

More Related Content

What's hot

Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesJonathan D'Cruz
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksAnvardh Nanduri
 
Quantum computing and machine learning overview
Quantum computing and machine learning overviewQuantum computing and machine learning overview
Quantum computing and machine learning overviewColleen Farrelly
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for GraphsDeepLearningBlr
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks IJECEIAES
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesAlexander Decker
 
IRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means ClusteringIRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means ClusteringIRJET Journal
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
 
Prediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining TechniquesPrediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining TechniquesEditor IJCATR
 
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASETISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASETijmnct
 
Convolutional Neural Network for Text Classification
Convolutional Neural Network for Text ClassificationConvolutional Neural Network for Text Classification
Convolutional Neural Network for Text ClassificationAnaïs Addad
 
Probability Density Functions of the Packet Length for Computer Networks With...
Probability Density Functions of the Packet Length for Computer Networks With...Probability Density Functions of the Packet Length for Computer Networks With...
Probability Density Functions of the Packet Length for Computer Networks With...IJCNCJournal
 
16-mmap-ml-sigmod
16-mmap-ml-sigmod16-mmap-ml-sigmod
16-mmap-ml-sigmodDezhi Fang
 
Predicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parametersPredicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parametersIJLT EMAS
 
Educating Researchers Using the CHiMaD Benchmark Problems
Educating Researchers Using the CHiMaD Benchmark ProblemsEducating Researchers Using the CHiMaD Benchmark Problems
Educating Researchers Using the CHiMaD Benchmark ProblemsPFHub PFHub
 

What's hot (20)

Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variables
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
 
Quantum computing and machine learning overview
Quantum computing and machine learning overviewQuantum computing and machine learning overview
Quantum computing and machine learning overview
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machines
 
IRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means ClusteringIRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means Clustering
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 
Prediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining TechniquesPrediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining Techniques
 
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASETISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
 
Convolutional Neural Network for Text Classification
Convolutional Neural Network for Text ClassificationConvolutional Neural Network for Text Classification
Convolutional Neural Network for Text Classification
 
Probability Density Functions of the Packet Length for Computer Networks With...
Probability Density Functions of the Packet Length for Computer Networks With...Probability Density Functions of the Packet Length for Computer Networks With...
Probability Density Functions of the Packet Length for Computer Networks With...
 
16-mmap-ml-sigmod
16-mmap-ml-sigmod16-mmap-ml-sigmod
16-mmap-ml-sigmod
 
Predicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parametersPredicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parameters
 
Topology Note
Topology NoteTopology Note
Topology Note
 
Educating Researchers Using the CHiMaD Benchmark Problems
Educating Researchers Using the CHiMaD Benchmark ProblemsEducating Researchers Using the CHiMaD Benchmark Problems
Educating Researchers Using the CHiMaD Benchmark Problems
 
1104.0355
1104.03551104.0355
1104.0355
 

Viewers also liked

Readinggroup xiang 24112016
Readinggroup xiang 24112016Readinggroup xiang 24112016
Readinggroup xiang 24112016Xiang Zhang
 
HCI for Kids Termpaper
HCI for Kids TermpaperHCI for Kids Termpaper
HCI for Kids TermpaperFredric Mack
 
Animal husbandry department of chhattisgarh
Animal husbandry department of chhattisgarhAnimal husbandry department of chhattisgarh
Animal husbandry department of chhattisgarhravindersahur
 
Apuntes de competencias digitales
Apuntes de competencias digitalesApuntes de competencias digitales
Apuntes de competencias digitalesjorge luis garcia
 
Para una madre
Para una madrePara una madre
Para una madreCarmen
 
คู่มือการใช้งาน App : Zello
คู่มือการใช้งาน App : Zelloคู่มือการใช้งาน App : Zello
คู่มือการใช้งาน App : ZelloSuchart Sriwichai
 
Vanessa ruiz historia
Vanessa ruiz historiaVanessa ruiz historia
Vanessa ruiz historiavanerch
 
Jorge Lema-bloque acadèmico
Jorge Lema-bloque acadèmicoJorge Lema-bloque acadèmico
Jorge Lema-bloque acadèmicoespoch
 

Viewers also liked (20)

Readinggroup xiang 24112016
Readinggroup xiang 24112016Readinggroup xiang 24112016
Readinggroup xiang 24112016
 
Ppt guangyang
Ppt guangyangPpt guangyang
Ppt guangyang
 
Ppt xiaodong
Ppt xiaodongPpt xiaodong
Ppt xiaodong
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
HCI for Kids Termpaper
HCI for Kids TermpaperHCI for Kids Termpaper
HCI for Kids Termpaper
 
粤商能源基金
粤商能源基金粤商能源基金
粤商能源基金
 
Animal husbandry department of chhattisgarh
Animal husbandry department of chhattisgarhAnimal husbandry department of chhattisgarh
Animal husbandry department of chhattisgarh
 
OJELA article
OJELA articleOJELA article
OJELA article
 
Apuntes de competencias digitales
Apuntes de competencias digitalesApuntes de competencias digitales
Apuntes de competencias digitales
 
02 e
02 e02 e
02 e
 
Para una madre
Para una madrePara una madre
Para una madre
 
คู่มือการใช้งาน App : Zello
คู่มือการใช้งาน App : Zelloคู่มือการใช้งาน App : Zello
คู่มือการใช้งาน App : Zello
 
Reunimos nos aqui
Reunimos nos aquiReunimos nos aqui
Reunimos nos aqui
 
Vanessa ruiz historia
Vanessa ruiz historiaVanessa ruiz historia
Vanessa ruiz historia
 
Simplificación del negocio
Simplificación del negocioSimplificación del negocio
Simplificación del negocio
 
Antropología
AntropologíaAntropología
Antropología
 
Le corbusier
Le corbusierLe corbusier
Le corbusier
 
Qumica
QumicaQumica
Qumica
 
Jorge Lema-bloque acadèmico
Jorge Lema-bloque acadèmicoJorge Lema-bloque acadèmico
Jorge Lema-bloque acadèmico
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
 

Similar to Ppt manqing

A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
 
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...butest
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Improving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docImproving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docbutest
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
=Acs07 tania experim= copy
=Acs07 tania experim= copy=Acs07 tania experim= copy
=Acs07 tania experim= copyLaura Cruz Reyes
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection incsandit
 
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETSSECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETSIJCNCJournal
 
Securing BGP by Handling Dynamic Network Behavior and Unbalanced Datasets
Securing BGP by Handling Dynamic Network Behavior and Unbalanced DatasetsSecuring BGP by Handling Dynamic Network Behavior and Unbalanced Datasets
Securing BGP by Handling Dynamic Network Behavior and Unbalanced DatasetsIJCNCJournal
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...ssuser4b1f48
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.bhavinecindus
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learningsun peiyuan
 

Similar to Ppt manqing (20)

A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
 
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Improving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docImproving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..doc
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
=Acs07 tania experim= copy
=Acs07 tania experim= copy=Acs07 tania experim= copy
=Acs07 tania experim= copy
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection in
 
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETSSECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
 
Securing BGP by Handling Dynamic Network Behavior and Unbalanced Datasets
Securing BGP by Handling Dynamic Network Behavior and Unbalanced DatasetsSecuring BGP by Handling Dynamic Network Behavior and Unbalanced Datasets
Securing BGP by Handling Dynamic Network Behavior and Unbalanced Datasets
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
G124549
G124549G124549
G124549
 
Dy4301752755
Dy4301752755Dy4301752755
Dy4301752755
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 

Recently uploaded

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 

Recently uploaded (20)

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 

Ppt manqing

  • 1. Robust Unsupervised Feature Selection on Networked Data Reading Group: Manqing Paper resource: SDM 2016
  • 2. Introduction Networked data: encodes pairwise relations among instances in a network. Features: 1. Structural interactions 2. High-dimensional features Challenges: 1. The curse of dimensionality. 2. Memory storage requirements and computational costs for data analytics. 3. The existence of irrelevant, redundant and noisy features.
  • 3. Introduction Why feature selection? Feature selection, as a data preprocessing step has shown to be effective in preparing high-dimensional data for many data mining tasks such as sentiment analysis and node classification. How feature selection? According to the availability of labels, feature selection methods consist of supervised methods and unsupervised methods.
  • 4. Introduction 1. Supervised Minimum redundancy feature selection from microarray gene expression data. (C. Ding et al. 2005) Efficient and robust feature selectionvia joint 2,1-norm minimization. (F. Nie et al. 2010) Regression shrinkage and selection via the lasso. (R. Tibshirani 1996) 1. Unsupervised Unsupervised feature selection for multi-cluster data. (D. Cai et al. 2010) Laplacian score for feature selection. (X. He et al. 2005) Unsupervised feature selection using nonnegative spectral analysis. (Z. Li et al. 2012)
  • 5. Introduction As it is easy to amass substantial amounts of unlabeled data while label information is costly to obtain, unsupervised feature selection has received increasingly attention in the past few years. And it exploit different criteria to define the relevance of features such as: data similarity, local discriminative information, and data reconstruction error. Challenges for using current model to analyze network data: 1. In networks, data instances are not independent and identically distributed but inherently interconnected with each other. Meanwhile, they are often associated with some content features. 2. In addition to noisy features in the content space, link information is prevalent and also contains a lot of noise.
  • 6. Problem Statement A robust unsupervised feature selection framework NetFS: First, to capture the inherent interactions among networked instances, introduce the concept of latent representations to uncover some hidden attributes encoded in the network structure. Second, reduce the negative effects from noisy links: embedding the latent representation learning into the feature selection phase. Specifically, content information can help mitigate noisy links.
  • 7. Problem Statement denote a set of n linked instances in the network. represent the network structure of For undirected network, For directed network, Feature set: Content information:
  • 8. Robust Unsupervised Feature Selection for Networked Data - NetFS Modeling link information with latent representation. Embedding latent representation learning into feature selection. Optimization solution. Convergence analysis.
  • 9. 1. Modeling link information with latent representation By a symmetric nonnegative matrix factorization model (SymNMF). Mathematically, it decomposes the adjacency matrix A into a product of a nonnegative matrix U and its transpose U′ in a low-dimensional latent space:
  • 10. 2. Embedding latent Representation Learning into Feature selection As latent factors encode some hidden attributes of instances, they should be related to some features (or attributes) of networked instances. Therefore, we take U as a constraint to model the content information through a multivariate linear regression model:
  • 11. 2. Embedding latent Representation Learning into Feature selection
  • 12. 3. Optimization solution Equation (3.4) is not convex in both U and W. Besides, is not smooth. And based on paper: Efficient and robust feature selection via joint 2,1-norms minimization, it adopted alternating optimization scheme. Thus, when U is fixed, the objective funciton is convex. Therefore, we take the derivative of with respect to W to be zero, then we have:
  • 16. Experiments Datasets Experimental settings Quality of Selected Features by Netfs Effect of Parameters
  • 17. Experimental Settings Using NMI (normalized mutual information) and ACC (accuracy) to see the clustering performance. [Notes: Let C and C’ denote the clustering results from ground truth class labels and the predicted cluster labels, respectively. The definition: The mutual information between two clusters C and C’ is: NMI: ACC: So the higher the ACC and NMI values are, the better the feature selection performance is.
  • 19. Effect of parameters alpha controls the sparsity of the model while beta balances the latent representation learning and feature selection phase.
  • 20. Future Work First, in this work, the authors use social media as a test bed to evaluate the proposed NetFS framework, we also would like to validate the proposed framework on other kinds of networks such as gene networks, citation networks. Second, real-world networks are usually not static but evolve over time such that both the network structure and the content information may change. Future works can include how to perform feature selection on dynamic networks in the future.