Data.Mining.C.6(II).classification and prediction

[object Object],[object Object],[object Object],[object Object]

Chapter 6 (II) Alternative Classification Technologies ,[object Object],[object Object],[object Object],[object Object]

Instance-Based ( 基于示例 ) Approach ,[object Object],[object Object]

Instance-Based Method ,[object Object],[object Object],[object Object],[object Object]

Nearest Neighbor Classifiers ,[object Object],[object Object],Training Records Test Record Compute Distance (similarity) Choose k of the “nearest” records (i.e., most similar)

Nearest-Neighbor Classifiers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

Key to kNN Approach ,[object Object],[object Object],[object Object],[object Object],[object Object]

Distance- based Similarity Measure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Boolean type 布尔型 ,[object Object],[object Object],Object i Object j

Distance based Measure for Categorical Type( 标称型 ) of Data ,[object Object],[object Object],[object Object],[object Object]

Distance based Measure for Mixed Types ( 混合型 ) of Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

K-Nearest Neighbor Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Measure for Other Types of Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Similarity Measure for Textual Data ,[object Object],[object Object]

Other Similarity Measure ,[object Object],Cosine measure ( 余弦计算方法 ) ：

Discussion on the k -NN Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object]

Ensemble Methods ,[object Object],[object Object]

Examples of Ensemble Approaches ,[object Object],[object Object],[object Object]

Bagging ,[object Object],[object Object]

Bagging Algorithm Let k be the number of bootstrap samples set For i =1 to k do Create a bootstrap sample D i of Size N Train a (base) classifier C i on D i End for

Boosting ,[object Object],[object Object],[object Object]

Boosting ,[object Object],[object Object],[object Object],[object Object]

Boosting C 1 T D 1 F (D 2 ) C 2 T D m … C m T The process of generating classifiers F

AdaBoosting Algorithm ,[object Object],The error rate of a base classifier C i : where I(p) = 1 if p is true, and 0 otherwise. The importance of a classifier C i :

AdaBoosting Algorithm The weight update mechanism (Equation): where is the normalization factor: : the weight for example ( x i , y i ) during the round

AdaBoosting Algorithm Let k be the number of boosting rounds, D is the set of all examples Update the weight of each examples according to Equation End for , Initialize the weights for all N examples For i = 1 to k do Create training set D i by sampling from D according to W . Train a base classifier C i on D i Apply C i to all examples in the original set D

Increasing Classifier Accuracy ,[object Object],[object Object],[object Object],[object Object],Data C 1 C T C 2 … Combine Votes New data sample Class prediction

Unlabeled Data ,[object Object],[object Object],[object Object],[object Object]

Co-training Approach ,[object Object],[object Object],[object Object],[object Object]

Co-Training Approach Feature Set X=(X1, X2) Classification Model One Classification Model Two new labeled data set 1 subset X1 subset X2 training training new labeled data set 2 classifying classifying Unlabeled data Unlabeled data example set L example set L

Two views ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Co-training algorithm For instance, p=1, n=3, k=30, and u=75

Co-training: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Co-training: Experimental Results ,[object Object],[object Object],[object Object],[object Object]

Learning from Positive & Unlabeled Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Direct Marketing ,[object Object],[object Object],[object Object]

Novel 2-steps strategy ,[object Object],[object Object],[object Object],[object Object],[object Object]

Step 1 Step 2 positive negative Reliable Negative (RN) Q =U - RN U P positive Using P, RN and Q to build the final classifier iteratively or Using only P and RN to build a classifier Existing 2-step strategy

Step 1: The Spy technique ,[object Object],[object Object],[object Object],[object Object]

Step 2: Running a classification algorithm iteratively ,[object Object],[object Object]

PU-Learning ,[object Object],[object Object],[object Object],[object Object]

Data.Mining.C.6(II).classification and prediction

Data.Mining.C.6(II).classification and prediction

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Data.Mining.C.6(II).classification and prediction

Similar to Data.Mining.C.6(II).classification and prediction (20)

Recently uploaded

Recently uploaded (20)

Data.Mining.C.6(II).classification and prediction

Editor's Notes