Upcoming SlideShare
×

# Data mining using matlab codes

6,523 views

Published on

how to use matlab and weka in Data mining

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• like22

Are you sure you want to  Yes  No
Views
Total views
6,523
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
195
1
Likes
3
Embeds 0
No embeds

No notes for slide

### Data mining using matlab codes

1. 1. By Ahmad karawash DATA MINING USING MATLAB CODES 1
2. 2. overview  Network  Data used  Create the graph  Display graph  Learning parameter  Inference  conclusion 2
3. 3. Network 3
4. 4. Data used  Use asia10000.mat file that contain 10000 records about Chest Clinic. 4
5. 5. Create graph  N=8;  dag=zeros(N,N);  A=1;S=2;T=3;L=4;B=5;E=6;X=7;D=8;  dag(A,T)=1;  dag(S,[L B])=1;  dag(T,E)=1;  dag(L,E)=1;  dag(E,[X D])=1;  dag(S,B)=1;  dag(B,D)=1;  discrete_nodes=1:N;  node_sizes=[2 2 2 2 2 2 2 2];  bnet=mk_bnet(dag,node_sizes,discrete_nodes); 5
6. 6. Display graph  names = {'VisitToAsia', 'Smoker', 'HasTuberCulosis', 'HasLungCancer', 'HasBronchitis', 'TuberculosisOrCancer', 'PositiveX-Ray', 'Dyspnoea'};  carre_rond = [1 1 1 1 1 1 1 1];  draw_graph(bnet.dag,names,carre_rond);  title('medical domain'); 6
7. 7. Learning parameter  load asia10000.mat;  nsamples = size('asia10000',1);   bnet.CPD{E}=tabular_CPD(bnet,E);  bnet.CPD{T}=tabular_CPD(bnet,T);  bnet.CPD{L}=tabular_CPD(bnet,L);  bnet.CPD{S}=tabular_CPD(bnet,S);  bnet.CPD{A}=tabular_CPD(bnet,A);  bnet.CPD{D}=tabular_CPD(bnet,D);  bnet.CPD{B}=tabular_CPD(bnet,B);  bnet.CPD{X}=tabular_CPD(bnet,X);  bnet=learn_params(bnet,'asia10000'); 7
8. 8. S A Load CPT T   L CPT = cell(1,N); B for i=1:N  s=struct(bnet.CPD{i});  CPT{i}=s.CPT;   E End celldisp(CPT) D X 8
9. 9. Inference (via Mathlab code)  engine=jtree_inf_engine(bnet);  evidence=cell(1,N);  evidence{T}=1; % E=false => has no tuberclosis  evidence{L}=2; % => has lung cancer  evidence{B}=1; % => has no branchit  [engine,loglik]=enter_evidence(engine,evidence);  marg=marginal_nodes(engine,A);  % Displaying the result of inference  fprintf('nResult of the inferencen');  fprintf('P(E / T=2, L=1 ,B=1) = [%3.5f %3.5f]n',marg.T)  Result of the inference  P(E / T=2, L=1, B=1 ) = [1.0000 0.0000] -> 1 > 0  => P(E/ B=1, T=2,L=1)= true (normally true result if T or L =>E) then we can make classification 9
10. 10. conclusion  Now we can make probability (any thing/ anything) 10
11. 11. Weka overview  Used data  Decision tree  Bayes Naif Classifier  K-mean clustering 11
12. 12. Used data  For classification I will use arff file about Diabetes. For clustering I will use arff file bmw-training.arff 12
13. 13. Decision tree build 13
14. 14. Decision tree build Making a classification using decision Tree result of correct classification is ~84% And of incorrect classification is ~ 15% 14
15. 15. Decision tree draw 15
16. 16. BNC build 16
17. 17. BNC build Making a classification using decision Tree result of correct classification is ~76% And of incorrect classification is ~ 23% 17
18. 18. Compare DT & BNC BNC The incorrect classified instance by BNC is greater than that of DT DT 18
19. 19. K-mean cluster 19
20. 20. K-mean cluster  Interpretation of the result will be discussed  We divide cluster to 2 and 500 iteration 20