1.
Machine Learning<br />BY:Vatsal J. Gajera<br />(09BCE010)<br />
2.
<ul><li>What is Machine Learning?</li></ul> It is a branch of artificial intelligence.It is a scientfic discipline concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data.Such as from sensors and data bases.<br />
3.
<ul><li>Techanical Definition of machine learning:</li></ul> According to Tom M. Mitchell, a computer is said to learn from experience E with respect to some class of tasks T and performance measure P,if its performance at tasks in T, as measured by P improves with experience.<br />
4.
<ul><li>A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases.</li></li></ul><li><ul><li>Some machine learning systems attempt to eliminate the need for human interaction in data analysis, while others adopt a collaborative approach between human and machine. Human intuition cannot, however, be entirely eliminated, since the system's designer must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data.</li></li></ul><li><ul><li>Application of machine learning:
10.
Robot locomotion(Movement from One place to another place).
11.
Etc.</li></li></ul><li><ul><li>There are several algorithms for machine learning.</li></ul>Decision Tree Algorithm.<br />Bayesian Classification Algorithm.<br />Shortest Path Calculation Algorithm.<br />Neural Network Algorithm.<br />Genetic Algorithm.<br />
12.
Decision Tree Algorithm:<br /><ul><li>It is used in statistics data mining and machine learning uses a decision tree as a predictive model which maps observation about an item to conclusion about the item’s target value.
13.
The goal is to create a model that predicts the value of a target variable on several input variables.</li></li></ul><li>
14.
2. Bayesian Classification:<br /><ul><li>Bayesian classifiers are statistical classifiers.They can predict class membership probabilities,such as the probability that a given tupple belongs to a particular class.
15.
This classification is based on Bayesian theorem.</li></li></ul><li>3.Neural Network Algorithm:<br /><ul><li>An artificial neural network is a mathematical or computational model that is inspired by the structure and function aspects of biological neural network. A neural network consist of an interconnected group of artificial neurons and it processes information using a connectionist approach to computation.</li></li></ul><li>
16.
1.Decision Tree Induction:<br /><ul><li>During the late 1970s J.Ross Quinlan, a researcher in machine learning,developed a decision tree algorithm known as ID3(Iterative Dichotomiser).Quinlan later presented C4.5,which became a benchmark to which newer supervised learning algorithms are often compared.In 1984 </li></ul> a group of statisticians published the book classificatio and regression trees(CART),which describs the generation of binary decision trees.<br /><ul><li>ID3,C4.5 and CART adopt a greedy(i.e., nonbacktracking) approach in which decision tres are constructed in a top down recursive divide and conquer manner.</li></li></ul><li> Inputs:<br /><ul><li>Data partition D,Which is a set of training tuples and their associated class labels.
17.
Attribute_list,the set of candidate attributes.
18.
Attribute_selection_method a procedure to determine the splitting criterio that “best” partitions the data tuples into individual classes.This criterion consists of splitting_attribute and possibly,either a split point or splitting subset.</li></ul> Output: A decision tree.<br /> Method:<br />Create a node N;<br />If tuples in D are all of the same class,C then returns N as a leaf node labeled with the class C;<br />If Attribute_list is empty then return N as a leaf node labeled with the majority class in D;<br />Apply Attribute_selection_method to find “best” splitting_criterion;<br />
19.
5. Label node N with splitting_criterion;<br />6. If splitting_attribute is discrete_valued and multiway splits allowed then attribute_list=attribute_list – splitting_attribute.<br />7. For each outcome j of splitting_criterion<br />8. Let Dj be the set of data tuples in D satisfying outcome j;<br />9. If Dj is empty then<br />10. Attach a leaf labeled with the majority class D to node N;<br />11. Else attach the node returned by Generate_decision_tree to node N;endfor<br />12.Return N;<br />
21.
An attribute selection measure is a experience based techniques for selecting the splitting criterion that “best” separates a given data partition D,of class-labeled training tuples into individual classes.If we were to split D into smaller partitions according to the outcomes of the splitting criterion,ideally each partition would be pure.(i.e.,all of the tuples that fall into a given partition would belong to the same class.)
22.
There are main three measures for it.</li></ul>Information Gain.<br />Gain Ratio.<br />Gini Index.<br />
23.
Example:<br />Age Income Student Credit_Rating Class:Buy_Computer<br />Young high no fair no<br />Young high no excellent no<br />Middle high no fair yes<br />Senior medium yes fair yes<br />Senior low yes excellent no<br />Middle medium no fair yes<br />Senior medium no excellent no<br />
24.
Information Gain:<br />ID3 uses information gain as its attribute selection measure.The measure is based on pioneering work by Claude Shannon on information theory,which studied the value or information content of messages.<br /> Info(D)= -∑ Pi log(Pi) (where i=1 to m)<br /> Info A (D)=∑((|Dj| / |D|)*Info(Dj))<br /> (where j=1 to v)<br /> Gain(A)= Info(D) – Info A (D) <br />
25.
In Example, class buy_computer has two distinct value {yes,no}. So m=2.<br />Let class C1 correspond to yes and C2 correspond to no.<br />Here total tuples with “yes” are 3 and with “no” are 4. Total=4+3=7<br /> so Info(D)=-(3/7)Log(3/7) – <br /> (4/7)Log(4/7)<br /> =0.9851<br /> Here for young=2,middle=2,senior=3.among young both are from “no” class. And among middle both are from “yes” class.and among senior 1 is from “yes” and 2 are from “no” class.<br /> so Info age (D)=((2/7)*(-2/2 Log(2/2) – 0/2 Log(0/2)))+<br /> ((2/7)*(-2/2 Log(2/2) - 0/2 Log(0/2)))+<br /> ((3/7)*(-1/3 Log(1/3) - 2/3 Log(2/3)))<br /> =0.05931<br /> So Gain(age)=0.9851-0.05931<br /> =0.9257<br /> As we calculated ,gain for age, we have to calculate gain for all attribute.After <br /> calculating gain ,attribute which has highest gain value ,becomes our split node.<br />
26.
AGE<br />Young<br />Senior<br />Middle<br />Income<br />Student<br />Credit_rating<br />Class:Buy or not<br />Income<br />Student<br />Credit_rating<br />Class:Buy or not<br />Income<br />Student<br />Credit_rating<br />Class:Buy or Not<br />
27.
2. Gain Ratio:<br /> The information gain measure is biased toward tests with many outcomes.That is,it prefers to select attributes having a large number of values.For example,consider an attribute that acts as a unique identifier,such as a product_ID.It would give large no. of partitions. So Info product_ID (D)=0. so it is useless to calculate information gain.<br /> splitInfo A(D)= -∑((|Dj|/|D| * Log (|Dj|//|D|))<br /> Gain Ratio= Gain (A) / SplitInfo(A)<br /> For our example 2 tuple for young,2 for middle and 3 for senior<br /> so splitInfo age(D)=-2/7 log(2/7) – 2/7 log(2/7) -3/7 log(3/7)<br /> = 1.5564<br />For age gain(age)=0.9257<br />So gain ratio =0.9257/1.5564=0.5947<br />Attribute, which has maximum gain ratio is selected for split node.<br />
28.
3. Gini Index:<br /> The gini index used in CART. The gini index measures the impurity of D,<br /> Gini(D)= 1- ∑Pi*Pi (where i=1 to m)<br /> The gini index considers a binary split for each attribute.<br /> If we have v possible values then we have a 2^v possible subsets.<br />For example for income we have {low,medium,high},{low,medium},{low,high},{medium,high},{low},{high},{medium},{}. But we have to consider only 2^v-2 values.<br /> Gini A(D)=((|D1|/|D|)Gini(D1) + (|D2|/|D|)Gini(D2))<br /> and Gini(A) = Gini(D) – Gini A(D)<br /> The attribute which gives minimum gini index is considered as a split node. Because it has lowest impurity. <br />
29.
After calculation, of selection measure we split our decision tree through split node which we decide through any of the selection measures.<br /> The process will continue until we get a all tuple from same class.<br /> So decision tree algorithm is implemented like above.<br />
Be the first to comment