Machine LearningBY:Vatsal J. Gajera(09BCE010)
What is Machine Learning?  It is a branch of artificial intelligence.It is  a scientfic discipline concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data.Such as from sensors and data bases.
Techanical Definition of        machine learning:  According to Tom M. Mitchell, a computer is said to learn from experience E with respect to some class of tasks T and performance measure P,if its performance at tasks in T, as measured by P improves with experience.
A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases.Some machine learning systems attempt to eliminate the need for human interaction in data analysis, while others adopt a collaborative approach between human and machine. Human intuition cannot, however, be entirely eliminated, since the system's designer must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data.Application of machine learning:
Search Engines.
Medical Diagnosis.
Stock Market Analysis.
Game Playing.
Software Engineering.
Robot locomotion(Movement from One place to another place).
Etc.There are several algorithms for machine learning.Decision Tree Algorithm.Bayesian Classification Algorithm.Shortest Path Calculation Algorithm.Neural Network Algorithm.Genetic Algorithm.
Decision Tree Algorithm:It is used in statistics data mining and machine learning uses a decision tree as a predictive model which maps observation about an item to conclusion about the item’s target value.
The goal is to create a model that predicts the value of a target variable on several input variables.
2. Bayesian Classification:Bayesian classifiers are statistical classifiers.They can predict class membership probabilities,such as the probability that a given tupple belongs to a particular class.
This classification is based on Bayesian theorem.3.Neural Network Algorithm:An artificial neural network is a mathematical or computational model that is inspired by the structure and function aspects of biological neural network. A neural network consist of an interconnected group of artificial neurons and it processes information using a connectionist approach to computation.
1.Decision Tree Induction:During the late 1970s J.Ross Quinlan, a researcher in machine learning,developed a decision tree algorithm known as ID3(Iterative Dichotomiser).Quinlan later presented C4.5,which became a benchmark to which newer supervised learning algorithms are often compared.In 1984    a group of statisticians published the book classificatio and regression trees(CART),which describs the generation of binary decision trees.ID3,C4.5 and CART adopt a greedy(i.e., nonbacktracking) approach in which decision tres are constructed in a top down recursive divide and conquer manner. Inputs:Data partition D,Which is a set of training tuples and their     associated class labels.
 Attribute_list,the set of candidate attributes.
 Attribute_selection_method a procedure to determine the splitting criterio that “best” partitions the data tuples into individual classes.This criterion consists of splitting_attribute and possibly,either a split point or splitting subset. Output: A decision tree. Method:Create a node N;If tuples in D are all of the same class,C then returns N as a leaf node labeled with the class C;If Attribute_list is empty then return N as a leaf node labeled with the majority class in D;Apply Attribute_selection_method to find “best” splitting_criterion;
5. Label node N with splitting_criterion;6. If splitting_attribute is discrete_valued and multiway splits allowed then attribute_list=attribute_list – splitting_attribute.7. For each outcome j of splitting_criterion8. Let Dj be the set of data tuples in D satisfying outcome j;9. If Dj is empty  then10. Attach a leaf labeled with the majority class D to node N;11. Else attach the node returned by Generate_decision_tree to node N;endfor12.Return N;
Attribute Selection Measures:
 An attribute selection measure is a experience based techniques for selecting the splitting criterion that “best” separates a given data partition D,of class-labeled training tuples into individual classes.If we were to split D into smaller partitions according to the outcomes of the splitting criterion,ideally each partition would be pure.(i.e.,all of the tuples that fall into a given partition would belong to the same class.)
There are main three measures for it.Information Gain.Gain Ratio.Gini Index.
Example:Age    Income   Student    Credit_Rating     Class:Buy_ComputerYoung     high                no                   fair                                   noYoung     high                no                 excellent                             noMiddle     high               no                   fair                                   yesSenior     medium         yes                  fair                                   yesSenior     low                yes                excellent                              noMiddle     medium          no                   fair                                   yesSenior     medium          no                 excellent                             no
Information Gain:ID3 uses information gain as its attribute selection measure.The measure is based on pioneering work by Claude Shannon on information theory,which studied the value or information content of messages.     Info(D)= -∑ Pi log(Pi)  (where i=1 to m)     Info A (D)=∑((|Dj| / |D|)*Info(Dj))                                      (where j=1 to v)     Gain(A)= Info(D) – Info A (D)
In Example, class buy_computer has two distinct value {yes,no}. So m=2.Let class C1 correspond to yes and C2 correspond to no.Here total tuples with “yes” are 3 and with “no” are 4. Total=4+3=7   so Info(D)=-(3/7)Log(3/7) –                       (4/7)Log(4/7)                  =0.9851 Here for young=2,middle=2,senior=3.among young both are from “no” class. And among middle both are from “yes” class.and among senior 1 is from “yes” and 2 are from “no” class. so Info age (D)=((2/7)*(-2/2 Log(2/2) – 0/2 Log(0/2)))+                         ((2/7)*(-2/2 Log(2/2) -  0/2 Log(0/2)))+                         ((3/7)*(-1/3 Log(1/3) -  2/3 Log(2/3)))                      =0.05931 So Gain(age)=0.9851-0.05931                       =0.9257 As we calculated ,gain for age, we have to calculate gain for all attribute.After      calculating gain ,attribute which has highest gain value ,becomes our split   node.

Machine learning

  • 1.
  • 2.
    What is MachineLearning? It is a branch of artificial intelligence.It is a scientfic discipline concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data.Such as from sensors and data bases.
  • 3.
    Techanical Definition of machine learning: According to Tom M. Mitchell, a computer is said to learn from experience E with respect to some class of tasks T and performance measure P,if its performance at tasks in T, as measured by P improves with experience.
  • 4.
    A major focusof machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases.Some machine learning systems attempt to eliminate the need for human interaction in data analysis, while others adopt a collaborative approach between human and machine. Human intuition cannot, however, be entirely eliminated, since the system's designer must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data.Application of machine learning:
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    Robot locomotion(Movement fromOne place to another place).
  • 11.
    Etc.There are severalalgorithms for machine learning.Decision Tree Algorithm.Bayesian Classification Algorithm.Shortest Path Calculation Algorithm.Neural Network Algorithm.Genetic Algorithm.
  • 12.
    Decision Tree Algorithm:Itis used in statistics data mining and machine learning uses a decision tree as a predictive model which maps observation about an item to conclusion about the item’s target value.
  • 13.
    The goal isto create a model that predicts the value of a target variable on several input variables.
  • 14.
    2. Bayesian Classification:Bayesianclassifiers are statistical classifiers.They can predict class membership probabilities,such as the probability that a given tupple belongs to a particular class.
  • 15.
    This classification isbased on Bayesian theorem.3.Neural Network Algorithm:An artificial neural network is a mathematical or computational model that is inspired by the structure and function aspects of biological neural network. A neural network consist of an interconnected group of artificial neurons and it processes information using a connectionist approach to computation.
  • 16.
    1.Decision Tree Induction:Duringthe late 1970s J.Ross Quinlan, a researcher in machine learning,developed a decision tree algorithm known as ID3(Iterative Dichotomiser).Quinlan later presented C4.5,which became a benchmark to which newer supervised learning algorithms are often compared.In 1984 a group of statisticians published the book classificatio and regression trees(CART),which describs the generation of binary decision trees.ID3,C4.5 and CART adopt a greedy(i.e., nonbacktracking) approach in which decision tres are constructed in a top down recursive divide and conquer manner. Inputs:Data partition D,Which is a set of training tuples and their associated class labels.
  • 17.
    Attribute_list,the setof candidate attributes.
  • 18.
    Attribute_selection_method aprocedure to determine the splitting criterio that “best” partitions the data tuples into individual classes.This criterion consists of splitting_attribute and possibly,either a split point or splitting subset. Output: A decision tree. Method:Create a node N;If tuples in D are all of the same class,C then returns N as a leaf node labeled with the class C;If Attribute_list is empty then return N as a leaf node labeled with the majority class in D;Apply Attribute_selection_method to find “best” splitting_criterion;
  • 19.
    5. Label nodeN with splitting_criterion;6. If splitting_attribute is discrete_valued and multiway splits allowed then attribute_list=attribute_list – splitting_attribute.7. For each outcome j of splitting_criterion8. Let Dj be the set of data tuples in D satisfying outcome j;9. If Dj is empty then10. Attach a leaf labeled with the majority class D to node N;11. Else attach the node returned by Generate_decision_tree to node N;endfor12.Return N;
  • 20.
  • 21.
    An attributeselection measure is a experience based techniques for selecting the splitting criterion that “best” separates a given data partition D,of class-labeled training tuples into individual classes.If we were to split D into smaller partitions according to the outcomes of the splitting criterion,ideally each partition would be pure.(i.e.,all of the tuples that fall into a given partition would belong to the same class.)
  • 22.
    There are mainthree measures for it.Information Gain.Gain Ratio.Gini Index.
  • 23.
    Example:Age Income Student Credit_Rating Class:Buy_ComputerYoung high no fair noYoung high no excellent noMiddle high no fair yesSenior medium yes fair yesSenior low yes excellent noMiddle medium no fair yesSenior medium no excellent no
  • 24.
    Information Gain:ID3 usesinformation gain as its attribute selection measure.The measure is based on pioneering work by Claude Shannon on information theory,which studied the value or information content of messages. Info(D)= -∑ Pi log(Pi) (where i=1 to m) Info A (D)=∑((|Dj| / |D|)*Info(Dj)) (where j=1 to v) Gain(A)= Info(D) – Info A (D)
  • 25.
    In Example, classbuy_computer has two distinct value {yes,no}. So m=2.Let class C1 correspond to yes and C2 correspond to no.Here total tuples with “yes” are 3 and with “no” are 4. Total=4+3=7 so Info(D)=-(3/7)Log(3/7) – (4/7)Log(4/7) =0.9851 Here for young=2,middle=2,senior=3.among young both are from “no” class. And among middle both are from “yes” class.and among senior 1 is from “yes” and 2 are from “no” class. so Info age (D)=((2/7)*(-2/2 Log(2/2) – 0/2 Log(0/2)))+ ((2/7)*(-2/2 Log(2/2) - 0/2 Log(0/2)))+ ((3/7)*(-1/3 Log(1/3) - 2/3 Log(2/3))) =0.05931 So Gain(age)=0.9851-0.05931 =0.9257 As we calculated ,gain for age, we have to calculate gain for all attribute.After calculating gain ,attribute which has highest gain value ,becomes our split node.