Machine learning


Published on

Published in: Automotive, Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Machine learning

  1. 1. Machine Learning Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Artificial Intelligence
  2. 2. Abstract With the advancement in the Artificial Intelligence field, the machine learning came up to deal with the study of constructing systems that can learn, The core of machine learning is representation and generalization. Representation of data instances functions. Generalization is the property that the system will conduct on unseen data. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 1 of 15
  3. 3. Introduction // Salah Explain the purpose of the paper. In most cases, the Introduction summarizes the theoretical importance and previous research in the area and includes a clear statement of the research hypotheses or aims of the paper. The Introduction begins a new page. What is Machine Learning? If you have a problem that needs to be solved, you need an algorithm. An algorithm is a set of instructions that should be carried out to transform the input to output. For example, one can use an algorithm for sorting, the input would be a sequence of numbers randomly ordered and the output is those numbers but ordered sequence. For some problems, you don’t have an algorithm to solve a problem like distinguishing between spam and non-spam emails, we know that the input is an email document which is simply a file of characters and the output should be a yes/no indicating the spam or non-spam emails. We don’t have a certain algorithm to transform this input to that output as this process varies from individual and individual and changes in time. What we need is a computer (machine) to extract an algorithm automatically, so we don’t need to learn an algorithm for sorting numbers, we already have algorithms for that; but there are many applications that doesn’t have an algorithm but it does Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 2 of 15
  4. 4. have an example of data. So machine learning is the field of study that gives computers the ability to learn without being explicitly programmed. Machine Learning focuses on predication, based on known or pre-defined data learned from previous data. On the other hand Data mining focuses on the discovery of previous unknown data or properties of data. These two terms are usually confused with each other. We may not be able to identify the process completely, but we could construct a good approximation to this process. This approximation might not explain every single detail but it still can explain a part of the data. Identifying the process could be impossible so in this case we only detect and identify certain patterns which are an essential thing in machine learning. These patterns help us to understand the process and by them we would be able to make predications. Supervised Learning It entails learning mapping between a set of input variables X and an output variable Y and applying this mapping to predict the outputs of unseen data. Supervised learning is the most important methodology in machine learning and it also has a great importance in the processing of multimedia data. Supervision: The data (observations) are labeled with pre-defined classes. For example, a “teacher” gives the classes (supervision). Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 3 of 15
  5. 5. Supervised learning has two main steps: Learning: learn a model using the training data. Testing: test the model using unseen test data to assess the model accuracy. (See Figure 1) Figure 1 We mean by learning that given a data set D, a task T and a performance measure M, a computer system is said to learn from D to perform a task T is after learning the system’s performance on M to improve as measured by M. In other words, the learned model helps the system to perform T better as compared to no learning. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 4 of 15
  6. 6. Classification A loan is an amount of money given to a client by a bank or a financial institution. For example, a bank is to be paid back the given loan with interest, generally in installments. It’s important for the bank to predicate the risk of that loan, what the probability that the customer won’t pay the amount back. This is both to make sure that the bank would make profit and not to inconvenience a client with a loan over his financial capacity. The bank calculates the risk given the amount of loan and information about the client, information like his financial capacity, profession, age, and financial history and so on. From this data of some application, the aim is to infer a general rule coding the association between a client’s information and his/her risk. That is machine learning system. This is an example of classification problem where there are two classes: low-risk and high-risk clients. The information of the client is the input to the classifier whose task is to assign the input to one of the two classes. After training with the past data, a classification rule could be formulized in the form: IF income > θ1 AND SAVINGS > θ2 THEN LOW-RISK ELSE HIGH-RISK For a suitable θ1 and θ2 (see Figure 2). Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 5 of 15
  7. 7. Figure 2 In machine learning and statistics, classification is a problem of identifying to which a set of categories (sub-population) a new observation belongs. Regression Let’s say that we have a system that can predict the price of a used car. Inputs are the car attributes –brand, year, engine and other information- that affects a car’s price. The output is the price of the car. Such problems where the output is a numerical value are regression problems. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 6 of 15
  8. 8. In statistics regression is statistical process for estimating the relationship among variable. It includes many techniques for modeling and analyzing several variables. For example, linear regression, simple regression. Let X denote the car attributes and car’s price Y, we can collect data a training data and the machine learning program that fits a function to this data to learn Y as a function of Y. An example is shown in Figure 3, a training dataset of used cars and the function fitted. For simplicity, mileage is taken only as an input attribute. Y = WX + W0 For suitable values of W and W0. Figure 3, a training dataset of used cars and the function fitted. For simplicity, mileage is taken only as an input attribute. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 7 of 15
  9. 9. In cases where the linear model is too restrictive, one can use for example a quadratic. Y = W2X2 + W1X + W0 Or a higher-order polynomial or any other nonlinear function of the input to optimize the parameters to their best fit. Unsupervised Learning In supervised learning the aim is to learn a mapping from the input to an output whose correct values are provided by a supervisor. In unsupervised learning there’s no supervisor and we only have input data. The data have no target attribute, we want to explore the data to find intrinsic structures in them. Unsupervised learning studies how systems can learn to represent input patterns in a way that reflects the statistical structure of the overall collection of input patterns. Unsupervised learning is important since it is likely to much more common in the brain than supervised learning. For instance there are 106 photoreceptors in each eye whose activates are always changing with the visual world and which provide information about the objects in the world, how they are presented, what the lighting conditions are, etc. Developmental and adult plasticity are critical in vision, structural and physiological properties of synapses in the neocortex are known to be substantially influenced by the patterns of activity in sensory neurons that occur. However, essentially none of the information about the contents of scenes is available during learning. This makes unsupervised methods essential, and, equally, allows them to be used as computational models for synaptic adaptation. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 8 of 15
  10. 10. Clustering Clustering: is a technique for finding similarity groups in data, called clusters. It groups data instances that are similar to each other in one cluster and the other data which are different from each other into different clusters. Due to historical reasons, clustering is often considered synonymous with unsupervised learning. The data set has three natural sets of data points, i.e., 3 natural clusters. (Figure 4) Figure 4 Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 9 of 15
  11. 11. What is Clustering used for? Let us see an example: Given a collection of text documents, we want to organize them according to their content similarities, to produce a topic hierarchy. In fact, clustering is one of the most utilized data mining techniques. It has a long history and used in many fields, e.g., medicine, psychology, data visualization, etc. In recent years, due to the rapid increase of online documents, text clustering has become very important. Reinforcement Learning In some applications, the output of the system is a sequence of actions. In such case, a single action is not important; what is important is the policy that the sequence of correct actions to reach the goal. There is no such thing as good action in any intermediate state; an action is good if it is a part of a good policy. In such case, the machine learning application should be able to assess the goodness of the policies and learn from past good action sequences to be able to generate a policy. Such learning methods are called reinforcement learning algorithms. A good example is game playing where a single move by itself is not that important; it’s the sequence of right moves (actions) that is good. A move is good if it’s a part of a good game playing policy. Game playing is an important research area in both artificial intelligence and machine learning. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 10 of 15
  12. 12. Machine Learning Applications • Abstract : You may have heard that today's tech companies are using machine learning to identify and filter email spams (Google), blacklist and penalize spam blogs so that users get good search results (also Google), recommend products specifically for you (Amazon), and fight fraud (IBM). Today's post isn't about that. It's about the new, perhaps surprising ways that companies (and non-profits) are using machine learning to make smarter, faster, better products. Identify people who have a high degree of Psychopathy based on Twitter usage: The aim of the competition is to determine to what degree it's possible to predict people with a sufficiently high degree of Psychopathy based on Twitter usage and Linguistic Inquiry. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 11 of 15
  13. 13. The organizers provide all interested participants an anonymised dataset of users’ self-assessed psychopathy scores together with 337 variables derived from functions of Twitter information, usage and linguistic analysis. Psychopathy scores are based on a checklist developed by Professor Del Paulus at the University of British Columbia. The model should aim to identify people scoring high in Psychopathy, for the purpose of this competition, defined as 2 SD's above a mean of 1.98. This accounts for roughly 3% of the entire sample and therefore the challenge with this dataset is developing a model to work with a highly imbalanced dataset. The best performing model(s) will be formally cited in future paper/papers. The authors of the winning model may also be invited to attend future conferences to discuss their model. The intention of this research is to separate fact from fiction and examines just what can be predicted by social media use and how this information might be used, both for good and bad. As an organization, the Online Privacy Foundation works to raise awareness of online privacy issues and empower people to make informed choices about what they do online. We hope you'll support our mission and take part in this competition. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 12 of 15
  14. 14. Face detection and recognition now days: Face detection refers to the science of locating the faces of people in a scene. It is a critical element of focusing software in cameras, as well as the primary step in facial recognition in unconstrained scenes: the face must be detected before it can be compared to known faces and identified. It can also be used by newer software to organize photos on your computer. On cameras, face detection helps establish how the focus should work in a picture. Depending on the camera, face detection technology can identify at least 10 faces in a scene. Once they are identified, they can be prioritized, and the focus can automatically adjust to feature the high priority faces. The camera’s face detection is often shown to the user through a series of one or more rectangles overlaid on the scene. With the face detected, the technology can also adjust the exposure to make sure the subject is properly shown, including compensating for dark scenes or scenes with an illuminated background. Images with human faces, recorded under natural conditions, i.e. varying illumination and complex background. The eye positions have been set manually (and are included in the set) for calculating the accuracy of a face detector. A formula is presented to normalize the decision of a match or mismatch. This is, to my knowledge, the first attempt to finally create a real test scenario with precise rules on how to calculate the accuracy of a face detector - open for all to compare their results in a scientific way! Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 13 of 15
  15. 15. Prostate Cancer: 97 MEN (OBSERVATIONS): PREDICT THE LOG OF PROSTATE SPECIFIC ANTIGEN (LPSA) FROM A NUMBER OF MEASUREMENTS INCLUDING LOG-CANCER-VOLUME (LCAVOL) THIS IS A REGRESSION PROBLEM (OF COURSE SUPERVISED) THIS IS A SUPERVISED LEARNING PROBLEM, KNOWN AS A REGRESSION PROBLEM, BECAUSE THE OUTCOME MEASUREMENT IS QUANTITATIVE. Ω • Ω is some operation on input f(x). The data for this example, displayed in the figure bellow, come from a study by Stamey et al. (1989) that examined the correlation between the level of prostate specific antigen (PSA) and a number of clinical measures, in 97 men who were about to receive a radical prostatectomy. The goal is to predict the log of PSA (lpsa) from a number of measureMints including log cancer volume (lcavol), log prostate weight lweight, Age, log of benign prostatic hyperplasia amount lbph, seminal vesicle invasion svi, log of capsular penetration lcp, Gleason score Gleason, and Percent of Gleason scores 4 or 5 pgg45. Figure 1.1 is a scatterplot matrix of the variables. Some correlations with lpsa are evident, but a good Predictive model is difficult to construct by eye. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 14 of 15
  16. 16. Appendices Includes supplementary material not appropriate in the body of the report The Appendices section begins a new page. References P´adraig Cunningham, Matthieu Cord, and Sarah Jane Delany.2008.Supervised Learning. Bing Liu, UIC, Machine Learning. Ethem Alpaydin.2010.Introduction to Machine Learning. Peter Dayan.Unsupervised learning. Machine Learning | Abdelrahman Salah & Abdelrahman Badr & Mohamed Sweelam Page 15 of 15