Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014


Published on

Lebanon SoftShore organized a seminar on Artificial Intelligence at USEK on March 28, 2014. This is the presentation of Dr Hayssam Serhan

Published in: Technology, Education
  • Be the first to comment

Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014

  1. 1. Artificial Intelligence Presented by Dr. Hayssam Serhan
  2. 2. Outline Overview of AI Neural Networks Fuzzy Logic Expert Systems R Language (Introduction)
  3. 3. AI Computing Caution: AI is NOT magic AI is a unique approach to programming computers Thinking or conscious computer, is still far off on the digital horizon
  4. 4. AI Objectives Making machines more useful by Making them SMARTER Understanding intelligence shall be our First Goal
  5. 5. Intelligent Behavior Learn from experience Apply knowledge acquired from experience Handle complex situations Solve problems when important information is missing React quickly and correctly to a new situation Be creative and imaginative Use heuristics
  6. 6. Major Branches of AI Robotics & Perceptive Systems  Mechanical and computer devices that perform tedious tasks with high precision. Games Playing  programming computers to play games. The greatest advances have occurred in the field of games playing. Natural Language Processing (NLP)  Computers understand and react to statements and commands made in a “natural” language.
  7. 7. Major Branches of AI Expert System (ES) programming computers to make decisions in real-life Neural Network  Computer system that can act like or simulate the functioning of the human brain.  Unsupervised learning.  Supervised learning.
  8. 8. Machine Learning Learning System  Machine learning is the study of computer algorithms that improve automatically through experience  Computer changes how it functions or reacts to situations based on feedback. “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E” Tom Mitchell (1998)
  9. 9. Human VS Artificial Intelligence - Pros Human Intelligence  Intuition, Common sense, Judgment, Creativity, etc.  The ability to demonstrate their intelligence by communicating effectively  Reasoning and Critical thinking Artificial Intelligence  Ability to simulate human behavior and cognitive processes  Capture and preserve human expertise  Fast Response.
  10. 10. Human VS Artificial Intelligence - Cons Human Intelligence  Humans are fallible  They have limited knowledge  Information processing of serial nature proceed very slowly in the brain  Humans are unable to retain large amounts of data Artificial Intelligence No "common sense" Cannot readily deal with "mixed" knowledge May have high development costs Raise legal and ethical concerns
  11. 11. Conventional Computing VS Artificial Intelligence Artificial Intelligence  AI software uses the techniques of search and pattern matching  Programmers design AI software to give the computer only the problem, not the steps necessary to solve it Conventional computing  Conventional computer software follow a logical series of steps to reach a conclusion  Computer programmers originally designed software that accomplished tasks by completing algorithms
  12. 12. Knowledge Representation & Limits The number of atomic facts that the average person knows is astronomical. Building a complete knowledge base of commonsense requires enormous amounts of engineering. Much of what people know is not represented as "facts" that they could express verbally
  13. 13. Conclusion Intelligent Agents must be able to set goals and achieve them. They need a way to visualize the future and be able to make choices. Currently, no computers exhibit full artificial intelligence. Early AI researchers developed algorithms that require enormous computational resources. The search for more efficient problem-solving algorithms is a high priority for AI research.
  14. 14. Neural Networks Traditional computers cannot work around the failure of even a single transistor. With the biological designs, the algorithms are ever changing, allowing the system to continuously adapt and work around failures to complete tasks.
  15. 15. “We’re moving from engineering computing systems to something that has many of the characteristics of biological computing” Larry Smarr, an astrophysicist who directs the California Institute for Telecommunications and Information Technology
  16. 16. “The new approach, used in both hardware and software, is being driven by the explosion of scientific knowledge about the brain. But scientists are still far from fully understanding how brains function” Kwabena Boahen, a computer scientist who leads Stanford’s Brains in Silicon research program
  17. 17. “The largest class this fall at Stanford was a graduate level machine-learning course covering both statistical and biological approaches, taught by the computer scientist Andrew Ng. More than 760 students enrolled” “Everyone knows there is something big happening, and they’re trying find out what it is.” Terry Sejnowski, a computational neuroscientist at the Salk Institute
  18. 18. Human Brain Movie
  19. 19. Nervous Systems Human brain contains ~ 1011 neurons. Each neuron is connected ~ 104 others. Neurons are slower than logic gates :  10-9 secs for semiconductors  10-3 secs for biologicals neurons Energy efficiency of the brain is estimated at: 10-16 Joules / operation / sec, The best energy efficiency of computers : is 10-6 Joules / operation / sec
  20. 20. Nervous Systems it takes on average between 100 and 200 msec to recognize a familiar face, it takes days to process much simpler tasks with conventional computers Some scientists compared the brain with a “complex, nonlinear, parallel computer”.
  21. 21. IBM Supercomputer – Compass I.B.M. announced last year that it had built a supercomputer simulation of the brain (Compass) It encompassed roughly 10 billion neurons. It ran about 1,500 times more slowly than an actual brain. Further, it required several megawatts of power, compared with just 20 watts of power used by the biological brain. “attempting to simulate a brain, at the same speed would require a flow of electricity in a conventional computer that is equivalent to what is needed to power both San Francisco and New York,” Dr. Modha said
  22. 22. Google & DeepMind Google has acquired DeepMind for 400M$ DeepMind has not yet developed any commercial products. DeepMind main asset appears to be its personnel DeepMind claims that it combines “the best techniques from machine learning and systems neuroscience to build powerful general-purpose learning algorithms.”
  23. 23. Google & AI Google researchers were able to get a machine- learning algorithm based on neural networks, to perform an identification task. The network scanned a database of 10 million images, and in doing so trained itself to recognize cats In June, Google said it had used those neural network techniques to develop a new search service to help customers find specific photos more accurately
  24. 24. Applications Pattern classification Object recognition Function approximation Data compression Time series analysis and forecast . . .
  25. 25. Neurons The main purpose of neurons is to receive, analyze and transmit further the information in a form of signals (electric pulses). When a neuron sends the information we say that a neuron “fires”.
  26. 26. Structure of a Biological Neuron
  27. 27. Artificial Neuron
  28. 28. Artificial Multilayer Neural Network
  29. 29. Artificial Neural Networks Movie
  30. 30. Multilayer Perceptron . . . . . . . . . . . . x1 x2 xm y1 y2 yn Hidden Layer Input Layer Output Layer
  31. 31. Knowledge and Memory . . . . . . . . . . . . x1 x2 xm y1 y2 yn The output behavior of a network is determined by the weights. Weights  the memory of an NN. Knowledge  distributed across the network. Large number of nodes  increases the storage “capacity”;  ensures that the knowledge is robust;  fault tolerance. Store new information by changing weights.
  32. 32. Exp.: Pattern Classification . . . . . . . . . . . . x1 x2 xm y1 y2 yn Function: x  y The NN’s output is used to distinguish between and recognize different input patterns. Different output patterns correspond to particular classes of input patterns. Networks with hidden layers can be used for solving more complex problems then just a linear pattern classification. input pattern x output pattern y
  33. 33. Neural Networks Learning Rules Learning Rules for Multiple-Layered Perceptron Networks
  34. 34. Supervised Learning Goals The goal of any supervised learning algorithm is to find a function that best maps a set of inputs to its correct output. An example would be a simple classification task, where the input is an image of an animal (or the characteristics of this animal), and the correct output would be the name of the animal.
  35. 35. Training Neural Network: Back-Propagation Supervised learning method, Requires a dataset of the desired output for many inputs, making up the training set, Backpropagation requires that the activation function used by the artificial neurons (or "nodes") be differentiable.
  36. 36. A multi-layered network can create internal representations and learn different features per layer. The first layer may be responsible for learning the orientations of lines using the inputs from the individual pixels in the image. The second layer may combine the features learned in the first layer and learn to identify simple shapes. Each higher layer learns more and more abstract features that can be used to classify the image. Each layer finds patterns in the layer below it and it is this ability to create internal representations that are independent of outside input that gives multi-layered networks its power. Motivation
  37. 37. Backpropagation Learning Algo. The learning algorithm can be divided into two phases: Phase 1: Propagation  Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations.  Backward propagation of the propagation's output activations through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons. Phase 2: Weight update  Subtract a ratio (percentage) of the gradient from the weight.  This ratio (percentage) influences the speed and quality of learning; it is called the learning rate. The greater the ratio, the faster the neuron trains; the lower the ratio, the more accurate the training is.
  38. 38. Algorithm initialize network weights (often small random values) do forEach training example ex prediction = neural-net-output(network, ex) // forward pass actual = teacher-output(ex) compute error (prediction - actual) at the output units compute for all weights from output layer to hidden layer // backward pass compute for all weights from hidden layer to input layer // backward pass continued update network weights until all examples classified correctly or another stopping criterion satisfied return the network
  39. 39. Neural Network: Simulation
  40. 40. Neuromorphic Processors Those new processors consist of electronic components that can be connected by wires that mimic biological synapses. They are based on large groups of neuron-like elements, and known as neuromorphic processors, They are not “programmed.” The connections between circuits are “weighted” according to correlations in data that the processor has already “learned.” Those weights are then altered as data flows in to the chip, causing them to change their values and to “spike.” That generates a signal that travels to other components and, in reaction, changes the neural network,
  41. 41. Conclusion Neural Network technology offers more natural interaction with the real world. Neural Networks can:  learn and adapt to changes in a problem’s environment,  establish patterns in situations where rules are not known,  deal with fuzzy or incomplete information. However, they lack explanation facilities and usually act as a black box. The process of training neural networks with current technologies is still slow.
  42. 42. Motion and manipulation: Robotics The field of robotics is closely related to AI.
  43. 43. Motion and Manipulation: Robotics Intelligence is required for robots to be able to handle such tasks as object manipulation and navigation, with sub-problems of:  localization  mapping  and motion
  44. 44. Robot Quick Description Each Leg consists of 7 DOFs  3 DOFs – Active for the HIP  1 DOFs – Active for the KNEE  2 DOFs – Active for the ANKLE  1 DOFs – Passive for the FOOT
  45. 45. Robot Control Algorithm Université de Versailles St Quentin
  46. 46. Neural Network A More Complicated Design (Muscle Modelling)                                        )( )2( )1( )( )2( )1( )( )2( )1( )( )( t te te te ty ty ty tr tr tr tI d d d 
  47. 47. Learning with plant Identification
  48. 48. Université de Versailles St Quentin – Université Libanaise Extension Extension `Plantarflexion
  49. 49. Robot: Walking – Movies & Stability
  50. 50. Fuzzy Logic Very important technology dealing with vague, imprecise and uncertain knowledge and data
  51. 51. Fuzzy Logic Fuzzy logic or fuzzy set theory was introduced by Professor Lotfi Zadeh Human experts do not usually think in probability values, but in such terms as often, generally, sometimes, occasionally and rarely. At the heart of fuzzy logic lies the concept of a linguistic variable Linguistic variables are words rather than numbers Fuzzy logic provides the way to break through the computational bottlenecks of traditional expert systems. Eventually, fuzzy theory, ignored in the West, was taken seriously in the East – by the Japanese
  52. 52. Fuzzy Logic: Motivation Modeling of imprecise concepts:  Age, Weight, Height, … Modeling of imprecise dependencies:  If Temperature is low and Oil is cheap then crank up the heating system Origin of Information:  Modeling of Expert Knowledge  Representation of information extracted from inherently imprecise data
  53. 53. Characteristic Functions: Crisp Sets Classical Sets can be described by a characteristic function: Example: A = {x | a ≤ x ≤ b}
  54. 54. Characteristic Functions: Fuzzy Sets Fuzzy Sets are described by a membership function: Example:
  55. 55. Linguistic Variables and Values
  56. 56. Linguistic values & Context
  57. 57. Fuzzy Rule System
  58. 58. Basic Elements of a Fuzzy Logic System 2- Fuzzy-Inference 1- Fuzzification 3- Defuzzification Linguistic Level Numerical Level
  59. 59. Fuzzy Rule Systems: Example 1
  60. 60. Application of Fuzzy Logic
  61. 61. Term Definitions: Distance:= {far, medium, close, zero, neg_close} Angle := {pos_big, pos_small, zero, neg_small, neg_big} Power := {pos_high, pos_medium, zero, neg_medium, neg_high} 1. Fuzzification: - Linguistic Variables - Membership Function Definition: -90° -45° 0° 45° 90° 0 1 µ Angle zero pos_smallneg_smallneg_big pos_big 4° 0.8 0.2 -10 0 10 20 30 0 1 µ Distance [yards] zero close medium farneg_close 12m 0.9 0.1
  62. 62. Computation of the “IF-THEN”-Rules: #1: IF Distance = medium AND Angle = pos_small THEN Power = pos_medium #2: IF Distance = medium AND Angle = zero THEN Power = zero #3: IF Distance = far AND Angle = zero THEN Power = pos_medium #4: ……. 2. Fuzzy-Inference: “IF-THEN”-Rules Aggregation: Computing the “IF”-Part Composition: Computing the “THEN”-Part The Rules of the Fuzzy Logic Systems Are the “Laws” It Executes !
  63. 63. 2. Fuzzy-Inference: Composition Result for the Linguistic Variable "Power": pos_high with the degree 0.0 pos_medium with the degree 0.8 ( = max{ 0.8, 0.1 } ) zero with the degree 0.2 neg_medium with the degree 0.0 neg_high with the degree 0.0 Composition Computes How Each Rule Influences the Output Variables !
  64. 64. 3. Defuzzification Finding a Compromise Using “Center-of-Maximum”: -30 -15 0 15 30 0 1 µ Power [Kilowatts] zeroneg_mediumneg_high pos_medium pos_high 6.4 KW “Balancing” Out the Result !
  65. 65. Fuzzy Logic: Simulation
  66. 66. Improved Computational Power Fuzzy rule-based systems perform faster than conventional expert systems Fuzzy Systems require fewer rules. A fuzzy expert system merges the rules, making them more powerful. Lotfi Zadeh believes that in a few years most expert systems will use fuzzy logic to solve highly nonlinear and computationally difficult problems.
  67. 67. Summary Fuzzy systems allow expression of expert knowledge in a more natural way, they still depend on the rules extracted from the experts, and thus might be smart or dumb. Some experts can provide very clever fuzzy rules – but some just guess and may even get them wrong. Therefore, all rules must be tested and tuned, which can be a prolonged and tedious process. It took Hitachi engineers several years to test and tune only 54 fuzzy rules to guide the Sendal Subway System.
  68. 68. Expert Systems An expert system is a computer program that is designed to hold the accumulated knowledge of one or more domain experts ES imitate the expert’s reasoning processes to solve specific problems
  69. 69. Overview of Expert Systems Can…  Explain their reasoning or suggested decisions  Display intelligent behavior  Draw conclusions from complex relationships  Provide portable knowledge Expert system shell  A collection of software packages and tools used to develop expert systems
  70. 70. IBM & Expert Systems It has been two years since Watson, the artificial intelligence program created by I.B.M.. Watson, Watson has access to roughly 200 million pages of information, and is able to understand natural language queries and answer questions. The computer maker had initially planned to test the system as an expert adviser to doctors; the idea was that Watson’s encyclopedic knowledge of medical conditions could aid a human expert in diagnosing illnesses.
  71. 71. IBM & Watson In May, I.B.M. announced a general-purpose version of its software, the “I.B.M. Watson Engagement Advisor.” The idea is to make the company’s question- answering system available in a wide range of call center, technical support and telephone sales applications. The company says that as many as 61 percent of all telephone support calls currently fail because human support-center employees are unable to give people correct or complete information.
  72. 72. When to Use an Expert System Capture and preserve irreplaceable human expertise Provide expertise needed at a number of locations at the same time Provide expertise needed in a hostile environment that is dangerous to human health Provide expertise that is expensive or rare Develop a solution faster than human experts Provide a high potential payoff or significantly reduced downside risk
  73. 73. Limitations of Expert Systems Limited to relatively narrow problems May have high development costs May raise legal and ethical concerns Cannot readily deal with “mixed” knowledge Possibility of error Difficult to maintain
  74. 74. Legal and Ethical Issues Who is responsible if the advice is wrong?  The user?  The domain expert?  The knowledge engineer?  The programmer of the expert system shell?  The company selling the software?
  75. 75. Transferring Expertise Objective of an expert system  To transfer expertise from an expert to a computer system and  Then on to other humans (nonexperts) Activities  Knowledge acquisition  Knowledge representation  Knowledge inferencing  Knowledge transfer to the user Knowledge is stored in a knowledge base
  76. 76. An Expert System Example General Electric's (GE) : Top Locomotive Field Service Engineer was Nearing Retirement Traditional Solution: Apprenticeship but would like  A more effective and dependable way to disseminate expertise  To prevent valuable knowledge from retiring  To minimize extensive travel or moving the locomotives To MODEL the way a human troubleshooter works  Months of knowledge acquisition  3 years of prototyping A novice engineer or technician can perform at an expert’s level  On a personal computer  Installed at every railroad repair shop served by GE
  77. 77. Participants in Expert Systems Domain expert  The individual or group whose expertise and knowledge is captured for use in an expert system Knowledge user  The individual or group who uses and benefits from the expert system Knowledge engineer  Someone trained or experienced in the design, development, implementation, and maintenance of an expert system
  78. 78. Determining requirements Identifying experts Construct expert system components Implementing results Maintaining and reviewing system Expert Systems Development Domain • The area of knowledge addressed by the expert system.
  79. 79. Inference engine Explanation facility Knowledge base acquisition facility User interface Knowledge base Experts User Expert System Components
  80. 80. Evolution of Expert Systems Software Expert system shell  Collection of software packages & tools to design, develop, implement, and maintain expert systems Easeofuse low high Before 1980 1980s 1990s Traditional programming languages Special and 4th generation languages Expert system shells
  81. 81. Expert Systems Shells Software Development Packages Exsys InstantTea K-Vision KnowledgePro
  82. 82. Applications of Expert Systems PROSPECTOR: Used by geologists to identify sites for drilling or mining PUFF: Medical system for diagnosis of respiratory conditions
  83. 83. Applications of Expert Systems DESIGN ADVISOR: Gives advice to designers of processor chips MYCIN: Medical system for diagnosing blood disorders. First used in 1979
  84. 84. Applications of Expert Systems DENDRAL: Used to identify the structure of chemical compounds. First used in 1965 LITHIAN: Gives advice to archaeologists examining stone tools
  85. 85. Expert Systems Development Alternatives low high low high Development costs Time to develop expert system Use existing package Develop from shell Develop from scratch
  86. 86. Expert Systems Benefits Enhancement of Problem Solving and Decision Making Improved Product and Decision Quality Increased Output and Productivity Decreased Decision Making Time Capture Scarce Expertise Can Work with Incomplete or Uncertain Information Knowledge Transfer to Remote Locations
  87. 87. Problems and Limitations of Expert Systems Domain experts not always able to explain their logic and reasoning ES work well only in a narrow domain of knowledge Knowledge engineers are rare and expensive Expert system users have natural cognitive limits Lack of trust by end-users ES may not be able to arrive at valid conclusions ES may sometimes produce incorrect recommendations Lacks common sense Cannot make creative responses as human expert Cannot adapt to changing environments
  88. 88. Conclusion Classic expert systems are especially good for closed- system applications with precise inputs and logical outputs. They use expert knowledge in the form of rules and, if required, can interact with the user to establish a particular fact. A major drawback is that human experts cannot always express their knowledge in terms of rules or explain the line of their reasoning. This can prevent the expert system from accumulating the necessary knowledge, and consequently lead to its failure.
  89. 89. Summary Expert, neural and fuzzy systems have now matured and been applied to a broad range of different problems, mainly in engineering, medicine, finance, business and management. Each technology handles the uncertainty and ambiguity of human knowledge differently, and each technology has found its place in knowledge engineering. They no longer compete; rather they complement each other. A synergy of expert systems with fuzzy logic and neural computing improves adaptability, robustness, fault- tolerance and speed of knowledge-based systems. Besides, computing with words makes them more “human”.
  90. 90. R Language Statistic analysis on the fly Mathematical function and graphic module embedded FREE! & Open Source!
  91. 91. R Tops Data Mining Software Poll For the past 12 years, KDNuggets has conducted an annual poll asking "What analytics/data mining software you used in the past 12 months for a real project (not just evaluation)". In this year's poll, R was the top-ranked data mining solution, selected by 30.7% of poll respondents. Microsoft Excel was second, at 29.8%. Rapidminer, which took the #1 spot over R in 2011 and 2010, ranked third. And as Bob Muenchen notes, four of the top five ranked data mining solutions in this year's poll are open-source. R was also ranked in this poll as the most popular language for implementing data mining application, beating out SQL and Java.
  92. 92. Important Problems in Data Mining Prediction Finding patterns (Apriori) Clustering Classification Regression Ranking Density Estimation
  93. 93. Prediction For most of the following algorithms (as well as linear regression), we would in practice first generate the model using training data, and then predict values for test data. To make predictions, we use the predict function. Typically, the first argument is the variable in which you saved the model, and the second argument is a matrix or data frame of test data. For instance, if we were to predict for the linear regression model above, and x1 test and x2 test are vectors containing test data, we can use the command >predicted_values<-predict(lm_model,, x2_test)))
  94. 94. Finding patterns (Apriori) In large datasets -e.g. (Diapers → Beer). Use Apriori! To run the Apriori algorithm, first install the arules package and load it. Note that the dataset must be a binary incidence matrix; the column names should correspond to the “items” that make up the “transactions.” The following commands print out a summary of the results and a list of the generated rules. > dataset <-read.csv("C:Datasetsmushroom.csv", header = TRUE) > mushroom_rules <-apriori(as.matrix(dataset), parameter = list(supp = 0.8, conf = 0.9)) > summary(mushroom_rules) > inspect(mushroom_rules)
  95. 95. Clustering grouping data into clusters that “belong” together - objects within a cluster are more similar to each other than to those in other clusters. Kmeans, Kmedians Input: {xi}mi=1,xi ∈X ⊂ Rn Output: f : X →{1,...,K} (K clusters) clustering consumers for market research, clustering genes into families, image segmentation (medical imaging) If X is the data matrix and m is the number of clusters, then the command is: > kmeans_model <-kmeans(x=X, centers=m)
  96. 96. Classification Input: {(xi,yi)}m “examples,” “instances with labels,” “observations” xi ∈X,yi ∈ {−1, 1} “binary” Let X train and X test be matrices of the training and test data respectively, and labels be a binary vector of class attributes for the training examples. For k equal to K, the command is: > knn_model <-knn(train=X_train, test=X_test, cl=as.factor(labels), k=K) automatic handwriting recognition, speech recognition, biometrics, document classification Identifying to which of a set of categories a new observation belongs, on the basis of a training set of data. Decision trees: rpart, party Random forest: randomForest, party SVM: e1071, kernlab Neural networks: nnet, neuralnet, RSNNS Performance evaluation: ROCR
  97. 97. Regression Input: {(xi,yi)}mi=1, xi ∈X,yi ∈ R Output: f : X→ R predicting an individual’s income, predict house prices, predict stock prices, predict test scores the command is: > glm_mod <-glm(y ∼ x1+x2, family=binomial(link="logit"),,x1,x2)))
  98. 98. Ranking in between classification and regression. Search engines use ranking methods
  99. 99. Density Estimation predict conditional probabilities {(xi,yi)}mi=1, xi ∈X,yi ∈ {−1, 1} Output: f : X→ [0, 1] as “close” to P(y =1|x) as possible. estimate probability of failure, probability to default on loan
  100. 100. Training and Testing for supervised learning Training: training data are input, and model f is the output Testing: You want to predict y for a new x, where (x, y) comes from the same distribution as Compute f(x) and compare it to y. How well does f(x) match y? Measure goodness of f using a loss function Rtest(f) Rtest is also called the true risk or the test error We want Rtest to be small, to indicate that f(x) would be a good predictor (“estimator”) of y called the true risk or the test error
  101. 101.  Time series decomposition: decomp(), decompose(), arima(), stl()  Time series forecasting: forecast  Time Series Clustering: TSclust  Dynamic Time Warping (DTW): dtw Time Series Analysis with R
  102. 102. Packages: igraph, sna Centrality measures: degree(), betweenness(), closeness(), transitivity() Clusters: clusters(), no.clusters() Cliques: cliques(), largest.cliques(), maximal.cliques(), clique.number() Community detection:, Social Network Analysis with R
  103. 103. Scatter plot dataset <-read.csv ('fbgood.txt',head=TRUE, sep='t', row.names=1) x = dataset$friends y = dataset$getgoods plot(x,y)
  104. 104. Linear Fit fit <- lm(y ~ x); abline(fit, col = 'red', lwd=3)
  105. 105. 2nd order polynomial fit plot(x,y) polyfit2 <- lm(y ~ poly(x, 2)); lines(sort(x), polyfit2$fit[order(x)], col = 2, lwd = 3)
  106. 106. 3rd order polynomial fit plot(x,y) polyfit3 <- lm(y ~ poly(x, 3)); lines(sort(x), polyfit3$fit[order(x)], col = 2, lwd = 3)
  107. 107.  Packages: RHadoop, RHive  RHadoop10 is a collection of 3 R packages:  rmr2 - perform data analysis with R via MapReduce on a Hadoop cluster  rhdfs - connect to Hadoop Distributed File System (HDFS)  rhbase - connect to the NoSQL HBase database  You can play with it on a single PC (in standalone or pseudo- distributed mode), and your code developed on that will be able to work on a cluster of PCs (in full-distributed mode)!  Step by step to set up my first R Hadoop system ¹⁰ R and Hadoop
  108. 108. An Example of MapReducing with R library(rmr2) map <- function(k, lines) { words.list <- strsplit(lines, "s") words <- unlist(words.list) return(keyval(words, 1)) } reduce <- function(word, counts) { keyval(word, sum(counts)) } wordcount <- function(input, output = NULL) { mapreduce(input = input, output = output, input.format = "text", map = map, reduce = reduce) } ## Submit job out <- wordcount(in.file.path, out.file.path)
  109. 109. Thank you for your time ! Email: THE END