• Save
Machine-Learned Ranking using Distributed Parallel Genetic Programming
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Machine-Learned Ranking using Distributed Parallel Genetic Programming

on

  • 467 views

Machine-Learned Ranking using Distributed Parallel Genetic Programming

Machine-Learned Ranking using Distributed Parallel Genetic Programming

Statistics

Views

Total Views
467
Views on SlideShare
466
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

http://www.docshut.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Machine-Learned Ranking using Distributed Parallel Genetic Programming Presentation Transcript

  • 1. Machine-Learned Ranking usingDistributed Parallel GeneticProgramming -Mohammad Islam Supervisors: Dr. Santos, Dr. Ding Ryerson University, Toronto
  • 2. Agenda What is MLR? Typical Architecture Applications of MLR Why we need MLR? Mohammad Islam, Ryerson Large Scale Data issues History University PDGP based MLR My research 2
  • 3. Machine Learned Ranking• What if you throw a query to the computer and it is intelligent enough to generate an accurate ranking function for you dynamically, and you get your result based on new ranking function Mohammad Islam, Ryerson• ML Algorithm+Ranking = MLR• Machine-learned ranking (MLR) is a type of semi- supervised machine learning problem in which the goal is University to automatically construct a ranking model from training data[1]. 3
  • 4. Typical Architecture of a MLRSystem Mohammad Islam, Ryerson University 4
  • 5. Applications of MLR Web Search: Google, Bing etc. Recommender Systems: Netflix, Amazon etc. Question-Answering: IBM Watson Multimedia Information Retrieval: Image/Video Search(Google Image search, You tube ) Mohammad Islam, Ryerson Information Organization: Text Categorization, Document Clustering etc. University Other IR systems: Machine Translation, Computational Biology etc.[7] 5
  • 6. Problems with Traditional Ranking Manual parameter tuning is usually difficult, especially when there are many parameters and the evaluation measures are non-smooth. Manual parameter tuning sometimes leads to over fitting. Mohammad Islam, Ryerson It is non-trivial to combine the large number of models proposed in the literature to obtain an University even more effective model[7] 6
  • 7. Other IR problems Big Data: The Indexed Web contains at least 8.02 billion pages (Monday, 02 April, 2012 source:http://worldwidewebsize.com). Size of Data Growing exponentially Increasing Data Consumption: In 2008, Americans consumed information for about 1.3 trillion hours, an Mohammad Islam, Ryerson average of almost 12 hours per day. 10,845 trillion words and 3.6 zetta bytes (1021bytes), corresponding to 100,500words and 34 gigabytes for an University average person on an average day[2]. Unstructured Data. 7
  • 8. History of MLR C. Manning and other Barkley researchers started research in 1990s RankSVM (2000) RankBoost(2003) GBRank(2003). Used by Yahoo Mohammad Islam, Ryerson RankNET (2005). Used in Bing RankGP(2007). University BayesRank(2009) RankDE(2011).[1] 8
  • 9. Parallel Distributed GeneticProgramming Extension of Genetic Programming (GP) which is suitable for the development of programs with a high degree of parallelism and an efficient and effective reuse of partial results.[4] Programs are represented in Graphs Mohammad Islam, Ryerson Nodes=Functions/Terminals Edges=Flow of control/results University 9
  • 10. Mohammad Islam, Ryerson 10 UniversityPDGP vs GP max(x*y, 3+x*y) 
  • 11. PDGP vs GPAdvantages: Higher Degree of Parallelism and Distributiveness Special Crossover and Mutation guarantees more efficient search Mohammad Islam, Ryerson User can control over dimensions University 11
  • 12. Mohammad Islam, Ryerson 12 UniversityPDGP SAAN Crossover
  • 13. Mohammad Islam, Ryerson 13 UniversityPGDP Global Mutation
  • 14. Simple Algorithmic view aInput 1: a set of query- Input 2: GP-related parameters: document pairs with their G (# of generations), PSize(sizefeature vectors and relevance of a population), Rc (crossover judgments (i.e., the training rate), Rm (mutation rate) set, T) Mohammad Islam, Ryerson PDGP Algorithm University Output: a ranking function, f, which is supposed to associate a real number with a query and a document as their degree of relevance function and output the best one 14
  • 15. My Research• Optimize RankGP by applying PDGP• Address RankGP issues• Implement future work• Implement a Feature Selection Algorithm• Optimize GP parameters• Compare with other well known Algorithms
  • 16. References• Wikipedia article “Learning to rank” http://en.wikipedia.org/wiki/Learning_to_rank (extracted Apr 1st,2012)2. Roger E. Bohn, James E. Short, “How Much Information? 2009 Report on American Consumers”3. Jen-Yuan Yeh et al, “Learning to Rank for Information Retrieval Using Genetic Programming” Mohammad Islam, Ryerson4. Ricardo Poli, “Parallel Distributed Genetic Programming (1999)”5. .Danushka Bollegala, “RankDE: Learning a Ranking Function for Information Retrieval using Differential Evolution” University6. Tie-Yan Liu (2009), "Learning to Rank for Information Retrieval", Foundations and Trends® in Information Retrieval, Foundations and Trends in Information Retrieval7. .Luo Si, “A Machine Learning Approach for Information Retrieval 16 Applications”
  • 17. Mohammad Islam, Ryerson 17 UniversityQuestions?