Machine-Learned Ranking usingDistributed Parallel GeneticProgramming        -Mohammad Islam  Supervisors: Dr. Santos, Dr. ...
Agenda    What is MLR?    Typical Architecture    Applications of MLR    Why we need MLR?                             ...
Machine Learned Ranking• What if you throw a query to the computer and it is  intelligent enough to generate an accurate r...
Typical Architecture of a MLRSystem                                Mohammad Islam, Ryerson                                ...
Applications of MLR    Web Search: Google, Bing etc.    Recommender Systems: Netflix, Amazon etc.    Question-Answering...
Problems with Traditional Ranking    Manual parameter tuning is usually difficult,    especially when there are many para...
Other IR problems    Big Data: The Indexed Web contains at least 8.02 billion    pages (Monday, 02 April, 2012    source:...
History of MLR    C. Manning and other Barkley researchers started    research in 1990s    RankSVM (2000)    RankBoost(...
Parallel Distributed GeneticProgramming    Extension of Genetic Programming (GP) which is suitable    for the development...
Mohammad Islam, Ryerson                                                          10                                       ...
PDGP vs GPAdvantages:  Higher Degree of Parallelism and Distributiveness  Special Crossover and Mutation guarantees more...
Mohammad Islam, Ryerson                                                 12                                    UniversityPD...
Mohammad Islam, Ryerson                                                  13                                     University...
Simple Algorithmic view    aInput 1: a set of query-                              Input 2: GP-related parameters:  docume...
My Research• Optimize RankGP by applying PDGP• Address RankGP issues• Implement future work• Implement a Feature Selection...
References•    Wikipedia article “Learning to rank”     http://en.wikipedia.org/wiki/Learning_to_rank (extracted Apr 1st,2...
Mohammad Islam, Ryerson                                        17                           UniversityQuestions?
Upcoming SlideShare
Loading in …5
×

Machine-Learned Ranking using Distributed Parallel Genetic Programming

523 views

Published on

Machine-Learned Ranking using Distributed Parallel Genetic Programming

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
523
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Machine-Learned Ranking using Distributed Parallel Genetic Programming

  1. 1. Machine-Learned Ranking usingDistributed Parallel GeneticProgramming -Mohammad Islam Supervisors: Dr. Santos, Dr. Ding Ryerson University, Toronto
  2. 2. Agenda What is MLR? Typical Architecture Applications of MLR Why we need MLR? Mohammad Islam, Ryerson Large Scale Data issues History University PDGP based MLR My research 2
  3. 3. Machine Learned Ranking• What if you throw a query to the computer and it is intelligent enough to generate an accurate ranking function for you dynamically, and you get your result based on new ranking function Mohammad Islam, Ryerson• ML Algorithm+Ranking = MLR• Machine-learned ranking (MLR) is a type of semi- supervised machine learning problem in which the goal is University to automatically construct a ranking model from training data[1]. 3
  4. 4. Typical Architecture of a MLRSystem Mohammad Islam, Ryerson University 4
  5. 5. Applications of MLR Web Search: Google, Bing etc. Recommender Systems: Netflix, Amazon etc. Question-Answering: IBM Watson Multimedia Information Retrieval: Image/Video Search(Google Image search, You tube ) Mohammad Islam, Ryerson Information Organization: Text Categorization, Document Clustering etc. University Other IR systems: Machine Translation, Computational Biology etc.[7] 5
  6. 6. Problems with Traditional Ranking Manual parameter tuning is usually difficult, especially when there are many parameters and the evaluation measures are non-smooth. Manual parameter tuning sometimes leads to over fitting. Mohammad Islam, Ryerson It is non-trivial to combine the large number of models proposed in the literature to obtain an University even more effective model[7] 6
  7. 7. Other IR problems Big Data: The Indexed Web contains at least 8.02 billion pages (Monday, 02 April, 2012 source:http://worldwidewebsize.com). Size of Data Growing exponentially Increasing Data Consumption: In 2008, Americans consumed information for about 1.3 trillion hours, an Mohammad Islam, Ryerson average of almost 12 hours per day. 10,845 trillion words and 3.6 zetta bytes (1021bytes), corresponding to 100,500words and 34 gigabytes for an University average person on an average day[2]. Unstructured Data. 7
  8. 8. History of MLR C. Manning and other Barkley researchers started research in 1990s RankSVM (2000) RankBoost(2003) GBRank(2003). Used by Yahoo Mohammad Islam, Ryerson RankNET (2005). Used in Bing RankGP(2007). University BayesRank(2009) RankDE(2011).[1] 8
  9. 9. Parallel Distributed GeneticProgramming Extension of Genetic Programming (GP) which is suitable for the development of programs with a high degree of parallelism and an efficient and effective reuse of partial results.[4] Programs are represented in Graphs Mohammad Islam, Ryerson Nodes=Functions/Terminals Edges=Flow of control/results University 9
  10. 10. Mohammad Islam, Ryerson 10 UniversityPDGP vs GP max(x*y, 3+x*y) 
  11. 11. PDGP vs GPAdvantages: Higher Degree of Parallelism and Distributiveness Special Crossover and Mutation guarantees more efficient search Mohammad Islam, Ryerson User can control over dimensions University 11
  12. 12. Mohammad Islam, Ryerson 12 UniversityPDGP SAAN Crossover
  13. 13. Mohammad Islam, Ryerson 13 UniversityPGDP Global Mutation
  14. 14. Simple Algorithmic view aInput 1: a set of query- Input 2: GP-related parameters: document pairs with their G (# of generations), PSize(sizefeature vectors and relevance of a population), Rc (crossover judgments (i.e., the training rate), Rm (mutation rate) set, T) Mohammad Islam, Ryerson PDGP Algorithm University Output: a ranking function, f, which is supposed to associate a real number with a query and a document as their degree of relevance function and output the best one 14
  15. 15. My Research• Optimize RankGP by applying PDGP• Address RankGP issues• Implement future work• Implement a Feature Selection Algorithm• Optimize GP parameters• Compare with other well known Algorithms
  16. 16. References• Wikipedia article “Learning to rank” http://en.wikipedia.org/wiki/Learning_to_rank (extracted Apr 1st,2012)2. Roger E. Bohn, James E. Short, “How Much Information? 2009 Report on American Consumers”3. Jen-Yuan Yeh et al, “Learning to Rank for Information Retrieval Using Genetic Programming” Mohammad Islam, Ryerson4. Ricardo Poli, “Parallel Distributed Genetic Programming (1999)”5. .Danushka Bollegala, “RankDE: Learning a Ranking Function for Information Retrieval using Differential Evolution” University6. Tie-Yan Liu (2009), "Learning to Rank for Information Retrieval", Foundations and Trends® in Information Retrieval, Foundations and Trends in Information Retrieval7. .Luo Si, “A Machine Learning Approach for Information Retrieval 16 Applications”
  17. 17. Mohammad Islam, Ryerson 17 UniversityQuestions?

×