2. 2
01
Hello
• BSc, PhD
• Software Developer in the News Search Infrastructure
team at Bloomberg
• Apache Lucene/Solr committer and PMC member
• Apache Software Foundation member
• spare-time beekeeper
6. 6
01
Happy Users?
• Bianca is a beekeeper and software engineer. She is
fascinated by honeybees, bumblebees, solitary bees
and bee science.
• Harry is Bianca's friend and works as a pastry chef.
Harry loves honey.
7. 7
01
Too many search results?
https://twitter.com/DaveGoulson/status/334232443563433985
https://twitter.com/1PortionDeutsch/status/827139009277227008
8. 8
01
Learning-to-Rank (LTR)
"Learning to rank or machine-learned ranking (MLR) is the
application of machine learning ... in the construction of
ranking models for information retrieval systems. …"
https://en.wikipedia.org/wiki/Learning_to_rank
An illustration:
1. top 10 documents wanted
2. fetch 100 scored documents
3. calculate 100 revised scores
4. re-sort the 100 documents based on the revised scores
5. return the top 10 documents
16. 16
01
Feature Extraction: fl=source,tweet,[fv]
"source" : "https://twitter.com/Joannechocolat/status/
809365630319259652",
"tweet" : "2. To the ancient Egyptians, bees were born
from the tears of the Sun god, Ra. #TenThingsAboutBees",
"[fv]" : "byVerifiedAccount:1.0
containsHashtag:1.0 containsHoney:0.0 containsMention:0.0
followersCount:52331.0 followingCount:959.0
fromDesktop:0.0 fromMobile:0.0
hashtagCount:1.0 honeyContent:0.0
logFollowersCount:4.718759 logFollowersCountForRetweet:0.0
logFollowingCount:2.9818187 logFollowingCountForRetweet:0.0
mentionCount:0.0 originalScore:0.0 tweetLength:20.897959"
22. 22
01
Features + Clicks != Trees Model
ltr-with-bees.py train-trees-model
--feature-names
"byVerifiedAccount,containsHashtag,
fromDesktop,fromMobile,hashtagCount
,tweetLength"
--ranklib-tree=1 --ranklib-leaf=4
23. 23
01
Features + Clicks + ??? = Trees Model
java -jar RankLib-2.8.jar
...
-ranker <type> Specify which ranking algorithm to use
0: MART (gradient boosted regression tree)
...
6: LambdaMART
...
{MART, LambdaMART}-specific parameters
[ -tree <t> ] Number of trees (default=1000)
[ -leaf <l> ] Number of leaves for each tree (default=10)
...
26. 26
01
select?q="key points"&rq={!ltr model=wrap}
• Learning-to-Rank with Apache Solr is a journey.
• You now have a sense of the adventures ahead.
• Please don't send postcards.
Please do email solr-user@, file JIRA tickets,
talk and/or blog about your use case.