SlideShare a Scribd company logo
Search Intelligence @ elo7.com
Fernando Meyer, Felipe Besson
March 9, 2013
Outline
Some data about our data
Some history
Apache Solr
How Lucene Works
Examples
Terms
Inverted index
How a result is scored against a query in Lucene
Lucene conceptual Scoring formula [?]
Search Intelligence
How have we optimized our index
How to declare a solr index
Infrastructure Upgrade
version 2 - single node
version 3 - current infrastructure
Frenzy API
Example of product operation
Content recommendation
Architecture
http://elo7.com 2013 3/29
Search Intelligence
Current Scenario
Future WorkContent Tracker
BigData Analytics
http://elo7.com 2013 4/29
Search Intelligence
About
Fernando Meyer - Undergrad in Applied Mathematics for University of São Paulo.
Holds more than 12 years of experience in R&D deploying cool systems for
companies like RedHat(JBoss), Globo and Locaweb. Currently is focusing his
research and interests in machine learning, information retrieve and statistics.
Felipe Besson - B.S. in Information Systems and Masters in Computer Sci-
ence for the University of São Paulo, Brazil. His research focused on automated
testing of web services composition. Now, he is expanding his horizons by working
with searching, data mining, machine learning and other geek stuff.
http://elo7.com 2013 5/29
Search Intelligence
Some data about our data
• 3000 (avg.) queries per second
• from 3500 to 4200 users on site per minute
• 15000 requests per minute on AppServer
• 160000 (avg.) bot/requests per day
• 160000 (avg.) bot/requests per day
• 1200000 indexed products
• 20000 active sellers
http://elo7.com 2013 6/29
Search Intelligence
Some history
• Search v0.0 - select * from product where text like ’%query%’
• Search v0.1 - Sphinx
– No delta index
– Poor index/query performance for large scale dataset
• Search v1.0 - Apache Solr
http://elo7.com 2013 7/29
Search Intelligence
Apache Solr
Solr is written in Java and runs as a standalone full-text search server within a
servlet container such as Jetty. Solr uses the Lucene Java search library at its
core for full-text indexing and search, and has REST-like HTTP/XML and JSON
APIs that make it easy to use from virtually any programming language.
http://elo7.com 2013 8/29
Search Intelligence
How Lucene Works
Lucene is an inverted full-text index. This means that it takes all the documents,
splits them into words, and then builds an index for each word. Since the index
is an exact string-match, unordered, it can be extremely fast.
http://elo7.com 2013 9/29
Search Intelligence
Examples
Terms
T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"
Inverted index
"a": {(2, 2)}
"banana": {(2, 3)}
"is": {(0, 1), (0, 4), (1, 1), (2, 1)}
"it": {(0, 0), (0, 3), (1, 2), (2, 0)}
"what": {(0, 2), (1, 0)}
http://elo7.com 2013 10/29
Search Intelligence
How a result is scored against a query in Lucene
A.K.A: That answer to the dollar question: Why isn’t this product appearing by
searching "bleh"
Lucene conceptual Scoring formula [?]
score(q,d) = coord-factor(q,d).query-boost(q). A·B
A B .doc-len-norm(d).score(d)
http://elo7.com 2013 11/29
Search Intelligence
How have we optimized our index
<fieldType name="text_pt_br" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="com.elo7.solr.analysis.OrengoStemmerFilterFa
http://elo7.com 2013 12/29
Search Intelligence
exceptionList="stemmerignore.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonym
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
http://elo7.com 2013 13/29
Search Intelligence
<filter class="com.elo7.solr.analysis.OrengoStemmerFilterFa
exceptionList="stemmerignore.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
http://elo7.com 2013 14/29
Search Intelligence
How to declare a solr index
<field name="id" type="int" indexed="true"
stored="true" required="true" />
<field name="title" type="text_pt_br"
indexed="true" stored="true"/>
<field name="description" type="text_pt_br"
indexed="true" stored="false" />
<field name="tags" type="text_pt_br"
indexed="true" stored="true" multiValued="true"/>
http://elo7.com 2013 15/29
Search Intelligence
Infrastructure Upgrade
version 2 - single node
• Scaling issues
• M1.xlarge => m2.2xlarge => c1.xlarge 90% CPU
• Solr 3.6
• Full index with ruby scripts (takes 3.5hs to full index )
http://elo7.com 2013 16/29
Search Intelligence
version 3 - current infrastructure
• 3 m1.xlarge (20% CPU Usage) behind an amazon ELB
• 1 m1.xlarge Search API (50% of logged users staging )
• Solr Data Importer (takes 15mn to full index)
http://elo7.com 2013 17/29
Search Intelligence
Frenzy API
Solr environment evolution
• Operations: Searching, indexing and deleting
• Resources: Products, stores, auto-complete suggestions and categories
• Recommendations
Advantages
• Removing search and indexing logic from marketplace
• Providing a search service to other applications (e.g., mobile)
http://elo7.com 2013 18/29
Search Intelligence
Example of product operation
Searching
• input (GET): query term
– filters: city, min. price and max. price
– sort: featured, organic, oldest, newest, ...
• output (json)
– metadata (query status, response time and hits)
– list of products
– references (previous and next page urls)
http://elo7.com 2013 19/29
Search Intelligence
Content recommendation
• Collaborative filtering (user similarity)
• Based on user favorited products
Input (GET)
• frenzy/users/:id/recommendations
Output: (similiar to search output)
http://elo7.com 2013 20/29
Search Intelligence
Architecture
http://elo7.com 2013 21/29
Search Intelligence
Current Scenario
• Experimental stage
• Search operations are being integrated
• 50% of logged user searches are using the API
• Recommendation API is being evolved
http://elo7.com 2013 22/29
Search Intelligence
Future WorkContent Tracker
We need to understand, track, analyse and take advantage on our users navigation
patterns.
• Any user receiver an unique ID
• This ID follows any user’s interaction with the website
• Whenever an user interacts with a product: views; add to favorites; social
share; add to cart or buys. we trigger a convertion action.
http://elo7.com 2013 23/29
Search Intelligence
SearchID UserID Term pgN Filters
A376AC e00c59 "abajur" 1 Nil
A376AD e00c59 "abajur" 1 "pr:[10.0,15.0]"
A376AE e00c59 "abajur" 1 "pr:[10.0,15.0] city:curitiba"
Table 1: Search Action logger
http://elo7.com 2013 24/29
Search Intelligence
ViewID SearchID PRDID PPP
000001 A376AE 201209 1
000002 A37FED 204439 5
000003 EDA342 202234 1
000004 EFDBC1 231324 5
000005 EDA563 214512 2
000006 EFA564 264553 13
Table 2: Product View logger
http://elo7.com 2013 25/29
Search Intelligence
ActionID ViewID type
000001 000001 cart
000002 000002 fav
000003 000005 cart
000004 000004 social
000005 000003 ship
000006 000006 contact
Table 3: Product Action logger
http://elo7.com 2013 26/29
Search Intelligence
ActionID convert
000001 true
000002 true
000003 false
000004 false
000005 false
000006 true
Table 4: Action to convert
http://elo7.com 2013 27/29
Search Intelligence
BigData Analytics
• Product conversion per channel
• Consumer behaviour
• Trends
• Better recomendation (including new users)
• Better emailmarketing (attractiveness )
• Per product stats (Clicks/Impressions/CTR)
http://elo7.com 2013 28/29
Questions?
fmeyer@elo7.com
felipe.besson@elo7.com

More Related Content

What's hot

In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
Tibor Lipusz
 
Apache Solr
Apache SolrApache Solr
Apache Solr
Minh Tran
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
vinay arora
 
Library hacks
Library hacksLibrary hacks
Library hacks
Andy Powell
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
Gaurav Verma
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
Saumitra Srivastav
 
Extreme APIs for a better tomorrow
Extreme APIs for a better tomorrowExtreme APIs for a better tomorrow
Extreme APIs for a better tomorrow
Aaron Maturen
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)
ICF CIRCUIT
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
Mindfire Solutions
 
Jena
JenaJena
Jena
yuhana
 

What's hot (10)

In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Library hacks
Library hacksLibrary hacks
Library hacks
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Extreme APIs for a better tomorrow
Extreme APIs for a better tomorrowExtreme APIs for a better tomorrow
Extreme APIs for a better tomorrow
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Jena
JenaJena
Jena
 

Viewers also liked

Drools Fisl
Drools FislDrools Fisl
Drools Fisl
Fernando Meyer
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
Fernando Meyer
 
Jboss Night
Jboss NightJboss Night
Jboss Night
Fernando Meyer
 
Palestra encontro provedores regionais recife agosto 20 2013 eduardo grizeni...
Palestra encontro provedores regionais recife  agosto 20 2013 eduardo grizeni...Palestra encontro provedores regionais recife  agosto 20 2013 eduardo grizeni...
Palestra encontro provedores regionais recife agosto 20 2013 eduardo grizeni...
Eduardo Grizendi
 
Nemesis Project
Nemesis Project Nemesis Project
Nemesis Project
Fernando Meyer
 
Computação aplicada na boo-box
Computação aplicada na boo-boxComputação aplicada na boo-box
Computação aplicada na boo-box
Fernando Meyer
 
Qcon bigdata
Qcon bigdataQcon bigdata
Qcon bigdata
Fernando Meyer
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
Fernando Meyer
 

Viewers also liked (8)

Drools Fisl
Drools FislDrools Fisl
Drools Fisl
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
 
Jboss Night
Jboss NightJboss Night
Jboss Night
 
Palestra encontro provedores regionais recife agosto 20 2013 eduardo grizeni...
Palestra encontro provedores regionais recife  agosto 20 2013 eduardo grizeni...Palestra encontro provedores regionais recife  agosto 20 2013 eduardo grizeni...
Palestra encontro provedores regionais recife agosto 20 2013 eduardo grizeni...
 
Nemesis Project
Nemesis Project Nemesis Project
Nemesis Project
 
Computação aplicada na boo-box
Computação aplicada na boo-boxComputação aplicada na boo-box
Computação aplicada na boo-box
 
Qcon bigdata
Qcon bigdataQcon bigdata
Qcon bigdata
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 

Similar to Search Intelligence @elo7.com

The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
Francisco Gonçalves
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Solr 101
Solr 101Solr 101
Solr 101
Findwise
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
nkabra
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Apache lucene
Apache luceneApache lucene
Apache lucene
Dr. Abhiram Gandhe
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Ayesha Siddika
 
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Lucidworks
 

Similar to Search Intelligence @elo7.com (20)

The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Solr 101
Solr 101Solr 101
Solr 101
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
 

Recently uploaded

Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
nhiyenphan2005
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Florence Consulting
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
cuobya
 
7 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 20247 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 2024
Danica Gill
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
Trending Blogers
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
uehowe
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
ukwwuq
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
vmemo1
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
cuobya
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
bseovas
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
CIOWomenMagazine
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
SEO Article Boost
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
harveenkaur52
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
zoowe
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
wolfsoftcompanyco
 

Recently uploaded (20)

Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
 
7 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 20247 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 2024
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
 

Search Intelligence @elo7.com

  • 1. Search Intelligence @ elo7.com Fernando Meyer, Felipe Besson March 9, 2013
  • 2. Outline Some data about our data Some history Apache Solr How Lucene Works Examples Terms Inverted index How a result is scored against a query in Lucene Lucene conceptual Scoring formula [?]
  • 3. Search Intelligence How have we optimized our index How to declare a solr index Infrastructure Upgrade version 2 - single node version 3 - current infrastructure Frenzy API Example of product operation Content recommendation Architecture http://elo7.com 2013 3/29
  • 4. Search Intelligence Current Scenario Future WorkContent Tracker BigData Analytics http://elo7.com 2013 4/29
  • 5. Search Intelligence About Fernando Meyer - Undergrad in Applied Mathematics for University of São Paulo. Holds more than 12 years of experience in R&D deploying cool systems for companies like RedHat(JBoss), Globo and Locaweb. Currently is focusing his research and interests in machine learning, information retrieve and statistics. Felipe Besson - B.S. in Information Systems and Masters in Computer Sci- ence for the University of São Paulo, Brazil. His research focused on automated testing of web services composition. Now, he is expanding his horizons by working with searching, data mining, machine learning and other geek stuff. http://elo7.com 2013 5/29
  • 6. Search Intelligence Some data about our data • 3000 (avg.) queries per second • from 3500 to 4200 users on site per minute • 15000 requests per minute on AppServer • 160000 (avg.) bot/requests per day • 160000 (avg.) bot/requests per day • 1200000 indexed products • 20000 active sellers http://elo7.com 2013 6/29
  • 7. Search Intelligence Some history • Search v0.0 - select * from product where text like ’%query%’ • Search v0.1 - Sphinx – No delta index – Poor index/query performance for large scale dataset • Search v1.0 - Apache Solr http://elo7.com 2013 7/29
  • 8. Search Intelligence Apache Solr Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. http://elo7.com 2013 8/29
  • 9. Search Intelligence How Lucene Works Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. http://elo7.com 2013 9/29
  • 10. Search Intelligence Examples Terms T[0] = "it is what it is" T[1] = "what is it" T[2] = "it is a banana" Inverted index "a": {(2, 2)} "banana": {(2, 3)} "is": {(0, 1), (0, 4), (1, 1), (2, 1)} "it": {(0, 0), (0, 3), (1, 2), (2, 0)} "what": {(0, 2), (1, 0)} http://elo7.com 2013 10/29
  • 11. Search Intelligence How a result is scored against a query in Lucene A.K.A: That answer to the dollar question: Why isn’t this product appearing by searching "bleh" Lucene conceptual Scoring formula [?] score(q,d) = coord-factor(q,d).query-boost(q). A·B A B .doc-len-norm(d).score(d) http://elo7.com 2013 11/29
  • 12. Search Intelligence How have we optimized our index <fieldType name="text_pt_br" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="com.elo7.solr.analysis.OrengoStemmerFilterFa http://elo7.com 2013 12/29
  • 13. Search Intelligence exceptionList="stemmerignore.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.SynonymFilterFactory" synonyms="synonym ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> http://elo7.com 2013 13/29
  • 14. Search Intelligence <filter class="com.elo7.solr.analysis.OrengoStemmerFilterFa exceptionList="stemmerignore.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> http://elo7.com 2013 14/29
  • 15. Search Intelligence How to declare a solr index <field name="id" type="int" indexed="true" stored="true" required="true" /> <field name="title" type="text_pt_br" indexed="true" stored="true"/> <field name="description" type="text_pt_br" indexed="true" stored="false" /> <field name="tags" type="text_pt_br" indexed="true" stored="true" multiValued="true"/> http://elo7.com 2013 15/29
  • 16. Search Intelligence Infrastructure Upgrade version 2 - single node • Scaling issues • M1.xlarge => m2.2xlarge => c1.xlarge 90% CPU • Solr 3.6 • Full index with ruby scripts (takes 3.5hs to full index ) http://elo7.com 2013 16/29
  • 17. Search Intelligence version 3 - current infrastructure • 3 m1.xlarge (20% CPU Usage) behind an amazon ELB • 1 m1.xlarge Search API (50% of logged users staging ) • Solr Data Importer (takes 15mn to full index) http://elo7.com 2013 17/29
  • 18. Search Intelligence Frenzy API Solr environment evolution • Operations: Searching, indexing and deleting • Resources: Products, stores, auto-complete suggestions and categories • Recommendations Advantages • Removing search and indexing logic from marketplace • Providing a search service to other applications (e.g., mobile) http://elo7.com 2013 18/29
  • 19. Search Intelligence Example of product operation Searching • input (GET): query term – filters: city, min. price and max. price – sort: featured, organic, oldest, newest, ... • output (json) – metadata (query status, response time and hits) – list of products – references (previous and next page urls) http://elo7.com 2013 19/29
  • 20. Search Intelligence Content recommendation • Collaborative filtering (user similarity) • Based on user favorited products Input (GET) • frenzy/users/:id/recommendations Output: (similiar to search output) http://elo7.com 2013 20/29
  • 22. Search Intelligence Current Scenario • Experimental stage • Search operations are being integrated • 50% of logged user searches are using the API • Recommendation API is being evolved http://elo7.com 2013 22/29
  • 23. Search Intelligence Future WorkContent Tracker We need to understand, track, analyse and take advantage on our users navigation patterns. • Any user receiver an unique ID • This ID follows any user’s interaction with the website • Whenever an user interacts with a product: views; add to favorites; social share; add to cart or buys. we trigger a convertion action. http://elo7.com 2013 23/29
  • 24. Search Intelligence SearchID UserID Term pgN Filters A376AC e00c59 "abajur" 1 Nil A376AD e00c59 "abajur" 1 "pr:[10.0,15.0]" A376AE e00c59 "abajur" 1 "pr:[10.0,15.0] city:curitiba" Table 1: Search Action logger http://elo7.com 2013 24/29
  • 25. Search Intelligence ViewID SearchID PRDID PPP 000001 A376AE 201209 1 000002 A37FED 204439 5 000003 EDA342 202234 1 000004 EFDBC1 231324 5 000005 EDA563 214512 2 000006 EFA564 264553 13 Table 2: Product View logger http://elo7.com 2013 25/29
  • 26. Search Intelligence ActionID ViewID type 000001 000001 cart 000002 000002 fav 000003 000005 cart 000004 000004 social 000005 000003 ship 000006 000006 contact Table 3: Product Action logger http://elo7.com 2013 26/29
  • 27. Search Intelligence ActionID convert 000001 true 000002 true 000003 false 000004 false 000005 false 000006 true Table 4: Action to convert http://elo7.com 2013 27/29
  • 28. Search Intelligence BigData Analytics • Product conversion per channel • Consumer behaviour • Trends • Better recomendation (including new users) • Better emailmarketing (attractiveness ) • Per product stats (Clicks/Impressions/CTR) http://elo7.com 2013 28/29