mitula is one of the most important international vertical search engines for property, car and job classifieds.
In this presentation you will learn how mitula works. You also will learn why mitula sends more and qualified traffic to your site.
Key topics when migrating from FAST to Solr, EuroCon 2010Cominvent AS
Presented during Lucene EuroCon 2010 in Prague. This presentation assumes no prior experience with FAST ESP, but some idea of what Solr/Lucene is. It gives you some hints on what to expect when migrating.
mitula is one of the most important international vertical search engines for property, car and job classifieds.
In this presentation you will learn how mitula works. You also will learn why mitula sends more and qualified traffic to your site.
Key topics when migrating from FAST to Solr, EuroCon 2010Cominvent AS
Presented during Lucene EuroCon 2010 in Prague. This presentation assumes no prior experience with FAST ESP, but some idea of what Solr/Lucene is. It gives you some hints on what to expect when migrating.
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
Search Quality Evaluation: a Developer PerspectiveSease
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninja: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
OpenStack has been a part of OPNFV from the start and the OpenStack and OPNFV communities have strong areas of overlap. We will explain OPNFV from an Openstack and practical perspective, providing a specific example (SFC scenario) of how we are daily testing different components of OpenStack and other communities (ODL, OVS, etc). We’ll also talk about how OPNFV is useful to OpenStack because (hint: telco requirements & testing) and briefly describe several OPNFV projects which have contributed to OpenStack: NetReady, Multisite, Doctor, Cross CI, Copper, etc.
در این پرزنتیشن سعی شده است تا اجزاء تشکیل دهنده و ارتباطات این اجزاء با یکدیگر تشریح گردد
دوستانی که تمایل به دریافت فایل ویدئویی با توضیحات می باشند لطفا به آدرس
http://www.esoc.ir/RQ_Form.html
مراجعه نمایند
Splunk conf2014 - Using Selenium and Splunk for Transaction Monitoring InsightSplunk
The Synthetic Monitoring App enables you to monitor your Web application and measure critical KPIs such as application performance and availability. This session showcases how this app can simulate user interactions around the clock and set up alerts when your application breaches its performance and availability SLAs. Elias Haddad shows how you can proactively detect application problems before your customers do. Learn how you can compare the end user performance of your application from different locations, various browsers and from a myriad of devices and isolate performance bottlenecks to prevent outages.
As ODP enters its third year we are seeing increased maturity in its capabilities as well as increased adoption by application writers. This talk highlights ODP developments since SFO15 and discusses what’s ahead for ODP in 2016 as it enters production use.
TSC Sponsored BoF: Can Linux and Automotive Functional Safety Mix ? Take 2: T...Linaro
Session ID: SFO17-218
Session Name: TSC Sponsored BoF: Can Linux and Automotive Functional Safety Mix ? Take 2: Towards an open source, industry acceptable high assurance OS - SFO17-218
Speaker:
Track:
★ Session Summary ★
All are welcome!
At the first edition of the Automotive BoF held at Budapest David Rusling and
Robin Randhawa broached the topic of open source software use in the safety
critical parts of the Automotive domain. That discussion led to some important
realisations about Linux possibilities and realities. In this second edition
of the Automotive Bof David and Robin provide further interesting insights
from discussions with major Tier 1 Automotive OEMs. Overall, things seem to be
trending towards some concrete proposals for the role of Linaro in this space.
Join us at the BoF to learn more.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-218/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
More Related Content
Similar to Use of-solr-at-trovit-classified-ads marc-sturlese
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
Search Quality Evaluation: a Developer PerspectiveSease
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Splunk Ninja: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
OpenStack has been a part of OPNFV from the start and the OpenStack and OPNFV communities have strong areas of overlap. We will explain OPNFV from an Openstack and practical perspective, providing a specific example (SFC scenario) of how we are daily testing different components of OpenStack and other communities (ODL, OVS, etc). We’ll also talk about how OPNFV is useful to OpenStack because (hint: telco requirements & testing) and briefly describe several OPNFV projects which have contributed to OpenStack: NetReady, Multisite, Doctor, Cross CI, Copper, etc.
در این پرزنتیشن سعی شده است تا اجزاء تشکیل دهنده و ارتباطات این اجزاء با یکدیگر تشریح گردد
دوستانی که تمایل به دریافت فایل ویدئویی با توضیحات می باشند لطفا به آدرس
http://www.esoc.ir/RQ_Form.html
مراجعه نمایند
Splunk conf2014 - Using Selenium and Splunk for Transaction Monitoring InsightSplunk
The Synthetic Monitoring App enables you to monitor your Web application and measure critical KPIs such as application performance and availability. This session showcases how this app can simulate user interactions around the clock and set up alerts when your application breaches its performance and availability SLAs. Elias Haddad shows how you can proactively detect application problems before your customers do. Learn how you can compare the end user performance of your application from different locations, various browsers and from a myriad of devices and isolate performance bottlenecks to prevent outages.
As ODP enters its third year we are seeing increased maturity in its capabilities as well as increased adoption by application writers. This talk highlights ODP developments since SFO15 and discusses what’s ahead for ODP in 2016 as it enters production use.
TSC Sponsored BoF: Can Linux and Automotive Functional Safety Mix ? Take 2: T...Linaro
Session ID: SFO17-218
Session Name: TSC Sponsored BoF: Can Linux and Automotive Functional Safety Mix ? Take 2: Towards an open source, industry acceptable high assurance OS - SFO17-218
Speaker:
Track:
★ Session Summary ★
All are welcome!
At the first edition of the Automotive BoF held at Budapest David Rusling and
Robin Randhawa broached the topic of open source software use in the safety
critical parts of the Automotive domain. That discussion led to some important
realisations about Linux possibilities and realities. In this second edition
of the Automotive Bof David and Robin provide further interesting insights
from discussions with major Tier 1 Automotive OEMs. Overall, things seem to be
trending towards some concrete proposals for the role of Linaro in this space.
Join us at the BoF to learn more.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-218/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Similar to Use of-solr-at-trovit-classified-ads marc-sturlese (20)
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Use of-solr-at-trovit-classified-ads marc-sturlese
1. 1
U s a g e of S olr a t T r ov it
A Search Engine For Classified Ads
Marc Sturlese
Trovit
marc@trovit.com
Apache Lucene Eurocon 2010, Prague, 20 May 2010
Apache Lucene EuroCon 4 May 2010
2. Agenda
● Trovit, a Solr use case
● Types of index
● Architecture overview
● Relevance tuning
● Out of the box features
● Custom features
● Sharding
● Future directions
● Questions
Apache Lucene EuroCon 05/16/10
3. W h a t is T r o v it? A S e a r c h E n g in e F o r C la s s ifie d A d s
Apache Lucene EuroCon 05/16/10
4. T y pe s o f in de x
There are 3 different types of index
● Organic ads index
● Sponsored ads index
● Recommended searches index
There is an index per country and per business category for
every type... what means a total of 180 index
Some of them are sharded. All of them have replicas.
Apache Lucene EuroCon 05/16/10
5. T y pe s o f in de x
Captura donde se vean los 3 tipos de índice
Apache Lucene EuroCon 05/16/10
6. A r qu ite ctu r e o v e r v ie w crawling / parsing
wharehouse
indexing
Solr indexer
back end
replication
Solr
slaves
load balancer
frontal
load balancing
load balancer front end
request
Apache Lucene EuroCon 05/16/10 6
7. A r ch ite ctu r e o v e r v ie w
M a s te r s - I n de x in g
● 4 servers. Continuously updating index sequentially
● 1 server to index organic ads for all countries/categories
● 1 server to index powered ads for all countries/categories
● 1 server to index recommended searches for all countries/categories
S la v e s – S e r v in g s e a r c h r e q u e s ts
● Index with high traffic have 4 replicas
● Indexs with less traffic have 3 replicas
Apache Lucene EuroCon 05/16/10
8. A r qu ite ctu r e o v e r v ir e w
● Index are replicated using modified c o l l e c t i o n
d i s t r i b u t i o n scripts to allow multi core
● Snapshooter and snappuller are sequentially executed
● Snapinstaller is executed at the same time on each slave
to preserve exactly the same content all the time
● Started load balancing with P e r l b a l . It was producing
high CPU loads
Apache Lucene EuroCon 05/16/10
9. L ife o f a u s e r s e a r ch r e qu e s t
For every user search:
● A request is done to the organic and sponsored index
● Per each result of the organic search, a request to the
recommended searches ads is done
● 13 Solr request per user search! And once this is done...
The user search request is going to be batch processed to decide
if it must be indexed in the similar user searches index
Apache Lucene EuroCon 05/16/10
10. L ife o f a u s e r s e a r ch r e qu e s t
Apache Lucene EuroCon 05/16/10
11. R e le v a n c e tu n in g
● Basic searches use dismax qt. Build on top of Lucenes
DisjunctionMaxQuery
● Boosting queries to make latest ads more relevant
● Boost some ads at document level at indexing time to
make them more important than others
● Boost ads at field level at query time to make the match
more important in some fields than in others
Apache Lucene EuroCon 05/16/10
12. R e le v a n c e tu n in g
Us er s ea r ch: hom e tennes s ee
● Higher quality ad
● Lower quality ad
Apache Lucene EuroCon 05/16/10
13. O u t o f th e bo x S o lr fe a tu r e s
● Synonyms for USA states
● Per country and per business category stopwords
● MoreLikeThis request handler
● TrieFields to index housing latitude and longitude
● Facet fields, queries and dates.
● Warming queries from a specific file using an EventListener.
Issue SOLR-784
Apache Lucene EuroCon 05/16/10
14. O u t o f th e bo x S o lr fe a tu r e s : M o r e L ik e T h is
Apache Lucene EuroCon 05/16/10
15. O u t o f th e bo x S o lr fe a tu r e s : U s a g e o f T r ie F ie ld s
Apache Lucene EuroCon 05/16/10
16. Cus tom fe a tu r e s
● Duplicates detection
● Coming from the same source: Indexing time
● Coming from different sources: Indexing and search
time
● Pseudo field collapsing
● Custom ranking for sponsored ads
● Custom Data Import Handler for full indexing and updates
Apache Lucene EuroCon 05/16/10
17. C u s to m fe a tu r e s – N e a r d u plic a te s d e te c tio n
● A ds c om in g fr om th e s a m e s ou r c e
● Last who comes is the one that will be kept on the index
● Deduplication method using SignatureUpdateProcessor
● Small hack to custom the TextProfileSignature
● A ds c om in g fr om diffe r e n t s ou r c e s
● Give the user the chance to decide the source to visit
● Based on field collapsing issue (SOLR-236) and
SignatureUpdateProcessor used in Deduplication
● Done in 2 steps, one at index time and one at search time.
Apache Lucene EuroCon 05/16/10
18. N e a r d u plic a te s d e te c tio n
A ds c o m in g fr o m diffe r e n t s o u r c e s
Apache Lucene EuroCon 05/16/10
19. C u s to m fe a tu r e s – N e a r d u plic a te s d e te c tio n
A ds c o m in g fr o m diffe r e n t s o u r c e s
● Why to calculate them at index time?
● Avoid loading FieldCache of a “big field” at search time.
Very memory consuming!
Apache Lucene EuroCon 05/16/10
20. C u s to m fe a tu r e s – P s e u d o fie ld c o lla ps in g
● Don't want to show first results pages with all ads from the
same sources
● “Bad” results will be send to the later pages
● SOLR-236 makes a double trip, not so good in performance
terms
● Core hack to avoid the double trip... SOLR–1311
● Does not support proper distributed search at the moment
Apache Lucene EuroCon 05/16/10
21. C u s to m fe a tu r e s – S pe cia l r a n k in g fo r S po n s o r e d
Ads
● Not just relevance is important. External factors are
important too.
● Implemented using a Solr SearchComponent
● External factors are loaded from a resource and used
in a Lucene FieldComparatorSource to alter the
score of the documents
Apache Lucene EuroCon 05/16/10
22. C u s to m fe a tu r e s – H a c k e d D a ta I m po r tH a n d le r
● DIH is a tool to index data to Solr from different sources
(xml, txt, data bases...)
● Extended transformers to alter data before it is indexed
● Delta imports are meant to be used not updating huge
amounts of rows. Doing that can end up with memory
problems
● If something crashes we have to reindex. It can sometimes
take a long time. We want to keep going from the last indexed
doc
● Hacks to allow us to use it as distributed indexer.
Apache Lucene EuroCon 05/16/10
23. S h a r din g
F ir s t s tr a te g y
● No distributed IDF's at the moment Better to choose
randomly the shard where to index a doc:
SolrDocUniqueField.hashCode / NumberOfShards = ShardNumber
● Once we started keeping track of near duplicates among
ads from different sources this was not good anymore.
W h y ? Dups system is based on SOLR-236: Duplicated
documents must be indexed on the same shard to
be detected!!!
Apache Lucene EuroCon 05/16/10
24. S h a r din g
S e cond s tr a te gy
● HashCode of the signature field will decide the shard number
● This forces the signature field to be calculated in the
warehouse so when indexing process starts we
already have it
T h ir d a n d fu tu r e s tr a te g y
● Calculate duplicates in the warehouse
● There will be no need for the dups to be in the same shard
anymore
Apache Lucene EuroCon 05/16/10
25. F u tu r e dir e ctio n s
P r o pe r dis tr ibu te d I D F ' s
● Allows to have absolute relevance among shards.
More accurate results
● Issue SOLR-1632
● Still some bugs specially when using boosting functions
● Allows to improve sharding strategies. No need to choose the
shard number randomly anymore.
Apache Lucene EuroCon 05/16/10
26. F u tu r e dir e ctio n s
L o a d ba la n c e w ith Z o o k e e pe r ( S o lr C lo u d )
● Use Solr Cloud to manage sharding
● Currently being commited to trunk
● Replace load balancer for Zookeeper
● Let Zookeeper handle distributed configuration stuff
Apache Lucene EuroCon 05/16/10
28. T ha nk y ou
for y ou r a tte n tion
Marc Sturlese
Trovit
marc@trovit.com
Apache Lucene Eurocon 2010, Prague, 20 May 2010
Apache Lucene EuroCon 05/16/10