A presentation on how to produce better semantic relevancy in the context of using stemmers, synonyms and raw content in a dismay field weighting configuration.
There are many examples of text-based documents (all in ‘electronic’ format…)
e-mails, corporate Web pages, customer surveys, résumés, medical records, DNA sequences, technical papers, incident reports, news stories and more…
Not enough time or patience to read
Can we extract the most vital kernels of information…
So, we wish to find a way to gain knowledge (in summarised form) from all that text, without reading or examining them fully first…!
Some others (e.g. DNA seq.) are hard to comprehend!
The HortFlora Research Spectrum (HRS), is an international-peer reviewed, open access journal that serves as a forum for the exchange and dissemination of R & D advances and innovations in all facets of Horticultural Science (Pomology, Olericulture, Floriculture, Post Harvest Technology, Plant Biotechnology, and Medicinal & Aromatic Plants etc.) and its allied branches on an international level.
HRS is officially published quarterly (March, June, September and December) every year, in English (print & online version), under the keen auspices of Biosciences & Agriculture Advancement Society (BAAS), Meerut (India).
The journal is Indexed/Abstracted in
• Index Copernicus International, Poland with ICV: 27.39 • Ministry of Science & Higher Education, Poland with 02 points • Global Impact Factor with GIF 0.364• Indian Science Abstracts • CAB Abstracts • CABI Full text • CAB direct • ICRISAT-infoSAT • Google Scholar• CiteFactor • InfoBase Index with IBI Factor: 2.8 •New Journal Impact Factor (NJIF): 2.14 • ResearchBib • AgBiotech Net • Horticultural Science Abstracts • Forestry & Agroforestry Abstracts• Agric. Engg. Abstracts • Crop Physiology Abstracts • PGRs Abstracts • ResearchGate.net • getCited.com • Reference Repository • OAJI.net • Journal Index.net• University of Washington Library • University of Ottawa Library • Swedish University of Agric. Sci. Library, Stockholm, Sweden;
Full text PDF are available at: www.hortflorajournal.com
There are many examples of text-based documents (all in ‘electronic’ format…)
e-mails, corporate Web pages, customer surveys, résumés, medical records, DNA sequences, technical papers, incident reports, news stories and more…
Not enough time or patience to read
Can we extract the most vital kernels of information…
So, we wish to find a way to gain knowledge (in summarised form) from all that text, without reading or examining them fully first…!
Some others (e.g. DNA seq.) are hard to comprehend!
The HortFlora Research Spectrum (HRS), is an international-peer reviewed, open access journal that serves as a forum for the exchange and dissemination of R & D advances and innovations in all facets of Horticultural Science (Pomology, Olericulture, Floriculture, Post Harvest Technology, Plant Biotechnology, and Medicinal & Aromatic Plants etc.) and its allied branches on an international level.
HRS is officially published quarterly (March, June, September and December) every year, in English (print & online version), under the keen auspices of Biosciences & Agriculture Advancement Society (BAAS), Meerut (India).
The journal is Indexed/Abstracted in
• Index Copernicus International, Poland with ICV: 27.39 • Ministry of Science & Higher Education, Poland with 02 points • Global Impact Factor with GIF 0.364• Indian Science Abstracts • CAB Abstracts • CABI Full text • CAB direct • ICRISAT-infoSAT • Google Scholar• CiteFactor • InfoBase Index with IBI Factor: 2.8 •New Journal Impact Factor (NJIF): 2.14 • ResearchBib • AgBiotech Net • Horticultural Science Abstracts • Forestry & Agroforestry Abstracts• Agric. Engg. Abstracts • Crop Physiology Abstracts • PGRs Abstracts • ResearchGate.net • getCited.com • Reference Repository • OAJI.net • Journal Index.net• University of Washington Library • University of Ottawa Library • Swedish University of Agric. Sci. Library, Stockholm, Sweden;
Full text PDF are available at: www.hortflorajournal.com
Suicides and suicide attempts during long term treatment with antidepressants...Daryl Chow
Abstract
Background: It is unclear whether antidepressants can pre- vent suicides or suicide attempts, particularly during long- term use. Methods: We carried out a comprehensive review of long-term studies of antidepressants (relapse prevention). Sources were obtained from 5 review articles and by search- es of MEDLINE, PubMed Central and a hand search of bibli- ographies. We meta-analyzed placebo-controlled antide- pressant RCTs of at least 3 months’ duration and calculated suicide and suicide attempt incidence rates, incidence rate ratios and Peto odds ratios (ORs). Results: Out of 807 studies screened 29 were included, covering 6,934 patients (5,529 patient-years). In total, 1.45 suicides and 2.76 suicide at- tempts per 1,000 patient-years were reported. Seven out of 8 suicides and 13 out of 14 suicide attempts occurred in an- tidepressant arms, resulting in incidence rate ratios of 5.03 (0.78–114.1; p = 0.102) for suicides and of 9.02 (1.58–193.6; p = 0.007) for suicide attempts. Peto ORs were 2.6 (0.6–11.2; nonsignificant) and 3.4 (1.1–11.0; p = 0.04), respectively. Dropouts due to unknown reasons were similar in the anti-
depressant and placebo arms (9.6 vs. 9.9%). The majority of suicides and suicide attempts originated from 1 study, ac- counting for a fifth of all patient-years in this meta-analysis. Leaving out this study resulted in a nonsignificant incidence rate ratio for suicide attempts of 3.83 (0.53–91.01). Conclu- sions: Therapists should be aware of the lack of proof from RCTs that antidepressants prevent suicides and suicide at- tempts. We cannot conclude with certainty whether antide- pressants increase the risk for suicide or suicide attempts. Researchers must report all suicides and suicide attempts in RCTs.
Food systems transformation: what is the role of pulses in the sustainability...ExternalEvents
http://www.fao.org/globalsoilpartnership/en/
This presentation was presentaed during the seminar Soils & Pulses: symbiosis for life that took place at FAO HQ on 19 Apr 2016. it was made by Massimo Iannetta & Milena Stefanova and it presents the Food systems transformation.
This is the agenda and brochure for Pharma IQ's upcoming Regulatory Information Management conference on the 2nd and 3rd of March in London.
This conference is the only event of its kind to focus solely on the management of information within pharmaceutical companies, to aid regulatory submissions. The event will bring together leading professionals to focus on the regulatory aspects of data and information management.
Digital publishing has changed. Understand the base components that allow modern publishers to more easily publish content in multiple formats across multiple platforms.
Presentation originally developed by Apex VP and Principal Consultant Bill Kasdorf for a university press in June 2016, based on presentations on this subject that he has given to many organizations over the past ten years. Learn more at www.apexcovantage.com.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
Suicides and suicide attempts during long term treatment with antidepressants...Daryl Chow
Abstract
Background: It is unclear whether antidepressants can pre- vent suicides or suicide attempts, particularly during long- term use. Methods: We carried out a comprehensive review of long-term studies of antidepressants (relapse prevention). Sources were obtained from 5 review articles and by search- es of MEDLINE, PubMed Central and a hand search of bibli- ographies. We meta-analyzed placebo-controlled antide- pressant RCTs of at least 3 months’ duration and calculated suicide and suicide attempt incidence rates, incidence rate ratios and Peto odds ratios (ORs). Results: Out of 807 studies screened 29 were included, covering 6,934 patients (5,529 patient-years). In total, 1.45 suicides and 2.76 suicide at- tempts per 1,000 patient-years were reported. Seven out of 8 suicides and 13 out of 14 suicide attempts occurred in an- tidepressant arms, resulting in incidence rate ratios of 5.03 (0.78–114.1; p = 0.102) for suicides and of 9.02 (1.58–193.6; p = 0.007) for suicide attempts. Peto ORs were 2.6 (0.6–11.2; nonsignificant) and 3.4 (1.1–11.0; p = 0.04), respectively. Dropouts due to unknown reasons were similar in the anti-
depressant and placebo arms (9.6 vs. 9.9%). The majority of suicides and suicide attempts originated from 1 study, ac- counting for a fifth of all patient-years in this meta-analysis. Leaving out this study resulted in a nonsignificant incidence rate ratio for suicide attempts of 3.83 (0.53–91.01). Conclu- sions: Therapists should be aware of the lack of proof from RCTs that antidepressants prevent suicides and suicide at- tempts. We cannot conclude with certainty whether antide- pressants increase the risk for suicide or suicide attempts. Researchers must report all suicides and suicide attempts in RCTs.
Food systems transformation: what is the role of pulses in the sustainability...ExternalEvents
http://www.fao.org/globalsoilpartnership/en/
This presentation was presentaed during the seminar Soils & Pulses: symbiosis for life that took place at FAO HQ on 19 Apr 2016. it was made by Massimo Iannetta & Milena Stefanova and it presents the Food systems transformation.
This is the agenda and brochure for Pharma IQ's upcoming Regulatory Information Management conference on the 2nd and 3rd of March in London.
This conference is the only event of its kind to focus solely on the management of information within pharmaceutical companies, to aid regulatory submissions. The event will bring together leading professionals to focus on the regulatory aspects of data and information management.
Digital publishing has changed. Understand the base components that allow modern publishers to more easily publish content in multiple formats across multiple platforms.
Presentation originally developed by Apex VP and Principal Consultant Bill Kasdorf for a university press in June 2016, based on presentations on this subject that he has given to many organizations over the past ten years. Learn more at www.apexcovantage.com.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
As enterprises continue to push more or their data to the cloud, Salesforce has seen data volumes in its tenant orgs grow at an exponential rate. How do you manage such volumes efficiently? How do you build queries and reports that respond in a timely manner?
The need for sophistication in modern search engine implementationsBen DeMott
The need for more sophisticated search implementations is often at odds with the limited feature set available in modern out of the box open source search engines.
This presentation discusses the challenges associated with properly modeling information within a domain and why it's critically needed.
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
This talk introduces the main techniques of Recommender Systems and Topic Modeling. Then, we present a case of how we've combined those techniques to build Smart Canvas, a SaaS that allows people to bring, create and curate content relevant to their organization, and also helps to tear down knowledge silos.
We give a deep dive into the design of our large-scale recommendation algorithms, giving special attention to a content-based approach that uses topic modeling techniques (like LDA and NMF) to discover people’s topics of interest from unstructured text, and social-based algorithms using a graph database connecting content, people and teams around topics.
Our typical data pipeline that includes the ingestion millions of user events (using Google PubSub and BigQuery), the batch processing of the models (with PySpark, MLib, and Scikit-learn), the online recommendations (with Google App Engine, Titan Graph Database and Elasticsearch), and the data-driven evaluation of UX and algorithms through A/B testing experimentation. We also touch topics about non-functional requirements of a software-as-a-service like scalability, performance, availability, reliability and multi-tenancy and how we addressed it in a robust architecture deployed on Google Cloud Platform.
Short-Bio: Gabriel Moreira is a scientist passionate about solving problems with data. He is Head of Machine Learning at CI&T and Doctoral student at Instituto Tecnológico de Aeronáutica - ITA. where he has also got his Masters on Science. His current research interests are recommender systems and deep learning.
https://www.meetup.com/pt-BR/machine-learning-big-data-engenharia/events/239037949/
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
This talk introduces the main techniques of Recommender Systems and Topic Modeling.
Then, we present a case of how we've combined those techniques to build Smart Canvas (www.smartcanvas.com), a service that allows people to bring, create and curate content relevant to their organization, and also helps to tear down knowledge silos.
We present some of Smart Canvas features powered by its recommender system, such as:
- Highlight relevant content, explaining to the users which of his topics of interest have generated each recommendation.
- Associate tags to users’ profiles based on topics discovered from content they have contributed. These tags become searchable, allowing users to find experts or people with specific interests.
- Recommends people with similar interests, explaining which topics brings them together.
We give a deep dive into the design of our large-scale recommendation algorithms, giving special attention to our content-based approach that uses topic modeling techniques (like LDA and NMF) to discover people’s topics of interest from unstructured text, and social-based algorithms using a graph database connecting content, people and teams around topics.
Our typical data pipeline that includes the ingestion millions of user events (using Google PubSub and BigQuery), the batch processing of the models (with PySpark, MLib, and Scikit-learn), the online recommendations (with Google App Engine, Titan Graph Database and Elasticsearch), and the data-driven evaluation of UX and algorithms through A/B testing experimentation. We also touch topics about non-functional requirements of a software-as-a-service like scalability, performance, availability, reliability and multi-tenancy and how we addressed it in a robust architecture deployed on Google Cloud Platform.
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon
SolidWorks World Presentation from Paul Gimbel at Razorleaf. This presentation deals with the use of Microsoft Excel and Visual Basic for Applications as a front end to driving SolidWorks geometry in a design automation implementation.
Strong Compare and Contrast Essay Examples. Compare And Contrast Essay Outline Mla. How to Write a Compare and Contrast Essay | Literacy Ideas. Stirring Compare Contrast Essay Example ~ Thatsnotus.
Similar to Relevancy and synonyms - ApacheCon NA 2013 - Portland, Oregon, USA (20)
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Relevancy and synonyms - ApacheCon NA 2013 - Portland, Oregon, USA
1.
2. About us
• Founded in 2007
• B2B Startup
• Semantic Services started in
2011
• Big Data projects in 2012
• Training, Consultancy and
Support on Solr and Hadoop
3. About me
• Leo Oliveira
• 15 years of experience with
websites and search engines
• Specialized in Relevancy and Semantics
• Graduated in Business Management & IT
Innovation
4. The Importance of Relevancy
The game changer in Search
Google became what it is due to relevancy algorithms and PageRank
Can be achieved through many different types of data
Cross-reference, log analysis, social media, market research, new ideas etc
It’s about the user
And not about what we think of the user
If you don’t understand your data, find what is relevant through
research
Research user behavior and needs and you’ll find a way of finding relevant
information or you will find a way of discovering what is relevant automatically.
If the user can’t find what they want using your search…
… they’ll leave.
5. The Importance of having a resultset
Sometimes you have little data and users get a lot of zero-result
queries.
Then it’s time to find the right words for the job
Synonym discovery, stemming
Analysis: understand what is the user searching for
Suggest other options
Sometimes a zero-result query is just typo. Use suggestions to find the right
results.
Create an interface that makes it easy for the user to try different keywords to get
results
Analyze words that are searched together by the same users to create special
suggestions or second searches that brings some results
6. Relevancy and Semantics
THE RELEVANCY RULES (the most forgotten ones)
Phrases are more relevant the single words
“Pure” words, that is, words the way they were typed, are more relevant than
stemmed words or even synonyms
In general, newest docs are more relevant than old docs, but this rule could be
reversed.
Closer venues and locations are more relevant than distant ones.
Give the right weights to the right fields (seems simple, but it’s not)
When you have a mix of relevant things…
for instance, in an e-commerce, that you must consider the freshest docs, plus the
smaller prices and the best sellers, sometimes you need to create a math formula to
cope with all these items for relevancy. And it’s tricky.
You can boost each parameter separately, but then you must test your formulas
very well not to prioritize one parameter more than the other.
7. Relevancy and Weighting
USING DISMAX (an example of relevancy configuration)
By using Dismax, you have two main options: qf and pf.
Phrase Fields (pf) should have bigger weights than all of your Query Fields (qf)
Field weights must be decided carefully. You can use a scale, such as Fibonacci
numbers, so as to not lose control of what is more relevant and what is not that relevant in
your “search formula”
EXAMPLE OF pf for a movies website:
MovieName^34 MovieActors^21 MovieDirector^13 MovieYear^8
EXAMPLE OF qf:
MovieName^21 MovieActors^13 MovieDirector^8 MovieYear^5
9. Synonym Theory Vs Synonym in Search
In search, synonyms can get complicated
You don’t need to use only “real” synonyms
You must focus on user needs rather than “thesaurus”
You can use it for other useful stuff, such as the most common typos to be
corrected or even some unrelated words that could do the job for you
Sometimes you just need to get creative.
10. 1-way transformations
SynonymFilter
E. g. Compter => computer
Very useful for common typos or misconceptions
Sometimes requires expansion.
In our setup, we’ll set a fieldtype that will do the job.
11. 2-way transformations
SynonymFilter
E. g. computer, pc, mac, notebook, server
Expansion of meanings
Sometimes you don’t use exact synonyms. Brand names and other “real world” terms can
be used.
It’s hard to configure and maintain 1-way and 2-way in a single synonyms.txt file.
12. Index Time Vs Query Time
Most synonyms should work with Query Time only
But there are exceptions. E. g.: photoalbum, photo album
When using a phrase synonym, this might require an index time synonym to work
correctly
In our fieldtype, we’ll discuss how to achieve this.
This index time synonym file is also useful for search when you need symbols in some
terms that could be droped by the tokenizer. E. g. C++ => cpp or .NET => dotnet etc
Keep in mind that this index time file will have the exceptions that don’t work well in query-
time. To figure this out, sometimes it requires testing, sometimes it is easy to identify, such
as phrase synonyms.
13. Some examples
Query time
Enginer => Engineer (1-way)
Engineer, engineering (2-way with expand=true)
Computer, pc, mac
Index time
C# => csharp
.NET => dotnet
Human resources, HR
Business intelligence, BI
CRM, Costumer Relationship Management
19. Creating a relevant formula for search
Imagine you have a field named “Product_name”
To keep search result semantically relevant, you’ll need a “Product_name” field and also a
“Product_name_pure”
Keep in mind that phrases are also more relevant than individual words
In that case, here’s how Solrconfig would be:
QF (query fields, single words)
Product_name_pure^5 Product_name^3
PF (phrase fields)
Product_name_pure^8 Product_nameˆ5
20. What will you get?
Relevant results semantically
Users will understand the results
Synonyms or stemmed tokens won’t disturb or create noise
Similar to what main search websites are doing for many different things, such as
document search, e-commerce, intranet search, document search, library search etc
Be able to find relevant results even in a scenario with millions or billions of documents.