- Solr is a search engine that indexes document content and provides fast full-text search and faceted search capabilities. It uses Lucene under the hood to index and search documents.
- The document discusses Solr's architecture and capabilities for indexing, searching, and filtering large collections of documents. It also compares Solr to traditional RDBMS systems and how they are meant to complement each other.
- The key aspects of Solr covered include its use of schemas and fields to define document structure, faceting to filter search results, and SolrCloud architecture for distributed searching across multiple servers and shards.
Steve will show how and why to use Solr’s new Schemaless Mode, under which document indexing can be performed with no up-front schema configuration. Solr uses content clues to choose among a predefined set of field types and then automatically add previously unseen fields to the schema.
Die Präsentation zeigt die Features und Funktionen von JSON in der Oracle Datenbank. Das Ganze wird demonstriert an einem durchgehenden Beispiel. Gezeigt wurden die Präsentation auf der APEX connect 2018
Solr is a highly scalable and fast open source enterprise search platform from the Apache Lucene project. Let's explore why some of the largest Internet sites in the world are giving a preference to its many exciting features.
Introduction to Solr. A brief introduction to Solr for the resources who wants to get trained on Solr.
1. Introduction to Solr
2. Solr Terminologies
3.Installation and Configuration
4. Configuration files schema.xml and solrconfig.xml
5. Features of SOLR
a. Hit Highlighting
Auto Complete / Suggester
Stop words
Synonyms
SpellCheck
Geo Spatial Search
Result Grouping
Query Syntax
Query Boosting
Content Spotlighting
Block Record / Remove URL Feature
Content Spotlighting / Merchandising / Banner / Elevate
Block Record / Remove URL Feature
6. Indexing the Data
7. Search Queries
8. DataImportHandler - DIH
9. Plugins to index various types of Data (XML, CSV, DB, Filesystem)
10. Solr Client APIs
11. Overview of SOLRJ API
12. Running Solr on Tomcat
13. Enabling SSL on Solr
14. Zookeeper Configuration
15. Solr Cloud Deployment
16. Production Indexing Architecture
17. Production Serving Architecture
18. Solr Upgradation
19. References
Introduction to the basics of Information Retrieval (IR) with an emphasis on Apache Solr/Lucene. A lecture I gave during the JOSA Data Science Bootcamp.
Steve will show how and why to use Solr’s new Schemaless Mode, under which document indexing can be performed with no up-front schema configuration. Solr uses content clues to choose among a predefined set of field types and then automatically add previously unseen fields to the schema.
Die Präsentation zeigt die Features und Funktionen von JSON in der Oracle Datenbank. Das Ganze wird demonstriert an einem durchgehenden Beispiel. Gezeigt wurden die Präsentation auf der APEX connect 2018
Solr is a highly scalable and fast open source enterprise search platform from the Apache Lucene project. Let's explore why some of the largest Internet sites in the world are giving a preference to its many exciting features.
Introduction to Solr. A brief introduction to Solr for the resources who wants to get trained on Solr.
1. Introduction to Solr
2. Solr Terminologies
3.Installation and Configuration
4. Configuration files schema.xml and solrconfig.xml
5. Features of SOLR
a. Hit Highlighting
Auto Complete / Suggester
Stop words
Synonyms
SpellCheck
Geo Spatial Search
Result Grouping
Query Syntax
Query Boosting
Content Spotlighting
Block Record / Remove URL Feature
Content Spotlighting / Merchandising / Banner / Elevate
Block Record / Remove URL Feature
6. Indexing the Data
7. Search Queries
8. DataImportHandler - DIH
9. Plugins to index various types of Data (XML, CSV, DB, Filesystem)
10. Solr Client APIs
11. Overview of SOLRJ API
12. Running Solr on Tomcat
13. Enabling SSL on Solr
14. Zookeeper Configuration
15. Solr Cloud Deployment
16. Production Indexing Architecture
17. Production Serving Architecture
18. Solr Upgradation
19. References
Introduction to the basics of Information Retrieval (IR) with an emphasis on Apache Solr/Lucene. A lecture I gave during the JOSA Data Science Bootcamp.
In this On-Demand Webinar, Erik Hatcher, co-founder of Lucid Imagination, co-author of Lucene in Action, and Lucene/Solr PMC member and committer, presents and discusess key features and innovations of Apache Solr 1.4
The next major release of Solr is right around the corner! Join Solr Committer Cassandra Targett and Lucidworks SVP of Engineering Trey Grainger for a first look into what’s included in the upcoming release.
A presentation given at the Lucene/Solr Revolution 2014 conference to show Solr and Elasticsearch features side by side. The presentation time was only 30 minutes, so only the core usability features were compared. The full video is embedded on the last slide.
Créer et gérer une scratch org avec Visual Studio CodeThierry TROUIN ☁
Créer et gérer une scratch org avec Visual Studio Code, c’est ce que l’on vous propose de découvrir lors de cette session.Après vous avoir présenté rapidement SFDX, je vous expliquerai comment créer une « Scratch Org » (organisation nouvelle, gratuite, vierge et éphémère) à partir de Visual Studio Code. A l’issue de la session, vous aurez tous les éléments nécessaire pour être autonome afin de créer et gérer ce nouveau type d’organisation.
Comment utiliser Visual Studio Code pour travailler avec une scratch OrgThierry TROUIN ☁
Comment travailler sur une Scratch org avec Visual studio Code, c’est ce qu’on vous propose avec la venue de Thierry, Ligthning Champion et Leader du groupe de Toulouse (France). Salesforce DX (developer Experience) est une suite d'outils permettant notamment de créer et gérer une organisation. Après vous avoir présenté rapidement cet outil, je vous expliquerai comment créer une « Scratch Org » (organisation nouvelle, gratuite, vierge et éphémère) à partir de Visual Studio Code.
A l’issue de la session, vous aurez seulement 3 commandes à retenir pour être autonome et créer et gérer une org de demo ou un POC.
On fera un jeu à la fin pour connaitre les gagnants des vouchers de certifications Salesforce.
Webinar: Simplifying Persistence for Java and MongoDBMongoDB
Jeff Yemin will host a webinar covering the design and major features of Morphia, an Object Document Mapper (ODM) for Java and MongoDB. This webinar will start with a short introduction to MongoDB and the various options for building MongoDB applications on the JVM before taking a deep dive into Morphia. Morphia will be presented as an extended example format that demonstrates, for each feature, the domain model, a test driver, and the results as they appear in MongoDB.
getting started guide for web scraping using scrapy framework.
GitHub Link : https://github.com/zekelabs/Python---ML---DL---PySpark-Training/tree/master/Scrapy%20Projects
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
In this On-Demand Webinar, Erik Hatcher, co-founder of Lucid Imagination, co-author of Lucene in Action, and Lucene/Solr PMC member and committer, presents and discusess key features and innovations of Apache Solr 1.4
The next major release of Solr is right around the corner! Join Solr Committer Cassandra Targett and Lucidworks SVP of Engineering Trey Grainger for a first look into what’s included in the upcoming release.
A presentation given at the Lucene/Solr Revolution 2014 conference to show Solr and Elasticsearch features side by side. The presentation time was only 30 minutes, so only the core usability features were compared. The full video is embedded on the last slide.
Créer et gérer une scratch org avec Visual Studio CodeThierry TROUIN ☁
Créer et gérer une scratch org avec Visual Studio Code, c’est ce que l’on vous propose de découvrir lors de cette session.Après vous avoir présenté rapidement SFDX, je vous expliquerai comment créer une « Scratch Org » (organisation nouvelle, gratuite, vierge et éphémère) à partir de Visual Studio Code. A l’issue de la session, vous aurez tous les éléments nécessaire pour être autonome afin de créer et gérer ce nouveau type d’organisation.
Comment utiliser Visual Studio Code pour travailler avec une scratch OrgThierry TROUIN ☁
Comment travailler sur une Scratch org avec Visual studio Code, c’est ce qu’on vous propose avec la venue de Thierry, Ligthning Champion et Leader du groupe de Toulouse (France). Salesforce DX (developer Experience) est une suite d'outils permettant notamment de créer et gérer une organisation. Après vous avoir présenté rapidement cet outil, je vous expliquerai comment créer une « Scratch Org » (organisation nouvelle, gratuite, vierge et éphémère) à partir de Visual Studio Code.
A l’issue de la session, vous aurez seulement 3 commandes à retenir pour être autonome et créer et gérer une org de demo ou un POC.
On fera un jeu à la fin pour connaitre les gagnants des vouchers de certifications Salesforce.
Webinar: Simplifying Persistence for Java and MongoDBMongoDB
Jeff Yemin will host a webinar covering the design and major features of Morphia, an Object Document Mapper (ODM) for Java and MongoDB. This webinar will start with a short introduction to MongoDB and the various options for building MongoDB applications on the JVM before taking a deep dive into Morphia. Morphia will be presented as an extended example format that demonstrates, for each feature, the domain model, a test driver, and the results as they appear in MongoDB.
getting started guide for web scraping using scrapy framework.
GitHub Link : https://github.com/zekelabs/Python---ML---DL---PySpark-Training/tree/master/Scrapy%20Projects
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
The presentation describes what is Apache Solr, how it could be used. There is apache solr overview, performance tuning tips and advanced features description
Search engines, and Apache Solr in particular, are quickly shifting the focus away from “big data” systems storing massive amounts of raw (but largely unharnessed) content, to “smart data” systems where the most relevant and actionable content is quickly surfaced instead. Apache Solr is the blazing-fast and fault-tolerant distributed search engine leveraged by 90% of Fortune 500 companies. As a community-driven open source project, Solr brings in diverse contributions from many of the top companies in the world, particularly those for whom returning the most relevant results is mission critical.
Out of the box, Solr includes advanced capabilities like learning to rank (machine-learned ranking), graph queries and distributed graph traversals, job scheduling for processing batch and streaming data workloads, the ability to build and deploy machine learning models, and a wide variety of query parsers and functions allowing you to very easily build highly relevant and domain-specific semantic search, recommendations, or personalized search experiences. These days, Solr even enables you to run SQL queries directly against it, mixing and matching the full power of Solr’s free-text, geospatial, and other search capabilities with the a prominent query language already known by most developers (and which many external systems can use to query Solr directly).
Due to the community-oriented nature of Solr, the ecosystem of capabilities also spans well beyond just the core project. In this talk, we’ll also cover several other projects within the larger Apache Lucene/Solr ecosystem that further enhance Solr’s smart data capabilities: bi-directional integration of Apache Spark and Solr’s capabilities, large-scale entity extraction, semantic knowledge graphs for discovering, traversing, and scoring meaningful relationships within your data, auto-generation of domain-specific ontologies, running SPARQL queries against Solr on RDF triples, probabilistic identification of key phrases within a query or document, conceptual search leveraging Word2Vec, and even Lucidworks’ own Fusion project which extends Solr to provide an enterprise-ready smart data platform out of the box.
We’ll dive into how all of these capabilities can fit within your data science toolbox, and you’ll come away with a really good feel for how to build highly relevant “smart data” applications leveraging these key technologies.
This talk was given during Lucene Revolution 2017.
They say optimize is bad for you, they say you shouldn't do it, they say it will invalidate operating system caches and make your system suffer. This is all true, but is it true in all cases?
In this presentation we will look closer on what optimize or better called force merge does to your Solr search engine. You will learn what segments are, how they are built and how they are used by Lucene and Solr for searching. We will discuss real-life performance implications regarding Solr collections that have many segments on a single node and compare that to the Solr where the number of segments is moderate and low. We will see what we can do to tune the merging process to trade off indexing performance for better query performance and what pitfalls are there waiting for us. Finally, at the end of the talk we will discuss possibilities of running force merge to avoid system disruption and still benefit from query performance boost that single segment index provides.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
1. What is Solr?
2. When should I use Solr vs. Azure Search?
3. Why is Solr great (and its downside)?
4. How does Solr compare to Azure Search?
5. Why SearchStax? (Solr is complex; SearchStax makes it as easy as Azure Search)
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
8. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
9. • Solr is not meant to entirely replace your RDBMS but rather complement it
• One of the things that Solr does best is to answer a question such as: “What
is a list of the most relevant documents or fields, that possibly match query
‘XYZ’? ” In this case, your “XYZs” might match a bunch of documents or
records with fields that Solr may have tokenized, stored, queried and/or
ranked to produce a list of result documents.
• Solr also provides the ability to quickly filter results by facet fields enriching
the search experience so that your users can narrow the list to find the right
item or set of items based on faceted fields.
10. • By contrast, the RDBMS in its classic implementation is meant to answer
questions such as: exact match queries, e.g., “give me all records in my users
table with a creation date after Oct 1, 2009”; or, reporting-related queries,
like “what is the average file size of images uploaded to my photo site
grouped by user and date”
• There is one other thing the RDBMS does quite well: efficiently executing a
series of inserts and updates for a transaction, rolling back if one of those
operations failed (also known as ACID properties: Atomicity, Consistency,
Isolation, Durability).
• The best way to think about Solr is that it’s a quickly searchable view of your
data. A well-designed application can use the best of both these
approaches, utilizing Solr to help users find the most relevant documents and
then use your RDBMS to query for more precise additional information to
better present the results to the end user.
11. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
12. collection
shard1 192.xxx.xxx.111
shard2 192.xxx.xxx.222
shard3 192.xxx.xxx.333
192.xxx.xxx.222
192.xxx.xxx.333
192.xxx.xxx.111
Shard leader
Shard leader
Shard leader
Zookeeper
Physical nodes
SolrTerminology description
Collection A complete logical index in a SolrCloud cluster.
It is associated with a config set and is made up of one or more shards
Config Set A set of config files necessary for a collection to function properly.
At minimum this will consist of solrconfig.xml and schema.xml, but
depending on the contents of those two files, may include other files
Config Set
logical
13. SolrTerminology description
Shard A logical piece (or slice) of a collection.
Each shard is made up of one or more replicas.
An election is held (by zookeeper) to determine which replica is the
leader.
Replica One copy of a shard. One of them will be elected to be the leader
collection
shard1 192.xxx.xxx.111
shard2 192.xxx.xxx.222
shard3 192.xxx.xxx.333
192.xxx.xxx.222
192.xxx.xxx.333
192.xxx.xxx.111
Shard leader
Shard leader
Shard leader
Zookeeper
Config Set
Physical nodeslogical
14. SolrTerminology description
Shard Leader The shard replica that has won the leader election.
Elections can happen at any time, but normally they are only triggered by
events like a Solr instance going down.
When documents are indexed, SolrCloud will forward them to the leader
of the shard, and the leader will distribute them to all the shard replicas.
collection
shard1 192.xxx.xxx.111
shard2 192.xxx.xxx.222
shard3 192.xxx.xxx.333
192.xxx.xxx.222
192.xxx.xxx.333
192.xxx.xxx.111
Shard leader
Shard leader
Shard leader
Zookeeper
Config Set
Physical nodeslogical
15. SolrTerminology description
Zookeeper SolrCloud requires Zookeeper to handles leader elections.
It is recommended that it be standalone, installed separately from Solr.
A majority of servers are needed to provide service (e.g., 5 zookeeper
servers are needed to allow for the failure of up to 2 servers at a time.).
Zookeeper can run on the same hardware as Solr, and many users do run
it on the same hardware.
collection
shard1 192.xxx.xxx.111
shard2 192.xxx.xxx.222
shard3 192.xxx.xxx.333
192.xxx.xxx.222
192.xxx.xxx.333
192.xxx.xxx.111
Shard leader
Shard leader
Shard leader
Zookeeper
Config Set
Physical nodeslogical
16. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
17. This is a query respose
This is a solr doc
These are fields defined in schema.xml
18. • The content of schema.xml looks roughly like this:
<schema name="your name" version="x.x">
<types> </types>
<fields> </fields>
<uniqueKey> </uniqueKey>
</schema>
Field types
Fields
A field
19. • The content of schema.xml looks roughly like this:
<schema name="your name" version="x.x">
<types> </types>
<fields> </fields>
<uniqueKey> </uniqueKey>
</schema>
Field types
Fields
A field
• Attribute "name" is the name of this schema and is only used for display
purposes.
• Version="x.y" is Solr's version number for the schema syntax and semantics.
It should not normally be changed by applications.
• Field “uniqueKey” is used to determine and enforce document uniqueness.
This is not required.
But if you don’t need it, you’d better have a good reason.
21. Property Description values
name The name of the fieldType. It is strongly recommended that names
consist of alphanumeric or underscore characters only and not start
with a digit.
class The class name that gets used to store and index the data for this
type.
indexed If true, then this filed is searchable, sortable, and facetable. true/false
stored If true, the actual value of the field can be retrieved by queries true/false
required Instructs Solr to reject any attempts to add a document which does
not have a value for this field. This property defaults to false.
true/false
multiValued If true, indicates that a single document might contain multiple
values for this field type
true/false
• General Properties:
22. Property Description values
positionIncrementGap For multivalued fields, specifies a distance between
multiple values, which prevents spurious phrase
matches
Integer
autoGeneratePhraseQueri
es
For text fields. If true, Solr automatically generates
phrase queries for adjacent terms. If false, terms must
be enclosed in double-quotes to be treated as phrases.
true/false
docValues If true, the value of the field will be put in a column-
oriented DocValues structure (this is for performance
boost in sorting, faceting, highlighting)
true/false
sortMissingFirst
sortMissingLast
If sortMissingLast="true", then a sort on this field
will cause documents without the field to come after
documents with the field, regardless of the requested
sort order (asc or desc).
true/false
• General Properties:
23. <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
• This is a general text field that has reasonable, generic cross-language defaults: it
tokenizes with StandardTokenizer, removes stop words from case-insensitive
"stopwords.txt"(empty by default), and down cases.
• At query time only, it also applies synonyms.
24. <dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_is" type="int" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true" />
<dynamicField name="*_ss" type="string" indexed="true" stored="true" multiValued="true"/>
• Dynamic fields allow Solr to index fields that you did not explicitly define in your
schema. This is useful if you discover you have forgotten to define one or more fields.
(We note that changing schema in solr is very costly!)
• A dynamic field is just like a regular field except it has a name with a wildcard in it.
• For example, suppose your schema includes a dynamic field with a name of *_i.
If you attempt to index a document with a cost_i field, but no explicit cost_i field is
defined in the schema, then the cost_i field will have the field type and analysis
defined for *_i.
• In practice, we put dynamic fields in schema.xml for all data types for greatest schema
flexibility.
• An example:
25. <copyField source="cat" dest="text"/>
<copyField source="name" dest="text"/>
<copyField source="manu" dest="text"/>
<copyField source="features" dest="text"/>
• A copyField copies one field to another at the time a document is added to the index.
In the above example, content in four fields (cat, name, manu, features) will be copied
to field text.
• When to use copyField?
1. If you want to provide a default search field that essentially search several fields at
the same time when a query comes to that default field.
2. If you want to send the same data to two different field at the same time.
• An example:
<!-- Create a string version of author for faceting -->
<copyField source="author" dest="author_s"/>
26. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
29. • fq: This parameter can be used to specify a query that can be used to
restrict the super set of documents that can be returned, without influencing
score.
• It can be very useful for speeding up complex queries since the queries
specified with fq are cached independently from the main query. Caching
means the same filter is used again for a later query.
• An example:
http://localhost:8983/solr/select?
q=cars
&fq=color:black
&fq=model:Lamborghini
&fq=year:[2014 TO *]
• By default, Solr resolves all of the filters before the main query. Each filter
query is looked up individually in Solr’s filterCache
30. • The following parameters are used for spatial search:
Parameter Description
d the radial distance, in kilometers
pt the center point using the format "lat,lon" if latitude & longitude.
sfield a spatial indexed field
• geofilt: For example, to find all documents within five kilometers of a given
lat/lon point, you could enter
&q=*:*&fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5
31. • bbox: very similar to geofilt except it uses the bounding box of
the calculated circle. Here's a sample query:
• The rectangular shape is faster to compute and so it's sometimes
used as an alternative to geofilt when it's acceptable to return
points outside of the radius.
&q=*:*&fq={!bbox sfield=store}&pt=45.15,-93.85&d=5
32. • geodist: a distance function that takes three optional
parameters: (sfield,latitude,longitude). You can use the geodist
function to sort results by distance or score return results.
• For example, to sort your results by ascending distance, enter
&q=*:*&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&sort=geodist asc.
• To return the distance as the document score, enter
&q={!func}geodist()&sfield=store&pt=45.15,-93.85&sort=score+asc.
33. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
34. • It’s easiest to understand what faceted search is through an example,
appropriately from CNET Reviews, the first website to use Solr even before it
had been contributed to Apache by CNET.
35. • Faceted search provides an effective way to allow users to refine search
results, continually drilling down until the desired items are found. The
benefits include
1. Superior feedback – users can see at a glance a summary of the search
results and how those results break down by different criteria.
2. No surprises or dead ends – users know how many results match before
they click. Values with zero counts are normally removed to reduce visual
noise and eliminate the possibility of a user accidentally selecting a
constraint that would lead to no results.
3. No selection hierarchy is imposed – users are generally free to add or
remove constraints in any order
36. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
37. • Submitting using solrj (an example):
String ZookeeperQuorum="192.168.10.1,192.168.10.2,192.168.10.3";
CloudSolrServer server=new CloudSolrServer(ZookeeperQuorum);
server.setDefaultCollection("yourCollection");
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id","111");
doc.addField("intfield",100);
doc.addField("StringField","some data");
server.connect();
server.add(doc);
server.commit();
• Note:
1. In this example we specify a collection, but we do not specify which shard
to submit. Solr automatically performs load balancing between shards so
that sizes of shards will roughly be equal.
2. If you like, you may specify which shard to submit.
3. In this example we do not need to know where the solr instances are – this
is managed by Zookeeper
38. • Updating a doc using solrj (an example):
doc.addField("id", id);
Map<String, String> importIDupdate = new HashMap<String, String>();
importIDupdate.put("set", “A1234567”);
doc.addField("importID", importIDupdate);
Need to specify a unique key syntax, fixed
39. • Import CSV files using curl:
• Suppose we have a CSV file in example/exampledocs/books.csv
Example of using HTTP-POST to send the CSV data over the
network to the Solr server:
cd example/exampledocs
curl http://localhost:8983/solr/update/csv --data-binary @books.csv -H 'Content-type:text/plain;
charset=utf-8'
40. • Suppose the collection is properly configured. Then you can
pass a rich text (word, pdf, ppt, …) to solr for indexing by using
the following api:
ContentStreamUpdateRequest up= new ContentStreamUpdateRequest("/update/extract");
up.addFile(new File(filePath), "application/xml; charset=UTF-8");
/* the literal.id=doc1 param provides the necessary unique id for the document being indexed */
up.setParam("literal.id", solrId);
/*The uprefix=attr_ param causes all generated fields that aren't defined in the schema to be prefixed *
* with attr_ (which is a dynamic field that is stored) */
up.setParam("uprefix", "attr_");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);
41. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
42. • Delete all documents in Solr
http://host:8983/solr/core/update?stream.body=<delete><query>*:*</quer
y></delete>
Delete documents with id in a range:
http://host:8983/solr/update?stream.body=<delete><query>id:[200000001
TO 200000002]</query></delete>&commit=true
If you want to delete items that matches more than one field,
just add another query:
http://host:8983/solr/update?stream.body=
<delete><query>id:298253</query>
<query>entitytype:BlogEntry</query></delete>&commit=true
43. • What is Solr
• Solr VS RDBMs
• SolrCloud Architecture
• Docs, Fields & Schema Design
• Searching
• Faceting
• Submitting Docs
• Deleting Docs
• Import From DB (a test)
44. • Firtst, I set up a collection for test as follows:
• Go to the official website and download solr-4.6.0.tgz
• Copy solr-4.6.0.tgz to /home/ziv
• In the terminal, go to /home/ziv and unzip solr-4.6.0.tgz
• Now we have /home/ziv/solr-4.6.0
$tar zxvf solr-4.6.0.tgz
45. Go to /home/ziv/solr-4.6.0/example
Create a collection for DB import test:
Created by me.
Copied from
example-DIH
dataImportHandler
Official Example
config
46. Look into example-DIH-test
This is the primary configuration file Solr looks for when starting.
This file specifies the list of "SolrCores" it should load, and high
level configuration options that should be used for all SolrCores.
48. Look into the db folder
This directory is mandatory and must contain your
solrconfig.xml and schema.xml.
Any other optional configuration files would also be kept here.
This directory is the default location where Solr will keep your index, and
is used by the replication scripts for dealing with snapshots.
You can override this location in the conf/solrconfig.xml.
Solr will create this directory if it does not already exist.
49. Look into the solr/db folder
This directory is optional.
If it exists, Solr will load any Jars found in this directory and use them to
resolve any "plugins“ specified in your solrconfig.xml or schema.xml (ie:
Analyzers, Request Handlers, etc...).
Alternatively you can use the <lib> syntax in conf/solrconfig.xml to direct
Solr to your plugins.
50. Look into the solr/db/conf folder
Schema definition
Path setup, UpdateHandler
setup, RequestHandler
setup, …
Specify what are going
to be imported from DB
to Solr
51. Write the following content to solrconfig.xml
<lib dir="../../../../dist/" regex="solr-dataimporthandler-.*.jar" />
<lib dir="../../../../dist/" regex="postgresql-d.*.jar" />
Write the following content to db-data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost/SolrTest"
user="postgres"
password="postgres"/>
<document>
<entity name="id"
query="select id,features from solrtest">
</entity>
</document>
</dataConfig>
52. Download jdbc driver for postgresql and put it in
/home/ziv/solr-4.6.0/dist/
Now run up this core:
$java -Dsolr.solr.home="./example-DIH-test/solr/" -jar start.jar