SlideShare a Scribd company logo
1 of 38
Searching in the Cloud
Arkadiusz Masiakiewicz
Michał Warecki
About us
Arkadiusz Masiakiewicz
●
NoSQL Databases
●
Full-text search
engines
●
Neural networks
●
Scuba diving
Michał Warecki
●
GC
●
JiT Compilers
●
Concurrency
●
Non-blocking algorithms
●
Programming languages
runtime
●
Astronomy
Agenda
●
Introduction to Apache Solr
●
Data in PayU
●
About SolrCloud
●
Performance
About Apache Solr
●
Full-text search server based on Apache
Lucene
●
Platform independent
●
Buzzword compatible (sharding,
replication, clustering, cloud, scaling, big
data, ...)
Where to use Solr?
Who uses Solr?
About Apache Solr
Indexer / Scheduler
Lucene Index
Solr server
Schema Config
Documents / Update query
Client application
Query (HTTP)
Response
Web container
Platform independent
(XML|JSON|PHP...)/HTTP
HTTP-based queries
●
http://localhost:8080/solr/select?q=query
–
&start=50
–
&rows=25
–
&fq=field:ala
–
&facet=on&facet.field=category
–
&sort=dist(2, point1, point2) desc
Solr schema
●
name
●
type
●
indexed
●
stored
●
multiValued
●
required
Solr schema
●
StrField
–
String (UTF-8 encoded string or Unicode).
●
TextField
–
Text, usually multiple words or tokens.
●
TrieDateField
–
Date field accessible for Lucene TrieRange processing.
●
TrieDoubleField, TrieFloatField, TrieIntField, TrieLongField
●
UUIDField
–
Universally Unique Identifier (UUID). Pass in a value of "NEW" and Solr
will create a new UUID.
●
CurrencyField
–
Supports currencies and exchange rates.
Data import
Solr
Data feeders
XML Documents
●
XML
●
CSV
●
File system
●
Web spiders
●
RSS/Atom
●
POJO
●
E-mail
●
DB
Data configuration
●
data source
●
document
–
entity(query, delta
query)
●
field
Solr
DB
Update query
SQ
L
DIH
Plugins
●
Search components
●
Request handlers
●
Process factory
Data in PayU
●
~100 GB index size
●
~400 milions of documents to
search
●
More than 30 fields in schema
Data flow in PayU
Client application
Solr
DB
1: search params
3: DB id's
2: DB id's
4: Data
Connecting to your old
app
Hibernate criteria
JPA criteria
Custom criteria
q=field1:testvalue&wt=javabin
Traditional Solr
architecture
Solr shard1
- config
- schema
Solr shard1 replica
- config
- schema
Solr shard1
- config
- schema
Solr shard1
- config
- schema
Solr shard2 replica
- config
- schema
Solr shard2
- config
- schema
✗ manually copy config
✗ manually split index
✗ manually shard queries
✗ add replica
✗ manually copy config
✗ setup replication
✗ does not provide fail-over
✗ separate monitoring
Zookeeper cluster
SolrCloud architecture
confClusterstate
conf
conf
conf
Zookeeper
Solr instance1
Shard1
leader
Shard2
Replica
Solr instance2
Shard1
replica
Shard2
leader
confClusterstate
conf
conf
conf
Zookeeper
confClusterstate
conf
conf
conf
Zookeeper2
Zookeeper1
Zookeeper3
Add document
ZooKeeperZooKeeperZooKeeper
Assigning machines
Number of shards : 3
Shard1
ZooKeeperZooKeeperZooKeeper
Assigning machines
Number of shards : 3
Shard1 Shard2
ZooKeeperZooKeeperZooKeeper
Assigning machines
Number of shards : 3
Shard1 Shard2 Shard3
Now you can search and index
ZooKeeperZooKeeperZooKeeper
Assigning machines
Shard1 Shard2 Shard3
Replica
shard1
ZooKeeperZooKeeperZooKeeper
Assigning machines
Shard1 Shard2 Shard3
Replica
shard1
Replica
shard2
ZooKeeperZooKeeperZooKeeper
Assigning machines
Shard1 Shard2 Shard3
Replica
shard1
Replica
shard2
Replica
shard3
ZooKeeper configuration
{"collection1":{
"shards":{
"shard1":{
"range":"80000000-8443ffff",
"state":"active",
"replicas":{
"10.205.33.92:8080_solr_collection1_shard1_replica2":{
"state":"active",
"core":"collection1_shard1_replica2",
"node_name":"10.205.33.92:8080_solr",
"base_url":"http://10.205.33.92:8080/solr",
"leader":"true"},
"10.205.33.93:8080_solr_collection1_shard1_replica1":{
"state":"active",
"core":"collection1_shard1_replica1",
"node_name":"10.205.33.93:8080_solr",
"base_url":"http://10.205.33.93:8080/solr"}}},
SolrCloud
●
Central Config in Zookeeper
●
Automatic Fail-Over
●
Near-Realtime
●
Leader Election
●
Optimistic Locking
●
Durable writes
Sharding in SolrCloud
Collection – i.e. books
Shard1
Books part1 replica replica
Shard3
Books part3 replica replica
Shard2
Books part2 replica replica
Sharding in SolrCloud
“--Ile dajemy shardów? 72?
-- ee, daj 100 do pełnego :-)”
Sharding in SolrCloud
●
Implicit documents distributing
–
uniqeId.hashCode() % numServers.
●
Composite key
–
groupId!uniqeId
–
groupId.hashCode() in shard hash range
1-10 11-20
21-30 31-40
hashCode=5 hashCode=10
hashCode=35
hashCode=15
?shard.keys=xxx!
hashCode=35
SolrJ
●
HTTP Solr Server
●
Embedded Solr Server
–
Does not require HTTP
●
Cloud Solr Server
–
Pass ZooKeeper hosts
SolrJ POJO
public class Item {
@Field
String id;
@Field("cat")
String[] categories;
@Field
List<String> features;
}
//...
Item item = new Item();
item.id = "one";
item.categories = new String[] { "aaa", "bbb", "ccc" };
server.addBean(item);
Cache
●
Filter cache
●
Field value cache
●
Query result cache
●
Document cache
●
User/Generic Caches
Cache implementations
●
Least recent used (LRU)
●
Fast LRU
●
Least frequent used
Warming queries
<lst>
<str name="q">solr</str>
<str name="sort">testDate desc</str>
</lst>
FQ vs Q
●
Filter Query (FQ) doesn't
involve very complex
document scoring
q=description:Potter
fq=type:book
Questions?
Thank you!

More Related Content

What's hot

How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
lucenerevolution
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
JSGB
 

What's hot (20)

How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environment
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 

Viewers also liked

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 

Viewers also liked (17)

Hackathon - building and extending OpenJDK
Hackathon - building and extending OpenJDKHackathon - building and extending OpenJDK
Hackathon - building and extending OpenJDK
 
Gc algorithms
Gc algorithmsGc algorithms
Gc algorithms
 
Java memory model
Java memory modelJava memory model
Java memory model
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
SolrCloud and Shard Splitting
SolrCloud and Shard SplittingSolrCloud and Shard Splitting
SolrCloud and Shard Splitting
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update Chain
 

Similar to Apache SolrCloud

[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
James Chen
 

Similar to Apache SolrCloud (20)

Small wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSmall wins in a small time with Apache Solr
Small wins in a small time with Apache Solr
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Webinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsWebinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data Analytics
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
 
Falcon feedenhancement
Falcon feedenhancementFalcon feedenhancement
Falcon feedenhancement
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Akka Cluster in Production
Akka Cluster in ProductionAkka Cluster in Production
Akka Cluster in Production
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSource
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talk
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Apache SolrCloud

Editor's Notes

  1. Przedstawić się, jesteśmy z PayU (Allegro group). Chcemy opowiedzieć o naszych krótkich doświadczeniach z Solr. Opowiemy o technicznych szczegółach tego fajnego serwera. Mieliśmy problem z czasami wyszukiwania, dlatego zainteresowaliśmy się tą technologią. Może być ciekawe bo zastosowaliśmy SolrCloud (stąd nazwa prezentacji). Pytania prosimy na końcu.
  2. Sklepy internetowe empik Wyszkiwarki Częściowe zastąpienie bazy danych wyszukiwanie plików po treści wyszukiwanie osób
  3. Omówić komponenty Solr Komunikacja
  4. Omówić atrybuty pól type i stored – wielkość indeksu Przykład z życia: setki milionów dokumentów I zwracamy tylko id, ktore pozniej wyciagamy po indeksie z bazy
  5. TextField – analizery (StandardTokenizer – whitespace, dots, itd.) i filtry (lowercasefilter) w czasie indeksowania i zapytania CurrencyField – zamiana wartości w trakcie zapytania
  6. Jak to się dzieję, że możemy odpytać solara o dokumenty? Dodatkowo można powiedzieć, że istnieje możliwość zaimplementowania własnego importu.
  7. Pełny import Import przyrostowy Dodatkowwo robimy full zamiast delta import, bo delta robi selecta per dokument
  8. Search component – może podkreślać wyszukiwane słowa kluczowe; zwracać dodatkowe informacje jak np. Ilość słów w polu Request handlers – podpinamy komponenty pod odpowiednią ścieżkę URL (endpoint) Process factory – przy indeksowaniu można za jego pomocą dodać nowe pola, zmieniać je itd.
  9. Dotyczy to tylko jednej tabeli bazodanowej Jest to stosunkowo duże wdrożenie.
  10. Można wówczas bardzo łatwo przełączać się pomiędzy bazą danych, a Solr
  11. Mogą być logiczne I fizyczne instancje
  12. Shardy są logicznym podziałem indeksu. W szczególności mogą być fizycznym podziałem.
  13. Jest jeszcze dostępny custom hashing
  14. Field value cache dla facetingu Query result cache trzyma posortowane id&apos;ki Document cache trzyma pelne dokumenty Lucene (enableLazyLoading – ref dla pol I potem dociaga – na podstawie fl)
  15. LRUCache - synchronized LinkedHashMap FastLRUCache - ConcurrentHashMap Omówić opcje cache&apos;a
  16. First searcher New searcher Kiedy jest otwierany nowy searcher