Activate '18 talk: Lucene and Solr 8 will be released near the end of 2018. This session will explore new features, performance improvements and breaking changes.
Cross Datacenter Replication aka CDCR has been a long requested feature in Apache Solr. In this talk, we will discuss CDCR as released in Apache Solr 6.0 and beyond to understand its use-cases, limitations, setup and performance. We will also take a quick look at the future enhancements that can further simplify and scale this feature.
- Solr 7.0 introduces new autoscaling capabilities including autoscaling policies and preferences to define the desired state of the cluster, and APIs to manage autoscaling.
- Triggers are added in Solr 7.1 to activate autoscaling when nodes join or leave the cluster to rebalance replicas according to policies.
- Collection APIs now use autoscaling policies and preferences to determine optimal replica placement. Future work will add more triggers and actions for autoscaling.
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Lucidworks
This document discusses using Jenkins to create a continuous delivery pipeline for Apache Solr. It describes packaging and deploying Solr configurations through the pipeline. Key steps include building Solr packages, deploying Solr to stage environments, and deploying Solr configurations from version control. The pipeline allows predictable, routine deployments and reduces work-in-progress through automation.
This document provides information about integrating Apache Solr and Apache Spark. It discusses using Solr as a data source and sink for Spark applications, including indexing data from Spark jobs into Solr in real-time and exposing Solr query results as Spark RDDs. The document also summarizes the Spark Streaming and RDD APIs and provides code examples for indexing tweets from Spark Streaming into Solr and reading from Solr into a DataFrame.
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
My presentation focuses on how we implemented Solr 4 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 620M documents and is growing by 3 to 4 million documents per day. My presentation will include details about: 1) Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper, 2) Operations concerns like how to handle a failed node and monitoring, 3) How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput, 4) Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is scalable, stable, and is production ready.
Solr Exchange: Introduction to SolrCloudthelabdude
SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will provide an architectural overview of SolrCloud and highlight its most important features. Specifically, Tim covers topics such as: sharding, replication, ZooKeeper fundamentals, leaders/replicas, and failure/recovery scenarios. Any discussion of a complex distributed system would not be complete without a discussion of the CAP theorem. Mr. Potter will describe why Solr is considered a CP system and how that impacts the design of a search application.
SolrCloud uses Zookeeper to elect a leader node for each shard. The leader coordinates write requests to ensure consistency. When the leader dies, Zookeeper detects this and elects a new leader based on the nodes' sequence numbers registered with Zookeeper. The new leader syncs updates with replicas and can replay logs if any replicas are too far behind. This allows write requests to continue being served with high availability despite leader failures.
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMLucidworks
This document discusses building and running Solr as a service in the cloud. It covers:
- The challenges of deploying Solr in cloud environments and the need for a managed service.
- The architecture of the Solr-as-a-Service, which uses Docker, Mesos, and other tools to provide multi-tenant Solr clusters.
- Key aspects of managing Solr clusters in the cloud service, including software upgrades, resizing clusters, handling replicas, and balancing clusters.
Cross Datacenter Replication aka CDCR has been a long requested feature in Apache Solr. In this talk, we will discuss CDCR as released in Apache Solr 6.0 and beyond to understand its use-cases, limitations, setup and performance. We will also take a quick look at the future enhancements that can further simplify and scale this feature.
- Solr 7.0 introduces new autoscaling capabilities including autoscaling policies and preferences to define the desired state of the cluster, and APIs to manage autoscaling.
- Triggers are added in Solr 7.1 to activate autoscaling when nodes join or leave the cluster to rebalance replicas according to policies.
- Collection APIs now use autoscaling policies and preferences to determine optimal replica placement. Future work will add more triggers and actions for autoscaling.
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Lucidworks
This document discusses using Jenkins to create a continuous delivery pipeline for Apache Solr. It describes packaging and deploying Solr configurations through the pipeline. Key steps include building Solr packages, deploying Solr to stage environments, and deploying Solr configurations from version control. The pipeline allows predictable, routine deployments and reduces work-in-progress through automation.
This document provides information about integrating Apache Solr and Apache Spark. It discusses using Solr as a data source and sink for Spark applications, including indexing data from Spark jobs into Solr in real-time and exposing Solr query results as Spark RDDs. The document also summarizes the Spark Streaming and RDD APIs and provides code examples for indexing tweets from Spark Streaming into Solr and reading from Solr into a DataFrame.
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
My presentation focuses on how we implemented Solr 4 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 620M documents and is growing by 3 to 4 million documents per day. My presentation will include details about: 1) Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper, 2) Operations concerns like how to handle a failed node and monitoring, 3) How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput, 4) Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is scalable, stable, and is production ready.
Solr Exchange: Introduction to SolrCloudthelabdude
SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will provide an architectural overview of SolrCloud and highlight its most important features. Specifically, Tim covers topics such as: sharding, replication, ZooKeeper fundamentals, leaders/replicas, and failure/recovery scenarios. Any discussion of a complex distributed system would not be complete without a discussion of the CAP theorem. Mr. Potter will describe why Solr is considered a CP system and how that impacts the design of a search application.
SolrCloud uses Zookeeper to elect a leader node for each shard. The leader coordinates write requests to ensure consistency. When the leader dies, Zookeeper detects this and elects a new leader based on the nodes' sequence numbers registered with Zookeeper. The new leader syncs updates with replicas and can replay logs if any replicas are too far behind. This allows write requests to continue being served with high availability despite leader failures.
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMLucidworks
This document discusses building and running Solr as a service in the cloud. It covers:
- The challenges of deploying Solr in cloud environments and the need for a managed service.
- The architecture of the Solr-as-a-Service, which uses Docker, Mesos, and other tools to provide multi-tenant Solr clusters.
- Key aspects of managing Solr clusters in the cloud service, including software upgrades, resizing clusters, handling replicas, and balancing clusters.
This document provides an overview of a workshop on Lucene performance given by Lucid Imagination, Inc. It discusses common Lucene performance issues, introduces Lucid Gaze for Lucene (LG4L) as a tool for monitoring Lucene performance statistics and examples of using it to analyze indexing and search performance. LG4L provides statistics on indexing, analysis, searching and storage through logs, a persistent database and an API. It can help identify causes of poor performance and was shown to have low overhead.
Apache Solr: Upgrading Your Upgrade Experience - Hrishikesh Gadre, LucidworksLucidworks
This document discusses upgrading Apache Solr. It describes the challenges of Solr upgrades and how to prepare for upgrades. Key steps include understanding release notes, having a rollback strategy, migrating configurations and data to be compatible, and identifying changes needed in custom components. The document demonstrates a Solr configuration migration tool that helps identify incompatible configurations and automatically fixes some issues to assist with the upgrade process.
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Lucidworks
This document discusses using Apache Solr with Apache Spark for faster data analytics. It provides examples of using Solr's SQL capabilities through Fusion SQL to perform queries and aggregations over data in Solr. These include count distinct queries, joins, full text search, and queries over time partitioned collections. It also discusses how Spark SQL can integrate with Solr as a data source and how to plug in custom query strategies.
This document provides an overview of SolrCloud on Hadoop. It discusses how SolrCloud allows for distributed, highly scalable search capabilities on Hadoop clusters. Key components that work with SolrCloud are also summarized, including HDFS for storage, MapReduce for processing, and ZooKeeper for coordination services. The document demonstrates how SolrCloud can index and query large datasets stored in Hadoop.
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
Building real-time notification systems is often limited to basic filtering and pattern matching against incoming records. Allowing users to query incoming documents using Solr's full range of capabilities is much more powerful. In our environment we needed a way to allow for tens of thousands of such query subscriptions, meaning we needed to find a way to distribute the query processing in the cloud. By creating in-memory Lucene indices from our Solr configuration, we were able to parallelize our queries across our cluster. To achieve this distribution, we wrapped the processing in a Storm topology to provide a flexible way to scale and manage our infrastructure. This presentation will describe our experiences creating this distributed, real-time inverted search notification framework.
Solr Metrics - Andrzej Białecki, LucidworksLucidworks
Solr metrics allow monitoring of Solr system behavior and performance optimization. Key aspects include:
- Solr collects metrics using the Dropwizard Metrics framework, organizing data into registries by component like JVM, core, query handler, etc.
- Metrics include counters, timers, histograms and gauges capturing values like request rates, cache sizes, commit times.
- The /admin/metrics handler provides access to metrics from all registries through flexible selection criteria.
- Configuration in solr.xml specifies metric reporters like JMX, Ganglia, Graphite to export data to external systems.
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will demonstrate how to provision, configure, and manage a SolrCloud cluster in Amazon EC2, using a Fabric/boto based solution for automating SolrCloud operations. Attendees will come away with a solid understanding of how to operate a large-scale Solr cluster, as well as tools to help them do it. Tim will also demonstrate these tools live during his presentation. Covered technologies, include: Apache Solr, Apache ZooKeeper, Linux, Python, Fabric, boto, Apache Kafka, Apache JMeter.
These slides were presented at the Great Indian Developer Summit 2014 at Bangalore. See http://www.developermarch.com/developersummit/session.html?insert=ShalinMangar2
"SolrCloud" is the name given to Apache Solr's feature set for fault tolerant, highly available, and massively scalable capabilities. SolrCloud has enabled organizations to scale, impressively, into the billions of documents with sub-second search!
Building a near real time search engine & analytics for logs using solrlucenerevolution
Presented by Rahul Jain, System Analyst (Software Engineer), IVY Comptech Pvt Ltd
Consolidation and Indexing of logs to search them in real time poses an array of challenges when you have hundreds of servers producing terabytes of logs every day. Since the log events mostly have a small size of around 200 bytes to few KBs, makes it more difficult to handle because lesser the size of a log event, more the number of documents to index. In this session, we will discuss the challenges faced by us and solutions developed to overcome them. The list of items that will be covered in the talk are as follows.
Methods to collect logs in real time.
How Lucene was tuned to achieve an indexing rate of 1 GB in 46 seconds
Tips and techniques incorporated/used to manage distributed index generation and search on multiple shards
How choosing a layer based partition strategy helped us to bring down the search response times.
Log analysis and generation of analytics using Solr.
Design and architecture used to build the search platform.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
The next major release of Solr is right around the corner! Join Solr Committer Cassandra Targett and Lucidworks SVP of Engineering Trey Grainger for a first look into what’s included in the upcoming release.
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we've scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth.
In practice, it's common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you'll learn about new features in Solr to help manage large-scale clusters. Specifically, we'll cover data partitioning and shard splitting.
Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We'll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution.
Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready.
Presented by Mark Miller, Software Developer, Cloudera
Apache Lucene/Solr committer Mark Miller talks about how Solr has been integrated into the Hadoop ecosystem to provide full text search at "Big Data" scale. This talk will give an overview of how Cloudera has tackled integrating Solr into the Hadoop ecosystem and highlights some of the design decisions and future plans. Learn how Solr is getting 'cozy' with Hadoop, which contributions are going to what project, and how you can take advantage of these integrations to use Solr efficiently at "Big Data" scale. Learn how you can run Solr directly on HDFS, build indexes with Map/Reduce, load Solr via Flume in 'Near Realtime' and much more.
Search-time Parallelism: Presented by Shikhar Bhushan, EtsyLucidworks
This document discusses techniques for parallelizing search in Lucene/Solr, including segment-level parallelism and refactoring the Collector API to support parallel collection. The proposed changes would allow indexing and collection to be parallelized across segments within a single index, improving performance without requiring sharding. Test results show speedups for most queries when using these techniques compared to a serial search. While sharding remains important for large datasets, these changes could provide parallelism benefits without the complexity of a distributed setup.
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
The document discusses benchmarking the performance of SolrCloud clusters. It describes Timothy Potter's experience operating a large SolrCloud cluster at Dachis Group. It outlines an methodology for benchmarking indexing performance by varying the number of servers, shards, and replicas. Results show near-linear scalability as nodes are added. The document also introduces the Solr Scale Toolkit for deploying and managing SolrCloud clusters using Python and AWS. It demonstrates integrating Solr with tools like Logstash and Kibana for log aggregation and dashboards.
This document provides tips for tuning Solr for high performance. It discusses optimizing queries and facets for CPU usage, tuning memory usage such as using docValues, optimizing disk usage through merge policies and commit settings, reducing network overhead through batching and caching, and techniques like deep paging to improve performance for large result sets. The document emphasizes only indexing and retrieving necessary fields to reduce resource usage and tuning garbage collection to avoid pauses.
Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr.
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
The document discusses troubleshooting techniques for Solr, including using a "TreeMap" process of establishing boundaries, splitting the problem, identifying relevant parts, zooming in, and re-formulating boundaries in an iterative process until the problem is fixed. It outlines how to establish boundaries by defining the identity, location, timing, and magnitude of the problem. Examples are provided for indexing and searching processes in Solr.
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLucidworks
The document provides a deep dive into the lifecycle of a Solr search request, from the initial HTTP request to the generation of the response. It describes each stage of processing, including how the request is routed through the Solr core, how the query and filters are parsed and executed against the index, how various caches and plugins can be leveraged, and how the final response is generated. It uses examples of simple and more complex queries to demonstrate how each component interacts throughout the processing pipeline.
This document discusses reindexing large repositories in Alfresco. It covers the Alfresco SOLR architecture, the indexing process, scenarios that require reindexing, alternatives for deployment during reindexing to minimize downtime, monitoring and profiling tools, and future improvements planned for Search Services 2.0 to optimize indexing performance. Benchmark results are presented showing improvements that reduced reindexing time for 1.2 billion documents from 21 days to 10 days.
Solr 4.7 and 4.8 include new features such as asynchronous execution of long-running actions, cursors for deep paging, document expiration, dynamic synonyms and stopwords, SSL support in SolrCloud, and improved collections API. Future versions will focus on ZooKeeper as the single source of truth, incremental field updates, multi-valued DocValues sorting, and removing legacy field types. The speaker also discussed related open source projects from LucidWorks for deploying Solr on AWS, log processing, and data quality.
This document provides an overview of a workshop on Lucene performance given by Lucid Imagination, Inc. It discusses common Lucene performance issues, introduces Lucid Gaze for Lucene (LG4L) as a tool for monitoring Lucene performance statistics and examples of using it to analyze indexing and search performance. LG4L provides statistics on indexing, analysis, searching and storage through logs, a persistent database and an API. It can help identify causes of poor performance and was shown to have low overhead.
Apache Solr: Upgrading Your Upgrade Experience - Hrishikesh Gadre, LucidworksLucidworks
This document discusses upgrading Apache Solr. It describes the challenges of Solr upgrades and how to prepare for upgrades. Key steps include understanding release notes, having a rollback strategy, migrating configurations and data to be compatible, and identifying changes needed in custom components. The document demonstrates a Solr configuration migration tool that helps identify incompatible configurations and automatically fixes some issues to assist with the upgrade process.
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Lucidworks
This document discusses using Apache Solr with Apache Spark for faster data analytics. It provides examples of using Solr's SQL capabilities through Fusion SQL to perform queries and aggregations over data in Solr. These include count distinct queries, joins, full text search, and queries over time partitioned collections. It also discusses how Spark SQL can integrate with Solr as a data source and how to plug in custom query strategies.
This document provides an overview of SolrCloud on Hadoop. It discusses how SolrCloud allows for distributed, highly scalable search capabilities on Hadoop clusters. Key components that work with SolrCloud are also summarized, including HDFS for storage, MapReduce for processing, and ZooKeeper for coordination services. The document demonstrates how SolrCloud can index and query large datasets stored in Hadoop.
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
Building real-time notification systems is often limited to basic filtering and pattern matching against incoming records. Allowing users to query incoming documents using Solr's full range of capabilities is much more powerful. In our environment we needed a way to allow for tens of thousands of such query subscriptions, meaning we needed to find a way to distribute the query processing in the cloud. By creating in-memory Lucene indices from our Solr configuration, we were able to parallelize our queries across our cluster. To achieve this distribution, we wrapped the processing in a Storm topology to provide a flexible way to scale and manage our infrastructure. This presentation will describe our experiences creating this distributed, real-time inverted search notification framework.
Solr Metrics - Andrzej Białecki, LucidworksLucidworks
Solr metrics allow monitoring of Solr system behavior and performance optimization. Key aspects include:
- Solr collects metrics using the Dropwizard Metrics framework, organizing data into registries by component like JVM, core, query handler, etc.
- Metrics include counters, timers, histograms and gauges capturing values like request rates, cache sizes, commit times.
- The /admin/metrics handler provides access to metrics from all registries through flexible selection criteria.
- Configuration in solr.xml specifies metric reporters like JMX, Ganglia, Graphite to export data to external systems.
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will demonstrate how to provision, configure, and manage a SolrCloud cluster in Amazon EC2, using a Fabric/boto based solution for automating SolrCloud operations. Attendees will come away with a solid understanding of how to operate a large-scale Solr cluster, as well as tools to help them do it. Tim will also demonstrate these tools live during his presentation. Covered technologies, include: Apache Solr, Apache ZooKeeper, Linux, Python, Fabric, boto, Apache Kafka, Apache JMeter.
These slides were presented at the Great Indian Developer Summit 2014 at Bangalore. See http://www.developermarch.com/developersummit/session.html?insert=ShalinMangar2
"SolrCloud" is the name given to Apache Solr's feature set for fault tolerant, highly available, and massively scalable capabilities. SolrCloud has enabled organizations to scale, impressively, into the billions of documents with sub-second search!
Building a near real time search engine & analytics for logs using solrlucenerevolution
Presented by Rahul Jain, System Analyst (Software Engineer), IVY Comptech Pvt Ltd
Consolidation and Indexing of logs to search them in real time poses an array of challenges when you have hundreds of servers producing terabytes of logs every day. Since the log events mostly have a small size of around 200 bytes to few KBs, makes it more difficult to handle because lesser the size of a log event, more the number of documents to index. In this session, we will discuss the challenges faced by us and solutions developed to overcome them. The list of items that will be covered in the talk are as follows.
Methods to collect logs in real time.
How Lucene was tuned to achieve an indexing rate of 1 GB in 46 seconds
Tips and techniques incorporated/used to manage distributed index generation and search on multiple shards
How choosing a layer based partition strategy helped us to bring down the search response times.
Log analysis and generation of analytics using Solr.
Design and architecture used to build the search platform.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
The next major release of Solr is right around the corner! Join Solr Committer Cassandra Targett and Lucidworks SVP of Engineering Trey Grainger for a first look into what’s included in the upcoming release.
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we've scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth.
In practice, it's common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you'll learn about new features in Solr to help manage large-scale clusters. Specifically, we'll cover data partitioning and shard splitting.
Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We'll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution.
Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready.
Presented by Mark Miller, Software Developer, Cloudera
Apache Lucene/Solr committer Mark Miller talks about how Solr has been integrated into the Hadoop ecosystem to provide full text search at "Big Data" scale. This talk will give an overview of how Cloudera has tackled integrating Solr into the Hadoop ecosystem and highlights some of the design decisions and future plans. Learn how Solr is getting 'cozy' with Hadoop, which contributions are going to what project, and how you can take advantage of these integrations to use Solr efficiently at "Big Data" scale. Learn how you can run Solr directly on HDFS, build indexes with Map/Reduce, load Solr via Flume in 'Near Realtime' and much more.
Search-time Parallelism: Presented by Shikhar Bhushan, EtsyLucidworks
This document discusses techniques for parallelizing search in Lucene/Solr, including segment-level parallelism and refactoring the Collector API to support parallel collection. The proposed changes would allow indexing and collection to be parallelized across segments within a single index, improving performance without requiring sharding. Test results show speedups for most queries when using these techniques compared to a serial search. While sharding remains important for large datasets, these changes could provide parallelism benefits without the complexity of a distributed setup.
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
The document discusses benchmarking the performance of SolrCloud clusters. It describes Timothy Potter's experience operating a large SolrCloud cluster at Dachis Group. It outlines an methodology for benchmarking indexing performance by varying the number of servers, shards, and replicas. Results show near-linear scalability as nodes are added. The document also introduces the Solr Scale Toolkit for deploying and managing SolrCloud clusters using Python and AWS. It demonstrates integrating Solr with tools like Logstash and Kibana for log aggregation and dashboards.
This document provides tips for tuning Solr for high performance. It discusses optimizing queries and facets for CPU usage, tuning memory usage such as using docValues, optimizing disk usage through merge policies and commit settings, reducing network overhead through batching and caching, and techniques like deep paging to improve performance for large result sets. The document emphasizes only indexing and retrieving necessary fields to reduce resource usage and tuning garbage collection to avoid pauses.
Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr.
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
The document discusses troubleshooting techniques for Solr, including using a "TreeMap" process of establishing boundaries, splitting the problem, identifying relevant parts, zooming in, and re-formulating boundaries in an iterative process until the problem is fixed. It outlines how to establish boundaries by defining the identity, location, timing, and magnitude of the problem. Examples are provided for indexing and searching processes in Solr.
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLucidworks
The document provides a deep dive into the lifecycle of a Solr search request, from the initial HTTP request to the generation of the response. It describes each stage of processing, including how the request is routed through the Solr core, how the query and filters are parsed and executed against the index, how various caches and plugins can be leveraged, and how the final response is generated. It uses examples of simple and more complex queries to demonstrate how each component interacts throughout the processing pipeline.
This document discusses reindexing large repositories in Alfresco. It covers the Alfresco SOLR architecture, the indexing process, scenarios that require reindexing, alternatives for deployment during reindexing to minimize downtime, monitoring and profiling tools, and future improvements planned for Search Services 2.0 to optimize indexing performance. Benchmark results are presented showing improvements that reduced reindexing time for 1.2 billion documents from 21 days to 10 days.
Solr 4.7 and 4.8 include new features such as asynchronous execution of long-running actions, cursors for deep paging, document expiration, dynamic synonyms and stopwords, SSL support in SolrCloud, and improved collections API. Future versions will focus on ZooKeeper as the single source of truth, incremental field updates, multi-valued DocValues sorting, and removing legacy field types. The speaker also discussed related open source projects from LucidWorks for deploying Solr on AWS, log processing, and data quality.
This document discusses deploying and managing Apache Solr at scale. It introduces the Solr Scale Toolkit, an open source tool for deploying and managing SolrCloud clusters in cloud environments like AWS. The toolkit uses Python tools like Fabric to provision machines, deploy ZooKeeper ensembles, configure and start SolrCloud clusters. It also supports benchmark testing and system monitoring. The document demonstrates using the toolkit and discusses lessons learned around indexing and query performance at scale.
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
This document discusses scaling SolrCloud to support large numbers of document collections. It begins by introducing SolrCloud and some of its key capabilities and terminology. It then describes four problems that can arise at large scale: high cluster state load, overseer performance issues, inflexible data management, and limitations with data export. For each problem, solutions are proposed that were implemented in Apache Solr to improve scalability, such as splitting the cluster state, optimizing the overseer, enabling more flexible data splitting and migration, and allowing distributed deep paging exports. The document concludes by describing efforts to test SolrCloud at massive scale through automated tools and cloud infrastructure.
Anshum Gupta is an Apache Lucene/Solr committer and Lucidworks employee with over 9 years of experience in search and related technologies. He has been involved with Apache Lucene since 2006 and Apache Solr since 2010, focusing on contributions, releases, and communities around Solr. The document then provides an overview of the major new features and improvements in Apache Solr 4.10, including ease of use enhancements, distributed pivot faceting, core, SolrCloud, and development tool updates.
Erik Hatcher presented on Solr and Lucene. He discussed what Solr is, how it is built on Apache Lucene, and how it provides a search server with features like scalability, fast performance, and extensibility. He provided examples of starting Solr, indexing and searching documents, and the various configuration files and components used.
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
This document discusses scaling SolrCloud to support a large number of collections. It identifies four main problems in scaling: 1) large cluster state size, 2) overseer performance issues with thousands of collections, 3) difficulty moving data between collections, and 4) limitations in exporting full result sets. The document outlines solutions implemented to each problem, including splitting the cluster state, optimizing the overseer, improving data management between collections, and enabling distributed deep paging to export full result sets. Testing showed the ability to support 30 hosts, 120 nodes, 1000 collections, over 6 billion documents, and sustained performance targets.
This document provides an introduction to SolrCloud, which enables horizontal scaling of a Solr search index using sharding and replication. Key terminology is defined, including ZooKeeper, nodes, collections, shards, replicas, and leaders. The document outlines the high-level SolrCloud architecture and discusses features like sharding, document routing, replication, distributed indexing and querying. Challenges around consistency and availability are also covered.
This document discusses building distributed search applications using Apache Solr. It provides an agenda that covers topics such as Solr architecture, schema configuration, indexing data, querying, SolrCloud, and performance factors. It also references a demo app that will be used for hands-on examples during the presentation.
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
Jan Høydahl presented what is new in Solr 4.0 including near real-time search capabilities, SolrCloud for distributed search across multiple cores, an improved spellchecker, smaller indexes using Flex, pluggable ranking, new sorting functions, and an updated admin GUI. Some key features being added in Solr 4.0 are support for Apache ZooKeeper, auto load balancing of queries across collections, and fault tolerant indexing.
New Persistence Features in Spring Roo 1.1Stefan Schmidt
Spring Roo 1.1 introduces several new persistence features including incremental database reverse engineering, integration with Apache Solr/Lucene for search, and support for alternative persistence technologies through add-ons. It allows generating JPA entity classes from live databases and automatically updating them as the schema changes. Entities can be made searchable with Solr with just one command. The Hades add-on also provides an alternative data access layer option with a repository pattern.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
This document discusses building distributed search applications using Apache Solr. It provides an overview of Solr architecture and components like schema, indexing, querying etc. It also describes hands-on activities to index sample data from disk, database using Data Import Handler and SolrJ client. Query syntax for different types of queries and configuration of search handlers is also covered.
We're talking about serious log crunching and intelligence gathering with Elastic, Logstash, and Kibana.
ELK is an end-to-end stack for gathering structured and unstructured data from servers. It delivers insights in real time using the Kibana dashboard giving unprecedented horizontal visibility. The visualization and search tools will make your day-to-day hunting a breeze.
During this brief walkthrough of the setup, configuration, and use of the toolset, we will show you how to find the trees from the forest in today's modern cloud environments and beyond.
The current trends to work in Agile and DevOps are challenging for database developers. Source control is a standard for non-database code but it’s a challenge for databases. This talk has an ambition to change that situation and help developers and DBA take over control of source code and data.
Solr 3.1 includes many new features and improvements such as range faceting on numeric fields, geospatial search enhancements, JSON document indexing, autosuggest and spellcheck components, analysis filter improvements, and distributed support for additional components. Major components include Apache Lucene 3.1.0, Apache Tika 0.8, Carrot2 3.4.2, Velocity 1.6.1 and Velocity Tools 2.0-beta3, and Apache UIMA 2.3.1-SNAPSHOT.
The document discusses benchmarking the performance of Apache Solr. It describes testing the indexing performance of SolrCloud clusters of varying sizes. The results show that indexing performance scales nearly linearly as nodes are added. It also discusses using the Solr Scale Toolkit, which is a set of tools for deploying, managing, and benchmarking SolrCloud clusters. Future work mentioned includes benchmarking mixed workloads and integrating chaos monkey tests.
Similar to Lucene/Solr 8: The next major release (20)
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
4. 7.X
1. Metrics
2. Autoscaling
3. CDCR
4. Time Routed Aliases
5. Replica types
6. Streaming expressions
7. JSON facet API
8. Configset / schema
9. Text Analysis / ML
10. Collections API
11. Queries
12. Large index segment
merging
13. Replication / recovery /
rolling updates
14. Block-join / nested docs
15. Miscellaneous
5. 7.X: Metrics
• Continuation of 6.X work to support Autoscaling efforts
• 7.0: - Aggregated metrics collected in overseer
- solrconfig.xml <jmx> ➞ solr.xml <metrics><reporter>
• 7.1: Prometheus metrics exporter contrib
• 7.4: /admin/metrics/history API: basic long-term key metric
time series aggregation
• Fixed-width windows at
several resolutions
• Not yet in Admin UI:
SOLR-12426
6. 7.X: Autoscaling
• 7.0: - Preferences and policy DSL: flexible replica placement
[ { minimize: cores }, { maximize: freedisk } ]
{ replica: "<2", shard: "#EACH", node: "#ANY" }
- Diagnostics API: return sorted nodes, policy violations
• 7.1: - autoAddReplicas ported to autoscaling framework
- Add/remove/suspend/resume triggers and listeners
- Triggers for added and lost nodes
- ComputePlanAction / ExecutePlanAction
- /autoscaling/history API: cluster events and actions
• 7.2: - Search rate trigger
- /autoscaling/suggestions API
- UTILIZENODE collections API command
8. 7.X: Autoscaling
• 7.4: - Periodic house-keeping task: cleans up inactive shards
- Index size trigger: document count or size in bytes
• 7.5: - Policy replica attribute: #ALL, #EQUAL, percentage,
range, and floating point values
- Policy cores attribute: #EQUAL, percentage,
range, and floating point values
- Percentage in freedisk policy attribute
- Simulation framework: test scaling up to 1 billion docs
9. 7.X: Cross Data Center Replication
• 7.2: Support bi-directional syncing of CDCR clusters
This is not
active-active,
but rather
passive-active
or active-passive:
only one active
cluster at a time.
10. 7.X: Time Routed Aliases
• 7.3: - Specialization of Solr’s collection alias feature
- Support time series data, e.g. logs / sensor data
- Maintain performance under continuous indexing
- CREATEALIAS: start, interval, retention policy
- Automatically create new collections
- Automatically delete old collections (optional)
- Route updates based on timestamp
- Search against all aliased collections*
• 7.5: Preemptively create the next collection when updates
are near the latest collection’s end date (optional)
* Pending optimization: minimize queried collections (SOLR-9562)
11. 7.X: Replica types
• 7.0:
• 7.4: Query param to prioritize replicas by type, e.g.
shards.preference=replica.type:PULL,replica.type:TLOG
Type
Indexes
locally
Supports
soft
commit
& RTG
Pulls
segments
from
leader
Writes to
TLog
Can
become
shard
leader
Queryable
NRT ✅ ✅ ✅ ✅ ✅
TLOG leader ✅ ✅ ✅ ✅ ✅
TLOG ✅ ✅ ✅ ✅
PULL ✅ ✅
12. 7.X: Streaming expressions
• Parallel computation function suite
• Some use cases: MapReduce, aggregations, parallel SQL, pub/
sub messaging, graph traversal, machine learning, statistical
programming
• Each 7.X release has added
many new functions
• 7.5: Ref guide:
Math Expressions User Guide
13. 7.X: JSON Facet API
• 7.0: Terms facets: added optional refinement support
• 7.4: Semantic Knowledge Graph support via new
relatedness() aggregate function
• Finds ad-hoc relationships by scoring documents
relative to foreground and background document
sets
• 7.5: Heatmap facet support
15. 7.X: Text analysis / machine learning
• 7.1: Bengali normalizer and stemmer
• 7.2: Enable off-ZooKeeper storage of large (>1MB) LTR models
• 7.3: OpenNLP integration: tokenization, POS tagging, phrase
chunking, lemmatization, NER, language detection
• 7.4: - ProtectedTermFilterFactory: don’t filter protected terms
- TaggerRequestHandler (a.k.a. SolrTextTagger): NER
• 7.5: - "nori" Korean morphological text analysis: "*_txt_ko"
- PhrasesIdentificationComponent: identify and score
candidate query phrases based on index statistics
- UIMA integration removed
16. 7.X: Collections API
• 7.3: Add collection level properties similar to cluster properties
• 7.4: Cluster-wide defaults for numShards, nrtReplicas,
tlogReplicas, pullReplicas
• 7.5: - Support co-locating replicas of two or more collections
together in a node via the withCollection parameter
to the CREATE and MODIFYCOLLECTION commands
- SPLITSHARD: New split method using hard links: splitMethod=link
• 3-5 times faster than the original splitMethod=rewrite
• Slows down replication
• Increases disk usage on replica nodes
18. 7.X: Queries
• 7.2: New synonymQueryStyle field type option: enable
generation of appropriate queries for hierarchical
relations between overlapping terms
• as_same_term (default): SynonymQuery(bird,robin)
• pick_best: Dismax(bird,robin)
• as_distinct_terms: (bird OR robin)
• 7.4: JSON query DSL: Enable query/filter tagging,
e.g. { "#colorfilt" : "color:blue" }
equivalent to local-param {!tag=colorfilt}color:blue
19. 7.X: Large index segment merging
• Problem: Overly large segments (e.g. as a result of force-
merge/optimize) stop being eligible for merging,
and can start accumulating >50% deleted
documents, wasting space and skewing index stats.
• 7.5: - TieredMergePolicy now respects maxSegmentSizeMB
by default when executing force-merge/optimize and
expunge-deletes
- TieredMergePolicy’s reclaimDeletesWeight has been
replaced with a new deletesPctAllowed setting to
control how aggressively deletes should be reclaimed
20. 7.X: Replication/recovery/rolling upgrades
• 7.3: The old Leader-Initiated-Recovery (LIR) implementation
is deprecated and replaced
• To perform a rolling upgrade to Solr 8, you must be on
Solr 7.3 or higher
• 7.4: - IndexFetcher now skips fetching identical files
- Buffering updates are written to a separate TLog
- Parallel replay of buffering TLogs
21. 7.X: Block-join / nested documents
• 7.3: Added filters and excludeTags local-params for
{!parent} and {!child} query parsers, usable for
multi-select faceting
• 7.5: WIP: Allow Solr to more faithfully represent deeply
nested document relationships, rather than requiring
reconstruction based on the flattened list of child docs
returned by Solr
22. 7.X: Miscellaneous
• 7.3: add-distinct atomic updates
• 7.4: - Ignore large document URP
- TLog: maxSize auto hard-commit setting
(in addition to maxDocs & maxTime)
• 7.5: Custom cluster properties allowed with ext. prefix
24. 8.0: Autoscaling
• Suggestions API: rebalance options even if no violations
• Suggestions API: add-replica for lost replicas
• maxOps limit for index size trigger
• Autoscaling policy framework will be the default replica
placement strategy
25. 8.0: Index upgrades
• 7.0: Lucene indexes record the major Lucene version that
created the index, and the minimum Lucene version
that contributed to segments.
• 8.0: Version N-2 or older indexes will now fail to open,
even if they have been merged into an N-1 index.
• IndexUpgrader will not upgrade 6.X or earlier indexes
• Re-indexing will be required to upgrade
26. 8.0: HTTP/2
• May 2018: Mark Miller announced his Star Burst effort:
many cleanups and performance enhancements
• July 2018: Cao Manh Dat took up the HTTP/2 aspects: SOLR-12639
• Indexing test: 33M docs, 1 shard, 2 replicas (SOLR-12642)
• Garbage: Leader: 26% less; replica: 76% less
• Indexing throughput: 54% higher
• CPU time: Leader: 39% higher; replica: 76% lower
• Ready to merge back to master, pending release of
Jetty 9.4.13, containing SPNEGO HTTP/2 implementation
27. 8.0: Miscellaneous
• Lucene: scores must be non-negative
• Function(Score)Query-s convert negative scores to zero
• TODO: remove deprecations
• Trie fields? Removal effectively blocked by:
• SOLR-12074: Add numeric equivalent to StrField
• SOLR-11127: Mechanism to migrate schema
for .system collection (a.k.a. blob store) schema from
Trie (pre-7.0) to Points (7.0+)
29. 8.X: Lucene/Solr minimum JDK
• Oracle will end free JDK 8 support in January 2019
• Both JDK 9 & 10 are already EOL, no more Oracle support
• JDK 11 will very likely be next minimum supported JDK, no
schedule yet
• Under JDK 9+, Solr’s Hadoop-related functionality has
problems, including with Kerberos
• Uwe Schindler’s Jenkins server tests Lucene/Solr on Oracle
9+10+11+12 JDKs
• All have higher Solr test failure rates than on JDK 8
30. 8.X: Luke: UI framework & licensing
• Andrzej Bialecki: Initial implementation: Thinlet, GPL
• Mark Harwood: GWT
• Mark Miller: Apache Pivot
• Dmitry Kan and Tomoko Uchida took ownership on Github
• Tomoko Uchida: JavaFX (bundled w/JDK 8)
• LUCENE-2562: Make Luke a Lucene/Solr Module
• JavaFX/OpenJFX unbundled from Java 11 JDK, GPL+CPE
• Tomoko Uchida: Swing (7.5 release available)
31. 8.X: New Lucene features
• Index impacts, Block-Max WAND, similarity cleanups
• Some queries (especially term queries and disjunctions)
are much faster when number of hits is not required
• FeatureField: incorporate static relevance signals, e.g.
PageRank
• Soft deletes
• Merge policy retains deleted docs according to policy
• Enables document history, e.g. for time-travel indexes
• RAMDirectory replaced by ByteBuffersDirectory