Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS
Past, Present, and Future
Mark Miller, Cloudera
Lucene Committer, Solr Committer.
Works for Cloudera.
A lot of work on Lucene, Solr, and SolrCloud.
A distributed, fault tolerant search engine using Lucene as it’s core search library.
A distributed, fault tolerant filesystem that is part of the Hadoop project.
Solr on HDFS
Wouldn’t it be nice if Solr could run on HDFS.
If you are running other things on HDFS, it simplifies operations.
If you are building indexes with MapReduce, merging them into your cluster becomes
You can do some other kind of cool things when you have are using a shared file
Most attempts in the past have not really caught on.
Solr on HDFS in the Past.
• Apache Blur is one of the more successful marriages of Lucene and HDFS.
• We borrowed some code from them to seed Solr on HDFS.
• Others have copied indexes between local filesystem and HDFS.
• Most people felt that running Lucene or Solr straight on HDFS would be too slow.
How HDFS Writes Data
Remote Remote Remote Remote
Write An attempt is made to make a local copy
and as many remote copies as necessary to
satisfy the replication factor configuration.
Co-Located Solr and HFDS Data Nodes
HDFS HDFS HDFS HDFS
Solr Solr Solr Solr
We recommend that HDFS data nodes and Solr nodes are co-located
so that the default case involves fast, local data.
Non Local Data
• BlockCache is first line of defense, but it’s good to get local data again.
• Optimize is more painful option.
• An HDFS affinity feature could be useful.
• A tool that simply wrote out a copy of the index with no merging might be interesting.
• Fairly simple and straightforward implementation.
• Full support required making the Directory interface a first class citizen in Solr.
• Largest part was making Replication work with non local filesystem directories.
• With large enough ‘buffer’ sizes, works reasonably well as long as the data is local.
• Really needs some kind of cache to be reasonable though.
“The Block Cache”
A replacement for the OS filesystem cache, especially for the case when there is no
Even with local data, making it larger will beneficially reduce HDFS traffic in many
Inside the Block Cache.
Each ByteBuffer of size ‘blockSize’.
Used locations tracked by ‘lock’ bitset.
The Global Block Cache
The initial Block Cache implementation used a separate Block Cache for every unique
index directory used by Solr in HDFS.
There are many limitations around this strategy. It hinders capacity planning, it’s not
very efficient, and it bites you at the worst times.
The Global Block Cache is meant to be a single Block Cache to be used by all
SolrCore’s for every directory.
This makes sizing very simple - determine how much RAM you can spare for the Block
Cache and size it that way once and forget it.
In many average cases, performance looks really good - very comparable to local
filesystem performance, though usually somewhat slower.
In other cases, adjusting various settings for the Block Cache can help with
We have recently found some changes to improve performance.
Tuning the Block Cache
By default, each ‘slab’ is 128 MB. Raise the slab count to increase by 128 MB slabs.
Block Size (8 KB default)
Not originally configurable, but certain use cases appear to work better with 4 KB.
HDFS Transaction Log
We also moved the Transaction Log to HDFS.
Implementation has held up okay, some improvements needed, a large replay
performance issue improved.
The HDFSDirectory and Block Cache have had a much larger impact.
No truncate support in HDFS, so we work around it by replaying the whole log in some
failed recovery cases where local filesystem impl just drops the log.
The autoAddReplicas Feature
A new feature that is currently only available when using a shared filesystem like
The Overseer monitors the cluster state and fires off SolrCore create command
pointing to existing data in HDFS when a node goes down.
At Cloudera, we are building an Enterprise Data Hub.
In our vision, the more that runs on HDFS, the better.
We will continue to improve and push forward HDFS support in SolrCloud.
Block Cache Improvements
Apache Blur has a Block Cache V2.
Uses variable sized blocks.
Optionally uses Unsafe for direct memory management.
The V1 Block Cache has some performance limitations.
* Copying bytes from off heap to IndexInput buffer.
* Concurrent access of the cache.
* Sequential reads have to pull a lot of blocks from the cache.
* Each DirectByteBuffer has some overhead, including a Cleaner object that can affect
GC and add to RAM reqs.
HDFS Only Replication When Using Replicas
Currently, if you want to use SolrCloud replicas, data is replicated both by HDFS and
HDFS replication factor = 1 is not a very good solution.
autoAddReplicas is one possible solution.
We will be working on another solution where only the leader writes to an index in
HDFS while replicas read from it.