Lucene KV-Store

2,638 views

Published on

A fast Key-Value store for large datasets

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,638
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Lucene KV-Store

  1. 1. Lucene KV-StoreA high-performance key-value store Mark Harwood
  2. 2. BenefitsHigh-speed reads and writes of key/value pairssustained over growing volumes of dataRead costs are always 0 or 1 disk seekEfficient use of memorySimple file structures with strong durabilityguarantees
  3. 3. Why “Lucene” KV store?Uses Lucene’s “Directory” APIs for low-level fileaccessBased on Lucene’s concepts of segmentfiles, soft deletes, background merges, commitpoints etc BUT a fundamentally different form ofindexI’d like to offer it to the Lucene community as a“contrib” module because they have a trackrecord in optimizing these same concepts (andcould potentially make use of it in Lucene?)
  4. 4. Example benchmark resultsNote, regular Lucene search indexes follow the same trajectory of the“Common KV Store” when it comes to lookups on a store with millionsof keys
  5. 5. KV-Store High-level DesignMap Key hash (int) Disk pointer (int)held in 23434 0RAM 6545463 10 874382 22 Num keys Key 1 Key 1 Value 1 Value 1 Key/values 2,3,4… with hash size (byte [ ]) size (byte[ ]) (VInt) (VInt) (Vint)Disk 1 3 Foo 3 Bar 2 5 Hello 5 World 7,Bonjour,8,Le Mon.. Most hashes have only one associated key and value Some hashes will have key collisions requiring the use of extra columns here
  6. 6. Read logic (pseudo code)int keyHash=hash(searchKey);int filePointer=ramMap.get(keyHash); There is aif filePointer is null guaranteed maximum of one return null for value; random disk seekfile.seek(filePointer); for any lookupint numKeysWithHash=file.readInt()for numKeysWithHash With a good{ hashing function most lookups will storedKey=file.readKeyData(); only need to go if(storedKey==searchKey) once around this return file.readValueData(); loop file.readValueData();}
  7. 7. Write logic (pseudo code) Updates willint keyHash=hash(newKey); always append toint oldFilePointer=ramMap.get(keyHash); the end of theramMap.put(keyHash,file.length()); file, leaving olderif oldFilePointer is null values{ unreferenced file.append(1);//only 1 key with hash file.append(newKey); file.append(newValue); In case of any key}else collisions, previou{ sly stored values file.seek(oldFilePointer); are copied to the int numOldKeys=file.readInt(); new position at Map tmpMap=file.readNextNKeysAndValues(numOldKeys); the end of the file tmpMap.put(newKey,newValue); along with the file.append(tmpMap.size()); new content file.appendKeysAndValues(tmpMap);}
  8. 8. Segment generations: writes Hash Pointer Hash Pointer Hash Pointer Hash PointerMaps held 23434 0 203765 0 23434 0 15243 0 Writes append to in RAM 65463 10 37594 10 65463 10 3 the end of the 74229 10 latest generation … … … … … … 7 … … segment until it reaches a set Key and size then it is 3value disk made read-only 0 1 2 and new stores segment is old created. new
  9. 9. Segment generations: readsMaps held Hash Pointer Hash Pointer Hash Pointer Hash Pointer Read operations 23434 0 203765 0 23434 0 15243 0 search memory in RAM 65463 10 37594 10 65463 10 3 74229 10 maps in reverse … … … … … … 7 order. The first … … map found with a hash is expected Key and 3 to have a pointervalue disk 0 1 2 into its associated stores file for all the latest keys/values with old new this hash
  10. 10. Segment generations: merges Hash Pointer Hash Pointer Hash Pointer Hash PointerMaps held 23434 0 20376 0 23434 0 15243 0 in RAM 65463 10 5 65463 10 3 37594 10 74229 10 … … … … 7 … … … … Key and 3value disk 0 1 2 stores A background thread merges read-only segments with many 4 outdated entries into new, more compact versions
  11. 11. Segment generations: durability Hash Pointer Hash Pointer Hash PointerMaps held 23434 0 203765 0 152433 0 in RAM 65463 10 37594 10 742297 10 … … … … … … Key and 3value disk 0 4 stores Completed 0,4 Segment IDs Active 3 Like Lucene, commit Segment ID operations create a new Active 423423 segment generation of a “segments” committed length file, the contents of which reflect the committed (i.e. fsync’ed state of the store.)
  12. 12. Implementation detailsJVM needs sufficient RAM for 2 ints for every active key(note: using “modulo N” on the hash can reduce RAM maxto Nx2 ints at the cost of more key collisions = more diskIO)Uses Lucene Directory for Abstraction from choice of file system Buffered reads/writes Support for Vint encoding of numbers Rate-limited merge operationsBorrows successful Lucene concepts: Multiple segments flushed then made read-only. “Segments” file used to list committed content (could potentially support multiple commit points) Background mergesUses LGPL “Trove” for maps of primitives

×