Project Voldemort: Big data loading

3,140 views

Published on

Lightning talk about loading big data in voldemort read-only stores.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,140
On SlideShare
0
From Embeds
0
Number of Embeds
454
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Project Voldemort: Big data loading

  1. 1. Big Data Loading:Project Voldemort
  2. 2. Big Data Loading● So youve processed your data...● Now, how to get that to people quickly?● Project Voldemorts Read-Only stores ● Simple key-value store ● Based upon Amazon Dynamo ● Simple Java interface and operation ● Immutable read only stores
  3. 3. Read Only Stores● Precompute in Hadoop or else where● Creates an indexed key-value store ● One reducer (or file) per node ● Replicated data for fail over● Atomically loads into nodes ● Copy from hdfs or other http source ● Very fast, limited by network or storage i/o ● Can throttle so not affecting live services● Can also roll back to previous versions
  4. 4. Example Hadoop Store Builderpublic class JsonStoreBuilder extends AbstractHadoopStoreBuilderMapper<LongWritable, Text>{ JSONParser parser = new JSONParser(); @Override public Object makeKey(LongWritable lineNo, Text line) { JSONObject json = parser.parse(line.toString()); return json.get("name"); } @Override public Object makeValue(LongWritable lineNo, Text line) { return line.toString(); }}
  5. 5. Example Hadoop Job$VOLDEMORT_HOME/bin/hadoop-build-readonly-store.sh --input hdfs/JsonFile.json --output hdfs/StoreOut --tmpdir hdfs/temp_dir --mapper uk.co.danharvey.hadoop.JsonStoreBuilder --jar hadoop-core.jar --cluster config/cluster.xml --storename example_store --storedefinitions config/store.xml --chunksize 1073741824 --replication 1
  6. 6. Pig to Json Index● Output JSON from pig STORE bag INTO data.json USING JsonStorage();● JsonStoreBuilder ● Extends Voldemort StoreBuilder ● Easily index any field● Code up here: http://github.com/danharvey/pigJsonUtils

×