SlideShare a Scribd company logo
Accismus
A Percolator implementation
using Accumulo
Keith Turner
Accismus
A form of irony where one pretends indifference
and refuses something while actually wanting it.
Google's Problem
●
Use M/R to process ~ 10
15
bytes
●
~10
12
bytes new data arrive
●
Use M/R to process 10
15
+ 10
12
bytes
● High latency before new data available for
query
Solution
● Percolator : incremental processing for big data
– Layer on top of BigTable
– Offers fault tolerant, cross row transactions
● Lazy recovery
– Offers snapshot isolation
● Only read committed data
– Uses BigTable data model, except timestamp
● Accismus adds visibility
– Has own API
Observers
● User defined function that executes a
transaction
● Triggered when a user defined column is
modified (called notification in paper)
● Guarantee only one transaction will execute per
notification
Initialize bank
tx1.begin()
if(tx1.get('bob','balance') == null)
tx1.set('bob','balance',100)
if(tx1.get('joe','balance') == null)
tx1.set('joe','balance',100)
if(tx1.get('sue','balance') == null)
tx1.set('sue','balance',100)
tx1.commit()
What could possibly go
wrong?
Two threads transferring
Thread 2 on node BThread 2 on node B
tx3.begin()
b3 = tx3.get('joe','balance')
b4 = tx3.get('sue','balance')
tx3.set('joe','balance',b3 + 5)
tx3.set('sue','balance',b4 - 5)
tx3.commit()
Thread 1 on node AThread 1 on node A
tx2.begin()
b1 = tx2.get('joe','balance')
b2 = tx2.get('bob','balance')
tx2.set('joe','balance',b1 + 7)
tx2.set('bob','balance',b2 - 7)
tx2.commit()
Accismus stochastic bank test
● Bank account per row
● Initialize N bank accounts with 1000
● Run random transfer threads
● Complete scan always sums to N*1000
Phrasecount example
● Have documents + source URI
● Dedupe documents based on SHA1
● Count number of unique documents each
phrase occurs in
● Can do this with two map reduce jobs
● https://github.com/keith-turner/phrasecount
Accismus Application
● Map Reduce+Bulk Import
● Load Transactions
● Observers
● Export Transactions
Load transaction 1
document:b4bf617e
my dog is very nice
http://foo.com/a
Load transaction 2
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a
Load transaction 3
document:1e111475
his dog is very nice
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
Observer transaction 1
document:1e111475
his dog is very nice
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
my dog is very : 1
dog is very nice : 1
Observer transaction 2
document:1e111475
his dog is very nice
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
my dog is very : 1 his dog is very : 1
dog is very nice : 2
Load transaction 4
document:1e111475
his dog is very nice
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
my dog is very : 1 his dog is very : 1
dog is very nice : 2
Observer transaction 3
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
my dog is very : 1
dog is very nice : 1
Phrasecount schema
Row Column Value
uri:<uri> doc:hash <hash>
doc:<hash> doc:content <document>
doc:<hash> doc:refCount <int>
doc:<hash> index:check null
doc:<hash> index:status INDEXED | null
phrase:<phrase> stat:docCount <int>
Querying phrase counts
● Query Accismus directly
– Lazy recovery may significantly delay query
– High load may delay queries
● Export transaction write to Accumlo table
– WARNING : leaving the sane word of transactions
– Faults during export
– Concurrently exporting same item
– Out of order arrival of exported data
Export transaction strategy
● Only export committed data (Intent log)
– Don't export something a transaction is going to
commit
● Idempotent
– Export transaction can fail
– Expect repeated execution (possibly concurrent)
● Use committed sequence # to order data
– Thread could read export data, pause, then export old
data.
– Use seq # as timestamp in Accumulo export table
Phrasecount export schema
Row Column Value
phrase:<phrase> export:check
phrase:<phrase> export:seq <int>
phrase:<phrase> export:sum <int>
phrase:<phrase> stat:sum <int>
Phrasecount problems
● No handling for high cardinality phrases
– Weak notifications mentioned in paper
– Multi-row tree another possibility
● Possible memory exhaustion
– Percolator uses many threads to get high
throughput
– Example loads entire document into memory. Many
threads X large documents == dead worker.
Weak notifications(Queue)
String pr = 'phrase:'+phrase;
int current = tx1.get(pr,'stat:docCount')
if(isHighVolume(phrase)){
tx1.set(pr,'stat:docCount'+rand,delta)
tx1.weakNotify(pr); //trigger observer to collapse rand
columns
}else
tx1.set(pr, 'stat:docCount',delta + current)
Multi-row tree for high cardinality
phrase:<phrase>
phrase_01:<phrase>
phrase_1:<phrase>
phrase_00:<phrase>
phrase_0:<phrase>
phrase_10:<phrase>
phrase_11:<phrase>
● Incoming updates leaves
● Observers percolate to root
● Export from root
Timestamp Oracle
● lightweight centralized service that issues
timestamp
– Allocates batches of timestamps from zookeeper
– Give batches of timestamps to nodes executing
transactions
Timestamp oracle
● Gives logical global ordering to events
– Transactions get timestamp at start. Only read data
committed before.
– Transaction get timestamp when committing.
Percolator Implementation
● Two phase commit using conditional mutations
– Write lock+data to primary row/column
– Write lock+data to all other row/columns
– commit primary row/column if still locked
– commit all other row/columns
● Lock fails if change between start and commit
timestamp
● All row/columns in transaction point to primary
● In case of failure, primary is authority
● No centralized locking
Handling failures
● Transaction dies in phase 1
– Written some locks+data
– Must rollback
● Transaction dies in phase 2
– All locks+data written
– Roll-forward and write data pointers
Transfer transaction
Row Column Percolator
Type
Time Value
bob balance write 1 0
bob balance data 0 100
joe balance write 1 0
joe balance data 0 100
Percolator appends column type to qualifier. Accismus uses high 4 bits of timestamp.
Lock primary
Row Column Percolator
Type
Time Value
bob balance write 1 0
bob balance lock 3 bob:balance
bob balance data 3 93
bob balance data 0 100
joe balance write 1 0
joe balance data 0 100
Lock other
Row Column Percolator
Type
Time Value
bob balance write 1 0
bob balance lock 3 bob:balance
bob balance data 3 93
bob balance data 0 100
joe balance write 1 0
joe balance lock 3 bob:balance
joe balance data 3 107
joe balance data 0 100
Commit primary
Row Column Percolator
Type
Time Value
bob balance write 6 3
bob balance write 1 0
bob balance data 3 93
bob balance data 0 100
joe balance write 1 0
joe balance lock 3 bob:balance
joe balance data 3 107
joe balance data 0 100
What happens if tx with start time 7 reads joe and bob?
Commit timestamp obtained after all locks written, why?
Commit other
Row Column Percolator
Type
Time Value
bob balance write 6 3
bob balance write 1 0
bob balance data 3 93
bob balance data 0 100
joe balance write 6 3
joe balance write 1 0
joe balance data 3 107
joe balance data 0 100
Garbage collection
● Not mentioned in paper
● Use compaction iterator
● Currently keep X versions. Could determine
oldest active scan start timestamp.
● Must keep data about success/failure of
primary column
– Added extra column type to indicate when primary
can be collected. Never collected in failure case.
After GC Iterator
Row Column Percolator
Type
Time Value
bob balance write 6 3:TRUNC
bob balance write 1 0
bob balance data 3 93
bob balance data 0 100
joe balance write 6 3:TRUNC
joe balance write 1 0
joe balance data 3 107
joe balance data 0 100
Transaction with read time of 5 would see StaleScanException
Snapshot iterator
● Used to read data
● Analyzes percolator metadata on tserver
● Returns commited data <= start OR open locks
● Detects scan past point of GC
– Client code throws StaleScanException
Accismus API
● Minimal byte buffer based API
– Currently byte sequence, plan to move to byte buffer.
(could be your first patch :)
– remove all external dependencies, like Accumulo
Range
● Wrap minimal API w/ convenience API that handles
nulls, encoding, and types well.
//automatically encode strings and int into bytes using supplied encoder
tx.mutate().row(“doc:”+hash).fam(“doc”).qual(“refCount”).set(5);
//no need to check if value is null and then parse as int
int rc = tx.get().row(“doc:”+hash).fam(“doc”).qual(“refCount”).toInteger(0);
TODO
● test at scale
● create a cluster test suite
● weak notifications
● use YARN to run
● improve batching of reads and writes
● Initialization via M/R. Accismus file output format
● column read ahead based on past read patterns
● Improve GC
● Improve finding notifications
Collaborate
● https://github.com/keith-turner/Accismus
● Interested in building an Accismus application?
● Hope to have a feature complete Alpha within a
few months that can be stabilized

More Related Content

What's hot

Source Code
Source CodeSource Code
Source Code
vijaykantsaini
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
Artur Zakirov
 
skipfish
skipfishskipfish
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
Eric Evans
 
tit
tittit
Dbms recovering from a system crash
Dbms recovering from a system crashDbms recovering from a system crash
Dbms recovering from a system crashAbhishek Kumar Gupta
 

What's hot (7)

Django debugging
Django debuggingDjango debugging
Django debugging
 
Source Code
Source CodeSource Code
Source Code
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
skipfish
skipfishskipfish
skipfish
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
 
tit
tittit
tit
 
Dbms recovering from a system crash
Dbms recovering from a system crashDbms recovering from a system crash
Dbms recovering from a system crash
 

Viewers also liked

Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit
 
Accumulo Summit 2014: A Tour of Internal Accumulo Testing
Accumulo Summit 2014: A Tour of Internal Accumulo TestingAccumulo Summit 2014: A Tour of Internal Accumulo Testing
Accumulo Summit 2014: A Tour of Internal Accumulo Testing
Accumulo Summit
 
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
Accumulo Summit
 
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...
Accumulo Summit
 
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...
Accumulo Summit
 
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit
 
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Accumulo Summit
 
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...
Accumulo Summit
 
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]
Accumulo Summit
 
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit
 
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit
 
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]
Accumulo Summit
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit
 
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]
Accumulo Summit
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
El Inicio Sin Fin
El Inicio Sin FinEl Inicio Sin Fin
El Inicio Sin Finguest611103
 

Viewers also liked (20)

Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
 
Accumulo Summit 2014: A Tour of Internal Accumulo Testing
Accumulo Summit 2014: A Tour of Internal Accumulo TestingAccumulo Summit 2014: A Tour of Internal Accumulo Testing
Accumulo Summit 2014: A Tour of Internal Accumulo Testing
 
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
 
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...
 
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...
 
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
 
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
 
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...
 
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]
 
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
 
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
 
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
El Inicio Sin Fin
El Inicio Sin FinEl Inicio Sin Fin
El Inicio Sin Fin
 

Similar to Accumulo Summit 2014: Accismus -- Percolating with Accumulo

Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
National Cheng Kung University
 
Study Notes: Google Percolator
Study Notes: Google PercolatorStudy Notes: Google Percolator
Study Notes: Google Percolator
Gao Yunzhong
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 
Dc ch07 : error control and data link control
Dc ch07 : error control and data link controlDc ch07 : error control and data link control
Dc ch07 : error control and data link control
Syaiful Ahdan
 
030 cpp streams
030 cpp streams030 cpp streams
030 cpp streamsHồ Lợi
 
4th_Ed_Ch03.pdf
4th_Ed_Ch03.pdf4th_Ed_Ch03.pdf
4th_Ed_Ch03.pdf
ShifatiRabbi
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward
 
Verilog Lecture2 thhts
Verilog Lecture2 thhtsVerilog Lecture2 thhts
Verilog Lecture2 thhts
Béo Tú
 
5-LEC- 5.pptxTransport Layer. Transport Layer Protocols
5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols
5-LEC- 5.pptxTransport Layer. Transport Layer Protocols
ZahouAmel1
 
Oop object oriented programing topics
Oop object oriented programing topicsOop object oriented programing topics
Oop object oriented programing topics
(•̮̮̃•̃) Prince Do Not Work
 
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
PROIDEA
 
The program will read the file like this, java homework6Bank sma.pdf
The program will read the file like this, java homework6Bank sma.pdfThe program will read the file like this, java homework6Bank sma.pdf
The program will read the file like this, java homework6Bank sma.pdf
ivylinvaydak64229
 
PPU Optimisation Lesson
PPU Optimisation LessonPPU Optimisation Lesson
PPU Optimisation Lesson
slantsixgames
 
AN INTRODUCTION TO SERIAL PORT INTERFACING
AN INTRODUCTION TO SERIAL PORT INTERFACINGAN INTRODUCTION TO SERIAL PORT INTERFACING
AN INTRODUCTION TO SERIAL PORT INTERFACING
Total Project Solutions
 
Datastage real time scenario
Datastage real time scenarioDatastage real time scenario
Datastage real time scenario
Naresh Bala
 
5.pptx fundamental programing one branch
5.pptx fundamental programing one branch5.pptx fundamental programing one branch
5.pptx fundamental programing one branch
ssuserdde43b
 
Performance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. JanowskiPerformance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. Janowski
Aspire Systems Poland Sp. z o.o.
 
Universal asynchronous receiver-transmitter UART Dsa project report
Universal asynchronous receiver-transmitter UART Dsa project reportUniversal asynchronous receiver-transmitter UART Dsa project report
Universal asynchronous receiver-transmitter UART Dsa project report
Shahrukh Javed
 
Metafuzz: Building Boring Fuzzers Faster, Using Metadata
Metafuzz: Building Boring Fuzzers Faster, Using MetadataMetafuzz: Building Boring Fuzzers Faster, Using Metadata
Metafuzz: Building Boring Fuzzers Faster, Using Metadata
amiable_indian
 

Similar to Accumulo Summit 2014: Accismus -- Percolating with Accumulo (20)

Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
Study Notes: Google Percolator
Study Notes: Google PercolatorStudy Notes: Google Percolator
Study Notes: Google Percolator
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 
Dc ch07 : error control and data link control
Dc ch07 : error control and data link controlDc ch07 : error control and data link control
Dc ch07 : error control and data link control
 
030 cpp streams
030 cpp streams030 cpp streams
030 cpp streams
 
4th_Ed_Ch03.pdf
4th_Ed_Ch03.pdf4th_Ed_Ch03.pdf
4th_Ed_Ch03.pdf
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
 
Verilog Lecture2 thhts
Verilog Lecture2 thhtsVerilog Lecture2 thhts
Verilog Lecture2 thhts
 
5-LEC- 5.pptxTransport Layer. Transport Layer Protocols
5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols
5-LEC- 5.pptxTransport Layer. Transport Layer Protocols
 
Oop object oriented programing topics
Oop object oriented programing topicsOop object oriented programing topics
Oop object oriented programing topics
 
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
 
The program will read the file like this, java homework6Bank sma.pdf
The program will read the file like this, java homework6Bank sma.pdfThe program will read the file like this, java homework6Bank sma.pdf
The program will read the file like this, java homework6Bank sma.pdf
 
PPU Optimisation Lesson
PPU Optimisation LessonPPU Optimisation Lesson
PPU Optimisation Lesson
 
AN INTRODUCTION TO SERIAL PORT INTERFACING
AN INTRODUCTION TO SERIAL PORT INTERFACINGAN INTRODUCTION TO SERIAL PORT INTERFACING
AN INTRODUCTION TO SERIAL PORT INTERFACING
 
Datastage real time scenario
Datastage real time scenarioDatastage real time scenario
Datastage real time scenario
 
5.pptx fundamental programing one branch
5.pptx fundamental programing one branch5.pptx fundamental programing one branch
5.pptx fundamental programing one branch
 
Performance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. JanowskiPerformance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. Janowski
 
Universal asynchronous receiver-transmitter UART Dsa project report
Universal asynchronous receiver-transmitter UART Dsa project reportUniversal asynchronous receiver-transmitter UART Dsa project report
Universal asynchronous receiver-transmitter UART Dsa project report
 
Metafuzz: Building Boring Fuzzers Faster, Using Metadata
Metafuzz: Building Boring Fuzzers Faster, Using MetadataMetafuzz: Building Boring Fuzzers Faster, Using Metadata
Metafuzz: Building Boring Fuzzers Faster, Using Metadata
 
Unit06 dbms
Unit06 dbmsUnit06 dbms
Unit06 dbms
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

Accumulo Summit 2014: Accismus -- Percolating with Accumulo

  • 2. Accismus A form of irony where one pretends indifference and refuses something while actually wanting it.
  • 3. Google's Problem ● Use M/R to process ~ 10 15 bytes ● ~10 12 bytes new data arrive ● Use M/R to process 10 15 + 10 12 bytes ● High latency before new data available for query
  • 4. Solution ● Percolator : incremental processing for big data – Layer on top of BigTable – Offers fault tolerant, cross row transactions ● Lazy recovery – Offers snapshot isolation ● Only read committed data – Uses BigTable data model, except timestamp ● Accismus adds visibility – Has own API
  • 5. Observers ● User defined function that executes a transaction ● Triggered when a user defined column is modified (called notification in paper) ● Guarantee only one transaction will execute per notification
  • 6. Initialize bank tx1.begin() if(tx1.get('bob','balance') == null) tx1.set('bob','balance',100) if(tx1.get('joe','balance') == null) tx1.set('joe','balance',100) if(tx1.get('sue','balance') == null) tx1.set('sue','balance',100) tx1.commit() What could possibly go wrong?
  • 7. Two threads transferring Thread 2 on node BThread 2 on node B tx3.begin() b3 = tx3.get('joe','balance') b4 = tx3.get('sue','balance') tx3.set('joe','balance',b3 + 5) tx3.set('sue','balance',b4 - 5) tx3.commit() Thread 1 on node AThread 1 on node A tx2.begin() b1 = tx2.get('joe','balance') b2 = tx2.get('bob','balance') tx2.set('joe','balance',b1 + 7) tx2.set('bob','balance',b2 - 7) tx2.commit()
  • 8. Accismus stochastic bank test ● Bank account per row ● Initialize N bank accounts with 1000 ● Run random transfer threads ● Complete scan always sums to N*1000
  • 9. Phrasecount example ● Have documents + source URI ● Dedupe documents based on SHA1 ● Count number of unique documents each phrase occurs in ● Can do this with two map reduce jobs ● https://github.com/keith-turner/phrasecount
  • 10. Accismus Application ● Map Reduce+Bulk Import ● Load Transactions ● Observers ● Export Transactions
  • 11. Load transaction 1 document:b4bf617e my dog is very nice http://foo.com/a
  • 12. Load transaction 2 document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a
  • 13. Load transaction 3 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c
  • 14. Observer transaction 1 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 dog is very nice : 1
  • 15. Observer transaction 2 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 his dog is very : 1 dog is very nice : 2
  • 16. Load transaction 4 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 his dog is very : 1 dog is very nice : 2
  • 17. Observer transaction 3 document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 dog is very nice : 1
  • 18. Phrasecount schema Row Column Value uri:<uri> doc:hash <hash> doc:<hash> doc:content <document> doc:<hash> doc:refCount <int> doc:<hash> index:check null doc:<hash> index:status INDEXED | null phrase:<phrase> stat:docCount <int>
  • 19. Querying phrase counts ● Query Accismus directly – Lazy recovery may significantly delay query – High load may delay queries ● Export transaction write to Accumlo table – WARNING : leaving the sane word of transactions – Faults during export – Concurrently exporting same item – Out of order arrival of exported data
  • 20. Export transaction strategy ● Only export committed data (Intent log) – Don't export something a transaction is going to commit ● Idempotent – Export transaction can fail – Expect repeated execution (possibly concurrent) ● Use committed sequence # to order data – Thread could read export data, pause, then export old data. – Use seq # as timestamp in Accumulo export table
  • 21. Phrasecount export schema Row Column Value phrase:<phrase> export:check phrase:<phrase> export:seq <int> phrase:<phrase> export:sum <int> phrase:<phrase> stat:sum <int>
  • 22. Phrasecount problems ● No handling for high cardinality phrases – Weak notifications mentioned in paper – Multi-row tree another possibility ● Possible memory exhaustion – Percolator uses many threads to get high throughput – Example loads entire document into memory. Many threads X large documents == dead worker.
  • 23. Weak notifications(Queue) String pr = 'phrase:'+phrase; int current = tx1.get(pr,'stat:docCount') if(isHighVolume(phrase)){ tx1.set(pr,'stat:docCount'+rand,delta) tx1.weakNotify(pr); //trigger observer to collapse rand columns }else tx1.set(pr, 'stat:docCount',delta + current)
  • 24. Multi-row tree for high cardinality phrase:<phrase> phrase_01:<phrase> phrase_1:<phrase> phrase_00:<phrase> phrase_0:<phrase> phrase_10:<phrase> phrase_11:<phrase> ● Incoming updates leaves ● Observers percolate to root ● Export from root
  • 25. Timestamp Oracle ● lightweight centralized service that issues timestamp – Allocates batches of timestamps from zookeeper – Give batches of timestamps to nodes executing transactions
  • 26. Timestamp oracle ● Gives logical global ordering to events – Transactions get timestamp at start. Only read data committed before. – Transaction get timestamp when committing.
  • 27. Percolator Implementation ● Two phase commit using conditional mutations – Write lock+data to primary row/column – Write lock+data to all other row/columns – commit primary row/column if still locked – commit all other row/columns ● Lock fails if change between start and commit timestamp ● All row/columns in transaction point to primary ● In case of failure, primary is authority ● No centralized locking
  • 28. Handling failures ● Transaction dies in phase 1 – Written some locks+data – Must rollback ● Transaction dies in phase 2 – All locks+data written – Roll-forward and write data pointers
  • 29. Transfer transaction Row Column Percolator Type Time Value bob balance write 1 0 bob balance data 0 100 joe balance write 1 0 joe balance data 0 100 Percolator appends column type to qualifier. Accismus uses high 4 bits of timestamp.
  • 30. Lock primary Row Column Percolator Type Time Value bob balance write 1 0 bob balance lock 3 bob:balance bob balance data 3 93 bob balance data 0 100 joe balance write 1 0 joe balance data 0 100
  • 31. Lock other Row Column Percolator Type Time Value bob balance write 1 0 bob balance lock 3 bob:balance bob balance data 3 93 bob balance data 0 100 joe balance write 1 0 joe balance lock 3 bob:balance joe balance data 3 107 joe balance data 0 100
  • 32. Commit primary Row Column Percolator Type Time Value bob balance write 6 3 bob balance write 1 0 bob balance data 3 93 bob balance data 0 100 joe balance write 1 0 joe balance lock 3 bob:balance joe balance data 3 107 joe balance data 0 100 What happens if tx with start time 7 reads joe and bob? Commit timestamp obtained after all locks written, why?
  • 33. Commit other Row Column Percolator Type Time Value bob balance write 6 3 bob balance write 1 0 bob balance data 3 93 bob balance data 0 100 joe balance write 6 3 joe balance write 1 0 joe balance data 3 107 joe balance data 0 100
  • 34. Garbage collection ● Not mentioned in paper ● Use compaction iterator ● Currently keep X versions. Could determine oldest active scan start timestamp. ● Must keep data about success/failure of primary column – Added extra column type to indicate when primary can be collected. Never collected in failure case.
  • 35. After GC Iterator Row Column Percolator Type Time Value bob balance write 6 3:TRUNC bob balance write 1 0 bob balance data 3 93 bob balance data 0 100 joe balance write 6 3:TRUNC joe balance write 1 0 joe balance data 3 107 joe balance data 0 100 Transaction with read time of 5 would see StaleScanException
  • 36. Snapshot iterator ● Used to read data ● Analyzes percolator metadata on tserver ● Returns commited data <= start OR open locks ● Detects scan past point of GC – Client code throws StaleScanException
  • 37. Accismus API ● Minimal byte buffer based API – Currently byte sequence, plan to move to byte buffer. (could be your first patch :) – remove all external dependencies, like Accumulo Range ● Wrap minimal API w/ convenience API that handles nulls, encoding, and types well. //automatically encode strings and int into bytes using supplied encoder tx.mutate().row(“doc:”+hash).fam(“doc”).qual(“refCount”).set(5); //no need to check if value is null and then parse as int int rc = tx.get().row(“doc:”+hash).fam(“doc”).qual(“refCount”).toInteger(0);
  • 38. TODO ● test at scale ● create a cluster test suite ● weak notifications ● use YARN to run ● improve batching of reads and writes ● Initialization via M/R. Accismus file output format ● column read ahead based on past read patterns ● Improve GC ● Improve finding notifications
  • 39. Collaborate ● https://github.com/keith-turner/Accismus ● Interested in building an Accismus application? ● Hope to have a feature complete Alpha within a few months that can be stabilized