• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
 

Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

on

  • 1,054 views

Adam Fuch provides an overview of Accumulo and Sqrrl Enterprise at the 2013 NoSQL Now! conference

Adam Fuch provides an overview of Accumulo and Sqrrl Enterprise at the 2013 NoSQL Now! conference

Statistics

Views

Total Views
1,054
Views on SlideShare
803
Embed Views
251

Actions

Likes
0
Downloads
55
Comments
0

2 Embeds 251

http://www.sqrrl.com 227
http://localhost 24

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Adam Fuchs' Accumulo Talk at NoSQL Now! 2013 Adam Fuchs' Accumulo Talk at NoSQL Now! 2013 Presentation Transcript

    • Securely explore your data SQRRL ENTERPRISE + APACHE ACCUMULO: A secure, scalable, real-time analysis framework Adam Fuchs, CTO Sqrrl Data, Inc. August 21, 2013
    • OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
    • TWO HALVES OF REAL-TIME Data-Driven Real-Time  reduce event to reaction time © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Query-Driven Real-Time  reduce ingest to query latency
    • Data-Driven + Query-Driven Real-Time Ecosystem Actions 3 SPE 4 Data 1 Dashboards 2 5 NoSQL+ 6 1. 2. 3. 4. 5. 6. Interactive Analysis Tools (Discovery + Forensics) SPE queries NoSQL to enrich streaming data SPE persists results in NoSQL for future query SPE takes action automatically SPE issues data-driven alerts Sqrrl provides context for dashboards Analysis tools query use Sqrrl to search and manipulate historical data © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
    • This talk focuses on the database. Actions 3 SPE 4 Data 1 Dashboards 2 5 NoSQL+ 6 Interactive Analysis Tools (Discovery + Forensics) 1. 2. 3. 4. 5. 6. SPE queries NoSQL to enrich streaming data SPE persists results in NoSQL for future query SPE takes action automatically SPE issues data-driven alerts Sqrrl provides context for dashboards Analysis tools query use Sqrrl to search and manipulate historical data © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
    • OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
    • ACCUMULO DATA FORMAT An Accumulo key is a 5-tuple, consisting of: - Row: Controls Atomicity - Column Family: Controls Locality - Column Qualifier: Controls Uniqueness - Visibility Label: Controls Access - Timestamp: Controls Versioning Accumulo Key/Value Example © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
    • ACCUMULO TABLETS Well-Known Location (zookeeper) Collections of KV pairs form Tables Tables are partitioned into Tablets Metadata tablets hold info about other tablets, forming a 3-level hierarchy A Tablet is a unit of work for a Tablet Server Root Tablet -∞ to ∞ Metadata Tablet 1 Metadata Tablet 2 -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Table: Adam’s Table Data Tablet -∞ : thing Data Tablet thing : ∞ Table: Encyclopedia Data Tablet -∞ : Ocelot © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Table: Foo Data Tablet -∞ to ∞ 8
    • ACCUMULO PROCESSES Zookeeper Tablet Server Zookeeper Zookeeper Tablet Read/Write Delegate Authority Assign/Balance Tablet Server Master Application Application Tablet Store/Replicate Tablet Server HDFS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Application Tablet 9
    • TABLET DATA FLOW Tablet Writes In-Memory Map Iterator Tree Iterator Tree Reads Minor Compaction Sorted, Indexed File Write Ahead Log (For Recovery) Scan Merging / Major Compaction © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Sorted, Indexed File Iterator Tree Sorted, Indexed File 10
    • WORD COUNT: Summing Aggregating Iterator Input Corpus © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 11
    • ITERATOR FRAMEWORK Iterator Operations: - File Reads - Block Caching - Merging - Deletion - Isolation - Locality Groups - Range Selection - Column Selection - Cell-level Security - Versioning - Filtering - Aggregation - Partitioned Joins © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
    • ACCUMULO LATENCIES ~ms ~ms Ingesters Tablet Servers InInInMemory Memory Memory Map Map Map Batch Writer ms - min Input ~ms Scan Scan Scan Iterators Iterators Iterators Queriers Scanner/ Batch Scanner Output Compactio Compactio n Compaction n Iterators Iterators Iterators RFile RFile RFiles © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
    • ACCUMULO THROUGHPUT Scan: up to 1M entries/s per node Ingest: up to 500K entries/s per node ~ms ~ms Ingesters Tablet Servers InInInMemory Memory Memory Map Map Map Batch Writer ms - min Input ~ms Scan Scan Scan Iterators Iterators Iterators Queriers Scanner /Batch Scanner Output Compacti Compacti on Compaction on Iterators Iterators Iterators RFile RFile RFiles Read-Modify-Write Latency: ~ms  >1K entries/s challenging with R-M-W © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 14
    • SQRRL ENTERPRISE Bulk Processing Integration Exploratory / Operational Apps Built on Apache Accumulo Graph + Document I/O Sqrrl API over Apache Thrift RPC (JSON, Graph, Aggregation, Search, etc.) • • • • • Sqrrl proprietary Automated indexing Custom iterators Lucene integration Security extensions Sqrrl Server Accumulo RPC (Sorted Key/Value I/O) • Open source (including Sqrrl contributions) Hadoop RPC (File I/O) • Open source or commercial distributions © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
    • OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
    • DATA-CENTRIC SECURITY Definition: Data carries with it information that is required to make policy decisions on its releasability. User 1 Sqrrl/ Accumul o © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential User 2 17
    • SECURITY Example Accumulo Key/Value Pairs Accumulo is the only NoSQL database with cell-level access controls © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18
    • DATA-CENTRIC SECURITY ECOSYSTEM Key Mgmt Audits End Users Labeler Sqrrl Enterprise Policies Data Policy Engine Apps Auth. Service User Attributes © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 19
    • OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 20
    • HIERARCHICAL DECOMPOSITION <person> Row: Column Family: Column Qualifier: Value: attribute purchases returns age discount sneakers hat <age> <rate> <cost> <cost> © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 21
    • MATERIALIZED TABLE Key/Value Pair Row: bill Column Family: george attribute purchases attribute purchases Column Qualifier: age discount sneakers age sneakers Value: 49 40% $100 27 $83 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 22
    • FORWARD AND INVERTED INDEX Table: Forward Index Inverted Index Row: <UUID> <Term> Column Family: <Type> <UUID> Column Qualifier: <Field> <Type+Field> <Term> <Digest of Event> Value: © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 23
    • FORWARD AND INVERTED INDEX © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 24
    • CUSTOM INDEXING Table: Geo Index Latitude Longitude 10110101001 00111010010 11010110110 <GeoHash> Row: 101001110111010101011100001011100 Column Family: <Event Type> Column Qualifier: Value: Depth <UUID> <Digest of Event> © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 25
    • D4M 2.0 SCHEMA FOR TWITTER DATA Table: Tedge TedgeT Row: <UUID> <value> Column Family: “stat” “time” “user” “word ” “stat” “time” “user” “word ” Column Qualifier: <stat> <time> <user > <word > <UUID > <UUID > <UUID > <UUID > “1” “1” “1” “1” “1” “1” “1” “1” Value: © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 26
    • D4M 2.0 SCHEMA FOR TWITTER DATA Table: TedgeDegT Ttext Row: <value> <UUID> Column Family: “stat” “time” “user” “word ” “text” Column Qualifier: “degre e” “degre e” “degre e” “degre e” - Value: <count> <count> <count> © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential <count> <text> 27
    • D4M 2.0 SCHEMA FOR TWITTER DATA Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database, Kepner et. al., HPEC 2013 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 28
    • OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 29
    • ACCUMULO WITH D4M 2.0 SCHEMA PERFORMANCE Maximizing throughput on an 8-node, 192-core cluster: Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database, Kepner et. al., HPEC 2013 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 30
    • ACCUMULO SCALABILITY: GRAPH500 BENCHMARK source: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 31
    • ATOMIC INCREMENT PERFORMANCE COMPARISON Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo) © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 32
    • QUESTIONS? Adam Fuchs, CTO Sqrrl Data, Inc. © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 33