Red Disk

AND SOME THOUGHTS ON BIG DATA IN THE GOVERNMENT

PAUL BROWN
KOVERSE, INC
Accumulo Origin Story
(Paul’s Version)



Thinking was:


We were way behind the curve



Data unification was the only...
Why Accumulo and Hadoop


Interactive Query at Scale




Adaptive Schemas

Heterogeneous Data




Bulk Processing

Mu...
Adoption of Big Data

Home Grown(pre 2008)
Open Source

GOTS
COTS
GOTS Phase


Mission Impact

Goals:

 Lower

complexity

 Mission

Impact

 Repeatability

Sources and
Methods
Technol...
Red Disk


Goals:

 Lower

the complexity and time associated
operationalizing data

 “product”,

purpose

repeatable, ...
Red Disk
 RPMs
 Key

New Apps

Existing Apps

Node Types

 Hadoop/Accumulo

Red Disk

 JBOSS
 STORM

Hadoop and Accum...
Red Disk API -> UCD API


Pre-processing and data ingest: storm



Bulk Analytics: MapReduce Input/Output Formats



CR...
Red Disk
Kafka
DPF
(UCD API)

Mission Apps

Storm
Ingest Analytics – NLP, etc

UCD Ingest / Query API

Raw Data

Indexing ...
UCD logical structure
Bob
Person
Place

Bob

Father Of

Terms

Bob Father of
Joe
Bobby AKA
Bob

Organization
Artifacts

Jo...
Review


Questions…


Red Disk



Accumulo



Anything else
Upcoming SlideShare
Loading in...5
×

Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

2,609

Published on

CEO of Koverse Paul Brown, shares the story of Accumulo and how the project is applied to Hadoop and Big Data.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,609
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

  1. 1. Red Disk AND SOME THOUGHTS ON BIG DATA IN THE GOVERNMENT PAUL BROWN KOVERSE, INC
  2. 2. Accumulo Origin Story (Paul’s Version)  Thinking was:  We were way behind the curve  Data unification was the only way to survive  Google’s architecture is proven to scale and the design is available  Need to prove as soon as possible:    Scale/Unification in real world scenarios Mission Impact What we Learned along the way:  Needed Secure Indexes across datasets  “Productization” is critical to scaling success  We are way ahead of the curve…
  3. 3. Why Accumulo and Hadoop  Interactive Query at Scale   Adaptive Schemas Heterogeneous Data   Bulk Processing Multiple Versions
  4. 4. Adoption of Big Data Home Grown(pre 2008) Open Source GOTS COTS
  5. 5. GOTS Phase  Mission Impact Goals:  Lower complexity  Mission Impact  Repeatability Sources and Methods Technology Core Principals
  6. 6. Red Disk  Goals:  Lower the complexity and time associated operationalizing data  “product”, purpose repeatable, documented, general  Interoperability between systems
  7. 7. Red Disk  RPMs  Key New Apps Existing Apps Node Types  Hadoop/Accumulo Red Disk  JBOSS  STORM Hadoop and Accumulo
  8. 8. Red Disk API -> UCD API  Pre-processing and data ingest: storm  Bulk Analytics: MapReduce Input/Output Formats  CRUD and Query: REST services
  9. 9. Red Disk Kafka DPF (UCD API) Mission Apps Storm Ingest Analytics – NLP, etc UCD Ingest / Query API Raw Data Indexing Providers: Koverse, GAIA, etc Accumulo, HDFS, etc
  10. 10. UCD logical structure Bob Person Place Bob Father Of Terms Bob Father of Joe Bobby AKA Bob Organization Artifacts Joe Statements UCD API Objects
  11. 11. Review  Questions…  Red Disk  Accumulo  Anything else

×