• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cloudera Federal Forum 2014: The REDDISK Big Data Architecture
 

Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

on

  • 1,366 views

CEO of Koverse Paul Brown, shares the story of Accumulo and how the project is applied to Hadoop and Big Data.

CEO of Koverse Paul Brown, shares the story of Accumulo and how the project is applied to Hadoop and Big Data.

Statistics

Views

Total Views
1,366
Views on SlideShare
1,279
Embed Views
87

Actions

Likes
2
Downloads
0
Comments
0

3 Embeds 87

http://www.cloudera.com 82
http://cloudera.com 4
http://www.google.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cloudera Federal Forum 2014: The REDDISK Big Data Architecture Cloudera Federal Forum 2014: The REDDISK Big Data Architecture Presentation Transcript

    • Red Disk AND SOME THOUGHTS ON BIG DATA IN THE GOVERNMENT PAUL BROWN KOVERSE, INC
    • Accumulo Origin Story (Paul’s Version)  Thinking was:  We were way behind the curve  Data unification was the only way to survive  Google’s architecture is proven to scale and the design is available  Need to prove as soon as possible:    Scale/Unification in real world scenarios Mission Impact What we Learned along the way:  Needed Secure Indexes across datasets  “Productization” is critical to scaling success  We are way ahead of the curve…
    • Why Accumulo and Hadoop  Interactive Query at Scale   Adaptive Schemas Heterogeneous Data   Bulk Processing Multiple Versions
    • Adoption of Big Data Home Grown(pre 2008) Open Source GOTS COTS
    • GOTS Phase  Mission Impact Goals:  Lower complexity  Mission Impact  Repeatability Sources and Methods Technology Core Principals
    • Red Disk  Goals:  Lower the complexity and time associated operationalizing data  “product”, purpose repeatable, documented, general  Interoperability between systems
    • Red Disk  RPMs  Key New Apps Existing Apps Node Types  Hadoop/Accumulo Red Disk  JBOSS  STORM Hadoop and Accumulo
    • Red Disk API -> UCD API  Pre-processing and data ingest: storm  Bulk Analytics: MapReduce Input/Output Formats  CRUD and Query: REST services
    • Red Disk Kafka DPF (UCD API) Mission Apps Storm Ingest Analytics – NLP, etc UCD Ingest / Query API Raw Data Indexing Providers: Koverse, GAIA, etc Accumulo, HDFS, etc
    • UCD logical structure Bob Person Place Bob Father Of Terms Bob Father of Joe Bobby AKA Bob Organization Artifacts Joe Statements UCD API Objects
    • Review  Questions…  Red Disk  Accumulo  Anything else