• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Sqrrl and Accumulo
 

Sqrrl and Accumulo

on

  • 398 views

An in-depth description of sqrrl and how it applies Accumulo's technology.

An in-depth description of sqrrl and how it applies Accumulo's technology.

Statistics

Views

Total Views
398
Views on SlideShare
381
Embed Views
17

Actions

Likes
0
Downloads
0
Comments
0

6 Embeds 17

http://yottabytes.info 4
http://virtualblackswanmarketing.com 4
https://www.linkedin.com 3
http://boomers-bank.com 3
http://www.linkedin.com 2
http://pure-leverage-systems.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Welcome everyone to the meetup.Thank Michael Walker for being such a generous and kind host.Give a brief overview of the topic: Security is now paramount
  • We’ve all seen decision trees, this one is a little different.Notice how Accumulo has positioning in almost every aspect of big data analysis.We’ll get into the greater explanation of the interactions and operations of both sqrrl and Accumulo.
  • Big data has been brewing for a while, and even though Accumulo hasn’t been widely known; it’s had a rich history.Most of you will know about Google’s Bigtable white paper, which Accumulo is based on.In 2008, the NSA began developmentIn 2011 NSA open sources the solution and Accumulo becomes an incubation project at ApacheIn 2012 Accumulo becomes a top-level projectAnd this year the first release of sqrrl was unleashed.
  • The bulk of sqrrl’s design architecture is held within Accumulo, hence why we’ll be diving so deep into its inner-workings.The majority of value from sqrrl is extracted by augmenting and adapting other technologies to provide real-time analytics, security, SQL interactions and analysis, such as graph search.
  • As the history described, development began in 2008.Sqrrl uses Accumulo as a basis for most data workflow and storage access.Cell-level security becomes the key element to differentiate Accumulo and allow for lowering of costs and implementing key security measures and compliances.Scalability is, of course, accommodated with Accumulo.Its schema/tablet/table design complements an adaptive approach.Accomplishes the cell-level security by parsing and sorting specific parts of key/value pairs.
  • This is an overview that we’ll return to in a bit.Notice the “keystone” is Accumulo, but also the diverse technologies associated.
  • Labels are used to apply security to keysPolicy driven enforcement allows for cell-level securityData is marked and labeled at ingest, requiring a well thought-out approachTablets contain tables, or data, and are controlled by those, hopefully, well thought-out security policiesKey/Value pairs contain inherent data at ingestion, using a 5-tuple system
  • I’d like to take a minute, just sit right there, and explain how cell security became the most sought-after feature for real-time analysis.And this is why sqrrl and Accumulo have, natively, risen to the challenge, and why it’s important.
  • An Accumulo key is a pretty complicated column, and is actually a set of nested columns.Explain each individually:Row, provides controlled atomicityCrucial for large data setsColumn Family, controlling locality, fast access systemsColumn Qualifier, sorts third and provides uniqueness to the keyVisibility Label, runs fourth and controls user access to individual key/value pairs. They are Boolean, and require parenthesis to determine orderingVisibility Label, runs fourth and controls user access to individual key/value pairs. They are Boolean, and require parenthesis to determine orderingTimestamp, controls which version of the key is which.Sorting of keys:Hierarchically, row first, column family second, qualifier third, etc.Lexicographically, compare the first by first, second, second, etc.
  • This is an example of how one might setup a key configuration.Row = NameCol. Fam. = Sorting Locality used as a diverse descriptorCol. Qual. = Additional descriptor for granularityVisibility = User access (the syntax is described in documentation, but parenthesis can be used to “nest” and order access restrictions)Timestamp = The calendar dateValue = Diverse result
  • Explain above.Make sure to spend some extra time on data modeling and design.
  • Mention and focus that the sqrrl enterprise white paper is much more verbose and granular when explaining the inner-workings of a Tablet server, etc.
  • Mention and focus that the sqrrl enterprise white paper is much more verbose and granular when explaining the inner-workings of a Tablet server, etc.
  • The visibility labels are a feature that is unique to Accumulo. No other database can apply access controls at this fine-­‐grained level. The labels are generated by translating an organization’s existing data security and information sharing policies into Boolean expressions over data attributes and user attributes.The process for applying these security labels to the data and connecting the labels to user’s designated authorizations is depicted in this figure. The first step is gathering an organization’s information security policies and dissecting or dissemenatingthem into data-­‐centric and user-­‐centric components. As data is ingested into Accumulo, a data labeler tags individual key/value pairs with the appropriate data-­‐centric visibility labels based on these policies. Data is then stored in Accumulo where it is available for real-­‐time queries by operational applications. End users are authenticated through these applications and authorized to access underlying data based on their defined attributes (e.g., sworn law enforcement officer, current employee, etc.). As an end user performs an operation via the app (e.g., performs a search request), the visibility label on each candidate key/value pair is checked against his or her attributes, and only the data that he or she is authorized to see is returned.
  • Real-time processing of millions of records per second, give sqrrl an edge.The workflow processes the ingestion through conversion to hierarchical JSON documents, or can, to utilize a document store’s capabilities.Egress out of Accumulo is where sqrrl shines, and passing this data to the analytics layer gives integration and real-time analytics development a lot more accessibility.Integration with existing Identity and Access Management systems is crucial to a direct, seamless, implementation.
  • Remember this graphic? There’s quite a lot more to sqrrl than just AccumuloStarting with ingest, sqrrl facilitates delineationPushed to Accumulo, processing and results are utilized, with HDFS as a storage systemThrift enables development from almost any developers favorite flavor of languageAnd finally Lucene, one of many post processing analytics platforms, takes the churned data and presents it to a number of layers.
  • Explain each usage case
  • Explain the above.I’d like to thank everyone for their time, and would love to answer any questions.

Sqrrl and Accumulo Sqrrl and Accumulo Presentation Transcript

  • sqrrl and Accumulo
  • Which NoSQL solution? There are a lot of places to fit sqrrl, and Accumulo. © sqrrl, Inc.
  • What is sqrrl?  Based on Accumulo  A proven, secure, multi tenant, data platform for building real-time applications  Scales elastically to tens of petabytes of data and enables organizations to eliminate their internal data silos  Seamless integration with Hadoop, and most of its variants  Providing a supply for a much needed security demand (groundup security)  Already deployed and utilized by defense and government industries
  • A history of sqrrl © sqrrl, Inc. - Accumulo Meetup Presentation
  • A sqrrl’s architecture © sqrrl, Inc.
  • What is Accumulo?  Development began at the NSA in 2008  Base foundation for sqrrl  Cell-level security reduces the cost of app development, circumnavigating complex, sometimes impossible, legal or policy restrictions  Provides the ability to scale to >PB levels  Highly adaptive schema and sorted key/value paradigm  Stores key/value pairs in parsed, sorted, secure controls
  • Where does Accumulo fit? © sqrrl, Inc.
  • How does Accumulo provide security?  Security Labels are applied to keys  Cell-level security is implemented to allow for security policy enforcement, using data labeler tags  These policies are applied when data is ingested  Tablets contain data, are controlled using security policies  Stores key/value pairs in parsed, sorted, secure controls, a 5-tuple key system
  • Accumulo Security (cont.) Why Cell-Level Security Is Important: Many databases insufficiently implement security through row-and columnlevel restrictions. Column-level security is only sufficient when the data schema is static, well known, and aligned with security concerns. Row-level security breaks down when a single record conveys multiple levels of information. The flexible, fine-grained cell-level security within Sqrrl Enterprise (or its root Accumulo) supports flexible schemas, new indexing patterns, and greater analytic adaptability at scale.
  • Accumulo Security (cont.) An Accumulo key is a 5-tuple key, consisting of:  Row: Controls Atomicity  Column Family: Controls Locality  Column Qualifier: Controls Uniqueness  Visibility Label: Controls Access  Timestamp: Controls Versioning (Values are byte arrays) Keys are sorted:  Hierarchically: Row first, then column family, and so on  Lexicographically: Compare first byte, then second, and so on
  • Accumulo Security (cont.) An example of column usage
  • Accumulo Architecture Accumulo servers (tablets) utilize a multitude of big data technologies, but their layout is different than Map/Reduce, HDFS, MongoDB, Cassandra, etc. used alone.  Data is stored in HDFS  Zookeeper is utilized for configuration management  SSH, password-less, node configuration  An emphasis, more of an imperative, on data model and data model design
  • Accumulo Architecture (cont.) Tablets  Partitions of tables, collections of sorted key/value pairs  Held and managed by Tablet Servers
  • Accumulo Architecture (cont.) Tablet Servers  Receive writes, responds to reads, from clients  Writes to a write-ahead log, sorting new key/value pairs in memory, while periodically flushing sorted key/value pairs to new files in HDFS  Managed by Master  Responsible for detecting and responding to Tablet Server failure, load balancing  Coordinates startup, graceful shutdown, and recovery of writeahead logs  Zookeeper  An apache project, open source  Utilized for distributed locking mechanism, with no single point of failure
  • Integration with users/access The visibility labels are a feature that is unique to Accumulo. No other database can apply access controls at such a fine-grained level. Labels are generated by translating an organization’s existing data security and information sharing policies into Boolean expressions 1. Gather an organization’s information security policies and dissecting them into data--‐centric and user--‐centric components 2. As data is ingested into Accumulo, a data labeler tags individual key/value pairs with the appropriate data--‐centric visibility labels based on these policies. 3. Data is then stored in Accumulo where it is available for real--‐time queries by operational applications. End users are authenticated through these applications and authorized to access underlying data 4. As an end user performs an operation via the app (e.g., performs a search request), the visibility label on each candidate key/value pair is checked against his or her attributes, and only the data that he or she is authorized to see is returned.
  • Making sqrrl work sqrrl’s extensibility of Accumulo allows it to process millions of records per second, as either static or streaming objects These records are converted into hierarchical JSON documents, giving document store capabilities Passing this data to the analytics layer is designed to make integration and development of real-time analytics possible, and accessible Combining access at the cell level, with Accumulo, sqrrl integrates Identity and Access Management (IAM) systems (LDAP, RADIUS, etc.)
  • Making sqrrl work Apache thrift: (cont.) Sqrrl process Apache Lucene: Enables development in diverse language choices Custom iterators, providing developers with real-time capabilities, such as full-text search, graph analysis, and statistics, for analytical applications and dashboards. HDFS: File Storage system, compatible with both open source (OSS) and commercial versions Data Ingest: JSON, or graph format Apache Accumulo: The core of transactional and online analytical data processing in sqrrl
  • Who is sqrrl for? CTOs/CIOs: Unlock the value in fractured and unstructured datasets across your organization Developers: More easily create apps on top of Big Data and distributed databases Infrastructure Managers: Simplify administration of Big Data through highly scalable and multitenant distributed systems Data Analysts: Dig deeper into your data using advanced analytical techniques, such as graph analysis Business Users: Use Big Data seamlessly via apps developed on top of sqrrl enterprise
  • sqrrl/Accumulo wrap-up Accumulo sqrrl bridges the gap for security perspectives that restrict a large swath of industries combines the best of available technologies, develops and contributes their own, and designs big apps for big data. Accumulo Setup: 1. 2. 3. Installation of HDFS and ZooKeeper must installed and configured Password-less SSH should be configured between all nodes (emphasized master <> tablet) Installation of Accumulo (from http://accumulo.apache.org/downloads/ using http://accumulo.apache.org/1.4/user_manual/Administration.html#Installation Or get started using their AMI (http://www.sqrrl.com/downloads#getting-started)