The Codex of Business Writing Software for Real-World Solutions 2.pptx
Sqrrl real time_big_data_20130411
1. sqrrl
Secure.
Scale.
Adapt
Sqrrl Data, Inc. All Rights Reserved
sqrrl
Secure.
Scale.
Adapt.
Adam
Fuchs,
CTO
11
April,
2013
2. 2
Sqrrl Data, Inc. All Rights Reserved
Management
Ely Kahn
sqrrl VP BizDev,
White House
Investors
Adam
Fuchs
sqrrl CTO, NSA
Who We Are
20+
years
of
combined
Apache
Accumulo
engineering
exper9se
Mark
Terenzoni
sqrrl CEO, F5
• Founded
July
2012
• Funded
August
2012
• Team
includes
former
Tech
Director
of
Accumulo
at
NSA
and
6
commiDers/contributors
3. 3
Sqrrl Data, Inc. All Rights Reserved
3
Our Mission
Security
AdapGvity
Scalability
4. 4
Sqrrl Data, Inc. All Rights Reserved
4
Apache Accumulo
" Sorted, Distributed Key/Value Store
" Based on Google’s Big Table Design
" Built on Top of Apache Hadoop and Apache Zookeeper
" Augments and Integrates With the Hadoop ecosystem
" Originally developed at the National Security Agency, now
an Apache Software Foundation project
5. 5
Sqrrl Data, Inc. All Rights Reserved
5
Applica9ons
Analy9cs
APIs
Security
&
Access
Controls
Data
Integra9on
Search,
Sta*s*cs,
Graph,
Lucene,
SQL,
Custom
Extensions
IAM,
Encryp*on,
DAM,
Secure
Code
ETL,
Hadoop
Accumulo
Sqrrl Enterprise Architecture
6. 6
Sqrrl Data, Inc. All Rights Reserved
" Start
small,
but
design
for
scalability
– One
applicaGon
first,
then
grow
to
hundreds
– One
gigabyte
first,
then
grow
to
petabytes
" Itera*ve
schema
refinement
– IniGally,
let
the
data
define
the
schema
– Refine
the
schema
in
bulk
as
you
beDer
understand
the
data
– Middle
ground
between
flat
files
and
complete
ontologies
" Discovery
analy*cs
as
applica*on
building
blocks
– Universal
search:
structured
and
unstructured
data,
across
data
sets,
low
latency
– Basic
staGsGcs:
aggregaGons
of
query
results,
parallelized,
low
latency,
to
support
big
picture
analysis
– Graphs:
scalable
graph
analyGcs
for
analyzing
how
everything
is
connected
" Data-‐centric
security
– Separate
modeling
of
security
and
analysis
– Simplifies
mulG-‐tenancy
and
applicaGon
accreditaGon
Big Data Lessons Learned
7. 7
Sqrrl Data, Inc. All Rights Reserved
7
Schema Discovery
8. 8
Sqrrl Data, Inc. All Rights Reserved
The
future
of
Big
Data
innovaGon
is
Apps,
built
on:
• Universal
Search
• Schema-‐less
StaGsGcs
• Graphs
• IntuiGve
Languages
• Secure,
Scalable,
and
Adaptable
plaorms
Lightweight Apps
9. 9
Sqrrl Data, Inc. All Rights Reserved
9
Targeted Analysis
10. 10
Sqrrl Data, Inc. All Rights Reserved
10
Big-Picture Analytics
11. 11
Sqrrl Data, Inc. All Rights Reserved
DefiniGon:
A
form
of
security
in
which
data
carries
with
it
the
elements
of
provenance
that
are
required
to
make
policy
decisions
on
its
releasability.
• Separate
data
modeling
for
Security
and
Analysis
• Reusability
of
applicaGons
across
security
domains
• Distributed
development
of
ingest
and
query
applicaGons
• Supported
by
Accumulo’s
cell-‐level
security
Data-Centric Security
12. 12
Sqrrl Data, Inc. All Rights Reserved
12
Cell-Level Security
13. 13
Sqrrl Data, Inc. All Rights Reserved
13
Scalable Data-Centric Security
Data
Labeler
Accumulo
Apps
User
ACributes
Audits
Policies
HDFS,
Zookeeper
End
Users
Auth.
Service
Policy
Engine
14. 14
Sqrrl Data, Inc. All Rights Reserved
14
Accumulo’s Strengths
" Security
– Cell-‐level
security
reduces
the
cost
of
applicaGon
development
in
the
presence
of
complex
legal
or
policy
restricGons
on
data
use
– IAM
and
encrypGon
Ges
into
enterprise
security
standards
" Scalability
– Proven
reliability
and
performance
at
the
mulG-‐petabyte
scale
– High-‐performance
parallel
I/O
library
" Adap9vity
– Flexible
schema
support
to
quickly
ingest
new
data
sources
– Sorted
key/value
paradigm
supports
a
mulGtude
of
search
and
analysis
applicaGons
– Server-‐side
programming
framework
“iterator
trees”
support
best-‐in-‐
class
aggregaGon,
filtering,
and
complex
query
semanGcs
15. 15
Sqrrl Data, Inc. All Rights Reserved
15
An
Accumulo
key
is
a
5-‐tuple,
consis9ng
of:
" Row:
Controls
Atomicity
" Column
Family:
Controls
Locality
" Column
Qualifier:
Controls
Uniqueness
" Visibility
Label:
Controls
Access
" Timestamp:
Controls
Versioning
Row
Col.
Fam.
Col.
Qual.
Visibility
Timestamp
Value
John
Doe
Notes
PCP
PCP_JD
20120912
PaGent
suffers
from
an
acute
…
John
Doe
Test
Results
Cholesterol
JD|PCP_JD
20120912
183
John
Doe
Test
Results
Mental
Health
JD|PSYCH_JD
20120801
Pass
John
Doe
Test
Results
X-‐Ray
JD|PHYS_JD
20120513
1010110110100…
Accumulo
Key/Value
Example
Accumulo Key Structure
16. 16
Sqrrl Data, Inc. All Rights Reserved
16
Accumulo Architecture
Tablet
Server
Tablet
Tablet
Server
Tablet
Tablet
Server
Tablet
ApplicaGon
Zookeeper
Zookeeper
Zookeeper
Master
HDFS
Read/Write
Store/Replicate
Assign/Balance
Delegate
Authority
Delegate
Authority
ApplicaGon
ApplicaGon
17. 17
Sqrrl Data, Inc. All Rights Reserved
17
Tablet Data Flow
In-‐Memory
Map
Write
Ahead
Log
(For
Recovery)
Sorted,
Indexed
File
Sorted,
Indexed
File
Sorted,
Indexed
File
Tablet
Reads
Iterator
Tree
Minor
Compac<on
Merging
/
Major
Compac<on
Iterator
Tree
Writes
Iterator
Tree
Scan
19. 19
Sqrrl Data, Inc. All Rights Reserved
• No
built-‐in
secondary
indices
• Sort
Order
ó
Index
• Balance
between
ingest
and
query
• Avoid
introducing
boDlenecks
• Preserve
cell-‐level
security
and
scalability
Table Design
Table:
Row:
Column
Family:
Column
Qualifier:
Value:
Forward
Index
<UUID>
<Type>
<Field>
<Term>
Inverted
Index
<Term>
<Type>
+
<Field>
<UUID>
<Digest
of
Event>
20. 20
Sqrrl Data, Inc. All Rights Reserved
20
Ecosystem Architecture
Apache
HDFS
Apache
Accumulo
Sqrrl
Enterprise
Custom
Ingester
Web
Server
Custom
AnalyGc
Map/Reduce
Task
Sqrrl
API
over
Apache
Thrip
RPC
:
Hierarchical
Documents
+
Graphs,
Lucene
+
SQL
+
more
Accumulo
RPC
:
Sorted
Key/Value
I/O
Hadoop
RPC
:
File
I/O
21. 21
Sqrrl Data, Inc. All Rights Reserved
21
sqrrl
data,
inc.
275
Third
St.
Cambridge,
MA
02142
617-‐902-‐0784
www.sqrrl.com
@sqrrl_inc
info@sqrrl.com
Contact