SlideShare a Scribd company logo
Securely explore your data
WHAT'S NEXT FOR
BIGTABLE?
Adam Fuchs, CTO
Sqrrl Data, Inc.
May 22, 2014
TODAY’S TALK
•  History of the World: Part 3
•  Bigtable/Accumulo Technology Overview
•  Accumulo Demonstration
•  Database Technology Survey
© 2014 Sqrrl Data, Inc. | All Rights Reserved 2
TIMELINE OF RELEVANT EVENTS
© 2014 Sqrrl Data, Inc. | All Rights Reserved
Google’s
BigTable Paper
2006
NSA Builds
Accumulo
2008
Sqrrl Founded
2012
1st Sqrrl Release
and Customers
2013
NSA Open
Sources
Accumulo
2011
3
Accumulo is a:
•  Apache Software Foundation (ASF) Open-
Source Software Project
•  Clone of Google’s Bigtable
•  Secure, Sorted Key-Value Store
•  Row-level ACID (locally) Distributed NoSQL
Database
© 2014 Sqrrl Data, Inc. | All Rights Reserved 4
Sqrrl is:
•  A commercial software company located in
Cambridge, MA
•  A search and Exploration Platform built with
Apache Accumulo
•  An exciting startup with a long roadmap of
challenging problems to solve
•  Hiring!
© 2014 Sqrrl Data, Inc. | All Rights Reserved 5
6
BIGTABLE & ACCUMULO TECH
OVERVIEW
1.  Data Model & API
2.  Underlying Architecture
3.  Distinguishing Features
© 2014 Sqrrl Data, Inc. | All Rights Reserved 7
An Accumulo key is a 5-tuple, consisting of:
•  Row: Controls Atomicity
•  Column Family: Controls Locality
•  Column Qualifier: Controls Uniqueness
•  Visibility Label: Controls Access
•  Timestamp: Controls Versioning
Row Col. Fam. Col. Qual. Visibility Timestamp Value
John Doe Notes PCP PCP_JD 20120912
Patient suffers
from an acute …
John Doe Test Results Cholesterol JD|PCP_JD 20120912 183
John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass
John Doe Test Results X-Ray JD|PHYS_JD 20120513
1010110110100
…
Accumulo	
  Key/Value	
  Example	
  
ACCUMULO DATA FORMAT
© 2014 Sqrrl Data, Inc. | All Rights Reserved 8
Instance
new ZooKeeperInstance(...)
new MockInstance()
Connector
getConnector(...)
TableOperations
InstanceOperations
SecurityOperations
Scanner BatchScanner
createScanner(...)createBatchScanner(...)
Range
IteratorOption
Map.Entry
Key Value
iterator()
BatchWriter
createBatchWriter(...)
Mutation
addMutation(...)
THE ACCUMULO CLIENT API
© 2014 Sqrrl Data, Inc. | All Rights Reserved 9
•  Collections of KV pairs form Tables
•  Tables are partitioned into Tablets
•  Metadata tablets hold info about
other tablets, forming a 3-level
hierarchy
•  A Tablet is a unit of work for a
Tablet Server
Data	
  Tablet	
  
-­‐∞	
  :	
  thing	
  
Data	
  Tablet	
  
thing	
  :	
  ∞	
  	
  
Data	
  Tablet	
  
-­‐∞	
  :	
  Ocelot	
  	
  
Data	
  Tablet	
  
Ocelot	
  :	
  Yak	
  	
  
Data	
  Tablet	
  
Yak	
  :	
  ∞	
  	
  
Data	
  Tablet	
  
-­‐∞	
  to	
  ∞	
  	
  
Table:	
  	
  Adam’s	
  Table	
   Table:	
  	
  Encyclopedia	
   Table:	
  	
  Foo	
  
ACCUMULO TABLETS
Well-­‐Known	
  
Loca9on	
  
(zookeeper)	
  
Root	
  Tablet	
  
-­‐∞	
  to	
  ∞	
  	
  
Metadata	
  Tablet	
  2	
  
“Encyclopedia:Ocelot”	
  to	
  ∞	
  
Metadata	
  Tablet	
  1	
  
-­‐∞	
  to	
  “Encyclopedia:Ocelot”	
  
© 2014 Sqrrl Data, Inc. | All Rights Reserved 10
Tablet	
  Server	
  
Tablet	
  
Tablet	
  Server	
  
Tablet	
  
Tablet	
  Server	
  
Tablet	
  
Applica9on	
  
Zookeeper	
  
Zookeeper	
  
Zookeeper	
  
Master	
  
HDFS	
  
Read/Write	
  
Store/Replicate	
  
Assign/Balance	
  
Delegate	
  
Authority	
  
Delegate	
  
Authority	
  
Applica9on	
  
Applica9on	
  
ACCUMULO PROCESSES
© 2014 Sqrrl Data, Inc. | All Rights Reserved 11
In-­‐Memory	
  
Map	
  
Write	
  Ahead	
  
Log	
  
(For	
  Recovery)	
  
Sorted,	
  
Indexed	
  
File	
  
Sorted,	
  
Indexed	
  
File	
  
Sorted,	
  
Indexed	
  
File	
  
Tablet	
  
Reads	
  
Iterator	
  
Tree	
  
Minor	
  
Compac<on	
  
Merging	
  /	
  Major	
  
Compac<on	
  
Iterator	
  
Tree	
  
Writes	
   Iterator	
  
Tree	
  
Scan	
  
TABLET DATA FLOW
© 2014 Sqrrl Data, Inc. | All Rights Reserved 12
Iterator Operations:
•  File Reads
•  Block Caching
•  Merging
•  Deletion
•  Isolation
•  Locality Groups
•  Range Selection
•  Column Selection
•  Cell-level Security
•  Versioning
•  Filtering
•  Aggregation
•  Partitioned Joins
ITERATOR FRAMEWORK
© 2014 Sqrrl Data, Inc. | All Rights Reserved 13
WORD COUNT:
SUMMING AGGREGATING ITERATOR
Input Corpus
© 2014 Sqrrl Data, Inc. | All Rights Reserved 14
Ingesters QueriersTablet Servers
ACCUMULO LATENCIES
Input
Batch
Writer
In-
Memory
Map
Scan
Iterators
Scanner/
Batch
Scanner
In-
Memory
Map
RFile
Compactio
n
Iterators
Scan
Iterators
RFile
Compactio
n
Iterators
In-
Memory
Map
RFiles
Compactio
n
Iterators
Scan
Iterators
Output
~ms~ms ~ms
ms-min
© 2014 Sqrrl Data, Inc. | All Rights Reserved 15
ACCUMULO THROUGHPUT
Ingesters QueriersTablet Servers
Input
Batch
Writer
In-
Memory
Map
Scan
Iterators
Scanner
/Batch
Scanner
In-
Memory
Map
RFile
Compacti
on
Iterators
Scan
Iterators
RFile
Compacti
on
Iterators
In-
Memory
Map
RFiles
Compactio
n
Iterators
Scan
Iterators
Output
~ms~ms ~ms
ms-min
Scan:
~1M entries/s per
node
Ingest:
~200K entries/s
per node
Read-Modify-Write Latency: ~ms
ê
>1K entries/s challenging with R-M-W
© 2014 Sqrrl Data, Inc. | All Rights Reserved 16
Securely explore your data
DEMO
R-M-R VS. COMPACTION-TIME
AGGREGATION
Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo)
© 2014 Sqrrl Data, Inc. | All Rights Reserved 18
SURVEY OF DATABASE
TECHNOLOGY
•  Exercises in Center-Seeking
•  SQL vs. NoSQL
•  Ingest-time vs. Query-time Analytics
•  ACID vs. BASE
•  Normalized vs. Denormalized Data Models
•  Primary Use Cases for Sqrrl+Accumulo
© 2014 Sqrrl Data, Inc. | All Rights Reserved 19
SQL VS. NOSQL
NoSQL
•  Optimized for get/put
operations
•  Specialized for client
languages
•  High concurrency
•  More client-side
control
Hybrid
•  Extend and evolve
SQL
•  Standardize and
incorporate NoSQL
paradigms
SQL
•  Optimized for joins
•  Strong mathematical
roots in set theory
•  Automatic query
optimization
© 2014 Sqrrl Data, Inc. | All Rights Reserved 20
INGEST-TIME VS. QUERY-TIME
ANALYTICS
Ingest-Time
•  Optimized for online
statistics
•  Can reduce storage
footprint
•  Can be indexed for
low latency
•  Leverages a variety
of indexes
•  Requires extensive
data organization at
ingest
Hybrid
•  Create partial
summary at ingest
(Question-focused
datasets, knowledge
bases, etc.)
•  Support ad-hoc
queries over
summaries
•  Leverage all known
indexing strategies **
Query-Time
•  Can compute holistic
statistics, like ranking,
topN, etc.
•  Ad-hoc analytics:
don’t know the query
ahead of time
•  High latency and low
concurrency at scale
•  Leverages block
indexes, columnar
layout
•  Ingest can be “stream
to disk”
© 2014 Sqrrl Data, Inc. | All Rights Reserved 21
ACID VS. BASE
ACID
•  Atomicity: all or
nothing for a group of
operations
•  Consistency and
Isolation: support
simple reasoning for
distributed,
multithreaded clients
•  Durability: simple
reasoning for whether
data might be lost
Hybrid
•  Must make some
relaxations for
performance at scale
(under failure modes)
•  Many options for
“Lightweight”
transaction support
•  Accumulo limits
atomicity,
consistency, and
isolation to row-level
operations
BASE
•  Basically Available:
ensure that core
operations always
complete in an
advertised time
•  Soft-State: relaxation
of referential integrity,
etc.
•  Eventual
Consistency:
relaxation of
© 2014 Sqrrl Data, Inc. | All Rights Reserved 22
NORMALIZED VS. DENORMALIZED
DATA MODELS
Normalized
•  “Normal Form
Relational Database”
•  Minimizes data
footprint
•  Minimizes cost of
data maintenance
•  Can lead to
expensive joins at
query time
Hybrid
•  Start with document
store
•  Introduce links/edges
for quick joins
•  Dynamically adapt to
flexible or sparse
schemas
•  Similar to property
graphs
Denormalized
•  “Document Store”
•  Flexible schema lets
applications adapt
quickly to changing
environments
•  Pre-joined to
eliminate joins at
query-time
•  Optimized for
“append-only” data
•  Can inflate data sizes
and slow data ingest
© 2014 Sqrrl Data, Inc. | All Rights Reserved 23
KNOWLEDGE-BASE USE CASE
2014-04-14
06:36:09 429
73.105.179.202
username@msn.c
om 500 POST
application/json
2014-04-14 06:36:09 429 73.105.179.202 username@msn.com 500 POST application/json
HTTPS “wikipedia.org:443/grouchinesses/?215=felled&297=wading&768=shimmies...” "Mozilla/
5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/
26.0.1410.43 Safari/537.31” 208.80.152.201
HR
Netflow
Proxy Logs
HTTPS “wikipedia.org:
443/grouchinesses/?
215=felled&297=wadin
g&768=shimmies...”
"Mozilla/5.0 (Macintosh;
Intel Mac OS X 10_8_3)
AppleWebKit/537.31
(KHTML, like Gecko)
Chrome/26.0.1410.43
Safari/537.31”
208.80.152.201
Email
Social Media
© 2014 Sqrrl Data, Inc. | All Rights Reserved 24
STREAM PROCESSING USE CASE
© 2014 Sqrrl Data, Inc. | All Rights Reserved
Dashboards
Actions
Interactive
Analysis Tools
(Discovery + Forensics)
1.  SPE queries Sqrrl to enrich streaming data
2.  SPE persists results in Sqrrl for future query
3.  SPE takes action automatically
4.  SPE issues data-driven alerts
5.  Sqrrl provides context for dashboards
6.  Analysis tools query use Sqrrl to search and
manipulate historical data
DATA
SPE
25
SQRRL OPERATIONALIZES
ACCUMULO WITH...
© 2014 Sqrrl Data, Inc. | All Rights Reserved 26
Data-Centric Security
Petabyte Scale and Operational Speeds
Document and Graph Data Models
SqrrlQL, including Aggregates, Secure Full-
Text Search, and Secure Graph Search
Analytics, including Real-Time Statistics and
Hadoop Integrations
MODERNIZING VISUALIZATION
© 2014 Sqrrl Data, Inc. | All Rights Reserved 27
Sqrrl is building the next generation of
operational analytics visualizations
UPCOMING EVENTS
Accumulo Summit 2014
•  June 12 in College Park, MD
•  http://accumulosummit.com
•  Multiple tracks of talks from the leaders of the Accumulo community
IEEE HPEC Conference 2014
•  September 9-11 in Waltham, MA
•  http://www.ieee-hpec.org/
•  Accumulo Users Group Meeting as a Special Event
•  Accumulo tutorial
Watch for more meetup opportunities coming soon!
© 2014 Sqrrl Data, Inc. | All Rights Reserved 28

More Related Content

What's hot

Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Spark Summit
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Sqrrl June Webinar: An Accumulo Love Story
Sqrrl June Webinar: An Accumulo Love StorySqrrl June Webinar: An Accumulo Love Story
Sqrrl June Webinar: An Accumulo Love Story
Sqrrl
 
Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilson
Becky Burwell
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
DataWorks Summit
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Spark Summit
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
Imply
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
Biswajit Das
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Labs
 
ARCHITECTING INFLUXENTERPRISE FOR SUCCESS
ARCHITECTING INFLUXENTERPRISE FOR SUCCESSARCHITECTING INFLUXENTERPRISE FOR SUCCESS
ARCHITECTING INFLUXENTERPRISE FOR SUCCESS
InfluxData
 
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemChris Huang
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
Performing Network & Security Analytics with Hadoop
Performing Network & Security Analytics with HadoopPerforming Network & Security Analytics with Hadoop
Performing Network & Security Analytics with HadoopDataWorks Summit
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Chris Huang
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 MinutesLightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
MongoDB
 

What's hot (20)

Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Sqrrl June Webinar: An Accumulo Love Story
Sqrrl June Webinar: An Accumulo Love StorySqrrl June Webinar: An Accumulo Love Story
Sqrrl June Webinar: An Accumulo Love Story
 
Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilson
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
 
ARCHITECTING INFLUXENTERPRISE FOR SUCCESS
ARCHITECTING INFLUXENTERPRISE FOR SUCCESSARCHITECTING INFLUXENTERPRISE FOR SUCCESS
ARCHITECTING INFLUXENTERPRISE FOR SUCCESS
 
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystem
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
Performing Network & Security Analytics with Hadoop
Performing Network & Security Analytics with HadoopPerforming Network & Security Analytics with Hadoop
Performing Network & Security Analytics with Hadoop
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 MinutesLightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
 

Similar to What's Next for Google's BigTable

Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

Cloudera, Inc.
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
Fran Navarro
 
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLTDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
tdc-globalcode
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
Riccardo Romani
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
Haluk Ulubay
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Developing for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdfDeveloping for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdf
Aerospike, Inc.
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
 
A5 oracle exadata-the game changer for online transaction processing data w...
A5   oracle exadata-the game changer for online transaction processing data w...A5   oracle exadata-the game changer for online transaction processing data w...
A5 oracle exadata-the game changer for online transaction processing data w...Dr. Wilfred Lin (Ph.D.)
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
Connor McDonald
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
Oracle Storage a ochrana dat
Oracle Storage a ochrana datOracle Storage a ochrana dat
Oracle Storage a ochrana dat
MarketingArrowECS_CZ
 

Similar to What's Next for Google's BigTable (20)

Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLTDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Developing for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdfDeveloping for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdf
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
A5 oracle exadata-the game changer for online transaction processing data w...
A5   oracle exadata-the game changer for online transaction processing data w...A5   oracle exadata-the game changer for online transaction processing data w...
A5 oracle exadata-the game changer for online transaction processing data w...
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Oracle Storage a ochrana dat
Oracle Storage a ochrana datOracle Storage a ochrana dat
Oracle Storage a ochrana dat
 

More from Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
Sqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
Sqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
Sqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
Sqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
Sqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
Sqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
Sqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Sqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
Sqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
Sqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
Sqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
Sqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
Sqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
Sqrrl
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with Sqrrl
Sqrrl
 

More from Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with Sqrrl
 

Recently uploaded

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

What's Next for Google's BigTable

  • 1. Securely explore your data WHAT'S NEXT FOR BIGTABLE? Adam Fuchs, CTO Sqrrl Data, Inc. May 22, 2014
  • 2. TODAY’S TALK •  History of the World: Part 3 •  Bigtable/Accumulo Technology Overview •  Accumulo Demonstration •  Database Technology Survey © 2014 Sqrrl Data, Inc. | All Rights Reserved 2
  • 3. TIMELINE OF RELEVANT EVENTS © 2014 Sqrrl Data, Inc. | All Rights Reserved Google’s BigTable Paper 2006 NSA Builds Accumulo 2008 Sqrrl Founded 2012 1st Sqrrl Release and Customers 2013 NSA Open Sources Accumulo 2011 3
  • 4. Accumulo is a: •  Apache Software Foundation (ASF) Open- Source Software Project •  Clone of Google’s Bigtable •  Secure, Sorted Key-Value Store •  Row-level ACID (locally) Distributed NoSQL Database © 2014 Sqrrl Data, Inc. | All Rights Reserved 4
  • 5. Sqrrl is: •  A commercial software company located in Cambridge, MA •  A search and Exploration Platform built with Apache Accumulo •  An exciting startup with a long roadmap of challenging problems to solve •  Hiring! © 2014 Sqrrl Data, Inc. | All Rights Reserved 5
  • 6. 6
  • 7. BIGTABLE & ACCUMULO TECH OVERVIEW 1.  Data Model & API 2.  Underlying Architecture 3.  Distinguishing Features © 2014 Sqrrl Data, Inc. | All Rights Reserved 7
  • 8. An Accumulo key is a 5-tuple, consisting of: •  Row: Controls Atomicity •  Column Family: Controls Locality •  Column Qualifier: Controls Uniqueness •  Visibility Label: Controls Access •  Timestamp: Controls Versioning Row Col. Fam. Col. Qual. Visibility Timestamp Value John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100 … Accumulo  Key/Value  Example   ACCUMULO DATA FORMAT © 2014 Sqrrl Data, Inc. | All Rights Reserved 8
  • 9. Instance new ZooKeeperInstance(...) new MockInstance() Connector getConnector(...) TableOperations InstanceOperations SecurityOperations Scanner BatchScanner createScanner(...)createBatchScanner(...) Range IteratorOption Map.Entry Key Value iterator() BatchWriter createBatchWriter(...) Mutation addMutation(...) THE ACCUMULO CLIENT API © 2014 Sqrrl Data, Inc. | All Rights Reserved 9
  • 10. •  Collections of KV pairs form Tables •  Tables are partitioned into Tablets •  Metadata tablets hold info about other tablets, forming a 3-level hierarchy •  A Tablet is a unit of work for a Tablet Server Data  Tablet   -­‐∞  :  thing   Data  Tablet   thing  :  ∞     Data  Tablet   -­‐∞  :  Ocelot     Data  Tablet   Ocelot  :  Yak     Data  Tablet   Yak  :  ∞     Data  Tablet   -­‐∞  to  ∞     Table:    Adam’s  Table   Table:    Encyclopedia   Table:    Foo   ACCUMULO TABLETS Well-­‐Known   Loca9on   (zookeeper)   Root  Tablet   -­‐∞  to  ∞     Metadata  Tablet  2   “Encyclopedia:Ocelot”  to  ∞   Metadata  Tablet  1   -­‐∞  to  “Encyclopedia:Ocelot”   © 2014 Sqrrl Data, Inc. | All Rights Reserved 10
  • 11. Tablet  Server   Tablet   Tablet  Server   Tablet   Tablet  Server   Tablet   Applica9on   Zookeeper   Zookeeper   Zookeeper   Master   HDFS   Read/Write   Store/Replicate   Assign/Balance   Delegate   Authority   Delegate   Authority   Applica9on   Applica9on   ACCUMULO PROCESSES © 2014 Sqrrl Data, Inc. | All Rights Reserved 11
  • 12. In-­‐Memory   Map   Write  Ahead   Log   (For  Recovery)   Sorted,   Indexed   File   Sorted,   Indexed   File   Sorted,   Indexed   File   Tablet   Reads   Iterator   Tree   Minor   Compac<on   Merging  /  Major   Compac<on   Iterator   Tree   Writes   Iterator   Tree   Scan   TABLET DATA FLOW © 2014 Sqrrl Data, Inc. | All Rights Reserved 12
  • 13. Iterator Operations: •  File Reads •  Block Caching •  Merging •  Deletion •  Isolation •  Locality Groups •  Range Selection •  Column Selection •  Cell-level Security •  Versioning •  Filtering •  Aggregation •  Partitioned Joins ITERATOR FRAMEWORK © 2014 Sqrrl Data, Inc. | All Rights Reserved 13
  • 14. WORD COUNT: SUMMING AGGREGATING ITERATOR Input Corpus © 2014 Sqrrl Data, Inc. | All Rights Reserved 14
  • 15. Ingesters QueriersTablet Servers ACCUMULO LATENCIES Input Batch Writer In- Memory Map Scan Iterators Scanner/ Batch Scanner In- Memory Map RFile Compactio n Iterators Scan Iterators RFile Compactio n Iterators In- Memory Map RFiles Compactio n Iterators Scan Iterators Output ~ms~ms ~ms ms-min © 2014 Sqrrl Data, Inc. | All Rights Reserved 15
  • 16. ACCUMULO THROUGHPUT Ingesters QueriersTablet Servers Input Batch Writer In- Memory Map Scan Iterators Scanner /Batch Scanner In- Memory Map RFile Compacti on Iterators Scan Iterators RFile Compacti on Iterators In- Memory Map RFiles Compactio n Iterators Scan Iterators Output ~ms~ms ~ms ms-min Scan: ~1M entries/s per node Ingest: ~200K entries/s per node Read-Modify-Write Latency: ~ms ê >1K entries/s challenging with R-M-W © 2014 Sqrrl Data, Inc. | All Rights Reserved 16
  • 18. R-M-R VS. COMPACTION-TIME AGGREGATION Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo) © 2014 Sqrrl Data, Inc. | All Rights Reserved 18
  • 19. SURVEY OF DATABASE TECHNOLOGY •  Exercises in Center-Seeking •  SQL vs. NoSQL •  Ingest-time vs. Query-time Analytics •  ACID vs. BASE •  Normalized vs. Denormalized Data Models •  Primary Use Cases for Sqrrl+Accumulo © 2014 Sqrrl Data, Inc. | All Rights Reserved 19
  • 20. SQL VS. NOSQL NoSQL •  Optimized for get/put operations •  Specialized for client languages •  High concurrency •  More client-side control Hybrid •  Extend and evolve SQL •  Standardize and incorporate NoSQL paradigms SQL •  Optimized for joins •  Strong mathematical roots in set theory •  Automatic query optimization © 2014 Sqrrl Data, Inc. | All Rights Reserved 20
  • 21. INGEST-TIME VS. QUERY-TIME ANALYTICS Ingest-Time •  Optimized for online statistics •  Can reduce storage footprint •  Can be indexed for low latency •  Leverages a variety of indexes •  Requires extensive data organization at ingest Hybrid •  Create partial summary at ingest (Question-focused datasets, knowledge bases, etc.) •  Support ad-hoc queries over summaries •  Leverage all known indexing strategies ** Query-Time •  Can compute holistic statistics, like ranking, topN, etc. •  Ad-hoc analytics: don’t know the query ahead of time •  High latency and low concurrency at scale •  Leverages block indexes, columnar layout •  Ingest can be “stream to disk” © 2014 Sqrrl Data, Inc. | All Rights Reserved 21
  • 22. ACID VS. BASE ACID •  Atomicity: all or nothing for a group of operations •  Consistency and Isolation: support simple reasoning for distributed, multithreaded clients •  Durability: simple reasoning for whether data might be lost Hybrid •  Must make some relaxations for performance at scale (under failure modes) •  Many options for “Lightweight” transaction support •  Accumulo limits atomicity, consistency, and isolation to row-level operations BASE •  Basically Available: ensure that core operations always complete in an advertised time •  Soft-State: relaxation of referential integrity, etc. •  Eventual Consistency: relaxation of © 2014 Sqrrl Data, Inc. | All Rights Reserved 22
  • 23. NORMALIZED VS. DENORMALIZED DATA MODELS Normalized •  “Normal Form Relational Database” •  Minimizes data footprint •  Minimizes cost of data maintenance •  Can lead to expensive joins at query time Hybrid •  Start with document store •  Introduce links/edges for quick joins •  Dynamically adapt to flexible or sparse schemas •  Similar to property graphs Denormalized •  “Document Store” •  Flexible schema lets applications adapt quickly to changing environments •  Pre-joined to eliminate joins at query-time •  Optimized for “append-only” data •  Can inflate data sizes and slow data ingest © 2014 Sqrrl Data, Inc. | All Rights Reserved 23
  • 24. KNOWLEDGE-BASE USE CASE 2014-04-14 06:36:09 429 73.105.179.202 username@msn.c om 500 POST application/json 2014-04-14 06:36:09 429 73.105.179.202 username@msn.com 500 POST application/json HTTPS “wikipedia.org:443/grouchinesses/?215=felled&297=wading&768=shimmies...” "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/ 26.0.1410.43 Safari/537.31” 208.80.152.201 HR Netflow Proxy Logs HTTPS “wikipedia.org: 443/grouchinesses/? 215=felled&297=wadin g&768=shimmies...” "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.43 Safari/537.31” 208.80.152.201 Email Social Media © 2014 Sqrrl Data, Inc. | All Rights Reserved 24
  • 25. STREAM PROCESSING USE CASE © 2014 Sqrrl Data, Inc. | All Rights Reserved Dashboards Actions Interactive Analysis Tools (Discovery + Forensics) 1.  SPE queries Sqrrl to enrich streaming data 2.  SPE persists results in Sqrrl for future query 3.  SPE takes action automatically 4.  SPE issues data-driven alerts 5.  Sqrrl provides context for dashboards 6.  Analysis tools query use Sqrrl to search and manipulate historical data DATA SPE 25
  • 26. SQRRL OPERATIONALIZES ACCUMULO WITH... © 2014 Sqrrl Data, Inc. | All Rights Reserved 26 Data-Centric Security Petabyte Scale and Operational Speeds Document and Graph Data Models SqrrlQL, including Aggregates, Secure Full- Text Search, and Secure Graph Search Analytics, including Real-Time Statistics and Hadoop Integrations
  • 27. MODERNIZING VISUALIZATION © 2014 Sqrrl Data, Inc. | All Rights Reserved 27 Sqrrl is building the next generation of operational analytics visualizations
  • 28. UPCOMING EVENTS Accumulo Summit 2014 •  June 12 in College Park, MD •  http://accumulosummit.com •  Multiple tracks of talks from the leaders of the Accumulo community IEEE HPEC Conference 2014 •  September 9-11 in Waltham, MA •  http://www.ieee-hpec.org/ •  Accumulo Users Group Meeting as a Special Event •  Accumulo tutorial Watch for more meetup opportunities coming soon! © 2014 Sqrrl Data, Inc. | All Rights Reserved 28