HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving

HBaseCon
Warcbase: Scaling ‘Out’, ‘In’, and
‘Down’ HBase for Web Archiving
Jimmy Lin
University of Maryland
@lintool
Ian Milligan
University of Waterloo
@ianmilligan1
HBaseCon2015
Thursday, May 7, 2015
Source: Wikipedia (All Souls College, Oxford)
From the Ivory Tower…
Source: Wikipedia (Factory)
… to building sh*t that works
Gupta et al.WTF:The Who to Follow Service at Twitter.WWW 2013
Lin and Kolcz. Large-Scale Machine Learning at Twitter. SIGMOD 2012
Busch et al. Earlybird: Real-Time
Search at Twitter. ICDE 2012
Mishne et al. Fast Data in the Era of Big Data:Twitter's Real-
Time Related Query Suggestion Architecture. SIGMOD 2013.
Leibert et al. Automatic Management of Partitioned,
Replicated Search Services. SoCC 2011
I worked on…
– data products to surface relevant content to users
– analytics infrastructure to support data science
Source: https://www.flickr.com/photos/bongtongol/3491316758/
circa ~2010
~150 people total
~60 Hadoop nodes
~6 people use analytics stack daily
circa ~2012
~1400 people total
10s of Ks of Hadoop nodes, multiple DCs
10s of PBs total Hadoop DW capacity
~100 TB ingest daily
dozens of teams use Hadoop daily
10s of Ks of Hadoop jobs daily
Source: Wikipedia (All Souls College, Oxford)
And back!
“That sucks.”
Big Data for Good
Source: http://www.flickr.com/photos/yoononn/3300080563/
“The best minds of my generation are thinking about how
to make people click ads.”
– Jeff Hammerbacher
Source: http://images3.nick.com/nick-assets/shows/images/house-of-anubis/flipbooks/hidden-room/hidden-room-04.jpg
Big Data for Good
Cultural Heritage Preservation
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
Source: Wikipedia (Opposition to United States involvement in the Vietnam War)
When does an event become “history”?
When does an event become “history”?
~20-30 years later
The history of the 1960s were written in the 1980s!
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
This scandal was brought to you by the digital revolution.
That meant we could access all the information we wanted,
when we wanted it, anytime, anywhere, and when the story
broke in January 1998, it broke online. It was the first time
the traditional news was usurped by the Internet for a major
news story, a click that reverberated around the world…
… it was before social media, but people could still comment
online, email stories, and, of course, email cruel jokes. News
sources plastered photos of me all over to sell newspapers,
banner ads online, and to keep people tuned to the TV.
Monica Lewinsky – The Price of Shame
TED Talk, March 2015
“
”
So, where are those web pages now?
When does an event become “history”?
~20-30 years later
Historians are getting ready to write about the 90s…
Right about now!
Can you write a history of the 
1990s without the web?
So, where are those web pages now?
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
Oct 25, 2001
… to look back over any
of the 10 billion web pages
archived… since 1996 …
Internet Archive’s Archive-It Service:WAaaS
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
And beyond…
70+ web archiving efforts worldwide!
(fortunately or unfortunately)
… don’t really care
What would a web archiving platform built on
modern big data infrastructure look like?
Wayback machine: monolithicTomcat application
Scalable storage of archived data
Efficient random access
Scalable processing and analytics
Scalable storage and access of derived data
Desiderata
Some work by the Internet Archive,
Common Crawl, and others…
Petabox by Internet Archive;
NAS, SAN, etc. by others
Ad hoc storage in
flat text WAT files
Scalable storage of archived data
Efficient random access
Scalable processing and analytics
Scalable storage and access of derived data
Desiderata
HDFS
HBase
Hadoop
HBase
Warcbase
An open-source platform for managing web
archives built on and H and
http://warcbase.org/	
  
Source: Wikipedia (Archive)
WARC = Web ARChive (file format)
Bigtable use case: storing web crawls!
Row key: domain-reversed URL
Column family: contents
Raw data
Value: raw HTML
Column qualifier:
Column family: anchor
Derived data
Value: anchor text
Column qualifier: source URL
Warcbase: HBase data model
Row key: domain-reversed URL
Column family: c
Value: raw source
Column qualifier: MIME type
Why this?
“Fat Row”
Timestamp: crawl date
Saves a query…
But you need to do a scan…
In most cases, you need to do
a short scan anyway….
WARC data
Ingestion
Applications
and Services
Processing  Analytics
text analysis, link analysis, …
Scalability?
We got 99 problems but scalability ain’t one…
– Jay-Z
Scalability of Warcbase limited by Hadoop/HBase
Applications are lightweight clients
WARC data
Ingestion
Applications
and Services
Processing  Analytics
text analysis, link analysis, …
Sample dataset: crawl of the 108th U.S. Congress
Monthly snapshots, January 2003 to January 2005
1.15 TB gzipped ARC files
29 million captures of 7.8 million unique URLs
23.8 million captures are HTML files
Hadoop/HBase cluster:
16 nodes, dual quad-core processors, 3 × 2TB disks each
Temporal Browsing
View a particular version of a captured web page
Follow links to contemporaneous pages
Internet Archive’s Wayback machine
OpenWayback
OpenWayBack + Warcbase Integration
Implementation:
OpenWayback frontend for rendering
Offloads storage management to HBase
Transparently scales out with HBase
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
Topic Model Explorer
Implementation
LDA on each temporal slice
Adaptation of Termite visualization
Topic 0
Topic 1Topic 1
Topic 2
Topic 3
Topic 4
Topic 5Topic 5
Topic 6
Topic 7
Topic 8
Topic 9
Topic 10
Topic 11
Topic 12
Topic 13
Topic 14
Topic 15
Topic 16
Topic 17
Topic 18
Topic 19
forces
development
community
local
state
school
children
international
foreign
nations
grants
training
schools
students
college
services
president
funding
states
year
united
federal
government
grant
program
support
programs
education
child
security
national
forces
development
community
local
state
school
children
international
foreign
nations
grants
training
schools
students
college
services
president
funding
states
year
united
federal
government
grant
program
support
programs
education
child
security
national
Webgraph Explorer
Implementation
Link extraction with Hadoop, site-level aggregation
Computation of standard graph statistics
d3 interactive visualization
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
Scale “out”: use lots of commodity machines
Scaling Options
Scale “up”: use a beefier machine
Scale “in”: use a single commodity machine
Warcbase: here.
Mac Pro (Late 2013)
3 GHz 8-Core Intel Xeon E5
64 GB RAM, 1TB SSD
OS X 10.10.2
Scale “In”: Experiments
Wide Web scrape (2011) – Internet Archive wide0002 crawl
Sample of 100 WARC files (~100 GB)
Warcbase running on the Mac Pro:
Ingestion in ~1 hour
Extraction of webgraph using Pig in ~55 minutes
Result: 17m links, .ca subset of 1.7m links
Visualization in Gephi…
Internet Archive
Wide Web Scrap
from 2011
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
But they can probably afford a Mac Pro
What’s the big deal?
Historians probably can’t afford Hadoop clusters…
How will this change historical scholarship?
Visual graph analysis on longitudinal data, select subsets
for further textual analysis
… all on your desktop!
Drill down to examine individual pages
Scale “out”: use lots of commodity machines
Scaling Options
Scale “up”: use a beefier machine
Scale “down”: use a wimpy machine
Scale “in”: use a single commodity machine
Warcbase: here.
Raspberry Pi B+ (2012)
Broadcom BCM2835 SoC
700 MHz ARM11 processor
512 MB RAM
up to 128 GB storage (microSD)
Raspbian Linux
Scale “Down”: Experiments
Columbia University’s Human Rights Web Archive
43 GB of crawls from 2008 (1.68 million records)
Warcbase running on the Raspberry Pi (standalone mode)
Ingestion in ~27 hours (17.3 records/second)
Random browsing (avg over 100 pages): 2.1s page load
Same pages in Internet Archive: 2.9s page load
Draws 2.4 Watts
Jimmy Lin. Scaling Down Distributed Infrastructure on Wimpy Machines
for Personal Web Archiving.Temporal Web Analytics Workshop 2015.
Throw in search, lightweight analytics, …
What’s the big deal?
Store every page you’ve ever visited in your pocket!
What will you do with the web in your pocket?
How will this change how you interact with the web?
When was the last time you searched to refind?
Source: http://images3.nick.com/nick-assets/shows/images/house-of-anubis/flipbooks/hidden-room/hidden-room-04.jpg
Questions?
In conclusion…
Interesting big data problem
Underexplored
“Warm and fuzzy”
1 of 54

Recommended

Presto & differences between popular SQL engines (Spark, Redshift, and Hive) by
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)Holden Ackerman
3.2K views23 slides
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop by
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
3.9K views37 slides
Hadoop summit 2010, HONU by
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONUJerome Boulon
1.1K views22 slides
Apache Kylin @ Big Data Europe 2015 by
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
2.6K views50 slides
Tailored for Spark by
Tailored for SparkTailored for Spark
Tailored for SparkDataWorks Summit/Hadoop Summit
588 views17 slides
Hadoop at Ebay by
Hadoop at EbayHadoop at Ebay
Hadoop at EbayAroop Maliakkal
2.5K views21 slides

More Related Content

What's hot

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB by
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon
5.6K views23 slides
Graph databases: Tinkerpop and Titan DB by
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
2K views27 slides
Introduction to TitanDB by
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB Knoldus Inc.
5.9K views14 slides
Stream Processing and Real-Time Data Pipelines by
Stream Processing and Real-Time Data PipelinesStream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesVladimír Schreiner
247 views37 slides
Building a Virtual Data Lake with Apache Arrow by
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowDremio Corporation
8.1K views20 slides
Data infrastructure architecture for medium size organization: tips for colle... by
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
1.7K views30 slides

What's hot(20)

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB by HBaseCon
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon5.6K views
Introduction to TitanDB by Knoldus Inc.
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB
Knoldus Inc.5.9K views
Building a Virtual Data Lake with Apache Arrow by Dremio Corporation
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
Dremio Corporation8.1K views
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B... by Chris Huang
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
Chris Huang1.4K views
Apache Kylin Extreme OLAP Engine for Big Data by Luke Han
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
Luke Han2.9K views
Qubole @ AWS Meetup Bangalore - July 2015 by Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma1.7K views
Apache Kylin 1.5 Updates by Yang Li
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 Updates
Yang Li1.1K views
Hadoop at Yahoo! -- Hadoop World NY 2009 by yhadoop
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
yhadoop1.5K views
Apache Kylin’s Performance Boost from Apache HBase by HBaseCon
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
HBaseCon3.5K views
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture by Skillspeed
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed11.2K views
Apache Kylin on HBase: Extreme OLAP engine for big data by Shi Shao Feng
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
Shi Shao Feng1.6K views
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ... by Luke Han
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
Luke Han3.1K views
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop by HBaseCon
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon3.3K views
Exploring BigData with Google BigQuery by Dharmesh Vaya
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Dharmesh Vaya12.4K views
Large Scale Graph Analytics with JanusGraph by P. Taylor Goetz
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz19.1K views

Viewers also liked

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase by
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
3.2K views30 slides
Real-time HBase: Lessons from the Cloud by
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudHBaseCon
4.5K views35 slides
HBaseCon 2015: HBase Operations in a Flurry by
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
4.1K views22 slides
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S... by
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
4.7K views11 slides
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B... by
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...HBaseCon
4.1K views54 slides
Rolling Out Apache HBase for Mobile Offerings at Visa by
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
2.6K views39 slides

Viewers also liked(20)

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase by HBaseCon
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon3.2K views
Real-time HBase: Lessons from the Cloud by HBaseCon
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
HBaseCon4.5K views
HBaseCon 2015: HBase Operations in a Flurry by HBaseCon
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon4.1K views
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S... by Cloudera, Inc.
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.4.7K views
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B... by HBaseCon
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon4.1K views
Rolling Out Apache HBase for Mobile Offerings at Visa by HBaseCon
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon2.6K views
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace by HBaseCon
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon4.5K views
Update on OpenTSDB and AsyncHBase by HBaseCon
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon2.6K views
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems by Cloudera, Inc.
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.6.1K views
Digital Library Collection Management using HBase by HBaseCon
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
HBaseCon3.1K views
HBase Data Modeling and Access Patterns with Kite SDK by HBaseCon
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon4.7K views
HBase at Bloomberg: High Availability Needs for the Financial Industry by HBaseCon
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon6.7K views
Content Identification using HBase by HBaseCon
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
HBaseCon3.8K views
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS by HBaseCon
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon4K views
Apache HBase in the Enterprise Data Hub at Cerner by HBaseCon
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon2.1K views
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster by Cloudera, Inc.
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Cloudera, Inc.7.5K views
Apache HBase Improvements and Practices at Xiaomi by HBaseCon
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon4.8K views
HBaseCon 2015: HBase @ CyberAgent by HBaseCon
HBaseCon 2015: HBase @ CyberAgentHBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgent
HBaseCon5.5K views
HBaseCon 2013: Full-Text Indexing for Apache HBase by Cloudera, Inc.
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.7.3K views
HBaseCon 2015: Analyzing HBase Data with Apache Hive by HBaseCon
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon7.9K views

Similar to HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving

Addressing dm-cloud by
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloudGenoveva Vargas-Solar
536 views54 slides
Sticking between: mashup in libraries by
Sticking between: mashup in librariesSticking between: mashup in libraries
Sticking between: mashup in librariesBonaria Biancu
2K views30 slides
Internet Mashups by
Internet MashupsInternet Mashups
Internet MashupsCesare Pautasso
524 views59 slides
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11 by
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11William Hall
258 views21 slides
Lodlam presentation v1.0 final al20151104 by
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Asa Letourneau
804 views25 slides
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 by
 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017BigchainDB
1.7K views85 slides

Similar to HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving(20)

Sticking between: mashup in libraries by Bonaria Biancu
Sticking between: mashup in librariesSticking between: mashup in libraries
Sticking between: mashup in libraries
Bonaria Biancu2K views
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11 by William Hall
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
William Hall258 views
Lodlam presentation v1.0 final al20151104 by Asa Letourneau
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104
Asa Letourneau804 views
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 by BigchainDB
 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
BigchainDB1.7K views
Scale as a Competitive Advantage by David Chou
Scale as a Competitive AdvantageScale as a Competitive Advantage
Scale as a Competitive Advantage
David Chou1.6K views
Big data and APIs for PHP developers - SXSW 2011 by Eli White
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
Eli White13.4K views
Reflections on 10 years of the Institutional Web by lisbk
Reflections on 10 years of the Institutional WebReflections on 10 years of the Institutional Web
Reflections on 10 years of the Institutional Web
lisbk4.6K views
ContentMine: Open Data and Social Machines by petermurrayrust
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
petermurrayrust2.6K views
The Elephant in the Library - Integrating Hadoop by cneudecker
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
cneudecker388 views
The Digital Library from Information Superhighway to the Semiotic Web by Martin Kalfatovic
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
Martin Kalfatovic369 views
NAG2007 by daveyp
NAG2007NAG2007
NAG2007
daveyp1.1K views
Another history of the Web from its architecture by Alexandre Monnin
Another history of the Web from its architectureAnother history of the Web from its architecture
Another history of the Web from its architecture
Alexandre Monnin4.6K views
Semantic Web: In Quest for the Next Generation Killer Apps by Jie Bao
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
Jie Bao2.6K views

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
3.9K views36 slides
hbaseconasia2017: HBase on Beam by
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
1.3K views26 slides
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
1.4K views21 slides
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
936 views42 slides
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
1.1K views21 slides
hbaseconasia2017: Apache HBase at Netease by
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
754 views27 slides

More from HBaseCon(20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by HBaseCon
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon3.9K views
hbaseconasia2017: HBase on Beam by HBaseCon
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon1.3K views
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by HBaseCon
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon1.4K views
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon936 views
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by HBaseCon
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon1.1K views
hbaseconasia2017: Apache HBase at Netease by HBaseCon
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon754 views
hbaseconasia2017: HBase在Hulu的使用和实践 by HBaseCon
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon878 views
hbaseconasia2017: 基于HBase的企业级大数据平台 by HBaseCon
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon701 views
hbaseconasia2017: HBase at JD.com by HBaseCon
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon828 views
hbaseconasia2017: Large scale data near-line loading method and architecture by HBaseCon
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon598 views
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei by HBaseCon
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon683 views
hbaseconasia2017: HBase Practice At XiaoMi by HBaseCon
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon1.8K views
hbaseconasia2017: hbase-2.0.0 by HBaseCon
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon1.8K views
HBaseCon2017 Democratizing HBase by HBaseCon
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon897 views
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon646 views
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase by HBaseCon
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon608 views
HBaseCon2017 Transactions in HBase by HBaseCon
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon1.8K views
HBaseCon2017 Highly-Available HBase by HBaseCon
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon1.1K views
HBaseCon2017 Apache HBase at Didi by HBaseCon
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon996 views
HBaseCon2017 gohbase: Pure Go HBase Client by HBaseCon
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon1.7K views

Recently uploaded

SAP FOR CONTRACT MANUFACTURING.pdf by
SAP FOR CONTRACT MANUFACTURING.pdfSAP FOR CONTRACT MANUFACTURING.pdf
SAP FOR CONTRACT MANUFACTURING.pdfVirendra Rai, PMP
11 views2 slides
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker by
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - ParkerDSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - ParkerDeltares
8 views16 slides
Neo4j : Graphes de Connaissance, IA et LLMs by
Neo4j : Graphes de Connaissance, IA et LLMsNeo4j : Graphes de Connaissance, IA et LLMs
Neo4j : Graphes de Connaissance, IA et LLMsNeo4j
46 views20 slides
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... by
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Donato Onofri
643 views34 slides
Winter '24 Release Chat.pdf by
Winter '24 Release Chat.pdfWinter '24 Release Chat.pdf
Winter '24 Release Chat.pdfmelbourneauuser
9 views20 slides
Unleash The Monkeys by
Unleash The MonkeysUnleash The Monkeys
Unleash The MonkeysJacob Duijzer
7 views28 slides

Recently uploaded(20)

DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker by Deltares
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - ParkerDSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker
Deltares8 views
Neo4j : Graphes de Connaissance, IA et LLMs by Neo4j
Neo4j : Graphes de Connaissance, IA et LLMsNeo4j : Graphes de Connaissance, IA et LLMs
Neo4j : Graphes de Connaissance, IA et LLMs
Neo4j46 views
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... by Donato Onofri
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Donato Onofri643 views
Applying Platform Engineering Thinking to Observability.pdf by Natan Yellin
Applying Platform Engineering Thinking to Observability.pdfApplying Platform Engineering Thinking to Observability.pdf
Applying Platform Engineering Thinking to Observability.pdf
Natan Yellin12 views
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan... by Deltares
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...
Deltares10 views
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t... by Deltares
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
Deltares9 views
Software testing company in India.pptx by SakshiPatel82
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptx
SakshiPatel827 views
Consulting for Data Monetization Maximizing the Profit Potential of Your Data... by Flexsin
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Flexsin 15 views
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ... by marksimpsongw
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
marksimpsongw74 views
Citi TechTalk Session 2: Kafka Deep Dive by confluent
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent17 views
Advanced API Mocking Techniques by Dimpy Adhikary
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking Techniques
Dimpy Adhikary18 views
MariaDB stored procedures and why they should be improved by Federico Razzoli
MariaDB stored procedures and why they should be improvedMariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improved
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... by Deltares
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
Deltares7 views
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove... by Deltares
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
Deltares15 views
How to Install and Activate Email-Researcher by eGrabber
How to Install and Activate Email-ResearcherHow to Install and Activate Email-Researcher
How to Install and Activate Email-Researcher
eGrabber19 views

HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving