SlideShare a Scribd company logo
1 of 23
Not your Dad’s Old HBase
Gilad Moscovitch - Senior Consultant UXC PS
@moscovig
Yaniv Rodenski - Principal Consultant UXC PS
@YRodenski
Agenda
Our use cases
Introduction to Apache
Phoenix
The first use case -
retrospective
Managing a large scale
Graph with TitanDB
The second use case -
retrospective
The Cable Company
Our story starts with a
cable company that
grew:
Over a decade ago,
bought an ISP
Bought a mobile
network
Started new ventures
such as VOD and VoIP
Our Dataset
Billions of records (PB scale)
Countless number of formats:
Multiple systems
Network equipment
Devices
Dynamic data model
New devices are introduced frequently
(on average every two weeks)
New demands are introduced even
more frequently
The Cable Guys:
Gilad Moscovitch
Engineering Manager
Yaniv Rodenski
Architect in the CTO team
Our Starting Point:
Devices
Systems of
Records
ETL via
ODI
Oracle Exadata
Challenges
The Oracle Data Warehouse and ODI could
not handle the load
ETL devs could not handle the load, the ETL
team became a bottleneck
Not all data types arrive at the warehouse
We had to prioritise due to lack of ETL devs
Incompatibility with the existing data model
Changes to the data model would take an
average of a month
Even when data was loaded, analysts were
not aware of the new tables, and we ended up
with an unusable schema
More Challenges
New data models that are not a
good fit for SQL databases:
Sparse data
Geospatial data
Full text
Graph
Need to ask harder questions
that require heavy processing:
Machine learning
Breaking Out
The new data platform
was Hadoop based
Using CDH (at that
time the most
advanced option)
Trying to reuse existing
components of the
platform as much as
possible
Challenge #1: Early Data
Access
Giving analysts, BI
developers and
business access to
raw data
For this use case we
reviewed a few tools,
including Apache
Phoenix
Apache Phoenix - SQL on
HBase
Apache Phoenix is a relational database layer over HBase with a
difference:
Table metadata is stored in an HBase table and versioned,
snapshot queries over prior versions will automatically use the
correct schema
Secondary indexes
Dynamic columns with schema on read
Views
Indexed
Updatable
Demo - Apache Phoenix
Challenge no 1: Results
In addition to Phoenix we also looked at Hive and Impala
Spark SQL, Presto and Drill were not considered due to immaturity
Impala was chosen
Schema on read was important
Hive on CDH doesn’t support Tez
Apache Phoenix was overkill and better suited to be a database rather than a
warehouse
Challenge no 2: Family
Time
Clients are never represented
by a single entity:
Households
Business
Clients have multiple devices
generating data:
Home and mobile phones
IP adresses for devices
DVRs
Titan - A Distributed Graph
Titan is a scalable graph database
Optimized for storing and querying graphs
Runs on top of:
Cassandra
HBase
DynamoDB
BerkeleyDB
Support for geo, numeric range, and full-text search via:
ElasticSearch
SolR
Supports Gremlin - a graph querying DSL via
Tinkerpop Gremlin over HTTP
Demo - Clash of the Titan
Challenge #2: Testing Stage
Hbase vs Cassandra benchmark + sanity check
Simulation for 1 billion Vertices
Sanity check- OK
Not much difference in loading time and querying time on both stores
HBase chosen because of the existing infrastructure
Retrospective: 1 billion Vertices on an empty graph didn’t really simulate anythin
Challenge #2: POC Stage
Initializing an untuned Hbase Cluster on all 24 nodes of the existing cluster
Hosted side by side with Map Reduce and Impala
Developing initial ontology for the largest data source together with a developer
from the client application team
Developing Map Reduce for loading hundreds of GB a day according to the
ontology
POC Performance
Input Data was stored in hourly directories so at first we scheduled the Map
Reduce for each hour.
An hour took about 40 minutes to process and load.
Later on - scheduled the Map-Reduce for a whole day at a time. The whole
day loading took about half a day.
ap-Reduce jobs create new challenges - Hold lots of reducers for a long time, not fun to re
Performance Tuning
HBase didn't handle the load, the symptoms included
HBase write-blocking compactions
Retired region servers
Tuning performed:
Region split size - split after 11 GB
Memstore flush size tuning
GC Tuning
Java Heap size decreasing from 32 to 16
Daily major compaction for the graph table
Retrospective: We had to statically partition to two different clusters:
One for HBase, and one for everything else
Today
The main graph ingests:
~1.7 billion edges
~1.7 billion vertices
The main graph size is 20TB
20 region servers
Rebuilding the graph on average every 3 months for new ontology
New data sources are added within a day by one (awesome) developer
Using a web based UI tool for graph exploration
Retrospective: Titan on HBase works pretty well for those sizes
Summary
HBase is a versatile datastore
Apache Phoenix modernises HBase with semi-relational
SQL layer
Titan provides powerful graph capabilities
Never be naive about Big Data tools, they will bite you,
badly
Next month:
Karel Alfonso
Apache Flink Ned Shawa
Apache NiFi

More Related Content

What's hot

Introduction to REST - API
Introduction to REST - APIIntroduction to REST - API
Introduction to REST - APIChetan Gadodia
 
Building Ext JS Using HATEOAS - Jeff Stano
Building Ext JS Using HATEOAS - Jeff StanoBuilding Ext JS Using HATEOAS - Jeff Stano
Building Ext JS Using HATEOAS - Jeff StanoSencha
 
Give a REST to your LDAP directory services
Give a REST to your LDAP directory servicesGive a REST to your LDAP directory services
Give a REST to your LDAP directory servicesLDAPCon
 
Birds Eye View on API Development - v1.0
Birds Eye View on API Development - v1.0Birds Eye View on API Development - v1.0
Birds Eye View on API Development - v1.0API Talent
 
Webservices Overview : XML RPC, SOAP and REST
Webservices Overview : XML RPC, SOAP and RESTWebservices Overview : XML RPC, SOAP and REST
Webservices Overview : XML RPC, SOAP and RESTPradeep Kumar
 
RESTful Web Service using Swagger
RESTful Web Service using SwaggerRESTful Web Service using Swagger
RESTful Web Service using SwaggerHong-Jhih Lin
 
Building Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsBuilding Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsSrdjan Strbanovic
 
Survey of restful web services frameworks
Survey of restful web services frameworksSurvey of restful web services frameworks
Survey of restful web services frameworksVijay Prasad Gupta
 
ASP.NET Web API and HTTP Fundamentals
ASP.NET Web API and HTTP FundamentalsASP.NET Web API and HTTP Fundamentals
ASP.NET Web API and HTTP FundamentalsIdo Flatow
 
Single page application
Single page applicationSingle page application
Single page applicationJeremy Lee
 
introduction about REST API
introduction about REST APIintroduction about REST API
introduction about REST APIAmilaSilva13
 
An Overview of Web Services: SOAP and REST
An Overview of Web Services: SOAP and REST An Overview of Web Services: SOAP and REST
An Overview of Web Services: SOAP and REST Ram Awadh Prasad, PMP
 
The Power of Drupal and Alfresco Together
The Power of Drupal and Alfresco TogetherThe Power of Drupal and Alfresco Together
The Power of Drupal and Alfresco TogetherJeff Potts
 
Alfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture OverviewAlfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture OverviewAlfresco Software
 
Role of Rest vs. Web Services and EI
Role of Rest vs. Web Services and EIRole of Rest vs. Web Services and EI
Role of Rest vs. Web Services and EIWSO2
 

What's hot (20)

Modern Web Applications
Modern Web ApplicationsModern Web Applications
Modern Web Applications
 
Introduction to REST - API
Introduction to REST - APIIntroduction to REST - API
Introduction to REST - API
 
Building Ext JS Using HATEOAS - Jeff Stano
Building Ext JS Using HATEOAS - Jeff StanoBuilding Ext JS Using HATEOAS - Jeff Stano
Building Ext JS Using HATEOAS - Jeff Stano
 
Give a REST to your LDAP directory services
Give a REST to your LDAP directory servicesGive a REST to your LDAP directory services
Give a REST to your LDAP directory services
 
Birds Eye View on API Development - v1.0
Birds Eye View on API Development - v1.0Birds Eye View on API Development - v1.0
Birds Eye View on API Development - v1.0
 
Webservices Overview : XML RPC, SOAP and REST
Webservices Overview : XML RPC, SOAP and RESTWebservices Overview : XML RPC, SOAP and REST
Webservices Overview : XML RPC, SOAP and REST
 
RESTful Web Service using Swagger
RESTful Web Service using SwaggerRESTful Web Service using Swagger
RESTful Web Service using Swagger
 
Building REST and Hypermedia APIs with PHP
Building REST and Hypermedia APIs with PHPBuilding REST and Hypermedia APIs with PHP
Building REST and Hypermedia APIs with PHP
 
Building Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsBuilding Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJs
 
Survey of restful web services frameworks
Survey of restful web services frameworksSurvey of restful web services frameworks
Survey of restful web services frameworks
 
Single page application
Single page applicationSingle page application
Single page application
 
ASP.NET Web API and HTTP Fundamentals
ASP.NET Web API and HTTP FundamentalsASP.NET Web API and HTTP Fundamentals
ASP.NET Web API and HTTP Fundamentals
 
Single page application
Single page applicationSingle page application
Single page application
 
Web API Basics
Web API BasicsWeb API Basics
Web API Basics
 
introduction about REST API
introduction about REST APIintroduction about REST API
introduction about REST API
 
An Overview of Web Services: SOAP and REST
An Overview of Web Services: SOAP and REST An Overview of Web Services: SOAP and REST
An Overview of Web Services: SOAP and REST
 
The Power of Drupal and Alfresco Together
The Power of Drupal and Alfresco TogetherThe Power of Drupal and Alfresco Together
The Power of Drupal and Alfresco Together
 
Alfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture OverviewAlfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture Overview
 
Excellent rest using asp.net web api
Excellent rest using asp.net web apiExcellent rest using asp.net web api
Excellent rest using asp.net web api
 
Role of Rest vs. Web Services and EI
Role of Rest vs. Web Services and EIRole of Rest vs. Web Services and EI
Role of Rest vs. Web Services and EI
 

Viewers also liked

Elasticsearch na prática
Elasticsearch na práticaElasticsearch na prática
Elasticsearch na práticaBreno Oliveira
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...Uri Cohen
 
JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDDUri Lavi
 
Scala does the Catwalk
Scala does the CatwalkScala does the Catwalk
Scala does the CatwalkAriel Kogan
 
What's the Magic in LinkedIn?
What's the Magic in LinkedIn?What's the Magic in LinkedIn?
What's the Magic in LinkedIn?Efrat Fenigson
 
Scrum. software engineering seminar
Scrum. software engineering seminarScrum. software engineering seminar
Scrum. software engineering seminarAlexandr Gavrishev
 
טלפונים חכמים ואתם
טלפונים חכמים ואתםטלפונים חכמים ואתם
טלפונים חכמים ואתםIdan ofek
 
1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)Yoav Francis
 
Guice - dependency injection framework
Guice - dependency injection frameworkGuice - dependency injection framework
Guice - dependency injection frameworkEvgeny Barabanov
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?Dina Goldshtein
 
מכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמןמכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמןLiran Fridman
 
Lessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGLLessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGLLior Tal
 
How fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practiceHow fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practiceTobias Pfeiffer
 
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...Gilad Garon
 
Optimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterpriseOptimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterpriseEyal Edri
 
Responsive Web Design
Responsive Web DesignResponsive Web Design
Responsive Web DesignNir Elbaz
 

Viewers also liked (20)

Scale up your thinking
Scale up your thinkingScale up your thinking
Scale up your thinking
 
Elasticsearch na prática
Elasticsearch na práticaElasticsearch na prática
Elasticsearch na prática
 
HagayOnn_EnglishCV_ 2016
HagayOnn_EnglishCV_ 2016HagayOnn_EnglishCV_ 2016
HagayOnn_EnglishCV_ 2016
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
 
JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDD
 
Scala does the Catwalk
Scala does the CatwalkScala does the Catwalk
Scala does the Catwalk
 
What's the Magic in LinkedIn?
What's the Magic in LinkedIn?What's the Magic in LinkedIn?
What's the Magic in LinkedIn?
 
Scrum. software engineering seminar
Scrum. software engineering seminarScrum. software engineering seminar
Scrum. software engineering seminar
 
Storm at Forter
Storm at ForterStorm at Forter
Storm at Forter
 
טלפונים חכמים ואתם
טלפונים חכמים ואתםטלפונים חכמים ואתם
טלפונים חכמים ואתם
 
Joy of scala
Joy of scalaJoy of scala
Joy of scala
 
1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)
 
Guice - dependency injection framework
Guice - dependency injection frameworkGuice - dependency injection framework
Guice - dependency injection framework
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?
 
מכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמןמכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמן
 
Lessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGLLessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGL
 
How fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practiceHow fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practice
 
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
 
Optimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterpriseOptimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterprise
 
Responsive Web Design
Responsive Web DesignResponsive Web Design
Responsive Web Design
 

Similar to Not your dad's h base new

Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Bigdata & Hadoop
Bigdata & HadoopBigdata & Hadoop
Bigdata & HadoopPinto Das
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Crossing the Chasm
Crossing the ChasmCrossing the Chasm
Crossing the ChasmHortonworks
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approachesLuxoft
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 

Similar to Not your dad's h base new (20)

Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Bigdata & Hadoop
Bigdata & HadoopBigdata & Hadoop
Bigdata & Hadoop
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Crossing the Chasm
Crossing the ChasmCrossing the Chasm
Crossing the Chasm
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approaches
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 

Recently uploaded

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 

Recently uploaded (20)

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

Not your dad's h base new

  • 1. Not your Dad’s Old HBase Gilad Moscovitch - Senior Consultant UXC PS @moscovig Yaniv Rodenski - Principal Consultant UXC PS @YRodenski
  • 2. Agenda Our use cases Introduction to Apache Phoenix The first use case - retrospective Managing a large scale Graph with TitanDB The second use case - retrospective
  • 3. The Cable Company Our story starts with a cable company that grew: Over a decade ago, bought an ISP Bought a mobile network Started new ventures such as VOD and VoIP
  • 4. Our Dataset Billions of records (PB scale) Countless number of formats: Multiple systems Network equipment Devices Dynamic data model New devices are introduced frequently (on average every two weeks) New demands are introduced even more frequently
  • 5. The Cable Guys: Gilad Moscovitch Engineering Manager Yaniv Rodenski Architect in the CTO team
  • 6. Our Starting Point: Devices Systems of Records ETL via ODI Oracle Exadata
  • 7. Challenges The Oracle Data Warehouse and ODI could not handle the load ETL devs could not handle the load, the ETL team became a bottleneck Not all data types arrive at the warehouse We had to prioritise due to lack of ETL devs Incompatibility with the existing data model Changes to the data model would take an average of a month Even when data was loaded, analysts were not aware of the new tables, and we ended up with an unusable schema
  • 8. More Challenges New data models that are not a good fit for SQL databases: Sparse data Geospatial data Full text Graph Need to ask harder questions that require heavy processing: Machine learning
  • 9. Breaking Out The new data platform was Hadoop based Using CDH (at that time the most advanced option) Trying to reuse existing components of the platform as much as possible
  • 10. Challenge #1: Early Data Access Giving analysts, BI developers and business access to raw data For this use case we reviewed a few tools, including Apache Phoenix
  • 11. Apache Phoenix - SQL on HBase Apache Phoenix is a relational database layer over HBase with a difference: Table metadata is stored in an HBase table and versioned, snapshot queries over prior versions will automatically use the correct schema Secondary indexes Dynamic columns with schema on read Views Indexed Updatable
  • 12. Demo - Apache Phoenix
  • 13. Challenge no 1: Results In addition to Phoenix we also looked at Hive and Impala Spark SQL, Presto and Drill were not considered due to immaturity Impala was chosen Schema on read was important Hive on CDH doesn’t support Tez Apache Phoenix was overkill and better suited to be a database rather than a warehouse
  • 14. Challenge no 2: Family Time Clients are never represented by a single entity: Households Business Clients have multiple devices generating data: Home and mobile phones IP adresses for devices DVRs
  • 15. Titan - A Distributed Graph Titan is a scalable graph database Optimized for storing and querying graphs Runs on top of: Cassandra HBase DynamoDB BerkeleyDB Support for geo, numeric range, and full-text search via: ElasticSearch SolR Supports Gremlin - a graph querying DSL via Tinkerpop Gremlin over HTTP
  • 16. Demo - Clash of the Titan
  • 17. Challenge #2: Testing Stage Hbase vs Cassandra benchmark + sanity check Simulation for 1 billion Vertices Sanity check- OK Not much difference in loading time and querying time on both stores HBase chosen because of the existing infrastructure Retrospective: 1 billion Vertices on an empty graph didn’t really simulate anythin
  • 18. Challenge #2: POC Stage Initializing an untuned Hbase Cluster on all 24 nodes of the existing cluster Hosted side by side with Map Reduce and Impala Developing initial ontology for the largest data source together with a developer from the client application team Developing Map Reduce for loading hundreds of GB a day according to the ontology
  • 19. POC Performance Input Data was stored in hourly directories so at first we scheduled the Map Reduce for each hour. An hour took about 40 minutes to process and load. Later on - scheduled the Map-Reduce for a whole day at a time. The whole day loading took about half a day. ap-Reduce jobs create new challenges - Hold lots of reducers for a long time, not fun to re
  • 20. Performance Tuning HBase didn't handle the load, the symptoms included HBase write-blocking compactions Retired region servers Tuning performed: Region split size - split after 11 GB Memstore flush size tuning GC Tuning Java Heap size decreasing from 32 to 16 Daily major compaction for the graph table Retrospective: We had to statically partition to two different clusters: One for HBase, and one for everything else
  • 21. Today The main graph ingests: ~1.7 billion edges ~1.7 billion vertices The main graph size is 20TB 20 region servers Rebuilding the graph on average every 3 months for new ontology New data sources are added within a day by one (awesome) developer Using a web based UI tool for graph exploration Retrospective: Titan on HBase works pretty well for those sizes
  • 22. Summary HBase is a versatile datastore Apache Phoenix modernises HBase with semi-relational SQL layer Titan provides powerful graph capabilities Never be naive about Big Data tools, they will bite you, badly
  • 23. Next month: Karel Alfonso Apache Flink Ned Shawa Apache NiFi