SlideShare a Scribd company logo
1 of 26
Download to read offline
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Multitenancy at Bloomberg
DataWorks Summit
June 14th
, 2017
Clay Baenziger
Hadoop Infrastructure
hadoop@bloomberg.net
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Bloomberg quickly and accurately delivers business and
financial information, news and insight around the world.
A Sense of Scale:
●550 exchange feeds and over 100 billion market data messages a
day
●400 million emails and 17 million IM’s daily across the Bloomberg
Professional Service
●Over 2,700 journalists and analysts in over 120 countries
●Producing more than 5,000 stories a day
●Reaching over 360 million homes worldwide
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Project JIRAs Project JIRAs Project JIRAs
Phoenix 24 HBase 20 Spark 9
Zookeeper 8 HDFS 6 Bigtop 3
Oozie 4 Storm 2 Hive 2
Hadoop 2 YARN 2 Kafka 2
Flume 1 HAWQ 1 Total 86
Apache Solr: 3 core committers (one PMC member) – commits in every release since 4.6
(Reporter or assignee from our Foundational Services group and affiliated projects)
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
Technology Intro – Apache HBase
What?
●Distributed database designed to host very large tables – billions of
rows by millions of columns
●Block cache, bloom filters and time-line consistency for highly available,
real-time queries
●Sharded, versioned, non-relational database modeled after Google's
Bigtable – compacting log structured merge tree design
●Supports exports and backups – by global administrators
© 2017 Bloomberg Finance L.P. All rights reserved.
Technology Intro – Apache Oozie
What?
•Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
•Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
•Oozie workflows can be templated with properties.
•Oozie coordinator jobs are reoccurring Oozie workflow jobs triggered by time and data
availability.
•Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop
jobs, security tokens as well as providing system specific jobs out of the box.
•Oozie is a scalable, highly available and extensible system.
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase Multitenancy
Why?
Larger capacity reserve:
●Can handle request spikes
●Can lose a rack
●Higher per-machine usage
●Multi-cluster Hadoop support is evolving
Why Not?
●Isolation
●Easier to understand
© 2017 Bloomberg Finance L.P. All rights reserved.
Write Heavy
●Memstore Heavy
●Compactions Optimized for HFile Size
●Flush Size Tuning
Read Heavy
●Cache Heavy
●Compactions Optimized for Minimal HFiles
●Read Replicas
Mixed Read/Write
●SSDs
●Read Replicas
HBase Workloads
© 2017 Bloomberg Finance L.P. All rights reserved.
●Availability:
●Data Bugs
●Thread Death/Starvation
●User Code Bugs
●
HBase Contested Resources
Storage:
●Memstore
●HDFS
●Cache
●Input/Outut:
●Latency
●Queues
●Ingest
© 2017 Bloomberg Finance L.P. All rights reserved.
●Data Bugs:
●Can isolate tenants with Region Server Groups – HBASE-6721
●Thread Death/Starvation:
●Master becomes a zombie if filesystem object closes – HBASE-17287
●“Region Server Too Busy” – Request Quotas
●Garbage Collection “Bombs” – HBASE-18023 - “Log multi-* requests for
more than threshold number of rows”
●User Code Bugs (Coprocessors)
●”Coprocessors - Uses, Abuses, Solutions” Esther Kundin and Clay Baenziger
– HBase Con East, September 26th, 2016
●Can run only approved coprocessors – HBASE-16700 – “Allow for
Coprocessor Whitelisting”
HBase Resources – Availability
© 2017 Bloomberg Finance L.P. All rights reserved.
●Memstore (Flushes)
●Affects HFile Quantity
●Can Block Writes
●Compacting Memstore
●
●HDFS
●Denial-of-Service – HBASE-16961 - “FileSystem Quotas”
●
●Cache
●Multiple workloads thrashing the cache
HBase Resources – Storage
© 2017 Bloomberg Finance L.P. All rights reserved.
Latency
●Caching – Off-Heaping
●De-Prioritizing Scanners
Queues – Monitoring!
●Replication Queue
●Handler Queue
Ingest
●Splits
●Compactions
●Bulk-Load
HBase Resources – Input/Output
© 2017 Bloomberg Finance L.P. All rights reserved.
●Kerberos - Need a way to get identity:
●LDAP Group Traversal – HADOOP-12291
●Namespaces
●Grants
●Multiple Clusters:
●Spark-Hbase Connector (https://github.com/hortonworks-spark/shc PR#120)
●Oozie Delegation Token Acquistion – OOZIE-1646 – “HBase Table Copy
between two HBase servers doesn't work with Kerberos”
HBase Access
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
Oozie and Self-Service
Self-Service:
“The serving of oneself with goods or services” – Merriam-Webster.COM
• Automation – pipelines – workflows; re-occurrence – coordinators
•Job Status – callback URLs to notify of job progress
•Job IDs – Map/Reduce or YARN IDs for action’s internal sub-jobs’ ID’s
•Authentication – delegation tokens (no keytabs!)
© 2017 Bloomberg Finance L.P. All rights reserved.
Separates Continuous Integration from Continuous Deployment
Leverages:
●Git – OOZIE-2877
●Maven – OOZIE-2878
Axioms – write a deployment workflow for your job (and its workflow)
Must follow principals of:
●Idempotency – On re-runs, results in same state as first run
●Cleanliness – Removes old artifacts (state)
●Separation of Configuration from Code – deploy same workflow to
development and production with only workflow property differences
●See also slides to ”Cluster Continuous Delivery with Oozie” – ApacheCon
North America - Big Data, May 18th, 2017
Oozie Deployments
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Why?
Self-Service – No DBA!
Regularly schedulable
Runs as the project user with no keytab
Logs and reporting
●Can record HBase errors with logs organized by each Oozie job
●Can report success of job with proactive callbacks
●Can verify performance of job with SLA subsystem
●Can provide infrastructure insight into what’s running
© 2017 Bloomberg Finance L.P. All rights reserved.
●
●
HBase Scheduled Compactions
Compactions are I/O and processing heavy – can be:
●Detrimental to read or write performance
●Lead to split storms or rebalancing
●Good to plan for impact – schedule them
●Can also go region-by-region to lessen impact
●Can poll to know when compaction is complete
●A simple Java action:https://tinyurl.com/oozie-hbase-compaction
© 2017 Bloomberg Finance L.P. All rights reserved.
●
●
HBase Scheduled Compactions
Key Oozie-isms:
●Java action needs to know how to get configuration:
conf.addResource(new Path("file:///", 
System.getProperty("oozie.action.conf.xml")));
●Need to pass delegation tokens from Oozie:
if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != 
null)
 { conf.set("mapreduce.job.credentials.binary",
       System.getenv("HADOOP_TOKEN_FILE_LOCATION")); }
●Need to use Oozie’s Credentials Action Authentication
●Must pass in properties manually – no job­xml support – OOZIE-2947
© 2017 Bloomberg Finance L.P. All rights reserved.
Backup Requirements:
●Live backups (cannot disable table or take HBase offline)
●Self-service (non-HBase user can backup/restore their own data)
●Automatable procedure (Oozie)
●Works on a secure cluster (hdfs:///hbase is non-world readable)
●Backup location may not be running Hbase
●Does not require significant architectural “baggage” addition to HBase
●
●
HBase Snapshot Export
Table Snapshot
Snapshot
initiated
Oozie workflow submitted
(as namespace admin)
HBase Snapshot Export
HBase Export
Snapshot
Perm. check
Oozie Export
Snapshot Action
Namespace Admin
User
HDFS
Create “dropbox” directory
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Multitenancy at Bloomberg
Clay Baenziger
Hadoop Infrastructure
https://github.com/bloomberg
hadoop@bloomberg.net
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Multitenancy at Bloomberg
DataWorks Summit
June 14th
, 2017
Clay Baenziger
Hadoop Infrastructure
Hadoop@Bloomberg.NET

More Related Content

What's hot

Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...DataWorks Summit
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureDataWorks Summit/Hadoop Summit
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersDataWorks Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 

What's hot (20)

Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 

Similar to Multitenancy At Bloomberg - HBase and Oozie

OSGi and Financial Messaging - A successful use case - Luis Matos
OSGi and Financial Messaging - A successful use case - Luis MatosOSGi and Financial Messaging - A successful use case - Luis Matos
OSGi and Financial Messaging - A successful use case - Luis Matosmfrancis
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9Gleb Otochkin
 
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...Insight Technology, Inc.
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerArtem Ervits
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsHBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsDataWorks Summit
 
Blended Web and Database Attacks on Real Time In-memory Platforms
Blended Web and Database Attacks on Real Time In-memory PlatformsBlended Web and Database Attacks on Real Time In-memory Platforms
Blended Web and Database Attacks on Real Time In-memory PlatformsOnapsis Inc.
 
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBWebinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBSeveralnines
 
Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaEdelweiss Kammermann
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryTsz-Wo (Nicholas) Sze
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixBiju Nair
 
Stream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesStream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesVladimír Schreiner
 

Similar to Multitenancy At Bloomberg - HBase and Oozie (20)

OSGi and Financial Messaging - A successful use case - Luis Matos
OSGi and Financial Messaging - A successful use case - Luis MatosOSGi and Financial Messaging - A successful use case - Luis Matos
OSGi and Financial Messaging - A successful use case - Luis Matos
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
 
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Cloud Foundry Summit 2017
Cloud Foundry Summit 2017Cloud Foundry Summit 2017
Cloud Foundry Summit 2017
 
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsHBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
 
Blended Web and Database Attacks on Real Time In-memory Platforms
Blended Web and Database Attacks on Real Time In-memory PlatformsBlended Web and Database Attacks on Real Time In-memory Platforms
Blended Web and Database Attacks on Real Time In-memory Platforms
 
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBWebinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
 
Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and Kafka
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache Phoenix
 
Stream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesStream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data Pipelines
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Recently uploaded

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Multitenancy At Bloomberg - HBase and Oozie

  • 1. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Multitenancy at Bloomberg DataWorks Summit June 14th , 2017 Clay Baenziger Hadoop Infrastructure hadoop@bloomberg.net
  • 2. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 3. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Bloomberg quickly and accurately delivers business and financial information, news and insight around the world. A Sense of Scale: ●550 exchange feeds and over 100 billion market data messages a day ●400 million emails and 17 million IM’s daily across the Bloomberg Professional Service ●Over 2,700 journalists and analysts in over 120 countries ●Producing more than 5,000 stories a day ●Reaching over 360 million homes worldwide
  • 4. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Project JIRAs Project JIRAs Project JIRAs Phoenix 24 HBase 20 Spark 9 Zookeeper 8 HDFS 6 Bigtop 3 Oozie 4 Storm 2 Hive 2 Hadoop 2 YARN 2 Kafka 2 Flume 1 HAWQ 1 Total 86 Apache Solr: 3 core committers (one PMC member) – commits in every release since 4.6 (Reporter or assignee from our Foundational Services group and affiliated projects)
  • 5. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 6. © 2017 Bloomberg Finance L.P. All rights reserved. Technology Intro – Apache HBase What? ●Distributed database designed to host very large tables – billions of rows by millions of columns ●Block cache, bloom filters and time-line consistency for highly available, real-time queries ●Sharded, versioned, non-relational database modeled after Google's Bigtable – compacting log structured merge tree design ●Supports exports and backups – by global administrators
  • 7. © 2017 Bloomberg Finance L.P. All rights reserved. Technology Intro – Apache Oozie What? •Oozie is a workflow scheduler system to manage Apache Hadoop jobs. •Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions. •Oozie workflows can be templated with properties. •Oozie coordinator jobs are reoccurring Oozie workflow jobs triggered by time and data availability. •Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs, security tokens as well as providing system specific jobs out of the box. •Oozie is a scalable, highly available and extensible system.
  • 8. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 9. © 2017 Bloomberg Finance L.P. All rights reserved. HBase Multitenancy Why? Larger capacity reserve: ●Can handle request spikes ●Can lose a rack ●Higher per-machine usage ●Multi-cluster Hadoop support is evolving Why Not? ●Isolation ●Easier to understand
  • 10. © 2017 Bloomberg Finance L.P. All rights reserved. Write Heavy ●Memstore Heavy ●Compactions Optimized for HFile Size ●Flush Size Tuning Read Heavy ●Cache Heavy ●Compactions Optimized for Minimal HFiles ●Read Replicas Mixed Read/Write ●SSDs ●Read Replicas HBase Workloads
  • 11. © 2017 Bloomberg Finance L.P. All rights reserved. ●Availability: ●Data Bugs ●Thread Death/Starvation ●User Code Bugs ● HBase Contested Resources Storage: ●Memstore ●HDFS ●Cache ●Input/Outut: ●Latency ●Queues ●Ingest
  • 12. © 2017 Bloomberg Finance L.P. All rights reserved. ●Data Bugs: ●Can isolate tenants with Region Server Groups – HBASE-6721 ●Thread Death/Starvation: ●Master becomes a zombie if filesystem object closes – HBASE-17287 ●“Region Server Too Busy” – Request Quotas ●Garbage Collection “Bombs” – HBASE-18023 - “Log multi-* requests for more than threshold number of rows” ●User Code Bugs (Coprocessors) ●”Coprocessors - Uses, Abuses, Solutions” Esther Kundin and Clay Baenziger – HBase Con East, September 26th, 2016 ●Can run only approved coprocessors – HBASE-16700 – “Allow for Coprocessor Whitelisting” HBase Resources – Availability
  • 13. © 2017 Bloomberg Finance L.P. All rights reserved. ●Memstore (Flushes) ●Affects HFile Quantity ●Can Block Writes ●Compacting Memstore ● ●HDFS ●Denial-of-Service – HBASE-16961 - “FileSystem Quotas” ● ●Cache ●Multiple workloads thrashing the cache HBase Resources – Storage
  • 14. © 2017 Bloomberg Finance L.P. All rights reserved. Latency ●Caching – Off-Heaping ●De-Prioritizing Scanners Queues – Monitoring! ●Replication Queue ●Handler Queue Ingest ●Splits ●Compactions ●Bulk-Load HBase Resources – Input/Output
  • 15. © 2017 Bloomberg Finance L.P. All rights reserved. ●Kerberos - Need a way to get identity: ●LDAP Group Traversal – HADOOP-12291 ●Namespaces ●Grants ●Multiple Clusters: ●Spark-Hbase Connector (https://github.com/hortonworks-spark/shc PR#120) ●Oozie Delegation Token Acquistion – OOZIE-1646 – “HBase Table Copy between two HBase servers doesn't work with Kerberos” HBase Access
  • 16. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 17. © 2017 Bloomberg Finance L.P. All rights reserved. Oozie and Self-Service Self-Service: “The serving of oneself with goods or services” – Merriam-Webster.COM • Automation – pipelines – workflows; re-occurrence – coordinators •Job Status – callback URLs to notify of job progress •Job IDs – Map/Reduce or YARN IDs for action’s internal sub-jobs’ ID’s •Authentication – delegation tokens (no keytabs!)
  • 18. © 2017 Bloomberg Finance L.P. All rights reserved. Separates Continuous Integration from Continuous Deployment Leverages: ●Git – OOZIE-2877 ●Maven – OOZIE-2878 Axioms – write a deployment workflow for your job (and its workflow) Must follow principals of: ●Idempotency – On re-runs, results in same state as first run ●Cleanliness – Removes old artifacts (state) ●Separation of Configuration from Code – deploy same workflow to development and production with only workflow property differences ●See also slides to ”Cluster Continuous Delivery with Oozie” – ApacheCon North America - Big Data, May 18th, 2017 Oozie Deployments
  • 19. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 20. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Why? Self-Service – No DBA! Regularly schedulable Runs as the project user with no keytab Logs and reporting ●Can record HBase errors with logs organized by each Oozie job ●Can report success of job with proactive callbacks ●Can verify performance of job with SLA subsystem ●Can provide infrastructure insight into what’s running
  • 21. © 2017 Bloomberg Finance L.P. All rights reserved. ● ● HBase Scheduled Compactions Compactions are I/O and processing heavy – can be: ●Detrimental to read or write performance ●Lead to split storms or rebalancing ●Good to plan for impact – schedule them ●Can also go region-by-region to lessen impact ●Can poll to know when compaction is complete ●A simple Java action:https://tinyurl.com/oozie-hbase-compaction
  • 22. © 2017 Bloomberg Finance L.P. All rights reserved. ● ● HBase Scheduled Compactions Key Oozie-isms: ●Java action needs to know how to get configuration: conf.addResource(new Path("file:///",  System.getProperty("oozie.action.conf.xml"))); ●Need to pass delegation tokens from Oozie: if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") !=  null)  { conf.set("mapreduce.job.credentials.binary",        System.getenv("HADOOP_TOKEN_FILE_LOCATION")); } ●Need to use Oozie’s Credentials Action Authentication ●Must pass in properties manually – no job­xml support – OOZIE-2947
  • 23. © 2017 Bloomberg Finance L.P. All rights reserved. Backup Requirements: ●Live backups (cannot disable table or take HBase offline) ●Self-service (non-HBase user can backup/restore their own data) ●Automatable procedure (Oozie) ●Works on a secure cluster (hdfs:///hbase is non-world readable) ●Backup location may not be running Hbase ●Does not require significant architectural “baggage” addition to HBase ● ● HBase Snapshot Export
  • 24. Table Snapshot Snapshot initiated Oozie workflow submitted (as namespace admin) HBase Snapshot Export HBase Export Snapshot Perm. check Oozie Export Snapshot Action Namespace Admin User HDFS Create “dropbox” directory
  • 25. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Multitenancy at Bloomberg Clay Baenziger Hadoop Infrastructure https://github.com/bloomberg hadoop@bloomberg.net
  • 26. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Multitenancy at Bloomberg DataWorks Summit June 14th , 2017 Clay Baenziger Hadoop Infrastructure Hadoop@Bloomberg.NET