Hug Hbase Presentation.

•Download as ODP, PDF•

39 likes•9,473 views

Jack Levin

Technology

Why? – HBASE is used for 99% of the backend

HBASE Best Practices or Taming the Beast ,[object Object]

What's hot

Cassandra an overviewPritamKathar

HBase Accelerated: In-Memory Flush and CompactionDataWorks Summit/Hadoop Summit

RedisConf17 - Lyft - Geospatial at Scale - Daniel HochmanRedis Labs

From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky

Distributed Lock ManagerHao Chen

Achieving HBase Multi-Tenancy with RegionServer Groups and Favored NodesDataWorks Summit

Cassandra DatabaseYounesCharfaoui

Security in mobile ad hoc networksPiyush Mittal

Introduction to RedisDvir Volk

From Mainframe to Microservice: An Introduction to Distributed SystemsTyler Treat

HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz

Fast Data at Scale with Amazon ElastiCache for RedisAmazon Web Services

Cephfs架构解读和测试分析Yang Guanjun

An Overview of Apache CassandraDataStax

Apache Flink and Apache Hudi.pdfdogma28

Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent

Hadoop foundation for analyticsHariniA7

Building a powerful double entry accounting systemLucas Cavalcanti dos Santos

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...NoSQLmatters

What's hot (20)

Cassandra an overview

HBase Accelerated: In-Memory Flush and Compaction

RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman

From cache to in-memory data grid. Introduction to Hazelcast.

Distributed Lock Manager

Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

Cassandra Database

Security in mobile ad hoc networks

Introduction to Redis

From Mainframe to Microservice: An Introduction to Distributed Systems

HBase and HDFS: Understanding FileSystem Usage in HBase

Fast Data at Scale with Amazon ElastiCache for Redis

Cephfs架构解读和测试分析

An Overview of Apache Cassandra

Apache Flink and Apache Hudi.pdf

Spring Boot+Kafka: the New Enterprise Platform

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...

Hadoop foundation for analytics

Building a powerful double entry accounting system

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...

Viewers also liked

Facebook Messages & HBase强王

Adding Search to the Hadoop EcosystemCloudera, Inc.

Apache Hive 0.13 Performance BenchmarksHortonworks

Hadoop World 2011 Keynote: Ebay - Hugh WilliamsCloudera, Inc.

A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit

HDFS Analysis for Small FilesDataWorks Summit/Hadoop Summit

Hive + Tez: A Performance Deep DiveDataWorks Summit

Stream Processing with Kafka in Uber, Danny Yuan confluent

REST to RESTful Web Service家弘周

Intro to HBasealexbaranau

Viewers also liked (10)

Facebook Messages & HBase

Adding Search to the Hadoop Ecosystem

Apache Hive 0.13 Performance Benchmarks

Hadoop World 2011 Keynote: Ebay - Hugh Williams

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics

HDFS Analysis for Small Files

Hive + Tez: A Performance Deep Dive

Stream Processing with Kafka in Uber, Danny Yuan

REST to RESTful Web Service

Intro to HBase

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Real Time Object Detection Using Open CV

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

GenAI Risks & Security Meetup 01052024.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Ransomware_Q4_2023. The report. [EN].pdf

Artificial Intelligence Chap.5 : Uncertainty

Automating Google Workspace (GWS) & more with Apps Script

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

MS Copilot expands with MS Graph connectors

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

DBX First Quarter 2024 Investor Presentation

Architecting Cloud Native Applications

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Corporate and higher education May webinar.pptx

Boost Fertility New Invention Ups Success Rates.pdf

Hug Hbase Presentation.

2. 10,000 Concurrent requests per second

3. Super fast!

4. Super Huge datastore – 2bl rows.

5. Backend is scalable

6. Does not lose data

7. Why? – HBASE is used for 99% of the backend

9. ImageShack: 25 ml monthly uniques

10. Yfrog: 33 ml monthly uniques

11. 4 Hbase Clusters of various sizes (50TB to 1 PT)

12. Storing and serving 250ml photos (500kb average per file), 60 servers

13. Yfrog is powered by smaller 50 TB cluster, with 2 billion rows, 20 servers

14. Using 0.89x and 0.90x versions

15.

16. Lots of RAM is good but only to a point, just avoid swap.

17. We use sub $1k desktop grade servers, they work great!

18. Check your network hardware for packet drops (we had outifDiscards interrupting zookeeper messages, Region servers would suicide during packet loss), just use ping -f to test for packet loss between core nodes.

19. JVM GC does take lots of CPU when misconfigured – e.g. Small NewSize

20. Single Namenode? No problem, just build two clusters have your APP tier do log query replication and replays when needed.

21. Inexpensive 2TB hitachi disks (~$100) work great, get more units for your money.

22.

23. 2. Setup HDFS to work flawlessly (pay attention to ulimits, thread limits, hardware stats, graphs, iowait, etc)

24. 3. Adjust JVM GC NewSize to be at least 100MB (if YG GC is too slow for 100MB, you need faster CPUs).

25. 4. For metadata rows (small rows) adjust your Hbase block size to be 4 or 8kb, you will see less IO and more blocks will fit into RAM.

26.

27.

28. Memstore size graph should be fairly flat with even flushes over time.

29. Iowait graphs should not go over 70-80% during major compaction, and 20% during minor compactions. Otherwise just add more disks and/or nodes.

30. Monitor and graph Thrift threads (via ps -eLf | grep PID), if your threads end up over 25,000, you may run out of RAM. We have dedicated thrift boxes so that we don't accidently kill RS nodes.

31. We use Nagios to monitor and alert for DN, RS, ZK, NN, etc on their web tcp ports – very helpful.

32. Run hbck to check for consistency of meta structures.

33.

34. Various RAM brands – boxes crash for no reason.

35. Glibc in FC13 had race condition bug, would lock up nodes, crash JVM processes under high load. Solution: yum -y update glibc (invalid binfree)

36. When running in mixed hardware environment, some boxes were slow enough to affect HDFS for the whole cluster – looking at “runnable threads” and “fsreadlatency” in Ganglia always pointed which boxes were 'slow'

37. Running cloudera HDFS under user 'hadoop', that was restricted to 1024 threads by default would crash datanodes, but only during compactions. Setting hadoop soft(and hard) nproc 32,000 in limits.conf resolved it.

38. GC sometimes autotunes NewSize of 20MB, caused GC run to 20 or 30 per second, causing CPU to flatline at 100% and kill the RS. Manually setting to 128MB resolved this issue.

39.

40. No strange crashes

41. No OOME

42. Fast – 0.5 ms puts, 2-3ms reads, 10ms disk reads.

43. Recovers quickly when nodes are taken down

44. Oncall team can finally relax

45.

46. Load test HBASE with YCSB – just leave it running for a week, if nothing crashes, you are good. Best not to test with live user traffic :)

47. Do not worry about Namenode redundancy, just backup /name dir frequently. Setup secondary Hbase cluster with the money you save on not buying 'Server' grade nodes.

48. Burn in your disks, even if they are new

49. Put Memcached between your App. Tier and Hbase, App. Bugs will hit memcached first, keeping hbase safe from the assault, which could drive your utilization.

50.

51. JD Cryans

52. Michael Stack

53. And everyone else on the hbase user list who helped us out during the rough times.

Hug Hbase Presentation.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Hug Hbase Presentation.

Similar to Hug Hbase Presentation. (20)

Recently uploaded

Recently uploaded (20)

Hug Hbase Presentation.