Building	  applica2ons	  with	  HBase        Headline	  Goes	  Here        Amandeep	  Khurana	  |	  Solu7ons	  Architect  ...
About	  me     • Solu7ons	  Architect,	  Cloudera	  Inc     • Amazon	  Web	  Services     • Interested	  in	  large	  scal...
I’m	  a	  wanna-­‐be	  (author)                                          www.hbaseinac7on.com                 Nick Dimiduk...
About	  the	  workshop     • Install	  HBase     • Basic	  interac7ons	  with	  the	  shell     • Use	  of	  HBase	  Java	...
Get	  HBase	  and	  tutorial	  instruc7ons     • Get	  it	  here	  -­‐>	  hGp://hbase.apache.org/     • Downloads	  -­‐>	 ...
What’s	  Big	  Data?         ...	  except	  another	  marke7ng	  term 6Monday, April 1, 13
Data	  management 7Monday, April 1, 13
Data	  has	  changed                                             END-USER                                            APPLI...
“Big	  Data” 9Monday, April 1, 13
Hadoop	  -­‐	  Big	  Data	  poster	  child                                                        HDFS, MAPREDUCE 10Monday...
Hadoop	  -­‐	  Big	  Data	  poster	  child                        File System Mount                 UI Framework          ...
What	  is	  HBase?      • Scalable,	  distributed	  data	  store      • Sorted	  map	  of	  maps      • Open	  source	  av...
So,	  what’s	  the	  big	  deal?      • Give	  up	  system	  complexity	  and	  sophis7cated	  features	  for	  ease	     ...
Use	  cases      •   Canonical	  use	  case:	  storing	  crawl	  data	  and	  indices	  for	  search                      ...
Use	  cases	  (con7nued)      • Capturing	  incremental	  data      • Content	  serving      • Informa7on	  exchange 14Mon...
Ok,	  enough	  of	  high	  level	  stuff      • Big	  data	  is	  a	  buzzword	  but	  there	  is	  more	  to	  it         ...
HBase	  -­‐	  GeEng	  Started         Installa7on	  and	  basic	  interac7on 16Monday, April 1, 13
Installing	  HBase      • Get	  it	  here	  -­‐>	  hGp://hbase.apache.org/      • Downloads	  -­‐>	  Choose	  a	  mirror	 ...
Basics      • Modes	  of	  opera2on           • Standalone           • Pseudo-­‐distributed           • Fully	  distribute...
Shell      • Basic	  interac7on	  tool      • WriGen	  in	  JRuby      • Uses	  the	  Java	  client	  library      • Good	...
Shell	  (con7nued)      • bin/hbase	  shell      • Create	  table           • create	  ‘users’	  ,	  ‘info’      • List	  ...
Shell	  (con7nued)      •   Put	  a	  row               •put	  ‘users’	  ,	  ‘u1’,	  ‘info:name’	  ,	  ‘Mr.	  Rogers’     ...
Java	  API         ...	  can’t	  really	  build	  an	  applica7on	  just	  with	  the	  shell 22Monday, April 1, 13
Important	  terms      •   Table               •   Consists	  of	  rows	  and	  columns      •   Row               • Has	 ...
Introducing	  TwitBase      •   TwiSer	  clone	  built	  on	  HBase      •   Designed	  to	  scale      •   It’s	  an	  ex...
Connec7ng	  to	  HBase      •   Using default confs:          HTableInterface usersTable = new HTable("users");      •   P...
Inser7ng	  data      •   Put	  call	  on	  HTable          Put p = new Put(Bytes.toBytes("TheRealMT"));            p.add(B...
Data	  coordinates      • Row	  is	  addressed	  using	  rowkey      • Cell	  is	  addressed	  using	          	  	  	  	 ...
Upda7ng	  data      • Update	  ==	  Write        Put p = new Put(Bytes.toBytes("TheRealMT"));        p.add(Bytes.toBytes("...
Write	  path                                 WAL                                            Write goes to the write-ahead ...
Reading	  data      •   Get	  the	  en2re	  row          Get g = new Get(Bytes.toBytes("TheRealMT"));          Result r = ...
Reading	  data	  (con7nued)      •   Scan          Scan s = new Scan(startRow, stopRow);          ResultsScanner rs = twit...
What	  happens	  when	  you	  read	  data?                                          BlockCache                            ...
Dele7ng	  data      • Another	  simple	  API	  call      • Delete	  en7re	  row          Delete d = new Delete(Bytes.toByt...
Delete	  internals      • Tombstone	  markers      • Dele7on	  only	  happens	  during	  major	  compac7ons      • Compac7...
Compac7ons                      Memstore   Memstore                       HFile                 Two or more HFiles get com...
Hands-­‐on:	  Java	  client	  to	  HBase      •   Create	  a	  table	  from	  the	  HBase	  shell            • create	  ‘u...
DAO      • Like	  any	  other	  applica7on      • Makes	  data	  access	  management	  easy           • Cleaner	  code    ...
Admin     HBaseAdmin                                             Table	  Opera7ons     •    Administra7ve	  API	  to	  the...
Data	  Model         It’s	  not	  a	  rela7onal	  database	  system 39Monday, April 1, 13
HBase	  is	  ...      • Column	  family	  oriented	  database          • Column	  family	  oriented          • Tables	  co...
HBase	  is	  not	  ...      •   A	  rela7onal	  database               • No	  SQL	  query	  language               • No	  ...
Tabular	  representa7on                                                                                                   ...
Key-­‐Value	  store                                                             Keys    Values                       [TheR...
Key-­‐Value	  store                                  [TheRealMT, info, password, 1329088818321]                       abc1...
Sorted	  map	  of	  maps                                                         Rowkey                                   ...
HFiles	  and	  physical	  data	  model      •   HFiles	  are            • Immutable            • Sorted	  on	  rowkey	  +	...
HBase	  in	  distributed	  mode         ...	  what	  really	  goes	  on	  under	  the	  hood 47Monday, April 1, 13
Distributed	  mode      •   Table	  is	  split	  into	  Regions                                                           ...
Distributed	  mode	  (con7nued)      • Regions	  are	  served	  by	     DataNode     RegionServer                         ...
Finding	  your	  region      •   Two	  special	  tables:	  -­‐ROOT-­‐	  and	  .META.                                      ...
Regions	  distributed	  across	  the	  cluster                                                    T1-R1                   ...
What	  the	  client	  does                                               MyTable                                      .MET...
What	  the	  client	  does                                               MyTable                                      .MET...
What	  the	  client	  does                                                     MyTable                                    ...
What	  the	  client	  does                                                                   MyTable                      ...
What	  the	  client	  does                                                                   MyTable                      ...
HBase	  and	  Hadoop         ...	  they	  play	  well	  together 53Monday, April 1, 13
HBase	  and	  Hadoop      • HBase	  has	  7ght	  integra7on	  with	  Hadoop      • Two	  separate	  concerns          • Ac...
Parallel	  processing 55Monday, April 1, 13
MapReduce 56Monday, April 1, 13
HBase	  and	  MapReduce      • HBase	  is	  7ghtly	  integrated	  with	  the	  Hadoop	  stack          • MapReduce        ...
HBase	  as	  MR	  source      •   One	  mapper	  per	  region                                                          Map...
HBase	  as	  MR	  source	  (con7nued)      •   Mapper          protected void map(ImmutableBytesWritable	  rowkey,	  Resul...
HBase	  as	  MR	  sink      •   Write	  to	  regions	  via	  MR	  job                                                     ...
HBase	  as	  MR	  sink	  (con7nued)      •   Reducer          protected void reduce(ImmutableBytesWritable	  rowkey,      ...
HBase	  as	  a	  shared	  resource      •   Lookups	  from	  MR	  jobs      •   Map-­‐side	  joins      •   Simple	  Get/S...
Integra7on	  with	  HDFS      • HDFS	  is	  a	  filesystem      • Underlying	  storage	  fabric	  for	  HBase      • HBase	...
Availability	  and	  fault	  tolerance                                                 Disaster strikes on                ...
Durability      • WAL	  keeps	  writes	  intact	  even	  if	  RS	  fails      • HDFS	  is	  reliable	  and	  doesn’t	  los...
Summary      • Big	  data	  =	  new	  products,	  applica7ons      • Hadoop	  ecosystem	  -­‐	  poster	  child      • HBas...
67Monday, April 1, 13
Schema	  design         ...	  it’s	  a	  database	  aper-­‐all 68Monday, April 1, 13
But	  isn’t	  HBase	  schema-­‐less?      • Number	  of	  tables      • Rowkey	  design	        • Number	  of	  column	  f...
Rowkeys      • Rowkey	  design	  is	  the	  single	  most	  important	  aspect	  of	  HBase	          table	  designs     ...
TwitBase	  rela7onships      •   Users	  follow	  users      •   Rela2onships	  need	  to	  be	  persisted	  for	  usage	 ...
Start	  simple      •   Adjacency	  list                                                              Column Family : foll...
Op7mizing	  the	  adjacency	  list      •   We	  need	  a	  count           • Where	  does	  a	  new	  followed	  user	  g...
Adding	  a	  new	  user                      Row that needs to be updated                                                 ...
Transac7ons	  ==	  not	  good      • HBase	  doesn’t	  have	  na7ve	  support	  (think	  scale)      • Don’t	  want	  to	 ...
Revisit	  the	  ques7ons      • Read	  paGern          • Who	  all	  does	  A	  follow?          • Who	  all	  follows	  A...
Revisit	  the	  ques7ons 76Monday, April 1, 13
Denormaliza7on      • Second	  table	  for	  reverse	  rela7onship      • Otherwise	  scan	  across	  en7re	  table	  and	...
More	  op7miza7ons      • Convert	  into	  tall-­‐narrow	  table      • Leverage	  rowkey	  indexing	  beGer      • Gets	 ...
Tall-­‐narrow	  table	  example      •   Denormaliza7on	  is	  the	  way	  to	  go                                        ...
Uniform	  rowkey	  length      • MD5	  the	  userids	  -­‐>	  16	  bytes	  +	  16	  bytes	  rowkeys      • BeGer	  distrib...
Uniform	  rowkey	  length	  (con7nued)                                                                    f               ...
Tall	  v/s	  Wide	  tables	  storage	  footprint                               Logical representation of an HBase table.  ...
Rowkey	  design      • Single	  most	  important	  aspect	  of	  designing	  tables      • Depends	  on	  expected	  acces...
Write	  op7mized      • Distribute	  writes	  across	  the	  cluster          • Issue	  most	  pronounced	  with	  7me	  s...
Read	  op7mized      •   Data	  to	  be	  accessed	  together	  should	  be	  stored	  together            • eg:	  twit	  ...
Rela7onal	  to	  Non	  rela7onal      • Rela7onal	  concepts          • En77es          • AGributes          • Rela7onship...
Rela7onal	  to	  Non	  rela7onal	  (con7nued)      •   AGributes            • Iden7fying                  • Primary	  keys...
Nested	  En77es      •   Column	  Qualifiers	  can	  contain	  data	  instead	  of	  just	  a	  column	            name    ...
Schema	  design	  summary      •   Schema	  can	  make	  or	  break	  the	  performance	  you	  get      •   Rowkey	  is	 ...
Deployments	  and	  Opera2ons         ...	  for	  some	  real	  stuff,	  beyond	  your	  laptop 90Monday, April 1, 13
Components      • HDFS	  (DataNodes)          • Storage      • ZooKeeper          • Membership	  management      • RegionS...
Hardware      •   DataNodes	  and	  RegionServers	  collocated            • Plenty	  of	  RAM	  and	  Disk            • 8-...
Hardware	  (con7nued)      •   HBase	  Master	  and	  ZooKeeper	  collocated            • Don’t	  necessarily	  need	  to	...
Types	  of	  deployments      •   Standalone	  for	  dev	  purpose      •   Prototype              • <	  10	  nodes       ...
Deploying	  sopware	  and	  configura7on      • Automated	  sopware	  bits	  deployment	  highly	  recommended          • C...
Configura7ons      •   $HBASE_HOME/conf            • hbase-­‐env.sh                   •   Environment	  configura2ons       ...
Monitoring      • Metrics	  context	  from	  Hadoop          • Ganglia          • File	  context      • JMX          • Cac...
Monitoring	  (con7nued) 98Monday, April 1, 13
Performance	  tuning      • Performance	  tes2ng           • Use	  real	  (or	  synthe2c)	  workloads           • YCSB    ...
100Monday, April 1, 13
Upcoming SlideShare
Loading in …5
×

Building apps with HBase - Data Days Texas March 2013

2,027 views

Published on

  • Be the first to comment

Building apps with HBase - Data Days Texas March 2013

  1. 1. Building  applica2ons  with  HBase Headline  Goes  Here Amandeep  Khurana  |  Solu7ons  Architect Speaker  Name  or  Subhead  Goes  Here Data  Days  Texas,  Aus2n,  March  2013 1Monday, April 1, 13
  2. 2. About  me • Solu7ons  Architect,  Cloudera  Inc • Amazon  Web  Services • Interested  in  large  scale  distributed  systems • Co-­‐author,  HBase  In  Ac7on • TwiGer:  amansk 2Monday, April 1, 13
  3. 3. I’m  a  wanna-­‐be  (author) www.hbaseinac7on.com Nick Dimiduk Amandeep Khurana MANNING 3Monday, April 1, 13
  4. 4. About  the  workshop • Install  HBase • Basic  interac7ons  with  the  shell • Use  of  HBase  Java  API • Data  model • HBase  and  Hadoop  integra7on 4Monday, April 1, 13
  5. 5. Get  HBase  and  tutorial  instruc7ons • Get  it  here  -­‐>  hGp://hbase.apache.org/ • Downloads  -­‐>  Choose  a  mirror  -­‐>  0.94.1 • Download  the  file  hbase-­‐0.94.1.tar.gz • Get  the  tutorial  handout  -­‐>   hGp://people.apache.org/~ak/ddtx13 5Monday, April 1, 13
  6. 6. What’s  Big  Data? ...  except  another  marke7ng  term 6Monday, April 1, 13
  7. 7. Data  management 7Monday, April 1, 13
  8. 8. Data  has  changed END-USER APPLICATIONS THE INTERNET DATA GROWTH MOBILE DEVICES SOPHISTICATED MACHINES UNSTRUCTURED DATA – 90% STRUCTURED DATA – 10% 1980 2012 8Monday, April 1, 13
  9. 9. “Big  Data” 9Monday, April 1, 13
  10. 10. Hadoop  -­‐  Big  Data  poster  child HDFS, MAPREDUCE 10Monday, April 1, 13
  11. 11. Hadoop  -­‐  Big  Data  poster  child File System Mount UI Framework Machine Learning FUSE-DFS HUE APACHE MAHOUT Workflow Scheduling Metadata APACHE OOZIE APACHE OOZIE APACHE HIVE Languages / Compilers APACHE PIG, APACHE HIVE, APACHE MAHOUT Fast Data Integration Read/Write Access APACHE FLUME, APACHE SQOOP APACHE HBASE HDFS, MAPREDUCE APACHE ZOOKEEPER Coordination 10Monday, April 1, 13
  12. 12. What  is  HBase? • Scalable,  distributed  data  store • Sorted  map  of  maps • Open  source  avatar  of  Google’s  Bigtable • Sparse • Mul7  dimensional • Tightly  integrated  with  Hadoop • Not  a  RDBMS 11Monday, April 1, 13
  13. 13. So,  what’s  the  big  deal? • Give  up  system  complexity  and  sophis7cated  features  for  ease   of  scale  and  manageability • Simple  system  !=  easy  to  build  applica7ons  on  top • Goal:  predictable  read  and  write  performance  at  scale • Effec7ve  API  usage • Op7mal  schema  design • Understand  internals  and  leverage  at  scale 12Monday, April 1, 13
  14. 14. Use  cases • Canonical  use  case:  storing  crawl  data  and  indices  for  search Web Search powered by Bigtable Crawlers Internets Crawlers Bigtable Crawlers Crawlers 1 2 MapReduce Indexing the Internet 1 Crawlers constantly scour the Internet for new pages. Those pages are stored as individual records in Bigtable. 4 3 2 A MapReduce job runs over the entire table, generating Search search indexes for the Web Search application. Search Index Search Web Search Index Index 5 You Searching the Internet 3 The user initiates a Web Search request. 4 The Web Search application queries the Search Indexes and retries matching documents directly from Bigtable. 5 Search results are presented to the user. 13Monday, April 1, 13
  15. 15. Use  cases  (con7nued) • Capturing  incremental  data • Content  serving • Informa7on  exchange 14Monday, April 1, 13
  16. 16. Ok,  enough  of  high  level  stuff • Big  data  is  a  buzzword  but  there  is  more  to  it • New  products,  applica7ons,  types  of  data • Hadoop  ecosystem  is  the  poster  child  of  big  data  architectures • Consists  of  several  components • HBase  is  a  part  of  the  ecosystem • Scalable  database • Lots  of  produc7on  deployments 15Monday, April 1, 13
  17. 17. HBase  -­‐  GeEng  Started Installa7on  and  basic  interac7on 16Monday, April 1, 13
  18. 18. Installing  HBase • Get  it  here  -­‐>  hGp://hbase.apache.org/ • Downloads  -­‐>  Choose  a  mirror  -­‐>  0.94.1 • Download  the  file  hbase-­‐0.94.1.tar.gz • Untar  it • mkdir  ~/ddtx • cp  hbase-­‐0.94.1.tar.gz  ~/ddtx • cd  ~/ddtx • tar  xvf  hbase-­‐0.94.1.tar.gz 17Monday, April 1, 13
  19. 19. Basics • Modes  of  opera2on • Standalone • Pseudo-­‐distributed • Fully  distributed • Start  up  in  standalone  mode • cd  hbase-­‐0.94.1 • bin/start-­‐hbase.sh • Logs  are  in  ./logs/ • Check  out  hSp://localhost:60010  in  your  browser 18Monday, April 1, 13
  20. 20. Shell • Basic  interac7on  tool • WriGen  in  JRuby • Uses  the  Java  client  library • Good  place  to  start  poking  around 19Monday, April 1, 13
  21. 21. Shell  (con7nued) • bin/hbase  shell • Create  table • create  ‘users’  ,  ‘info’ • List  tables • list • Describe  table • describe  ‘users’ 20Monday, April 1, 13
  22. 22. Shell  (con7nued) • Put  a  row •put  ‘users’  ,  ‘u1’,  ‘info:name’  ,  ‘Mr.  Rogers’ • Get  a  row • get  ‘users’  ,  ‘u1’ • Put  more • put  ‘users’  ,  ‘u1’  ,  ‘info:email’  ,  ‘mail@rogers.com’ • put  ‘users’  ,  ‘u1’  ,  ‘info:password’  ,  ‘foobar’ • Get  a  row •get  ‘users’  ,  ‘u1’ • Scan  table • scan  ‘users’ 21Monday, April 1, 13
  23. 23. Java  API ...  can’t  really  build  an  applica7on  just  with  the  shell 22Monday, April 1, 13
  24. 24. Important  terms • Table • Consists  of  rows  and  columns • Row • Has  a  bunch  of  columns. • Iden2fied  by  a  rowkey  (primary  key) • Column  Qualifier • Dynamic  column  name • Column  Family • Column  groups  -­‐  logical  and  physical  (Similar  access  paSern) • Cell • The  actual  element  that  contains  the  data  for  a  row-­‐column  intersec2on • Version • Every  cell  has  mul2ple  versions. 23Monday, April 1, 13
  25. 25. Introducing  TwitBase • TwiSer  clone  built  on  HBase • Designed  to  scale • It’s  an  example,  not  a  produc2on  app • Start  with  users  and  twits  for  now • Users • Have  profiles • Post  twits • Twits • Have  text  posted  by  users • Get  sample  code  at:  hSps://github.com/hbaseinac2on/twitbase 24Monday, April 1, 13
  26. 26. Connec7ng  to  HBase • Using default confs: HTableInterface usersTable = new HTable("users"); • Passing custom conf: Configuration myConf = HBaseConfiguration.create(); myConf.set("parameter_name", "parameter_value"); HTableInterface usersTable = new HTable(myConf, "users"); • Connection pooling HTablePool pool = new HTablePool(); HTableInterface usersTable = pool.getTable("users"); // work with the table usersTable.close(); 25Monday, April 1, 13
  27. 27. Inser7ng  data • Put  call  on  HTable Put p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mark Twain")); p.add(Bytes.toBytes("info"), Bytes.toBytes("email"), Bytes.toBytes("samuel@clemens.org")); p.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes("Langhorne")); usersTable.put(p); 26Monday, April 1, 13
  28. 28. Data  coordinates • Row  is  addressed  using  rowkey • Cell  is  addressed  using              [rowkey  +  family  +  qualifier] 27Monday, April 1, 13
  29. 29. Upda7ng  data • Update  ==  Write Put p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes("abc123")); usersTable.put(p); 28Monday, April 1, 13
  30. 30. Write  path WAL Write goes to the write-ahead log (WAL) Memstore as well as the in memory write buffer called the Memstore. Client HFile Clients dont interact directly with the underlying HFiles during writes. HFile Memstore Memstore gets flushed to disk to New HFile form a new HFile when filled up with writes. HFile HFile 29Monday, April 1, 13
  31. 31. Reading  data • Get  the  en2re  row Get g = new Get(Bytes.toBytes("TheRealMT")); Result r = usersTable.get(g); • Get  selected  columns Get g = new Get(Bytes.toBytes("TheRealMT")); g.addColumn(Bytes.toBytes("info"), Bytes.toBytes("password")); Result r = usersTable.get(g); • Extract  the  value  out  of  the  Result  object byte[] b = r.getValue(Bytes.toBytes("info"), Bytes.toBytes("email")); String email = Bytes.toString(b); 30Monday, April 1, 13
  32. 32. Reading  data  (con7nued) • Scan Scan s = new Scan(startRow, stopRow); ResultsScanner rs = twits.getScanner(s); for(Result r : rs) { // iterate over scanner results } • Single threaded iteration over defined set of rows. Could be the entire table too. 31Monday, April 1, 13
  33. 33. What  happens  when  you  read  data? BlockCache Memstore Client HFile HFile 32Monday, April 1, 13
  34. 34. Dele7ng  data • Another  simple  API  call • Delete  en7re  row Delete d = new Delete(Bytes.toBytes("TheRealMT")); usersTable.delete(d); • Delete  selected  columns Delete d = new Delete(Bytes.toBytes("TheRealMT")); d.deleteColumns(Bytes.toBytes("info"), Bytes.toBytes("email")); usersTable.delete(d); 33Monday, April 1, 13
  35. 35. Delete  internals • Tombstone  markers • Dele7on  only  happens  during  major  compac7ons • Compac7ons • Minor • Major 34Monday, April 1, 13
  36. 36. Compac7ons Memstore Memstore HFile Two or more HFiles get combined Compacted to form a single HFile during HFile a compaction. HFile HFile HFile 35Monday, April 1, 13
  37. 37. Hands-­‐on:  Java  client  to  HBase • Create  a  table  from  the  HBase  shell • create  ‘users’  ,  ‘info’ • Write  a  program  to  do  the  following • Add  10  users  to  the  ‘users’  table. • userid  as  rowkey • user’s  name  in  the  column  ‘info:name’ • userid  could  just  be  user  +  serial  number.  eg:  user1,  user2,  user3 • Get  a  user  from  the  ‘users’  table  and  display  on  screen • Scan  the  en2re  ‘users’  table  and  display  on  screen 36Monday, April 1, 13
  38. 38. DAO • Like  any  other  applica7on • Makes  data  access  management  easy • Cleaner  code • Table  info  at  a  single  place • Op7miza7ons  like  connec7on  pooling  etc • Easier  management  of  database  configs 37Monday, April 1, 13
  39. 39. Admin HBaseAdmin Table  Opera7ons • Administra7ve  API  to  the  cluster. • createTable() • Provides  an  interface  to  manage   • deleteTable() HBase  database  tables  metadata   • addColumn() and  general  administra7ve   func7ons. • modifyColumn() • deleteColumn() • listTables()Monday, April 1, 13
  40. 40. Data  Model It’s  not  a  rela7onal  database  system 39Monday, April 1, 13
  41. 41. HBase  is  ... • Column  family  oriented  database • Column  family  oriented • Tables  consis2ng  of  rows  and  columns • Persisted  Map • Sparse • Mul2  dimensional • Sorted • Indexed  by  rowkey,  column  and  2mestamp • Key  Value  store • [rowkey,  col  family,  col  qualifier,  2mestamp]  -­‐>  cell  value 40Monday, April 1, 13
  42. 42. HBase  is  not  ... • A  rela7onal  database • No  SQL  query  language • No  joins • No  secondary  indexing • No  transac7ons 41Monday, April 1, 13
  43. 43. Tabular  representa7on 2 Column Family - Info 1 3 Rowkey name email password GrandpaD Mark Twain samuel@clemens.org abc123 HMS_Surprise Patrick OBrien aubrey@sea.com abc123 The table is lexicographically sorted on the rowkeys SirDoyle Fyodor Dostoyevsky fyodor@brothers.net abc123 TheRealMT Sir Arthur Conan Doyle art@TheQueensMen.co.uk Langhorne abc123 4 Cells ts1=1329088321289 ts2=1329088818321 Each cell has multiple The coordinates used to identify data in an HBase table are: versions, (1) rowkey, (2) column family, (3) column qualifier, (4) version typically represented by the timestamp of when they were inserted into the table (ts2>ts1) 42Monday, April 1, 13
  44. 44. Key-­‐Value  store Keys Values [TheRealMT, info, password, 1329088818321] abc123 [TheRealMT, info, password, 1329088321289] Langhorne A single KeyValue instance 43Monday, April 1, 13
  45. 45. Key-­‐Value  store [TheRealMT, info, password, 1329088818321] abc123 1 Start with coordinates of full precision { 1329088818321 : "abc123", [TheRealMT, info, password] 1329088321289 : "Langhorne" } 2 Drop version and youre left with a map of version to values Keys { "email" : { 1329088321289 : "samuel@clemens.org" }, "name" : { [TheRealMT, info] 1329088321289 : "Mark Twain" }, Values "password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } 3 Omit qualifier and you have a map of qualifiers to the previous maps { "info" : { "email" : { 1329088321289 : "samuel@clemens.org" }, "name" : { 1329088321289 : "Mark Twain" [TheRealMT] }, "password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } } 4 Finally, drop the column family and you have a row, a map of maps 44Monday, April 1, 13
  46. 46. Sorted  map  of  maps Rowkey { "TheRealMT" : { Column family "info" : { "email" : { 1329088321289 : "samuel@clemens.org" }, "name" : { 1329088321289 : "Mark Twain" Column qualifiers }, "password" : { Values 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } } Versions } 45Monday, April 1, 13
  47. 47. HFiles  and  physical  data  model • HFiles  are • Immutable • Sorted  on  rowkey  +  qualifier  +  7mestamp • In  the  context  of  a  column  family  per  region "TheRealMT" , "info" , "email" , 1329088321289, "samuel@clemens.org" "TheRealMT" , "info" , "name" , 1329088321289 , "Mark Twain" "TheRealMT" , "info" , "password" , 1329088818321 , "abc123", "TheRealMT" , "info" , "password" , 1329088321289 , "Langhorne" HFile for the info column family in the users table 46Monday, April 1, 13
  48. 48. HBase  in  distributed  mode ...  what  really  goes  on  under  the  hood 47Monday, April 1, 13
  49. 49. Distributed  mode • Table  is  split  into  Regions T1R1 00001 John 415-111-1234 00002 Paul 408-432-9922 00001 John 415-111-1234 00003 Ron 415-993-2124 00001 John 415-111-1234 00002 Paul 408-432-9922 00004 Bob 818-243-9988 00002 Paul 408-432-9922 00003 Ron 415-993-2124 00003 Ron 415-993-2124 00004 Bob 818-243-9988 00004 Bob 818-243-9988 T1R2 00005 Carly 206-221-9123 00005 Carly 206-221-9123 00005 Carly 206-221-9123 00006 Scott 818-231-2566 00006 Scott 818-231-2566 00006 Scott 818-231-2566 00007 Simon 425-112-9877 00007 Simon 425-112-9877 00007 Simon 425-112-9877 00008 Lucas 415-992-4432 00008 Lucas 415-992-4432 00008 Lucas 415-992-4432 00009 Steve 530-288-9832 00010 Kelly 916-992-1234 00009 Steve 530-288-9832 T1R3 00011 Betty 650-241-1192 00010 Kelly 916-992-1234 00012 Anne 206-294-1298 00011 Betty 650-241-1192 00009 Steve 530-288-9832 00012 Anne 206-294-1298 00010 Kelly 916-992-1234 00011 Betty 650-241-1192 00012 Anne 206-294-1298 Full table T1 Table T1 split into 3 regions - R1, R2 and R3 48Monday, April 1, 13
  50. 50. Distributed  mode  (con7nued) • Regions  are  served  by   DataNode RegionServer T1R1 00001 00002 John Paul 415-111-1234 408-432-9922 RegionServers 00003 00004 Ron Bob 415-993-2124 818-243-9988 • RegionServers  are  Java   Host1 T1R2 processes,  typically   00005 Carly 206-221-9123 00006 Scott 818-231-2566 DataNode RegionServer 00007 Simon 425-112-9877 00008 Lucas 415-992-4432 collocated  with  the   T1R3 00009 00010 Steve Kelly 530-288-9832 916-992-1234 DataNodes Host2 00011 00012 Betty Anne 650-241-1192 206-294-1298 DataNode RegionServer Host3 49Monday, April 1, 13
  51. 51. Finding  your  region • Two  special  tables:  -­‐ROOT-­‐  and  .META. -ROOT- table, consisting of only 1 region -ROOT- hosted on region server RS1 RS1 .META. table, consisting of 3 regions, M1 M2 M3 hosted on RS1, RS2 and RS3 RS2 RS3 RS1 User tables T1 and T2, consisting of T1R1 T1R2 T2R1 T1R3 T2R2 T2R3 T2R4 4 and 3 regions respectively, distributed amongst RS1, RS2 and RS3 RS3 RS1 RS2 RS3 RS2 RS1 RS1 50Monday, April 1, 13
  52. 52. Regions  distributed  across  the  cluster T1-R1 00001 John 415-111-1234 DataNode RegionServer 00002 Paul 408-432-9922 00003 Ron 415-993-2124 RegionServer 1 (RS1), hosting regions R1 of 00004 Bob 818-243-9988 the user table T1 and M2 of .META. .META. - M2 T1:00009-T1:00012 R3 RS2 Host1 T1-R2 00005 Carly 206-221-9123 00006 Scott 818-231-2566 00007 Simon 425-112-9877 DataNode RegionServer 00008 Lucas 415-992-4432 T1-R3 RegionServer 2 (RS2), hosting regions R2, 00009 Steve 530-288-9832 R3 of the user table and M1 of .META. 00010 Kelly 916-992-1234 00011 Betty 650-241-1192 00012 Anne 206-294-1298 Host2 .META. - M1 T1:00001-T1:00004 R1 RS1 T1:00005-T1:00008 R2 RS2 DataNode RegionServer -ROOT- M:T100001-M:T100009 M1 RS2 RegionServer 3 (RS3), hosting just -ROOT- M:T100010-M:T100012 M2 RS1 Host3 51Monday, April 1, 13
  53. 53. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper 52Monday, April 1, 13
  54. 54. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper 52Monday, April 1, 13
  55. 55. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper Row per META region 52Monday, April 1, 13
  56. 56. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper Row per META region Row per table region 52Monday, April 1, 13
  57. 57. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper Row per META region Row per table region 52Monday, April 1, 13
  58. 58. HBase  and  Hadoop ...  they  play  well  together 53Monday, April 1, 13
  59. 59. HBase  and  Hadoop • HBase  has  7ght  integra7on  with  Hadoop • Two  separate  concerns • Access  via  MapReduce • Storage  on  HDFS 54Monday, April 1, 13
  60. 60. Parallel  processing 55Monday, April 1, 13
  61. 61. MapReduce 56Monday, April 1, 13
  62. 62. HBase  and  MapReduce • HBase  is  7ghtly  integrated  with  the  Hadoop  stack • MapReduce • HDFS • Source  for  MapReduce  jobs • Read  from  an  HBase  Table  with  a  Map-­‐Reduce  Job  (Export,  Row  Count,  ...) • Sink  for  MapReduce  jobs • Wri7ng  to  an  HBase  Table  from  a  Map-­‐Reduce  Job • Lookups  during  MapReduce  jobs • Join  between  hbase  and/or  another  source 57Monday, April 1, 13
  63. 63. HBase  as  MR  source • One  mapper  per  region Map Tasks - one per region. The tasks take the key range of the region as their input split and scan over it Regions being Region Region …. Region served by the Region Server Server Server Server 58Monday, April 1, 13
  64. 64. HBase  as  MR  source  (con7nued) • Mapper protected void map(ImmutableBytesWritable  rowkey,  Result  result, Context context) { // mapper logic goes here } • Job  setup Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("twits"), Bytes.toBytes("twit")); TableMapReduceUtil.initTableMapperJob("twits", scan, Map.class, ImmutableBytesWritable.class, Result.class, job); 59Monday, April 1, 13
  65. 65. HBase  as  MR  sink • Write  to  regions  via  MR  job Reduce tasks writing to HBase regions. Reduce tasks dont necessarily write to a region on the same physical host. They will write to whichever region contains the key range that they are writing into. This could potentially mean that all reduce tasks talk to all regions on the cluster Region Region …. Region Regions being served by the Server Server Server Region Server 60Monday, April 1, 13
  66. 66. HBase  as  MR  sink  (con7nued) • Reducer protected void reduce(ImmutableBytesWritable  rowkey, Iterable<Put> values, Context context) { // reduce logic goes here } • Job  setup TableMapReduceUtil.initTableReducerJob("users", Reducer.class, job); 61Monday, April 1, 13
  67. 67. HBase  as  a  shared  resource • Lookups  from  MR  jobs • Map-­‐side  joins • Simple  Get/Scan  calls HBase Regions being served by the Region Server Region Region Region Server Server Server Map Tasks reading files from HDFS as their source and doing random gets or short scans Datanode Datanode …. Datanode from HBase 62Monday, April 1, 13
  68. 68. Integra7on  with  HDFS • HDFS  is  a  filesystem • Underlying  storage  fabric  for  HBase • HBase  7ghtly  integrated  with  HDFS  and  leverages  the  seman7cs • LSM  trees  fit  well  into  a  write  once  filesystem • Availability  (fault  tolerance)  and  reliability  (durability)  at  scale 63Monday, April 1, 13
  69. 69. Availability  and  fault  tolerance Disaster strikes on the host which was serving region R1 Physical host Physical host Physical host Physical host Region Server Region Server Region Server Region Server Serving region R1 HFile1 HFile1 HFile1 Datanode Datanode Datanode Datanode HDFS Another server picks up the region and starts serving it from the persisted HFile from HDFS. The lost copy of the HFile will get replicated to another Datanode Physical host Physical host Physical host Physical host Region Server Region Server Region Server Region Server Serving region R1 HFile1 HFile1 HFile1 Datanode Datanode Datanode Datanode HDFS 64Monday, April 1, 13
  70. 70. Durability • WAL  keeps  writes  intact  even  if  RS  fails • HDFS  is  reliable  and  doesn’t  lose  data  if  individual  nodes  fail 65Monday, April 1, 13
  71. 71. Summary • Big  data  =  new  products,  applica7ons • Hadoop  ecosystem  -­‐  poster  child • HBase  serves  many  use  cases • Allows  to  solve  aspects  of  big  data  problems • Simple  Java  API • Tight  Hadoop  integra7on • Low  latency  access  to  large  amounts  of  data • Large  scale  computa7on  using  MR 66Monday, April 1, 13
  72. 72. 67Monday, April 1, 13
  73. 73. Schema  design ...  it’s  a  database  aper-­‐all 68Monday, April 1, 13
  74. 74. But  isn’t  HBase  schema-­‐less? • Number  of  tables • Rowkey  design   • Number  of  column  families  per  table.  What  goes  into  what   column  family • Column  qualifier  names • What  goes  into  the  cells • Number  of  versions 69Monday, April 1, 13
  75. 75. Rowkeys • Rowkey  design  is  the  single  most  important  aspect  of  HBase   table  designs • The  only  way  to  address  rows  in  HBase 70Monday, April 1, 13
  76. 76. TwitBase  rela7onships • Users  follow  users • Rela2onships  need  to  be  persisted  for  usage  later  on • Model  tables  for  the  expected  access  paSerns • Read  paSern • Who  does  A  follow? • Who  follows  A? • Does  A  follow  B? • Write  paSern • A  follows  B • A  unfollows  B 71Monday, April 1, 13
  77. 77. Start  simple • Adjacency  list Column Family : follows row key: userid column qualifier: followed user number cell value: followed userid Cell value Col Qualifier follows TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers TheRealMT 1:HRogers 2:Olivia 72Monday, April 1, 13
  78. 78. Op7mizing  the  adjacency  list • We  need  a  count • Where  does  a  new  followed  user  go? follows TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers count:4 TheRealMT 1:HRogers 2:Olivia count:2 73Monday, April 1, 13
  79. 79. Adding  a  new  user Row that needs to be updated follows TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers count:4 TheRealMT 1:HRogers 2:Olivia count:2 1 TheFakeMT : follows: {count -> 4} 2 increment count Client code: TheFakeMT : follows: {count -> 5} Step 1: Get current count Step 2: Update count Step 3: Add new entry 3 add new entry Step 4: Write the new data to HBase TheFakeMT : follows: {5 -> MTFanBoy2, count -> 5} 4 follows TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers 5:MTFanBoy2 count:5 TheRealMT 1:HRogers 2:Olivia count:2 74Monday, April 1, 13
  80. 80. Transac7ons  ==  not  good • HBase  doesn’t  have  na7ve  support  (think  scale) • Don’t  want  to  complicate  client  side  logic • Only  solu7on  -­‐>  simplify  schema follows TheFakeMT TheRealMT:1 MTFanBoy:1 Olivia:1 HRogers:1 TheRealMT HRogers:1 Olivia:1 75Monday, April 1, 13
  81. 81. Revisit  the  ques7ons • Read  paGern • Who  all  does  A  follow? • Who  all  follows  A? • Does  A  follow  B? • Write  paGern • A  follows  B • A  unfollows  B 76Monday, April 1, 13
  82. 82. Revisit  the  ques7ons 76Monday, April 1, 13
  83. 83. Denormaliza7on • Second  table  for  reverse  rela7onship • Otherwise  scan  across  en7re  table  and  affect  read  performance Normalization Dreamland Write performance Poor design Denormalization Read performance 77Monday, April 1, 13
  84. 84. More  op7miza7ons • Convert  into  tall-­‐narrow  table • Leverage  rowkey  indexing  beGer • Gets  -­‐>  short  Scans Keeping the column family and column qualifier names short reduces the data transferred over the network back to the client. The KeyValue objects become smaller. CF : f The + in the row key refers to concatenating row key: CQ: followed users name the two values. You could delimitate follower + followed using any character you like. eg: A-B or A,B cell value: 1 78Monday, April 1, 13
  85. 85. Tall-­‐narrow  table  example • Denormaliza7on  is  the  way  to  go f Putting the user name in the column qualifier saves you from looking up TheFakeMT+TheRealMT Mark Twain:1 the users table for the name of the user given an id. You can simply TheFakeMT+MTFanBoy Amandeep Khurana:1 list out names or ids while looking TheFakeMT+Olivia Olivia Clemens:1 at relationships just from this table. The downside of this is that you need TheFakeMT+HRogers Henry Rogers:1 to update the name in all the cells TheRealMT+Olivia Olivia Clemens:1 if the user updates their name in their profile. TheRealMT+HRogers Henry Rogers:1 This is classic Denormalization. 79Monday, April 1, 13
  86. 86. Uniform  rowkey  length • MD5  the  userids  -­‐>  16  bytes  +  16  bytes  rowkeys • BeGer  distribu7on  of  load CF : f Using MD5 of the user ids gives you row key: CQ: followed userid fixed lengths instead of variable md5(follower)md5(followed) length user ids. You dont need concatenation logic anymore. cell value: followed users name 80Monday, April 1, 13
  87. 87. Uniform  rowkey  length  (con7nued) f MD5(TheFakeMT) MD5(TheRealMT) TheRealMT:Mark Twain MD5(TheFakeMT) MD5(MTFanBoy) MTFanBoy:Amandeep Khurana MD5(TheFakeMT) MD5(Olivia) Olivia:Olivia Clemens MD5(TheFakeMT) MD5(HRogers) HRogers:Henry Rogers MD5(TheRealMT) MD5(Olivia) Olivia:Olivia Clemens MD5(TheRealMT) MD5(HRogers) HRogers:Henry Rogers 81Monday, April 1, 13
  88. 88. Tall  v/s  Wide  tables  storage  footprint Logical representation of an HBase table. Actual physical storage of the table Well look at what it means to Get() row r5 from this table. CF1 CF2 HFile for CF1 HFile for CF2 r1 c1:v1 c1:v9 c6:v2 r1:CF1:c1:t1:v1 r2 c1:v2 c3:v6 r2:CF1:c1:t2:v2 r1:CF2:c1:t1:v9 r2:CF1:c3:t3:v6 r1:CF2:c6:t4:v2 r3 c2:v3 c5:v6 r3:CF1:c2:t1:v3 r3:CF2:c5:t4:v6 r4:CF1:c2:t1:v4 r5:CF2:c7:t3:v8 r4 c2:v4 r5:CF1:c1:t2:v1 r5:CF1:c3:t3:v5 r5 c1:v1 c3:v5 c7:v8 Result object returned for a Get() on row r5 r5:CF1:c1:t2:v1 r5:CF1:c3:t3:v5 KeyValue objects r5:cf2:c7:t3:v8 Key Value Row Col Col Time Cell Key Fam Qual Stamp Value Structure of a KeyValue object 82Monday, April 1, 13
  89. 89. Rowkey  design • Single  most  important  aspect  of  designing  tables • Depends  on  expected  access  paGerns • HFiles  are  sorted  on  Key  part  of  KeyValue  objects "TheRealMT" , "info" , "email" , 1329088321289, "samuel@clemens.org" "TheRealMT" , "info" , "name" , 1329088321289 , "Mark Twain" "TheRealMT" , "info" , "password" , 1329088818321 , "abc123", "TheRealMT" , "info" , "password" , 1329088321289 , "Langhorne" HFile for the info column family in the users table 83Monday, April 1, 13
  90. 90. Write  op7mized • Distribute  writes  across  the  cluster • Issue  most  pronounced  with  7me  series  data • Hashing hash("TheRealMT") -> random byte[] • Sal7ng int salt = new Integer(new Long(timestamp).hashCode()).shortValue() % <number of region servers>; byte[] rowkey = Bytes.add(Bytes.toBytes(salt) + Bytes.toBytes("|") + Bytes.toBytes(timestamp)); 84Monday, April 1, 13
  91. 91. Read  op7mized • Data  to  be  accessed  together  should  be  stored  together • eg:  twit  streams  -­‐  last  10  twits  by  the  users  I  follow Olivia1 1Olivia Olivia2 1TheRealMT Olivia5 2Olivia Olivia7 2TheFakeMT Olivia9 2TheRealMT TheFakeMT2 3TheFakeMT TheFakeMT3 4TheFakeMT TheFakeMT4 5Olivia TheFakeMT5 5TheFakeMT TheFakeMT6 5TheRealMT TheRealMT1 6TheFakeMT TheRealMT2 7Olivia TheRealMT5 8TheRealMT TheRealMT8 9Olivia 85Monday, April 1, 13
  92. 92. Rela7onal  to  Non  rela7onal • Rela7onal  concepts • En77es • AGributes • Rela7onships • En77es • Table  is  a  table.  Not  much  going  on  there • Users  table  contains...  users.  Those  are  en77es • Good  place  to  start.  Denormaliza7on  bends  that  rule  a  bit 86Monday, April 1, 13
  93. 93. Rela7onal  to  Non  rela7onal  (con7nued) • AGributes • Iden7fying • Primary  keys.  Compound  keys • Maps  to  rowkeys • Non-­‐iden7fying • Other  columns • Maps  to  column  qualifiers  and  cells • Rela7onships 87Monday, April 1, 13
  94. 94. Nested  En77es • Column  Qualifiers  can  contain  data  instead  of  just  a  column   name hbase table row key column family fixed qualifier → timestamp → value Nested entities repeating entity variable qualifier → timestamp → value 88Monday, April 1, 13
  95. 95. Schema  design  summary • Schema  can  make  or  break  the  performance  you  get • Rowkey  is  the  single  most  important  thing • Use  tricks  like  hashing  and  sal2ng • Denormalize  to  your  advantage • There  are  no  joins • Isolate  access  paSerns • Separate  CFs  or  even  separate  tables • Shorter  names  -­‐>  lower  storage  footprint • Column  qualifiers  can  be  used  to  store  data  and  not  just  column  names • Big  difference  as  compared  to  RDBMS 89Monday, April 1, 13
  96. 96. Deployments  and  Opera2ons ...  for  some  real  stuff,  beyond  your  laptop 90Monday, April 1, 13
  97. 97. Components • HDFS  (DataNodes) • Storage • ZooKeeper • Membership  management • RegionServers • Serve  the  regions • HBase  Masters • Janitorial  work 91Monday, April 1, 13
  98. 98. Hardware • DataNodes  and  RegionServers  collocated • Plenty  of  RAM  and  Disk • 8-­‐12  cores • 48  GB  RAM • 12  x  1  TB  disks 92Monday, April 1, 13
  99. 99. Hardware  (con7nued) • HBase  Master  and  ZooKeeper  collocated • Don’t  necessarily  need  to  be  though • Keep  odd  number  of  ZKs • ~3  is  good  enough  for  upto  50-­‐70  node  cluster  typically • Give  ZK  independent  spindle  for  log  wri7ng • I/O  latency  sensi7ve • 4-­‐6  cores • 24  GB  RAM 93Monday, April 1, 13
  100. 100. Types  of  deployments • Standalone  for  dev  purpose • Prototype • <  10  nodes • No  real  SLAs • 1  ZK  +  Master,  rest  are  RS • Small  produc2on  cluster • 10-­‐20  nodes • 1  ZK  +  Master,  maybe  NNHA,  rest  are  RS • Medium  produc2on  cluster • 20-­‐50  nodes • 3  ZK  +  Master,  NNHA,  rest  are  RS • Large  produc2on  cluster • 50+  nodes • 3  or  5  ZK  +  Master,  NNHA,  rest  are  RS • Hardware  choices  remain  the  same  mostly 94Monday, April 1, 13
  101. 101. Deploying  sopware  and  configura7on • Automated  sopware  bits  deployment  highly  recommended • Chef • Puppet • Cloudera  Manager • Centralized  configura7on  management  highly  recommended • Consistency • Ease  of  management 95Monday, April 1, 13
  102. 102. Configura7ons • $HBASE_HOME/conf • hbase-­‐env.sh • Environment  configura2ons • hbase-­‐site.xml • HBase  system  configura2ons • Linux  configura2ons • Number  of  open  files • swappiness • HDFS  configura2ons • $HADOOP_HOME/hdfs-­‐site.xml 96Monday, April 1, 13
  103. 103. Monitoring • Metrics  context  from  Hadoop • Ganglia • File  context • JMX • Cac7 • Several  metrics  available • Write  path  specific • Read  path  specific 97Monday, April 1, 13
  104. 104. Monitoring  (con7nued) 98Monday, April 1, 13
  105. 105. Performance  tuning • Performance  tes2ng • Use  real  (or  synthe2c)  workloads • YCSB • Tuning • Write  op2mized • Memstore • Random-­‐read  op2mized • Block  cache • Smaller  block  size • Sequen2al-­‐read  op2mized • Block  cache • Higher  block  size • GC  tuning 99Monday, April 1, 13
  106. 106. 100Monday, April 1, 13

×