Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBase In Action - Chapter 04: HBase table design

617 views

Published on

HBase In Action - Chapter 04: HBase table design

Learning HBase, Real-time Access to Your Big Data, Data Manipulation at Scale, Big Data, Text Mining, HBase, Deploying HBase

Published in: Education
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

HBase In Action - Chapter 04: HBase table design

  1. 1. CHAPTER 04: HBASE TABLE DESIGN HBase IN ACTION by Nick Dimiduk et. al.
  2. 2. Overview: HBase table design  HBase schema design concepts  Mapping relational modeling knowledge to the HBase world  Advanced table definition parameters  HBase Filters to optimize read performance
  3. 3. 4.1 How to approach schema design  When we say schema, we include the following considerations:  How many column families should the table have?  What data goes into what column family?  How many columns should be in each column family?  What should the column names be?  What information should go into the cells?  How many versions should be stored for each cell?  What should the rowkey structure be, and what should it contain?
  4. 4. Hbase Course  Data Manipulation at Scale: Systems and Algorithms  Using HBase for Real-time Access to Your Big Data
  5. 5. 4.1.1 Modeling for the questions  A table store data about what users a particular user follows, support  read the entire list of users,  and query for the presence of a specific user in that list
  6. 6. 4.1.1 Modeling for the questions (cont.)
  7. 7. 4.1.1 Modeling for the questions (cont.)  Thinking further along those lines, you can come up with the following questions: 1. Whom does TheFakeMT follow? 2. Does TheFakeMT follow TheRealMT? 3. Who follows TheFakeMT? 4. Does TheRealMT follow TheFakeMT?
  8. 8. 4.1.2 Defining requirements: more work up front always pays  From the perspective of TwitBase, you expect data to be written to HBase when the following things happen:  A user follows someone  A user unfollows someone they were following
  9. 9. 4.1.2 Defining requirements: more work up front always pays (cont.)
  10. 10. 4.1.2 Defining requirements: more work up front always pays (cont.)  What is different from design tables in relational systems and tables in HBase?
  11. 11. 4.1.3 Modeling for even distribution of data and load
  12. 12. 4.1.3 Modeling for even distribution of data and load (cont.)
  13. 13. 4.1.4 Targeted data access  Only the keys are indexed in HBase tables.  There are two ways to retrieve data from a table: Get and Scan.  HBase tables are flexible, and you can store anything in the form of byte[].  Store everything with similar access patterns in the same column family.  Indexing is done on the Key portion of the KeyValue objects, consisting of the rowkey, qualifier, and timestamp in that order.  Tall tables can potentially allow you to move toward O(1) operations, but you trade atomicity
  14. 14. 4.1.4 Targeted data access (cont.)  De-normalizing is the way to go when designing HBase schemas.  Think how you can accomplish your access patterns in single API calls rather than multiple API calls.  Hashing allows for fixed-length keys and better distribution but takes away ordering.  Column qualifiers can be used to store data, just like cells.  The length of column qualifiers impacts the storage footprint because you can put data in them.  The length of the column family name impacts the size of data sent over the wire to the client (in KeyValue objects).
  15. 15. 4.2 De-normalization is the word in HBase land  One of the key concepts when designing HBase tables is de-normalization.
  16. 16. 4.3 Heterogeneous data in the same table  HBase schemas are flexible, and you’ll use that flexibility now to avoid doing scans every time you want a list of followers for a given user.  Isolate different access patterns as much as possible.  The way to improve the load distribution in this case is to have separate tables for the two types of relationships you want to store.
  17. 17. 4.4 Rowkey design strategies  In designing HBase tables, the rowkey is the single most important thing.  Your rowkeys determine the performance you get while interacting with HBase tables.  Unlike relational databases, where you can index on multiple columns, Hbase indexes only on the key;
  18. 18. 4.5 I/O considerations  The sorted nature of HBase tables can turn out to be a great thing for your application—or not  Optimized for writes  HASHING  SALTING  Optimized for reads  Cardinality and rowkey structure
  19. 19. 4.6 From relational to non-relational  There is no simple way to map your relational database knowledge to HBase. It’s a different paradigm of thinking  Things don’t necessarily map 1:1, and these concepts are evolving and being defined as the adoption of NoSQL systems increases.
  20. 20. 4.6.1 Some basic concepts  ENTITIES  These map to tables.  In both relational databases and HBase, the default container for an entity is a table, and each row in the table should represent one instance of that entity.  ATTRIBUTES  These map to columns.  Identifying attribute: This is the attribute that uniquely identifies exactly one instance of an entity (that is, one row).  Non-identifying attribute: Non-identifying attributes are easier to map.
  21. 21. 4.6.1 Some basic concepts (cont.)  RELATIONSHIPS  These map to foreign-key relationships.  There is no direct mapping of these in HBase, and often it comes down to denormalizing the data.  HBase, not having any built-in joins or constraints, has little use for explicit relationships.
  22. 22. 4.6.2 Nested entities  In Hbase, the columns (also known as column qualifiers) aren’t predefined at design time.
  23. 23. 4.6.2 Nested entities (cont.)  it’s possible to model it in HBase as a single row.  There are some limitations to this  this technique only works to one level deep: your nested entities can’t themselves have nested entities.  it’s not as efficient to access an individual value stored as a nested column qualifier inside a row
  24. 24. 4.6.3 Some things don’t map  COLUMN FAMILIES  (LACK OF) INDEXES  VERSIONING
  25. 25. 4.7 Advanced column family configurations  HBase has a few advanced features that you can use when designing your tables.  Configurable block size  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', BLOCKSIZE => '65536'}  Block cache  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', BLOCKCACHE => 'false’}  Aggressive caching  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', IN_MEMORY => 'true'}
  26. 26. 4.7 Advanced column family configurations (cont.)  Bloom filters  hbase(main):007:0> create 'mytable', {NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}  The default value for the BLOOMFILTER parameter is NONE.  TTL  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'}  Compression  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', COMPRESSION => 'SNAPPY'}  Cell versioning  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 1}
  27. 27. 4.8 Filtering data  Filters are a powerful feature that can come in handy in such cases.  HBase provides an API you can use to implement custom filters.
  28. 28. 4.8.1 Implementing a filter  Implement custom filter by extending FilterBase abstract class  The filtering logic goes in the filterKeyValue(..) method  To install custom filters  have to compile them into a JAR and put them in the HBase classpath so they get picked up by the RegionServers at startup time.  To compile the JAR, in the top-level directory of the project, do the following:  mvn install  cp target/twitbase-1.0.0.jar /my/folder/
  29. 29. 4.8.2 Prebundled filters  ROWFILTER  PREFIXFILTER  QUALIFIERFILTER  VALUEFILTER  TIMESTAMPFILTER  FILTERLIST
  30. 30. Hbase Course  Data Manipulation at Scale: Systems and Algorithms  Using HBase for Real-time Access to Your Big Data
  31. 31. 4.9 Summary  It’s about the questions, not the relationships.  Design is never finished.  Scale is a first-class entity.  Every dimension is an opportunity.

×