More Related Content


More from Cloudera, Inc.(20)


HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimball, WibiData

  1. Living Data: Applying Adaptable Schemas to HBase Aaron Kimball – CTO WibiData, Inc.
  2. HBase is a nexus for your data
  3. HBase: Schema free (unfortunately) • Cells only hold byte arrays • Column names implicitly defined by apps • Each app must (de)serialize values correctly • Changing a schema requires rewriting a column—and updating every reader/writer
  4. Datatypes can get rooted in place =
  5. Avro: Flexible schemas =
  6. Avro decouples schemas
  7. Every cell stores its schema (hash)
  8. Layout table stores common schemas <column> <name>info:email</name> <description>User email address</description> <schema>“string”</schema> </column> • Data dictionary provides reference to engineers on different projects • Common schemas used by tools that want to enforce a “default” schema for a column (e.g., Sqoop-based exports)
  9. Conclusions • Avro allows decoupled applications to: – Share the same data store – Change individual applications without downtime – Eliminates need to structurally modify data • Layout management allows: – Developers to communicate about data without using code – Data-agnostic applications to manipulate structured information
  10. / @wibidata Aaron Kimball –