Open Sourcing Tasmo


Published on

Tasmo is an open source, high performance, and easy to use system for storing and retrieving data objects in HBase. It let's developers model application data using a simple system of Events and Materialized Views, freeing them from having to handle complex join and filter logic. It's highly optimized for read performance; a Materialized View is served with a single HBase row.

Tasmo attempts to combine the scale, speed and fault-tolerance of a Big Data architecture with the developer productivity of a traditional database.

Jive is the world's #1 social business platform. We help employees, customers and partners connect, collaborate and communicate to achieve breakthrough results in sales, marketing, customer service and workforce productivity.

Learn more at

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Open Sourcing Tasmo

  2. 2. Tasmo Materialized Views of Event Streams using HBase Presenters: Pete Matern Jonathan Colt
  3. 3. 3 © Jive confidential What’s the problem • Joining to death at read time • With our operational constraints of a single point of failure (single db instance) • Can only scale up - not out • Read load far exceeds write load • Read every field of an object every time any field changed to support indexing • Read every field of an object to update one
  4. 4. 4 © Jive confidential What we needed • Joins performed at write time (materialized views) • Horizontally scalable • No single point of failure • Incremental updates • Notification of changes • Idempotency • Tolerance of duplicate and out of order input • Front end developers work against their object model rather than HBase specific constructs.
  5. 5. 5 © Jive confidential What we built: Tasmo Stateless HA service which • Maintains materialized views of data • Consumes our model (declaration of input and output types) • Notifies consumers when views change • Replaces all our relational db usage
  6. 6. 6 © Jive confidential How we consume and render our model • Every reader of our model defines views for Tasmo to maintain • Views contain joined/filtered data specific to point of use • Readers of these views render output or further process the data events Hbase ReadersTasmo read viewsread / write View definition ViewsViewsViews
  7. 7. 7 © Jive confidential How we declare our input and output (Model) Type: Content ● Subject: String ● Body: String ● Container: Reference ● Author: Reference Event Declarations Type: User ● Username: String ● First Name: String ● Last Name: String ● Creation Date: Long Type: Content ● Subject ● Container (Type: Folder) ○ Name ○ ModDate ● Author (Type: User) ○ Username ○ CreationDate View Declaration
  8. 8. 8 © Jive confidential Event > Model > View > Web Page body = “When can we try it?” Model Container Content Author Comment Tasmo Hbase View Comment Event
  9. 9. 9 © Jive confidential Web Page backed by View Instance
  10. 10. 10 © Jive confidential How we notify consumers • Consumers register for notifications on a type of view • Applying an event to the model in Tasmo results in the set of affected view instances. • We push the modified view instances to registered consumers Search events Tasmo notify Binary storage Activity Analysis
  11. 11. 11 © Jive confidential How we maintain search indices • Define views of data which correspond to the index schemas • Indexing engine registers for notifications of these view types • Tasmo fires notifications for affected view instances per event • Indexing engine reads the modified views, which represent complete and up to date documents for indexing. Search events Hbase Tasmo notify read index views read / write
  12. 12. 12 © Jive confidential 10,000 feet how it works Consumes events, consults configuration describing joins and selects, applies all relevant changes in event to update data views Values Existence Relationships Write events Relationships Views Traverse Join / Select writes scans concurrency consistency retry ( multiversion concurrency) updates / removes Tasmo
  13. 13. 13 © Jive confidential Taking over time • Snowflake id for every event - makes them unique and time orderable • Event time is based on when the system receives an event • Event time is used as HBase cell timestamp - logically stale writes no op • Event time has the room to disambiguate add vs remove: o Snowflake ids are even numbers. o Snowflake is used directly for adds o Snowflake -1 is used for removes o For a given event - adds trump removes
  14. 14. 14 © Jive confidential Concurrency Issues • Problem: As different events add/remove relationships in parallel, we can fail to add/remove elements of views. • Solution: Per relationship high water marks maintained in an HBase table. We test the per relationship times we saw during a path traversal against the high water mark. If we detect we are stale, we retry the operation.
  15. 15. 15 © Jive confidential Why HBase? • Timestamp control • Row level atomicity of changes • Performance and proven scalability
  16. 16. 16 © Jive confidential Roadmap • Production later this year. Currently heavily used by developers at Jive. • Looking at what work could be moved into coprocessors. • Considering double writes into two HBase clusters for higher availability if MTTR is too high in our environment.
  17. 17. 17 © Jive confidential Questions and Answers Open source Please Help!