#CASSANDRA13Cassandra at HailoDavid Gardner | Architect @ HailoCASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013What is this talk about?
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013•  1,352 changed files with 235,413 additions and 47,487 deletions•  7,429 commits•  1,653 ...
#CASSANDRA13 CASSANDRASUMMIT2013Cassandra adoption at Hailo from three perspectives:1.  Development2.  Operational3.  Mana...
#CASSANDRA13 CASSANDRASUMMIT2013What is Hailo?Hailo is the taxi app. Use Hailo to get a taxi wherever you are, whenever yo...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013•  The world’s highest-rated taxi app - over 7,000 five-star reviews•  Over 300,000 registe...
#CASSANDRA13 CASSANDRASUMMIT2013The historyThe story behind Cassandra adoption at Hailo
#CASSANDRA13 CASSANDRASUMMIT2013Hailo launched in London in November 2011•  Launched on AWS•  Two PHP/MySQL web apps plus ...
#CASSANDRA13 CASSANDRASUMMIT2013Why Cassandra?•  A desire for greater resilience – “become a utility”Cassandra is designed...
#CASSANDRA13 CASSANDRASUMMIT2013The path to adoption•  Largely unilateral decision by developers – a result of a startupcu...
#CASSANDRA13 CASSANDRASUMMIT2013Development perspective
#CASSANDRA13 CASSANDRASUMMIT2013“Cassandra just works”Dom Wong, Senior Engineer
#CASSANDRA13 CASSANDRASUMMIT2013Use cases1.  Entity storage2.  Time series data
#CASSANDRA13 CASSANDRASUMMIT2013CF = customers126007613634425612:createdTimestamp: 1370465412email: dave@cruft.cogivenName...
#CASSANDRA13 CASSANDRASUMMIT2013Considerations for entity storage•  Do not read the entire entity, update one property and...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013CF = comms2013-06-01:55374fa0-ce2b-11e2-8b8b-0800200c9a66: {“to”:”dave@c…a48bd800-ce2b-11e...
#CASSANDRA13 CASSANDRASUMMIT2013CF = commsdave@cruft.co:13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…20f70a40-ce2c-...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013Considerations for time series storage•  Choose row key carefully, since this partitions t...
#CASSANDRA13 CASSANDRASUMMIT2013Client libraries•  Astyanax (Java)•  phpcassa (PHP)•  github.com/carloscm/gossie (Go)
#CASSANDRA13 CASSANDRASUMMIT2013Analytics•  With Cassandra we lost the ability to carry out analyticseg: COUNT, SUM, AVG, ...
#CASSANDRA13 CASSANDRASUMMIT2013AQLSELECTSUM(accepted),SUM(ignored),SUM(declined),SUM(withdrawn)FROM AllocationsWHERE time...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013Challenges
#CASSANDRA13 CASSANDRASUMMIT201310 Average years experienceper team memberMySQL Cassandra
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013Lessons learned
#CASSANDRA13 CASSANDRASUMMIT2013Have an advocate•  Get someone who will sell the vision internally•  Make an effort to get...
#CASSANDRA13 CASSANDRASUMMIT2013Learn the theory•  Teach each team member the fundamentals•  CQL can encourage an SQL mind...
#CASSANDRA13 CASSANDRASUMMIT2013Operational perspective
#CASSANDRA13 CASSANDRASUMMIT2013“Allows a team of 2 to achieve things they wouldn’thave considered before Cassandra existe...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT20132 clusters6 machines per region3 regions(stats cluster pending additionof third DC)Operati...
#CASSANDRA13 CASSANDRASUMMIT2013AWS VPCs with OpenVPN links3 AZs per regionm1.large machinesProvisoned IOPS EBSOperational...
#CASSANDRA13 CASSANDRASUMMIT2013Backups•  SSTable snapshot•  Used to upload to S3, but this was taking >6 hours and consum...
#CASSANDRA13 CASSANDRASUMMIT2013Encryption•  Requirement for NYC launch•  We use dmcrypt to encrypt the entire EBS volume•...
#CASSANDRA13 CASSANDRASUMMIT2013Datastax Ops Centre•  We run the free version•  Offers up easily accessible “one screen” o...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013Multi DC•  Something that Cassandra makes trivial•  Would have been very difficult to accom...
#CASSANDRA13 CASSANDRASUMMIT2013Compression•  Our stats cluster was running at ~1.5TB per node•  We didn’t want to add mor...
#CASSANDRA13 CASSANDRASUMMIT2013Lessons learned
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013Management perspective
#CASSANDRA13 CASSANDRASUMMIT2013“The days of the quick and dirty are over”Simon Veingard, EVP Operations
#CASSANDRA13 CASSANDRASUMMIT2013Technically, everything is fine…•  Our COO feels that C* is “technically good and beautiful...
#CASSANDRA13 CASSANDRASUMMIT2013People who canattempt to queryMySQLPeople who canattempt toquery Cassandra
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013Lessons learned
#CASSANDRA13 CASSANDRASUMMIT2013Keep the business informed•  Pre-launch, we were tasked with increasing resiliency•  Cassa...
#CASSANDRA13 CASSANDRASUMMIT2013Sing from the same hymn sheet•  A senior founding engineer had doubts about the adoption o...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13 CASSANDRASUMMIT2013Provide solutions•  There are many options for ad-hoc querying of Cassandra•  We underesti...
#CASSANDRA13 CASSANDRASUMMIT2013The future
#CASSANDRA13 CASSANDRASUMMIT2013Cassandra at Hailo•  We will continue to invest in Cassandra as we expand globally•  We wi...
#CASSANDRA13 CASSANDRASUMMIT2013
#CASSANDRA13Thank youCASSANDRASUMMIT2013
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gardner
Upcoming SlideShare
Loading in …5
×

C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gardner

4,612 views

Published on

Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centers globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,612
On SlideShare
0
From Embeds
0
Number of Embeds
1,080
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gardner

  1. 1. #CASSANDRA13Cassandra at HailoDavid Gardner | Architect @ HailoCASSANDRASUMMIT2013
  2. 2. #CASSANDRA13 CASSANDRASUMMIT2013
  3. 3. #CASSANDRA13 CASSANDRASUMMIT2013What is this talk about?
  4. 4. #CASSANDRA13 CASSANDRASUMMIT2013
  5. 5. #CASSANDRA13 CASSANDRASUMMIT2013
  6. 6. #CASSANDRA13 CASSANDRASUMMIT2013•  1,352 changed files with 235,413 additions and 47,487 deletions•  7,429 commits•  1,653 tickets completedhttps://github.com/apache/cassandra/compare/cassandra-0.6.0...cassandra-1.2https://github.com/apache/cassandra/blob/trunk/CHANGES.txt0.6 to 1.2
  7. 7. #CASSANDRA13 CASSANDRASUMMIT2013Cassandra adoption at Hailo from three perspectives:1.  Development2.  Operational3.  ManagementWhat this talk is about
  8. 8. #CASSANDRA13 CASSANDRASUMMIT2013What is Hailo?Hailo is the taxi app. Use Hailo to get a taxi wherever you are, whenever you want.
  9. 9. #CASSANDRA13 CASSANDRASUMMIT2013
  10. 10. #CASSANDRA13 CASSANDRASUMMIT2013•  The world’s highest-rated taxi app - over 7,000 five-star reviews•  Over 300,000 registered passengers•  A Hailo hail is accepted around the world every 5 seconds•  Hailo is growing (30%+) every month•  Became the largest taxi network in all of Ireland within two monthsof launchWhat is Hailo?
  11. 11. #CASSANDRA13 CASSANDRASUMMIT2013The historyThe story behind Cassandra adoption at Hailo
  12. 12. #CASSANDRA13 CASSANDRASUMMIT2013Hailo launched in London in November 2011•  Launched on AWS•  Two PHP/MySQL web apps plus a Java backend•  Mostly built by a team of 3 or 4 backend engineers•  MySQL multi-master for single AZ resilience
  13. 13. #CASSANDRA13 CASSANDRASUMMIT2013Why Cassandra?•  A desire for greater resilience – “become a utility”Cassandra is designed for high availability•  Plans for international expansion around a single consumer appCassandra is good at global replication•  Expected growthCassandra scales linearly for both reads and writes•  Prior experienceI had experience with Cassandra and could recommend it
  14. 14. #CASSANDRA13 CASSANDRASUMMIT2013The path to adoption•  Largely unilateral decision by developers – a result of a startupculture•  Replacement of key consumer app functionality, splitting up thePHP/MySQL web app into a mixture of global PHP/Java servicesbacked by a Cassandra data store•  Launched into production in XYZ– originally just powering NorthAmerican expansion, before gradually switching over Dublin andLondon
  15. 15. #CASSANDRA13 CASSANDRASUMMIT2013Development perspective
  16. 16. #CASSANDRA13 CASSANDRASUMMIT2013“Cassandra just works”Dom Wong, Senior Engineer
  17. 17. #CASSANDRA13 CASSANDRASUMMIT2013Use cases1.  Entity storage2.  Time series data
  18. 18. #CASSANDRA13 CASSANDRASUMMIT2013CF = customers126007613634425612:createdTimestamp: 1370465412email: dave@cruft.cogivenName: DavefamilyName: Gardnerlocale: en_GBphone: +447911111111
  19. 19. #CASSANDRA13 CASSANDRASUMMIT2013Considerations for entity storage•  Do not read the entire entity, update one property and then writeback a mutation containing every column•  Only mutate columns that have been set•  This avoids read-before-write race conditions
  20. 20. #CASSANDRA13 CASSANDRASUMMIT2013
  21. 21. #CASSANDRA13 CASSANDRASUMMIT2013
  22. 22. #CASSANDRA13 CASSANDRASUMMIT2013CF = comms2013-06-01:55374fa0-ce2b-11e2-8b8b-0800200c9a66: {“to”:”dave@c…a48bd800-ce2b-11e2-8b8b-0800200c9a66: {“to”:”foo@ex…b0e15850-ce2b-11e2-8b8b-0800200c9a66: {“to”:”bar@ho …bfac6c80-ce2b-11e2-8b8b-0800200c9a66: {“to”:”baz@fo…
  23. 23. #CASSANDRA13 CASSANDRASUMMIT2013CF = commsdave@cruft.co:13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…20f70a40-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…338a22f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…
  24. 24. #CASSANDRA13 CASSANDRASUMMIT2013
  25. 25. #CASSANDRA13 CASSANDRASUMMIT2013
  26. 26. #CASSANDRA13 CASSANDRASUMMIT2013Considerations for time series storage•  Choose row key carefully, since this partitions the records•  Think about how many records you want in a single row•  Denormalise on write into many indexes
  27. 27. #CASSANDRA13 CASSANDRASUMMIT2013Client libraries•  Astyanax (Java)•  phpcassa (PHP)•  github.com/carloscm/gossie (Go)
  28. 28. #CASSANDRA13 CASSANDRASUMMIT2013Analytics•  With Cassandra we lost the ability to carry out analyticseg: COUNT, SUM, AVG, GROUP BY•  We use Acunu Analytics to give us this abilty in real time, for pre-planned query templates•  It is backed by Cassandra and therefore highly available, resilientand globally distributed•  Integration is straightforward
  29. 29. #CASSANDRA13 CASSANDRASUMMIT2013AQLSELECTSUM(accepted),SUM(ignored),SUM(declined),SUM(withdrawn)FROM AllocationsWHERE timestamp BETWEEN 1 week ago AND now’AND driver=LON123456789’GROUP BY timestamp(day)
  30. 30. #CASSANDRA13 CASSANDRASUMMIT2013
  31. 31. #CASSANDRA13 CASSANDRASUMMIT2013
  32. 32. #CASSANDRA13 CASSANDRASUMMIT2013Challenges
  33. 33. #CASSANDRA13 CASSANDRASUMMIT201310 Average years experienceper team memberMySQL Cassandra
  34. 34. #CASSANDRA13 CASSANDRASUMMIT2013
  35. 35. #CASSANDRA13 CASSANDRASUMMIT2013Lessons learned
  36. 36. #CASSANDRA13 CASSANDRASUMMIT2013Have an advocate•  Get someone who will sell the vision internally•  Make an effort to get everyone on board
  37. 37. #CASSANDRA13 CASSANDRASUMMIT2013Learn the theory•  Teach each team member the fundamentals•  CQL can encourage an SQL mindset, but it’s important tounderstand the underlying data model•  Make a real effort to share knowledge – keep in mind the gulf inexperience for most team members between their old world and thenew world (SQL vs NoSQL)•  Peer review data models
  38. 38. #CASSANDRA13 CASSANDRASUMMIT2013Operational perspective
  39. 39. #CASSANDRA13 CASSANDRASUMMIT2013“Allows a team of 2 to achieve things they wouldn’thave considered before Cassandra existed”Chris Hoolihan, Operations Engineer
  40. 40. #CASSANDRA13 CASSANDRASUMMIT2013
  41. 41. #CASSANDRA13 CASSANDRASUMMIT20132 clusters6 machines per region3 regions(stats cluster pending additionof third DC)OperationalClusterStatsClusterap-southeast-1 us-east-1 eu-west-1us-east-1 eu-west-1
  42. 42. #CASSANDRA13 CASSANDRASUMMIT2013AWS VPCs with OpenVPN links3 AZs per regionm1.large machinesProvisoned IOPS EBSOperationalClusterStatsCluster~ 600GB/node~ 100GB/node
  43. 43. #CASSANDRA13 CASSANDRASUMMIT2013Backups•  SSTable snapshot•  Used to upload to S3, but this was taking >6 hours and consumingall our network bandwidth•  Now take EBS snapshot of the SSTable snapshots
  44. 44. #CASSANDRA13 CASSANDRASUMMIT2013Encryption•  Requirement for NYC launch•  We use dmcrypt to encrypt the entire EBS volume•  Chose dmcrypt because it is uncomplicated•  Our tests show a 1% performance hit in disk performance, whichconcurs with what Amazon suggest
  45. 45. #CASSANDRA13 CASSANDRASUMMIT2013Datastax Ops Centre•  We run the free version•  Offers up easily accessible “one screen” overviews of the activity ofthe entire cluster•  Big fans – an easy win
  46. 46. #CASSANDRA13 CASSANDRASUMMIT2013
  47. 47. #CASSANDRA13 CASSANDRASUMMIT2013Multi DC•  Something that Cassandra makes trivial•  Would have been very difficult to accomplish active-active inter-DCreplication with a team of 2 without Cassandra•  Rolling repair needed to make it safe (we use LOCAL_QUORUM)•  We schedule “narrow repairs” on different nodes in our cluster eachnight
  48. 48. #CASSANDRA13 CASSANDRASUMMIT2013Compression•  Our stats cluster was running at ~1.5TB per node•  We didn’t want to add more nodes•  With compression, we are now back to ~600GB•  Easy to accomplish•  `nodetool upgradesstables` on a rolling schedule
  49. 49. #CASSANDRA13 CASSANDRASUMMIT2013Lessons learned
  50. 50. #CASSANDRA13 CASSANDRASUMMIT2013
  51. 51. #CASSANDRA13 CASSANDRASUMMIT2013Management perspective
  52. 52. #CASSANDRA13 CASSANDRASUMMIT2013“The days of the quick and dirty are over”Simon Veingard, EVP Operations
  53. 53. #CASSANDRA13 CASSANDRASUMMIT2013Technically, everything is fine…•  Our COO feels that C* is “technically good and beautiful”, a“perfectly good option”•  Our EVPO says that C* reminds him of a time series database inuse at Goldman Sachs that had “very good performance”…but there are concerns
  54. 54. #CASSANDRA13 CASSANDRASUMMIT2013People who canattempt to queryMySQLPeople who canattempt toquery Cassandra
  55. 55. #CASSANDRA13 CASSANDRASUMMIT2013
  56. 56. #CASSANDRA13 CASSANDRASUMMIT2013Lessons learned
  57. 57. #CASSANDRA13 CASSANDRASUMMIT2013Keep the business informed•  Pre-launch, we were tasked with increasing resiliency•  Cassandra addressed immediate business needs, but the trade offsinvolved should have been communicated more clearly
  58. 58. #CASSANDRA13 CASSANDRASUMMIT2013Sing from the same hymn sheet•  A senior founding engineer had doubts about the adoption ofCassandra until very recently•  In the presence of business doubt, this lack of consistencyamongst developers exacerbated the concerns•  We should have made more effort to make bilateral decisions onadoption – I don’t think this would have been hard to achieve
  59. 59. #CASSANDRA13 CASSANDRASUMMIT2013
  60. 60. #CASSANDRA13 CASSANDRASUMMIT2013Provide solutions•  There are many options for ad-hoc querying of Cassandra•  We underestimated the impact of not having a good solution forthis from the very beginning
  61. 61. #CASSANDRA13 CASSANDRASUMMIT2013The future
  62. 62. #CASSANDRA13 CASSANDRASUMMIT2013Cassandra at Hailo•  We will continue to invest in Cassandra as we expand globally•  We will hire people with experience running Cassandra•  We will focus on expanding our reporting facilities
  63. 63. #CASSANDRA13 CASSANDRASUMMIT2013
  64. 64. #CASSANDRA13Thank youCASSANDRASUMMIT2013

×