Cabs, Cassandra, and Hailo (at Cassandra EU)

4,200 views

Published on

My talk from #CassandraEU covering Hailo's use of Cassandra including insight from developers, operations and management, plus lessons learned.

Published in: Technology
  • Be the first to comment

Cabs, Cassandra, and Hailo (at Cassandra EU)

  1. 1. Cabs, Cassandra, and Hailo David Gardner, Architect at Hailo #CASSANDRAEU CASSANDRASUMMITEU
  2. 2. #CASSANDRAEU CASSANDRASUMMITEU
  3. 3. #CASSANDRAEU CASSANDRASUMMITEU
  4. 4. 0.6 to 1.2 • 1,352 changed files with 235,413 additions and 47,487 deletions • 7,429 commits • 1,653 tickets completed https://github.com/apache/cassandra/compare/cassandra-0.6.0...cassandra-1.2 https://github.com/apache/cassandra/blob/trunk/CHANGES.txt #CASSANDRAEU CASSANDRASUMMITEU
  5. 5. What this talk is about Cassandra adoption at Hailo from three perspectives: 1. Development 2. Operational 3. Management #CASSANDRAEU CASSANDRASUMMITEU
  6. 6. What is Hailo? Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want. #CASSANDRAEU CASSANDRASUMMITEU
  7. 7. #CASSANDRAEU CASSANDRASUMMITEU
  8. 8. #CASSANDRAEU CASSANDRASUMMITEU
  9. 9. #CASSANDRAEU CASSANDRASUMMITEU
  10. 10. What is Hailo? • The world’s highest-rated taxi app – over 11,000 five-star reviews • Over 500,000 registered passengers • A Hailo hail is accepted around the world every 4 seconds • Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation #CASSANDRAEU CASSANDRASUMMITEU
  11. 11. Hailo is growing • Hailo is a marketplace that facilitates over $100M in run-rate transactions and is making the world a better place for passengers and drivers • Hailo has raised over $50M in financing from the world's best investors including Union Square Ventures, Accel, the founder of Skype (via Atomico), Wellington Partners (Spotify), Sir Richard Branson, and our CEO's mother, Janice #CASSANDRAEU CASSANDRASUMMITEU
  12. 12. The history The story behind Cassandra adoption at Hailo #CASSANDRAEU CASSANDRASUMMITEU
  13. 13. Hailo launched in London in November 2011 • Launched on AWS • Two PHP/MySQL web apps plus a Java backend • Mostly built by a team of 3 or 4 backend engineers • MySQL multi-master for single AZ resilience #CASSANDRAEU CASSANDRASUMMITEU
  14. 14. Why Cassandra? • A desire for greater resilience – “become a utility” Cassandra is designed for high availability • Plans for international expansion around a single consumer app Cassandra is good at global replication • Expected growth Cassandra scales linearly for both reads and writes • Prior experience I had experience with Cassandra and could recommend it #CASSANDRAEU CASSANDRASUMMITEU
  15. 15. The path to adoption • Largely unilateral decision by developers – a result of a startup culture • Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store • Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London #CASSANDRAEU CASSANDRASUMMITEU
  16. 16. One year on... • Further breakdown of functionality into Go/Java SOA • Migrating all online databases to Cassandra #CASSANDRAEU CASSANDRASUMMITEU
  17. 17. Development perspective #CASSANDRAEU CASSANDRASUMMITEU
  18. 18. “Cassandra just works” Dom W, Senior Engineer #CASSANDRAEU CASSANDRASUMMITEU
  19. 19. Use cases 1. Entity storage 2. Time series data #CASSANDRAEU CASSANDRASUMMITEU
  20. 20. CF = customers 126007613634425612: createdTimestamp: email: givenName: familyName: locale: phone: #CASSANDRAEU 1370465412 dave@cruft.co Dave Gardner en_GB +447911111111 CASSANDRASUMMITEU
  21. 21. Considerations for entity storage • Do not read the entire entity, update one property and then write back a mutation containing every column • Only mutate columns that have been set • This avoids read-before-write race conditions #CASSANDRAEU CASSANDRASUMMITEU
  22. 22. #CASSANDRAEU CASSANDRASUMMITEU
  23. 23. CF = stats_db 2013-06-01: 55374fa0-ce2b-11e2-8b8b-0800200c9a66: a48bd800-ce2b-11e2-8b8b-0800200c9a66: b0e15850-ce2b-11e2-8b8b-0800200c9a66: bfac6c80-ce2b-11e2-8b8b-0800200c9a66: #CASSANDRAEU {“action”:”… {“action”:”… {“action”:”… {“action”:”… CASSANDRASUMMITEU
  24. 24. CF = stats_db LON123456: 13b247f0-ce2c-11e2-8b8b-0800200c9a66: 20f70a40-ce2c-11e2-8b8b-0800200c9a66: 2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: 338a22f0-ce2c-11e2-8b8b-0800200c9a66: #CASSANDRAEU {“action”:”… {“action”:”… {“action”:”… {“action”:”… CASSANDRASUMMITEU
  25. 25. #CASSANDRAEU CASSANDRASUMMITEU
  26. 26. Considerations for time series storage • Choose row key carefully, since this partitions the records • Think about how many records you want in a single row • Denormalise on write into many indexes #CASSANDRAEU CASSANDRASUMMITEU
  27. 27. Client libraries • Gossie (Go) • Astyanax (Java) • phpcassa (PHP) #CASSANDRAEU CASSANDRASUMMITEU
  28. 28. Analytics • With Cassandra we lost the ability to carry out analytics eg: COUNT, SUM, AVG, GROUP BY • We use Acunu Analytics to give us this abilty in real time, for preplanned query templates • It is backed by Cassandra and therefore highly available, resilient and globally distributed • Integration is straightforward #CASSANDRAEU CASSANDRASUMMITEU
  29. 29. events #CASSANDRAEU NSQ Acunu C* CASSANDRASUMMITEU
  30. 30. AQL SELECT SUM(accepted), SUM(ignored), SUM(declined), SUM(withdrawn) FROM Allocations WHERE timestamp BETWEEN '1 week ago' AND 'now’ AND driver='LON123456789’ GROUP BY timestamp(day) #CASSANDRAEU CASSANDRASUMMITEU
  31. 31. #CASSANDRAEU CASSANDRASUMMITEU
  32. 32. Operational perspective #CASSANDRAEU CASSANDRASUMMITEU
  33. 33. “Allows a team of 2 to achieve things they wouldn’t have considered before Cassandra existed” Chris H, Operations Engineer #CASSANDRAEU CASSANDRASUMMITEU
  34. 34. #CASSANDRAEU CASSANDRASUMMITEU
  35. 35. 6 machines per region 3 regions us-east-1 eu-west-1 us-east-1 eu-west-1 Operational Cluster clusters Stats Cluster 3 (stats cluster is a long story) ap-southeast-1 #CASSANDRAEU CASSANDRASUMMITEU
  36. 36. eu-west-1 us-east-1 ap-southeast-1 AZ1 AZ1 AZ1 AZ1 AZ1 AZ1 AZ2 AZ2 AZ2 AZ2 AZ2 AZ2 AZ3 AZ3 AZ3 AZ3 AZ3 AZ3 #CASSANDRAEU CASSANDRASUMMITEU
  37. 37. Stats Cluster AWS VPCs with Open VPN links 3 AZs per region m1.large machines ~ 1TB/node Provisoned IOPS EBS #CASSANDRAEU Operational Cluster ~ 200GB/node CASSANDRASUMMITEU
  38. 38. Backups • SSTable snapshot • Used to upload to S3, but this was taking >6 hours and consuming all our network bandwidth • Now take EBS snapshot of the data volumes #CASSANDRAEU CASSANDRASUMMITEU
  39. 39. Encryption • Requirement for NYC launch • We use dmcrypt to encrypt the entire EBS volume • Chose dmcrypt because it is uncomplicated • Our tests show a 1% performance hit in disk performance, which concurs with what Amazon suggest #CASSANDRAEU CASSANDRASUMMITEU
  40. 40. Datastax Ops Centre is a quick win #CASSANDRAEU CASSANDRASUMMITEU
  41. 41. Multi DC • Something that Cassandra makes trivial • Would have been very difficult to accomplish active-active inter-DC replication with a team of 2 without Cassandra • Rolling repair needed to make it safe (we use LOCAL_QUORUM) • We schedule “narrow repairs” on different nodes in our cluster each night #CASSANDRAEU CASSANDRASUMMITEU
  42. 42. Compression • Our stats cluster was running at ~1.5TB per node • We didn’t want to add more nodes • With compression, we are now back to ~600GB • Easy to accomplish • `nodetool upgradesstables` on a rolling schedule #CASSANDRAEU CASSANDRASUMMITEU
  43. 43. Management perspective #CASSANDRAEU CASSANDRASUMMITEU
  44. 44. “The days of the quick and dirty are over” Simon V, EVP Operations #CASSANDRAEU CASSANDRASUMMITEU
  45. 45. Technically, everything is fine… • Our COO feels that C* is “technically good and beautiful”, a “perfectly good option” • Our EVPO says that C* reminds him of a time series database in use at Goldman Sachs that had “very good performance” …but there are concerns #CASSANDRAEU CASSANDRASUMMITEU
  46. 46. People who can attempt to query MySQL People who can attempt to query Cassandra #CASSANDRAEU CASSANDRASUMMITEU
  47. 47. #CASSANDRAEU CASSANDRASUMMITEU
  48. 48. Lessons learned #CASSANDRAEU CASSANDRASUMMITEU
  49. 49. There might be a gulf in experience #CASSANDRAEU CASSANDRASUMMITEU
  50. 50. 10 Average years experience per team member MySQL #CASSANDRAEU Cassandra CASSANDRASUMMITEU
  51. 51. Lesson learned • Have an advocate - get someone who will sell the vision internally • Learn the theory - teach each team member the fundamentals • Make an effort to get everyone on board #CASSANDRAEU CASSANDRASUMMITEU
  52. 52. Things can drift into failure #CASSANDRAEU CASSANDRASUMMITEU
  53. 53. #CASSANDRAEU CASSANDRASUMMITEU
  54. 54. #CASSANDRAEU CASSANDRASUMMITEU
  55. 55. #CASSANDRAEU CASSANDRASUMMITEU
  56. 56. #CASSANDRAEU CASSANDRASUMMITEU
  57. 57. #CASSANDRAEU CASSANDRASUMMITEU
  58. 58. Lesson learned • Be pro-active with Cassandra, even if it seems to be running smoothly • Peer-review data models, take time to think about them • Big rows are bad - use cfstats to look for them • Mixed workloads can cause problems - use cfhistograms and look out for signs of data modeling problems • Think about the compaction strategy for each CF #CASSANDRAEU CASSANDRASUMMITEU
  59. 59. EBS is terrible #CASSANDRAEU CASSANDRASUMMITEU
  60. 60. Lessons learned • EBS is nearly always the cause of Amazon outages • EBS is a single point of failure (it will fail everywhere in your cluster) • EBS is slow • EBS is expensive • EBS is unnecessary! #CASSANDRAEU CASSANDRASUMMITEU
  61. 61. Management need to know the trade offs #CASSANDRAEU CASSANDRASUMMITEU
  62. 62. Lessons learned • Keep the business informed – explain the tradeoffs in simple terms • Sing from the same hymn sheet • Make sure there solutions in place for every use case from the beginning #CASSANDRAEU CASSANDRASUMMITEU
  63. 63. People who can attempt to query MySQL #CASSANDRAEU People who can attempt to query Cassandra CASSANDRASUMMITEU
  64. 64. Conclusions #CASSANDRAEU CASSANDRASUMMITEU
  65. 65. We like Cassandra • Solid design • HA characteristics • Easy multi-DC setup • Simplicity of operation #CASSANDRAEU CASSANDRASUMMITEU
  66. 66. Lessons for successful adoption • Have an advocate, sell the dream • Learn the fundamentals, get the best out of Cassandra • Invest in tools to make life easier • Keep management in the loop, explain the trade offs #CASSANDRAEU CASSANDRASUMMITEU
  67. 67. The future • We will continue to invest in Cassandra as we expand globally • We will hire people with experience running Cassandra • We will focus on expanding our reporting facilities • We aspire to extend our network (1M consumer installs, wallet) beyond cabs • We will continue to hire the best engineers in London, NYC and Asia #CASSANDRAEU CASSANDRASUMMITEU
  68. 68. Questions? #CASSANDRAEU CASSANDRASUMMITEU

×