0
Cabs, Cassandra, and Hailo

David Gardner, Architect at Hailo
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
0.6 to 1.2
•  1,352 changed files with 235,413 additions and 47,487 deletions
•  7,429 commits
•  1,653 tickets completed

...
What this talk is about
Cassandra adoption at Hailo from three perspectives:
1.  Development
2.  Operational
3.  Managemen...
What is Hailo?
Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want.

#CASSANDRAEU

CASSAN...
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
What is Hailo?
•  The world’s highest-rated taxi app – over 11,000 five-star reviews
•  Over 500,000 registered passengers
...
Hailo is growing
•  Hailo is a marketplace that facilitates over $100M in run-rate
transactions and is making the world a ...
The history
The story behind Cassandra adoption at Hailo

#CASSANDRAEU

CASSANDRASUMMITEU
Hailo launched in London in November 2011
•  Launched on AWS
•  Two PHP/MySQL web apps plus a Java backend
•  Mostly built...
Why Cassandra?
•  A desire for greater resilience – “become a utility”
Cassandra is designed for high availability
•  Plan...
The path to adoption
•  Largely unilateral decision by developers – a result of a startup
culture
•  Replacement of key co...
One year on...
•  Further breakdown of functionality into Go/Java SOA
•  Migrating all online databases to Cassandra


#CA...
Development perspective

#CASSANDRAEU

CASSANDRASUMMITEU
“Cassandra just works”
Dom W, Senior Engineer

#CASSANDRAEU

CASSANDRASUMMITEU
Use cases
1.  Entity storage
2.  Time series data


#CASSANDRAEU

CASSANDRASUMMITEU
CF = customers
126007613634425612:

createdTimestamp:


email: 
 
 
 

givenName: 
 

familyName: 
 

locale: 
 
 
 

phon...
Considerations for entity storage
•  Do not read the entire entity, update one property and then write
back a mutation con...
#CASSANDRAEU

CASSANDRASUMMITEU
CF = stats_db
2013-06-01:

55374fa0-ce2b-11e2-8b8b-0800200c9a66:

a48bd800-ce2b-11e2-8b8b-0800200c9a66:

b0e15850-ce2b-11e...
CF = stats_db
LON123456:

13b247f0-ce2c-11e2-8b8b-0800200c9a66:


20f70a40-ce2c-11e2-8b8b-0800200c9a66:

2b44d3b0-ce2c-11e...
#CASSANDRAEU

CASSANDRASUMMITEU
Considerations for time series storage
•  Choose row key carefully, since this partitions the records
•  Think about how m...
Client libraries
•  Gossie (Go)
•  Astyanax (Java)
•  phpcassa (PHP)



#CASSANDRAEU

CASSANDRASUMMITEU
Analytics
•  With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY
•  We use Acunu Analy...
events

#CASSANDRAEU

NSQ

Acunu

C*

CASSANDRASUMMITEU
AQL
SELECT
SUM(accepted),
SUM(ignored),
SUM(declined),
SUM(withdrawn)
FROM Allocations
WHERE timestamp BETWEEN '1 week ago...
#CASSANDRAEU

CASSANDRASUMMITEU
Operational perspective

#CASSANDRAEU

CASSANDRASUMMITEU
“Allows a team of 2 to achieve things they wouldn’t
have considered before Cassandra existed”
Chris H, Operations Engineer...
#CASSANDRAEU

CASSANDRASUMMITEU
us-east-1

eu-west-1

us-east-1

eu-west-1

Operational
Cluster

6 
machines per region

Stats
Cluster

3 
clusters
3 
reg...
eu-west-1

us-east-1

ap-southeast-1

AZ1

AZ1

AZ1

AZ1

AZ1

AZ1

AZ2

AZ2

AZ2

AZ2

AZ2

AZ2

AZ3

AZ3

AZ3

AZ3

AZ3
...
3 AZs per region
m1.large machines

~ 1TB/node

Stats
Cluster

AWS VPCs with Open
VPN links

Provisoned IOPS EBS

#CASSAND...
Backups
•  SSTable snapshot
•  Used to upload to S3, but this was taking >6 hours and consuming
all our network bandwidth
...
Encryption
•  Requirement for NYC launch
•  We use dmcrypt to encrypt the entire EBS volume
•  Chose dmcrypt because it is...
Datastax Ops Centre is a quick win

#CASSANDRAEU

CASSANDRASUMMITEU
Multi DC
•  Something that Cassandra makes trivial
•  Would have been very difficult to accomplish active-active inter-DC
r...
Compression
•  Our stats cluster was running at ~1.5TB per node
•  We didn’t want to add more nodes
•  With compression, w...
Management perspective

#CASSANDRAEU

CASSANDRASUMMITEU
“The days of the quick and dirty are over”
Simon V, EVP Operations

#CASSANDRAEU

CASSANDRASUMMITEU
Technically, everything is fine…
•  Our COO feels that C* is “technically good and beautiful”, a
“perfectly good option”
• ...
People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra

#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
Lessons learned

#CASSANDRAEU

CASSANDRASUMMITEU
There might be a gulf in experience

#CASSANDRAEU

CASSANDRASUMMITEU
10

Average years experience
per team member

MySQL
 Cassandra
#CASSANDRAEU

CASSANDRASUMMITEU
Lesson learned
•  Have an advocate - get someone who will sell the vision internally
•  Learn the theory - teach each team...
Things can drift into failure

#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
#CASSANDRAEU

CASSANDRASUMMITEU
Lesson learned
•  Be pro-active with Cassandra, even if it seems to be running
smoothly
•  Peer-review data models, take t...
EBS is terrible

#CASSANDRAEU

CASSANDRASUMMITEU
Lessons learned
•  EBS is nearly always the cause of Amazon outages
•  EBS is a single point of failure (it will fail ever...
Management need to know the trade offs

#CASSANDRAEU

CASSANDRASUMMITEU
Lessons learned
•  Keep the business informed – explain the tradeoffs in simple terms
•  Sing from the same hymn sheet
•  ...
People who can
attempt to query
MySQL

#CASSANDRAEU

People who can
attempt to
query Cassandra

CASSANDRASUMMITEU
Conclusions

#CASSANDRAEU

CASSANDRASUMMITEU
We like Cassandra
•  Solid design
•  HA characteristics
•  Easy multi-DC setup
•  Simplicity of operation

#CASSANDRAEU

C...
Lessons for successful adoption
•  Have an advocate, sell the dream
•  Learn the fundamentals, get the best out of Cassand...
The future
•  We will continue to invest in Cassandra as we expand globally
•  We will hire people with experience running...
Questions?

#CASSANDRAEU

CASSANDRASUMMITEU
Upcoming SlideShare
Loading in...5
×

C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo

3,587

Published on

Speaker: Dave Gardner, Architect at Hailo
Video: http://www.youtube.com/watch?v=6cUuE7sTdU0&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=16
Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centres globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,587
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo "

  1. 1. Cabs, Cassandra, and Hailo David Gardner, Architect at Hailo #CASSANDRAEU CASSANDRASUMMITEU
  2. 2. #CASSANDRAEU CASSANDRASUMMITEU
  3. 3. #CASSANDRAEU CASSANDRASUMMITEU
  4. 4. 0.6 to 1.2 •  1,352 changed files with 235,413 additions and 47,487 deletions •  7,429 commits •  1,653 tickets completed https://github.com/apache/cassandra/compare/cassandra-0.6.0...cassandra-1.2 https://github.com/apache/cassandra/blob/trunk/CHANGES.txt #CASSANDRAEU CASSANDRASUMMITEU
  5. 5. What this talk is about Cassandra adoption at Hailo from three perspectives: 1.  Development 2.  Operational 3.  Management #CASSANDRAEU CASSANDRASUMMITEU
  6. 6. What is Hailo? Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want. #CASSANDRAEU CASSANDRASUMMITEU
  7. 7. #CASSANDRAEU CASSANDRASUMMITEU
  8. 8. #CASSANDRAEU CASSANDRASUMMITEU
  9. 9. #CASSANDRAEU CASSANDRASUMMITEU
  10. 10. What is Hailo? •  The world’s highest-rated taxi app – over 11,000 five-star reviews •  Over 500,000 registered passengers •  A Hailo hail is accepted around the world every 4 seconds •  Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation #CASSANDRAEU CASSANDRASUMMITEU
  11. 11. Hailo is growing •  Hailo is a marketplace that facilitates over $100M in run-rate transactions and is making the world a better place for passengers and drivers •  Hailo has raised over $50M in financing from the world's best investors including Union Square Ventures, Accel, the founder of Skype (via Atomico), Wellington Partners (Spotify), Sir Richard Branson, and our CEO's mother, Janice #CASSANDRAEU CASSANDRASUMMITEU
  12. 12. The history The story behind Cassandra adoption at Hailo #CASSANDRAEU CASSANDRASUMMITEU
  13. 13. Hailo launched in London in November 2011 •  Launched on AWS •  Two PHP/MySQL web apps plus a Java backend •  Mostly built by a team of 3 or 4 backend engineers •  MySQL multi-master for single AZ resilience #CASSANDRAEU CASSANDRASUMMITEU
  14. 14. Why Cassandra? •  A desire for greater resilience – “become a utility” Cassandra is designed for high availability •  Plans for international expansion around a single consumer app Cassandra is good at global replication •  Expected growth Cassandra scales linearly for both reads and writes •  Prior experience I had experience with Cassandra and could recommend it #CASSANDRAEU CASSANDRASUMMITEU
  15. 15. The path to adoption •  Largely unilateral decision by developers – a result of a startup culture •  Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store •  Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London #CASSANDRAEU CASSANDRASUMMITEU
  16. 16. One year on... •  Further breakdown of functionality into Go/Java SOA •  Migrating all online databases to Cassandra #CASSANDRAEU CASSANDRASUMMITEU
  17. 17. Development perspective #CASSANDRAEU CASSANDRASUMMITEU
  18. 18. “Cassandra just works” Dom W, Senior Engineer #CASSANDRAEU CASSANDRASUMMITEU
  19. 19. Use cases 1.  Entity storage 2.  Time series data #CASSANDRAEU CASSANDRASUMMITEU
  20. 20. CF = customers 126007613634425612: createdTimestamp: email: givenName: familyName: locale: phone: #CASSANDRAEU 1370465412 dave@cruft.co Dave Gardner en_GB +447911111111 CASSANDRASUMMITEU
  21. 21. Considerations for entity storage •  Do not read the entire entity, update one property and then write back a mutation containing every column •  Only mutate columns that have been set •  This avoids read-before-write race conditions #CASSANDRAEU CASSANDRASUMMITEU
  22. 22. #CASSANDRAEU CASSANDRASUMMITEU
  23. 23. CF = stats_db 2013-06-01: 55374fa0-ce2b-11e2-8b8b-0800200c9a66: a48bd800-ce2b-11e2-8b8b-0800200c9a66: b0e15850-ce2b-11e2-8b8b-0800200c9a66: bfac6c80-ce2b-11e2-8b8b-0800200c9a66: #CASSANDRAEU {“action”:”… {“action”:”… {“action”:”… {“action”:”… CASSANDRASUMMITEU
  24. 24. CF = stats_db LON123456: 13b247f0-ce2c-11e2-8b8b-0800200c9a66: 20f70a40-ce2c-11e2-8b8b-0800200c9a66: 2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: 338a22f0-ce2c-11e2-8b8b-0800200c9a66: #CASSANDRAEU {“action”:”… {“action”:”… {“action”:”… {“action”:”… CASSANDRASUMMITEU
  25. 25. #CASSANDRAEU CASSANDRASUMMITEU
  26. 26. Considerations for time series storage •  Choose row key carefully, since this partitions the records •  Think about how many records you want in a single row •  Denormalise on write into many indexes #CASSANDRAEU CASSANDRASUMMITEU
  27. 27. Client libraries •  Gossie (Go) •  Astyanax (Java) •  phpcassa (PHP) #CASSANDRAEU CASSANDRASUMMITEU
  28. 28. Analytics •  With Cassandra we lost the ability to carry out analytics eg: COUNT, SUM, AVG, GROUP BY •  We use Acunu Analytics to give us this abilty in real time, for preplanned query templates •  It is backed by Cassandra and therefore highly available, resilient and globally distributed •  Integration is straightforward #CASSANDRAEU CASSANDRASUMMITEU
  29. 29. events #CASSANDRAEU NSQ Acunu C* CASSANDRASUMMITEU
  30. 30. AQL SELECT SUM(accepted), SUM(ignored), SUM(declined), SUM(withdrawn) FROM Allocations WHERE timestamp BETWEEN '1 week ago' AND 'now’ AND driver='LON123456789’ GROUP BY timestamp(day) #CASSANDRAEU CASSANDRASUMMITEU
  31. 31. #CASSANDRAEU CASSANDRASUMMITEU
  32. 32. Operational perspective #CASSANDRAEU CASSANDRASUMMITEU
  33. 33. “Allows a team of 2 to achieve things they wouldn’t have considered before Cassandra existed” Chris H, Operations Engineer #CASSANDRAEU CASSANDRASUMMITEU
  34. 34. #CASSANDRAEU CASSANDRASUMMITEU
  35. 35. us-east-1 eu-west-1 us-east-1 eu-west-1 Operational Cluster 6 machines per region Stats Cluster 3 clusters 3 regions (stats cluster is a long story) ap-southeast-1 #CASSANDRAEU CASSANDRASUMMITEU
  36. 36. eu-west-1 us-east-1 ap-southeast-1 AZ1 AZ1 AZ1 AZ1 AZ1 AZ1 AZ2 AZ2 AZ2 AZ2 AZ2 AZ2 AZ3 AZ3 AZ3 AZ3 AZ3 AZ3 #CASSANDRAEU CASSANDRASUMMITEU
  37. 37. 3 AZs per region m1.large machines ~ 1TB/node Stats Cluster AWS VPCs with Open VPN links Provisoned IOPS EBS #CASSANDRAEU Operational Cluster ~ 200GB/node CASSANDRASUMMITEU
  38. 38. Backups •  SSTable snapshot •  Used to upload to S3, but this was taking >6 hours and consuming all our network bandwidth •  Now take EBS snapshot of the data volumes #CASSANDRAEU CASSANDRASUMMITEU
  39. 39. Encryption •  Requirement for NYC launch •  We use dmcrypt to encrypt the entire EBS volume •  Chose dmcrypt because it is uncomplicated •  Our tests show a 1% performance hit in disk performance, which concurs with what Amazon suggest #CASSANDRAEU CASSANDRASUMMITEU
  40. 40. Datastax Ops Centre is a quick win #CASSANDRAEU CASSANDRASUMMITEU
  41. 41. Multi DC •  Something that Cassandra makes trivial •  Would have been very difficult to accomplish active-active inter-DC replication with a team of 2 without Cassandra •  Rolling repair needed to make it safe (we use LOCAL_QUORUM) •  We schedule “narrow repairs” on different nodes in our cluster each night #CASSANDRAEU CASSANDRASUMMITEU
  42. 42. Compression •  Our stats cluster was running at ~1.5TB per node •  We didn’t want to add more nodes •  With compression, we are now back to ~600GB •  Easy to accomplish •  `nodetool upgradesstables` on a rolling schedule #CASSANDRAEU CASSANDRASUMMITEU
  43. 43. Management perspective #CASSANDRAEU CASSANDRASUMMITEU
  44. 44. “The days of the quick and dirty are over” Simon V, EVP Operations #CASSANDRAEU CASSANDRASUMMITEU
  45. 45. Technically, everything is fine… •  Our COO feels that C* is “technically good and beautiful”, a “perfectly good option” •  Our EVPO says that C* reminds him of a time series database in use at Goldman Sachs that had “very good performance” …but there are concerns #CASSANDRAEU CASSANDRASUMMITEU
  46. 46. People who can attempt to query MySQL People who can attempt to query Cassandra #CASSANDRAEU CASSANDRASUMMITEU
  47. 47. #CASSANDRAEU CASSANDRASUMMITEU
  48. 48. Lessons learned #CASSANDRAEU CASSANDRASUMMITEU
  49. 49. There might be a gulf in experience #CASSANDRAEU CASSANDRASUMMITEU
  50. 50. 10 Average years experience per team member MySQL Cassandra #CASSANDRAEU CASSANDRASUMMITEU
  51. 51. Lesson learned •  Have an advocate - get someone who will sell the vision internally •  Learn the theory - teach each team member the fundamentals •  Make an effort to get everyone on board #CASSANDRAEU CASSANDRASUMMITEU
  52. 52. Things can drift into failure #CASSANDRAEU CASSANDRASUMMITEU
  53. 53. #CASSANDRAEU CASSANDRASUMMITEU
  54. 54. #CASSANDRAEU CASSANDRASUMMITEU
  55. 55. #CASSANDRAEU CASSANDRASUMMITEU
  56. 56. #CASSANDRAEU CASSANDRASUMMITEU
  57. 57. #CASSANDRAEU CASSANDRASUMMITEU
  58. 58. Lesson learned •  Be pro-active with Cassandra, even if it seems to be running smoothly •  Peer-review data models, take time to think about them •  Big rows are bad - use cfstats to look for them •  Mixed workloads can cause problems - use cfhistograms and look out for signs of data modeling problems •  Think about the compaction strategy for each CF #CASSANDRAEU CASSANDRASUMMITEU
  59. 59. EBS is terrible #CASSANDRAEU CASSANDRASUMMITEU
  60. 60. Lessons learned •  EBS is nearly always the cause of Amazon outages •  EBS is a single point of failure (it will fail everywhere in your cluster) •  EBS is slow •  EBS is expensive •  EBS is unnecessary! #CASSANDRAEU CASSANDRASUMMITEU
  61. 61. Management need to know the trade offs #CASSANDRAEU CASSANDRASUMMITEU
  62. 62. Lessons learned •  Keep the business informed – explain the tradeoffs in simple terms •  Sing from the same hymn sheet •  Make sure there solutions in place for every use case from the beginning #CASSANDRAEU CASSANDRASUMMITEU
  63. 63. People who can attempt to query MySQL #CASSANDRAEU People who can attempt to query Cassandra CASSANDRASUMMITEU
  64. 64. Conclusions #CASSANDRAEU CASSANDRASUMMITEU
  65. 65. We like Cassandra •  Solid design •  HA characteristics •  Easy multi-DC setup •  Simplicity of operation #CASSANDRAEU CASSANDRASUMMITEU
  66. 66. Lessons for successful adoption •  Have an advocate, sell the dream •  Learn the fundamentals, get the best out of Cassandra •  Invest in tools to make life easier •  Keep management in the loop, explain the trade offs #CASSANDRAEU CASSANDRASUMMITEU
  67. 67. The future •  We will continue to invest in Cassandra as we expand globally •  We will hire people with experience running Cassandra •  We will focus on expanding our reporting facilities •  We aspire to extend our network (1M consumer installs, wallet) beyond cabs •  We will continue to hire the best engineers in London, NYC and Asia #CASSANDRAEU CASSANDRASUMMITEU
  68. 68. Questions? #CASSANDRAEU CASSANDRASUMMITEU
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×