Hailo and NoSQL

David Gardner, Architect at Hailo
JAXLONDON2013
JAXLONDON2013
What this talk is about
1.  Why choose NoSQL
2.  A whistle-stop tour of Cassandra
3.  Adoption of Cassandra at Hailo

JAXL...
What is Hailo?
Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want.

JAXLONDON2013
JAXLONDON2013
JAXLONDON2013
JAXLONDON2013
Facts and figures
•  The world’s highest-rated taxi app – over 11,000 five-star reviews
•  Over 500,000 registered passenger...
Hailo is growing
•  Hailo is a marketplace that facilitates over $100M in run-rate
transactions and is making the world a ...
Why choose NoSQL?

JAXLONDON2013
“NoSQL DBs trade off traditional features to better
support new and emerging use cases”
Andy Gross, Riak

http://www.slide...
What are we trading off?
•  More widely used, tested and documented software
•  Ad-hoc querying
•  Talent pool with direct...
What do we get back in return?
•  High availability
•  Scalability
•  Operational simplicity

JAXLONDON2013
Cassandra 101

JAXLONDON2013
Amazon Dynamo

+

Consistent hashing
Vector clocks *
Gossip protocol
Hinted handoff
Read repair
http://www.allthingsdistri...
tokens are integers
from 0 to 2127

three replicas (RF=3)
coordinator node
Client	


JAXLONDON2013
Consistency level (CL)
How many replicas must respond to declare success?
Level

Description

ONE

1st Response

QUORUM

N...
Big Table
• 
• 
• 
• 
• 
• 

Sparse column based data model
SSTable disk storage
Append-only commit log
Memtable (buffer a...
Name

Plus timestamp,
used for Last Write Wins
(LWW) conflict resolution

Value
Column

JAXLONDON2013
we can have
millions of columns

Name

Name

Name

Value

Value

Value

Column

Column

Column

JAXLONDON2013
Row

Name

Name

Name

Value

Value

Value

Column

Column

Column

Row Key

JAXLONDON2013
Column Family

Row Key

Column

Column

Column

Row Key

Column

Column

Column

Row Key

Column

Column

Column

we can h...
buffers writes and
sorts data
Memtable

Write

flush on time or
size trigger

Memory
Disk

Commit
Log

SSTable

SSTable

i...
Cassandra at Hailo

JAXLONDON2013
Hailo launched in London in November 2011
•  Launched on AWS
•  Two PHP/MySQL web apps plus a Java backend
•  Mostly built...
Why Cassandra?
•  A desire for greater resilience – “become a utility”
Cassandra is designed for high availability
•  Plan...
The path to adoption
•  Largely unilateral decision by developers – a result of a startup
culture
•  Replacement of key co...
One year on...
•  Further breakdown of functionality into Go/Java SOA
•  Migrating all online databases to Cassandra

JAXL...
Development perspective

JAXLONDON2013
“Cassandra just works”
Dom W, Senior Engineer

JAXLONDON2013
Use cases
1.  Entity storage
2.  Time series data

JAXLONDON2013
CF = customers
126007613634425612:
createdTimestamp:
email:
givenName:
familyName:
locale:
phone:

1370465412
dave@cruft.c...
Considerations for entity storage
•  Do not read the entire entity, update one property and then write
back a mutation con...
JAXLONDON2013
CF = stats_db
2013-06-01:
55374fa0-ce2b-11e2-8b8b-0800200c9a66:
a48bd800-ce2b-11e2-8b8b-0800200c9a66:
b0e15850-ce2b-11e2-8...
CF = stats_db
LON123456:
13b247f0-ce2c-11e2-8b8b-0800200c9a66:
20f70a40-ce2c-11e2-8b8b-0800200c9a66:
2b44d3b0-ce2c-11e2-8b...
JAXLONDON2013
Considerations for time series storage
•  Choose row key carefully, since this partitions the records
•  Think about how m...
Analytics
•  With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY
•  We use Acunu Analy...
events

NSQ

Acunu

C*

JAXLONDON2013
AQL
SELECT
SUM(accepted),
SUM(ignored),
SUM(declined),
SUM(withdrawn)
FROM Allocations
WHERE timestamp BETWEEN '1 week ago...
Get a picture of driver supply
SELECT COUNT DISTINCT(driverId)
FROM driverLocs
WHERE timestamp BETWEEN '1 day ago' AND 'no...
JAXLONDON2013
Operational perspective

JAXLONDON2013
“Allows a team of 2 to achieve things they wouldn’t
have considered before Cassandra existed”
Chris H, Operations Engineer...
JAXLONDON2013
6

machines per region

3

regions

us-east-1

eu-west-1

us-east-1

eu-west-1

Operational
Cluster

clusters

Stats
Clust...
eu-west-1

us-east-1

ap-southeast-1

AZ1

AZ1

AZ1

AZ1

AZ1

AZ1

AZ2

AZ2

AZ2

AZ2

AZ2

AZ2

AZ3

AZ3

AZ3

AZ3

AZ3
...
Stats
Cluster

AWS VPCs with Open
VPN links
3 AZs per region
m1.large machines

~ 1TB/node

Provisoned IOPS EBS
Operationa...
Backups
•  SSTable snapshot
•  Used to upload to S3, but this was taking >6 hours and consuming
all our network bandwidth
...
Encryption
•  Requirement for NYC launch
•  We use dmcrypt to encrypt the entire EBS volume
•  Chose dmcrypt because it is...
Datastax Ops Centre is a quick win

JAXLONDON2013
Multi DC
•  Something that Cassandra makes trivial
•  Would have been very difficult to accomplish active-active inter-DC
r...
Compression
•  Our stats cluster was running at ~1.5TB per node
•  We didn’t want to add more nodes
•  With compression, w...
Management perspective

JAXLONDON2013
“The days of the quick and dirty are over”
Simon V, EVP Operations

JAXLONDON2013
Technically, everything is fine…
•  Our COO feels that C* is “technically good and beautiful”, a
“perfectly good option”
• ...
People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra

JAXLONDON2013
JAXLONDON2013
Lessons learned

JAXLONDON2013
There might be a gulf in experience

JAXLONDON2013
10

Average years experience
per team member

MySQL

Cassandra
JAXLONDON2013
Lesson learned
•  Have an advocate - get someone who will sell the vision internally
•  Learn the theory - teach each team...
Things can drift into failure

JAXLONDON2013
JAXLONDON2013
JAXLONDON2013
JAXLONDON2013
JAXLONDON2013
JAXLONDON2013
Lesson learned
•  Be pro-active with Cassandra, even if it seems to be running
smoothly
•  Peer-review data models, take t...
EBS is terrible

JAXLONDON2013
Lessons learned
•  EBS is nearly always the cause of Amazon outages
•  EBS is a single point of failure (it will fail ever...
Management need to know the trade offs

JAXLONDON2013
Lessons learned
•  Keep the business informed – explain the tradeoffs in simple terms
•  Sing from the same hymn sheet
•  ...
People who can
attempt to query
MySQL

People who can
attempt to
query Cassandra

JAXLONDON2013
Conclusions

JAXLONDON2013
We like Cassandra
•  Solid design
•  HA characteristics
•  Easy multi-DC setup
•  Simplicity of operation

JAXLONDON2013
Lessons for successful adoption
•  Have an advocate, sell the dream
•  Learn the fundamentals, get the best out of Cassand...
The future
•  We will continue to invest in Cassandra as we expand globally
•  We will hire people with experience running...
Questions?

JAXLONDON2013
Upcoming SlideShare
Loading in...5
×

How Hailo fuels its growth using NoSQL storage and analytics - Dave Gardner (Hailo)

1,026

Published on

Presented at JAX London 2013

Hailo, the taxi app, has served more than 5 million passengers in 15 cities and has taken fares of $100 million this year. I'm going to talk about how that rapid growth has been powered by a platform based on Cassandra and operational analytics and insights powered by Acunu Analytics. I'll cover some challenges and lessons learned from scaling fast!

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,026
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

How Hailo fuels its growth using NoSQL storage and analytics - Dave Gardner (Hailo)

  1. 1. Hailo and NoSQL David Gardner, Architect at Hailo JAXLONDON2013
  2. 2. JAXLONDON2013
  3. 3. What this talk is about 1.  Why choose NoSQL 2.  A whistle-stop tour of Cassandra 3.  Adoption of Cassandra at Hailo JAXLONDON2013
  4. 4. What is Hailo? Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want. JAXLONDON2013
  5. 5. JAXLONDON2013
  6. 6. JAXLONDON2013
  7. 7. JAXLONDON2013
  8. 8. Facts and figures •  The world’s highest-rated taxi app – over 11,000 five-star reviews •  Over 500,000 registered passengers •  A Hailo hail is accepted around the world every 4 seconds •  Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation JAXLONDON2013
  9. 9. Hailo is growing •  Hailo is a marketplace that facilitates over $100M in run-rate transactions and is making the world a better place for passengers and drivers •  Hailo has raised over $50M in financing from the world's best investors including Union Square Ventures, Accel, the founder of Skype (via Atomico), Wellington Partners (Spotify), Sir Richard Branson, and our CEO's mother, Janice JAXLONDON2013
  10. 10. Why choose NoSQL? JAXLONDON2013
  11. 11. “NoSQL DBs trade off traditional features to better support new and emerging use cases” Andy Gross, Riak http://www.slideshare.net/argv0/riak-use-cases-dissecting-the-solutions-to-hard-problems JAXLONDON2013
  12. 12. What are we trading off? •  More widely used, tested and documented software •  Ad-hoc querying •  Talent pool with direct experience JAXLONDON2013
  13. 13. What do we get back in return? •  High availability •  Scalability •  Operational simplicity JAXLONDON2013
  14. 14. Cassandra 101 JAXLONDON2013
  15. 15. Amazon Dynamo + Consistent hashing Vector clocks * Gossip protocol Hinted handoff Read repair http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Google Big Table Columnar SSTable storage Append-only Memtable Compaction http://labs.google.com/papers/bigtableosdi06.pdf JAXLONDON2013
  16. 16. tokens are integers from 0 to 2127 three replicas (RF=3) coordinator node Client JAXLONDON2013
  17. 17. Consistency level (CL) How many replicas must respond to declare success? Level Description ONE 1st Response QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas JAXLONDON2013
  18. 18. Big Table •  •  •  •  •  •  Sparse column based data model SSTable disk storage Append-only commit log Memtable (buffer and sort) Immutable SSTable files Compaction http://research.google.com/archive/bigtable-osdi06.pdf http://www.slideshare.net/geminimobile/bigtable-4820829 JAXLONDON2013
  19. 19. Name Plus timestamp, used for Last Write Wins (LWW) conflict resolution Value Column JAXLONDON2013
  20. 20. we can have millions of columns Name Name Name Value Value Value Column Column Column JAXLONDON2013
  21. 21. Row Name Name Name Value Value Value Column Column Column Row Key JAXLONDON2013
  22. 22. Column Family Row Key Column Column Column Row Key Column Column Column Row Key Column Column Column we can have billions of rows JAXLONDON2013
  23. 23. buffers writes and sorts data Memtable Write flush on time or size trigger Memory Disk Commit Log SSTable SSTable immutable JAXLONDON2013
  24. 24. Cassandra at Hailo JAXLONDON2013
  25. 25. Hailo launched in London in November 2011 •  Launched on AWS •  Two PHP/MySQL web apps plus a Java backend •  Mostly built by a team of 3 or 4 backend engineers •  MySQL multi-master for single AZ resilience JAXLONDON2013
  26. 26. Why Cassandra? •  A desire for greater resilience – “become a utility” Cassandra is designed for high availability •  Plans for international expansion around a single consumer app Cassandra is good at global replication •  Expected growth Cassandra scales linearly for both reads and writes •  Prior experience I had experience with Cassandra and could recommend it JAXLONDON2013
  27. 27. The path to adoption •  Largely unilateral decision by developers – a result of a startup culture •  Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store •  Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London JAXLONDON2013
  28. 28. One year on... •  Further breakdown of functionality into Go/Java SOA •  Migrating all online databases to Cassandra JAXLONDON2013
  29. 29. Development perspective JAXLONDON2013
  30. 30. “Cassandra just works” Dom W, Senior Engineer JAXLONDON2013
  31. 31. Use cases 1.  Entity storage 2.  Time series data JAXLONDON2013
  32. 32. CF = customers 126007613634425612: createdTimestamp: email: givenName: familyName: locale: phone: 1370465412 dave@cruft.co Dave Gardner en_GB +447911111111 JAXLONDON2013
  33. 33. Considerations for entity storage •  Do not read the entire entity, update one property and then write back a mutation containing every column •  Only mutate columns that have been set •  This avoids read-before-write race conditions JAXLONDON2013
  34. 34. JAXLONDON2013
  35. 35. CF = stats_db 2013-06-01: 55374fa0-ce2b-11e2-8b8b-0800200c9a66: a48bd800-ce2b-11e2-8b8b-0800200c9a66: b0e15850-ce2b-11e2-8b8b-0800200c9a66: bfac6c80-ce2b-11e2-8b8b-0800200c9a66: {“action”:”… {“action”:”… {“action”:”… {“action”:”… JAXLONDON2013
  36. 36. CF = stats_db LON123456: 13b247f0-ce2c-11e2-8b8b-0800200c9a66: 20f70a40-ce2c-11e2-8b8b-0800200c9a66: 2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: 338a22f0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”… {“action”:”… {“action”:”… {“action”:”… JAXLONDON2013
  37. 37. JAXLONDON2013
  38. 38. Considerations for time series storage •  Choose row key carefully, since this partitions the records •  Think about how many records you want in a single row •  Denormalise on write into many indexes JAXLONDON2013
  39. 39. Analytics •  With Cassandra we lost the ability to carry out analytics eg: COUNT, SUM, AVG, GROUP BY •  We use Acunu Analytics to give us this abilty in real time, for preplanned query templates •  It is backed by Cassandra and therefore highly available, resilient and globally distributed •  Integration is straightforward (HTTP POST) JAXLONDON2013
  40. 40. events NSQ Acunu C* JAXLONDON2013
  41. 41. AQL SELECT SUM(accepted), SUM(ignored), SUM(declined), SUM(withdrawn) FROM Allocations WHERE timestamp BETWEEN '1 week ago' AND 'now’ AND driver='LON123456789’ GROUP BY timestamp(day) JAXLONDON2013
  42. 42. Get a picture of driver supply SELECT COUNT DISTINCT(driverId) FROM driverLocs WHERE timestamp BETWEEN '1 day ago' AND 'now' GROUP BY timestamp(hour) SELECT COUNT FROM driverLocs WHERE timestamp BETWEEN '1 day ago' AND 'now' GROUP BY latitude(0.01), longitude(0.01) JAXLONDON2013
  43. 43. JAXLONDON2013
  44. 44. Operational perspective JAXLONDON2013
  45. 45. “Allows a team of 2 to achieve things they wouldn’t have considered before Cassandra existed” Chris H, Operations Engineer JAXLONDON2013
  46. 46. JAXLONDON2013
  47. 47. 6 machines per region 3 regions us-east-1 eu-west-1 us-east-1 eu-west-1 Operational Cluster clusters Stats Cluster 3 (stats cluster is a long story) ap-southeast-1 JAXLONDON2013
  48. 48. eu-west-1 us-east-1 ap-southeast-1 AZ1 AZ1 AZ1 AZ1 AZ1 AZ1 AZ2 AZ2 AZ2 AZ2 AZ2 AZ2 AZ3 AZ3 AZ3 AZ3 AZ3 AZ3 JAXLONDON2013
  49. 49. Stats Cluster AWS VPCs with Open VPN links 3 AZs per region m1.large machines ~ 1TB/node Provisoned IOPS EBS Operational Cluster ~ 200GB/node JAXLONDON2013
  50. 50. Backups •  SSTable snapshot •  Used to upload to S3, but this was taking >6 hours and consuming all our network bandwidth •  Now take EBS snapshot of the data volumes JAXLONDON2013
  51. 51. Encryption •  Requirement for NYC launch •  We use dmcrypt to encrypt the entire EBS volume •  Chose dmcrypt because it is uncomplicated •  Our tests show a 1% performance hit in disk performance, which concurs with what Amazon suggest JAXLONDON2013
  52. 52. Datastax Ops Centre is a quick win JAXLONDON2013
  53. 53. Multi DC •  Something that Cassandra makes trivial •  Would have been very difficult to accomplish active-active inter-DC replication with a team of 2 without Cassandra •  Rolling repair needed to make it safe (we use LOCAL_QUORUM) •  We schedule “narrow repairs” on different nodes in our cluster each night JAXLONDON2013
  54. 54. Compression •  Our stats cluster was running at ~1.5TB per node •  We didn’t want to add more nodes •  With compression, we are now back to ~600GB •  Easy to accomplish •  `nodetool upgradesstables` on a rolling schedule JAXLONDON2013
  55. 55. Management perspective JAXLONDON2013
  56. 56. “The days of the quick and dirty are over” Simon V, EVP Operations JAXLONDON2013
  57. 57. Technically, everything is fine… •  Our COO feels that C* is “technically good and beautiful”, a “perfectly good option” •  Our EVPO says that C* reminds him of a time series database in use at Goldman Sachs that had “very good performance” …but there are concerns JAXLONDON2013
  58. 58. People who can attempt to query MySQL People who can attempt to query Cassandra JAXLONDON2013
  59. 59. JAXLONDON2013
  60. 60. Lessons learned JAXLONDON2013
  61. 61. There might be a gulf in experience JAXLONDON2013
  62. 62. 10 Average years experience per team member MySQL Cassandra JAXLONDON2013
  63. 63. Lesson learned •  Have an advocate - get someone who will sell the vision internally •  Learn the theory - teach each team member the fundamentals •  Make an effort to get everyone on board JAXLONDON2013
  64. 64. Things can drift into failure JAXLONDON2013
  65. 65. JAXLONDON2013
  66. 66. JAXLONDON2013
  67. 67. JAXLONDON2013
  68. 68. JAXLONDON2013
  69. 69. JAXLONDON2013
  70. 70. Lesson learned •  Be pro-active with Cassandra, even if it seems to be running smoothly •  Peer-review data models, take time to think about them •  Big rows are bad - use cfstats to look for them •  Mixed workloads can cause problems - use cfhistograms and look out for signs of data modeling problems •  Think about the compaction strategy for each CF JAXLONDON2013
  71. 71. EBS is terrible JAXLONDON2013
  72. 72. Lessons learned •  EBS is nearly always the cause of Amazon outages •  EBS is a single point of failure (it will fail everywhere in your cluster) •  EBS is slow •  EBS is expensive •  EBS is unnecessary! JAXLONDON2013
  73. 73. Management need to know the trade offs JAXLONDON2013
  74. 74. Lessons learned •  Keep the business informed – explain the tradeoffs in simple terms •  Sing from the same hymn sheet •  Make sure there solutions in place for every use case from the beginning JAXLONDON2013
  75. 75. People who can attempt to query MySQL People who can attempt to query Cassandra JAXLONDON2013
  76. 76. Conclusions JAXLONDON2013
  77. 77. We like Cassandra •  Solid design •  HA characteristics •  Easy multi-DC setup •  Simplicity of operation JAXLONDON2013
  78. 78. Lessons for successful adoption •  Have an advocate, sell the dream •  Learn the fundamentals, get the best out of Cassandra •  Invest in tools to make life easier •  Keep management in the loop, explain the trade offs JAXLONDON2013
  79. 79. The future •  We will continue to invest in Cassandra as we expand globally •  We will hire people with experience running Cassandra •  We will focus on expanding our reporting facilities •  We aspire to extend our network (1M consumer installs, wallet) beyond cabs •  We will continue to hire the best engineers in London, NYC and Asia JAXLONDON2013
  80. 80. Questions? JAXLONDON2013
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×