Cabs, Cassandra, and Hailo (at Cassandra EU)

Cabs, Cassandra, and Hailo

David Gardner, Architect at Hailo
#CASSANDRAEU

CASSANDRASUMMITEU

#CASSANDRAEU

CASSANDRASUMMITEU

0.6 to 1.2
• 1,352 changed files with 235,413 additions and 47,487 deletions
• 7,429 commits
• 1,653 tickets completed

https://github.com/apache/cassandra/compare/cassandra-0.6.0...cassandra-1.2
https://github.com/apache/cassandra/blob/trunk/CHANGES.txt

#CASSANDRAEU

CASSANDRASUMMITEU

What this talk is about
Cassandra adoption at Hailo from three perspectives:
1. Development
2. Operational
3. Management

#CASSANDRAEU

CASSANDRASUMMITEU

What is Hailo?
Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want.

#CASSANDRAEU

CASSANDRASUMMITEU

What is Hailo?
• The world’s highest-rated taxi app – over 11,000 five-star reviews
• Over 500,000 registered passengers
• A Hailo hail is accepted around the world every 4 seconds
• Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in
nearly 2 years of operation

#CASSANDRAEU

CASSANDRASUMMITEU

Hailo is growing
• Hailo is a marketplace that facilitates over $100M in run-rate
transactions and is making the world a better place for passengers
and drivers
• Hailo has raised over $50M in financing from the world's best
investors including Union Square Ventures, Accel, the founder of
Skype (via Atomico), Wellington Partners (Spotify), Sir Richard
Branson, and our CEO's mother, Janice

#CASSANDRAEU

CASSANDRASUMMITEU

The history
The story behind Cassandra adoption at Hailo

#CASSANDRAEU

CASSANDRASUMMITEU

Hailo launched in London in November 2011
• Launched on AWS
• Two PHP/MySQL web apps plus a Java backend
• Mostly built by a team of 3 or 4 backend engineers
• MySQL multi-master for single AZ resilience

#CASSANDRAEU

CASSANDRASUMMITEU

Why Cassandra?
• A desire for greater resilience – “become a utility”
Cassandra is designed for high availability

• Plans for international expansion around a single consumer app
Cassandra is good at global replication
• Expected growth
Cassandra scales linearly for both reads and writes
• Prior experience
I had experience with Cassandra and could recommend it
#CASSANDRAEU

CASSANDRASUMMITEU

The path to adoption
• Largely unilateral decision by developers – a result of a startup
culture

• Replacement of key consumer app functionality, splitting up the
PHP/MySQL web app into a mixture of global PHP/Java services
backed by a Cassandra data store
• Launched into production in September 2012 – originally just
powering North American expansion, before gradually switching
over Dublin and London
#CASSANDRAEU

CASSANDRASUMMITEU

One year on...
• Further breakdown of functionality into Go/Java SOA
• Migrating all online databases to Cassandra

#CASSANDRAEU

CASSANDRASUMMITEU

Development perspective

#CASSANDRAEU

CASSANDRASUMMITEU

“Cassandra just works”
Dom W, Senior Engineer

#CASSANDRAEU

CASSANDRASUMMITEU

Use cases
1. Entity storage
2. Time series data

#CASSANDRAEU

CASSANDRASUMMITEU

CF = customers
126007613634425612:
createdTimestamp:
email:
givenName:
familyName:
locale:
phone:

#CASSANDRAEU

1370465412
dave@cruft.co
Dave
Gardner
en_GB
+447911111111

CASSANDRASUMMITEU

Considerations for entity storage
• Do not read the entire entity, update one property and then write
back a mutation containing every column

• Only mutate columns that have been set
• This avoids read-before-write race conditions

#CASSANDRAEU

CASSANDRASUMMITEU

CF = stats_db
2013-06-01:
55374fa0-ce2b-11e2-8b8b-0800200c9a66:
a48bd800-ce2b-11e2-8b8b-0800200c9a66:
b0e15850-ce2b-11e2-8b8b-0800200c9a66:
bfac6c80-ce2b-11e2-8b8b-0800200c9a66:

#CASSANDRAEU

{“action”:”…

CASSANDRASUMMITEU

CF = stats_db
LON123456:
13b247f0-ce2c-11e2-8b8b-0800200c9a66:
20f70a40-ce2c-11e2-8b8b-0800200c9a66:
2b44d3b0-ce2c-11e2-8b8b-0800200c9a66:
338a22f0-ce2c-11e2-8b8b-0800200c9a66:

#CASSANDRAEU


CASSANDRASUMMITEU

Considerations for time series storage
• Choose row key carefully, since this partitions the records
• Think about how many records you want in a single row

• Denormalise on write into many indexes

#CASSANDRAEU

CASSANDRASUMMITEU

Client libraries
• Gossie (Go)
• Astyanax (Java)

• phpcassa (PHP)

#CASSANDRAEU

CASSANDRASUMMITEU

Analytics
• With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY

• We use Acunu Analytics to give us this abilty in real time, for preplanned query templates
• It is backed by Cassandra and therefore highly available, resilient
and globally distributed
• Integration is straightforward
#CASSANDRAEU

CASSANDRASUMMITEU

events

#CASSANDRAEU

NSQ

Acunu

C*

CASSANDRASUMMITEU

AQL
SELECT
SUM(accepted),
SUM(ignored),
SUM(declined),
SUM(withdrawn)
FROM Allocations
WHERE timestamp BETWEEN '1 week ago' AND 'now’
AND driver='LON123456789’
GROUP BY timestamp(day)
#CASSANDRAEU

CASSANDRASUMMITEU

Operational perspective

#CASSANDRAEU

CASSANDRASUMMITEU

“Allows a team of 2 to achieve things they
wouldn’t have considered before Cassandra
existed”
Chris H, Operations Engineer

#CASSANDRAEU

CASSANDRASUMMITEU

6

machines per region

3

regions

us-east-1

eu-west-1

us-east-1

eu-west-1

Operational
Cluster

clusters

Stats
Cluster

3

(stats cluster is a long story)

ap-southeast-1

#CASSANDRAEU

CASSANDRASUMMITEU

eu-west-1

us-east-1

ap-southeast-1

AZ1

AZ1

AZ1

AZ1

AZ1

AZ1

AZ2

AZ2

AZ2

AZ2

AZ2

AZ2

AZ3

AZ3

AZ3

AZ3

AZ3

AZ3

#CASSANDRAEU

CASSANDRASUMMITEU

Stats
Cluster

AWS VPCs with Open
VPN links
3 AZs per region

m1.large machines

~ 1TB/node

Provisoned IOPS EBS

#CASSANDRAEU

Operational
Cluster

~ 200GB/node

CASSANDRASUMMITEU

Backups
• SSTable snapshot
• Used to upload to S3, but this was taking >6 hours and consuming
all our network bandwidth
• Now take EBS snapshot of the data volumes

#CASSANDRAEU

CASSANDRASUMMITEU

Encryption
• Requirement for NYC launch
• We use dmcrypt to encrypt the entire EBS volume

• Chose dmcrypt because it is uncomplicated
• Our tests show a 1% performance hit in disk performance, which
concurs with what Amazon suggest

#CASSANDRAEU

CASSANDRASUMMITEU

Datastax Ops Centre is a quick win

#CASSANDRAEU

CASSANDRASUMMITEU

Multi DC
• Something that Cassandra makes trivial
• Would have been very difficult to accomplish active-active inter-DC
replication with a team of 2 without Cassandra
• Rolling repair needed to make it safe (we use LOCAL_QUORUM)
• We schedule “narrow repairs” on different nodes in our cluster
each night

#CASSANDRAEU

CASSANDRASUMMITEU

Compression
• Our stats cluster was running at ~1.5TB per node
• We didn’t want to add more nodes

• With compression, we are now back to ~600GB
• Easy to accomplish

• `nodetool upgradesstables` on a rolling schedule

#CASSANDRAEU

CASSANDRASUMMITEU

Management perspective

#CASSANDRAEU

CASSANDRASUMMITEU

“The days of the quick and dirty are over”
Simon V, EVP Operations

#CASSANDRAEU

CASSANDRASUMMITEU

Technically, everything is fine…
• Our COO feels that C* is “technically good and beautiful”, a
“perfectly good option”

• Our EVPO says that C* reminds him of a time series database in
use at Goldman Sachs that had “very good performance”

…but there are concerns
#CASSANDRAEU

CASSANDRASUMMITEU

People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra

#CASSANDRAEU

CASSANDRASUMMITEU

Lessons learned

#CASSANDRAEU

CASSANDRASUMMITEU

There might be a gulf in experience

#CASSANDRAEU

CASSANDRASUMMITEU

10

Average years experience
per team member

MySQL
#CASSANDRAEU

Cassandra
CASSANDRASUMMITEU

Lesson learned
• Have an advocate - get someone who will sell the vision internally
• Learn the theory - teach each team member the fundamentals

• Make an effort to get everyone on board

#CASSANDRAEU

CASSANDRASUMMITEU

Things can drift into failure

#CASSANDRAEU

CASSANDRASUMMITEU

Lesson learned
• Be pro-active with Cassandra, even if it seems to be running
smoothly

• Peer-review data models, take time to think about them
• Big rows are bad - use cfstats to look for them
• Mixed workloads can cause problems - use cfhistograms and look
out for signs of data modeling problems
• Think about the compaction strategy for each CF
#CASSANDRAEU

CASSANDRASUMMITEU

EBS is terrible

#CASSANDRAEU

CASSANDRASUMMITEU

Lessons learned
• EBS is nearly always the cause of Amazon outages
• EBS is a single point of failure (it will fail everywhere in your
cluster)
• EBS is slow
• EBS is expensive
• EBS is unnecessary!

#CASSANDRAEU

CASSANDRASUMMITEU

Management need to know the trade offs

#CASSANDRAEU

CASSANDRASUMMITEU

Lessons learned
• Keep the business informed – explain the tradeoffs in simple terms
• Sing from the same hymn sheet
• Make sure there solutions in place for every use case from the
beginning

#CASSANDRAEU

CASSANDRASUMMITEU

People who can
attempt to query
MySQL

#CASSANDRAEU

People who can
attempt to
query Cassandra

CASSANDRASUMMITEU

Conclusions

#CASSANDRAEU

CASSANDRASUMMITEU

We like Cassandra
• Solid design
• HA characteristics
• Easy multi-DC setup
• Simplicity of operation

#CASSANDRAEU

CASSANDRASUMMITEU

Lessons for successful adoption
• Have an advocate, sell the dream
• Learn the fundamentals, get the best out of Cassandra
• Invest in tools to make life easier
• Keep management in the loop, explain the trade offs

#CASSANDRAEU

CASSANDRASUMMITEU

The future
• We will continue to invest in Cassandra as we expand globally
• We will hire people with experience running Cassandra
• We will focus on expanding our reporting facilities
• We aspire to extend our network (1M consumer installs, wallet)
beyond cabs
• We will continue to hire the best engineers in London, NYC and
Asia
#CASSANDRAEU

CASSANDRASUMMITEU

Questions?

#CASSANDRAEU

CASSANDRASUMMITEU

Cabs, Cassandra, and Hailo (at Cassandra EU)

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (17)

Similar to Cabs, Cassandra, and Hailo (at Cassandra EU)

Similar to Cabs, Cassandra, and Hailo (at Cassandra EU) (20)

More from Dave Gardner

More from Dave Gardner (6)

Recently uploaded

Recently uploaded (20)

Cabs, Cassandra, and Hailo (at Cassandra EU)

Editor's Notes