0
How Hailo fuels its growth using NoSQL Storage and Analytics
David Gardner, Architect @ Hailo
#NoSQLNow
#NoSQLNow
#NoSQLNow
#NoSQLNow
#NoSQLNow
• The world’s highest-rated taxi app – over 10,000 five-star reviews
• Over 500,000 registered passengers
• A Ha...
#NoSQLNow
• Hailo is a marketplace that facilitates over $100M in run-rate
transactions and is making the world a better p...
#NoSQLNow
• Why Hailo are using NoSQL
• How we use Cassandra
• How we use Acunu Analytics
• Challenges of NoSQL
What this ...
#NoSQLNow
Why choose NoSQL?
#NoSQLNow
“NoSQL DBs trade off traditional features to
better support new and emerging use cases”
Andy Gross, Riak
http://...
#NoSQLNow
• More widely used, tested and documented software
• Ad-hoc querying
• Talent pool with direct experience
What a...
#NoSQLNow
• High availability
• Scalability
• Operational simplicity
What do we get back in return?
#NoSQLNow
The path to adoption at Hailo
#NoSQLNow
Hailo launched in London in November 2011
• Launched on AWS
• Two PHP/MySQL web apps plus a Java backend
• Mostl...
#NoSQLNow
Why Cassandra?
• A desire for greater resilience – “become a utility”
Cassandra is designed for high availabilit...
#NoSQLNow
The path to adoption
• Largely unilateral decision by developers – a result of a startup
culture
• Replacement o...
#NoSQLNow
Cassandra at Hailo
#NoSQLNow
“Cassandra just works”
Dom W, Senior Engineer
#NoSQLNow
Use cases
1. Entity storage
2. Time series data
#NoSQLNow
CF = customers
126007613634425612:
createdTimestamp: 1370465412
email: dave@cruft.co
givenName: Dave
familyName:...
#NoSQLNow
Considerations for entity storage
• Do not read the entire entity, update one property and then write
back a mut...
#NoSQLNow
CF = comms
2013-06-01:
55374fa0-ce2b-11e2-8b8b-0800200c9a66:
{“to”:”dave@c…
a48bd800-ce2b-11e2-8b8b-0800200c9a66...
#NoSQLNow
CF = comms
dave@cruft.co:
13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…
20f70a40-ce2c-11e2-8b8b-0800200c9...
#NoSQLNow
Considerations for time series storage
• Choose row key carefully, since this partitions the records
• Think abo...
#NoSQLNow
Client libraries
• Astyanax (Java)
• phpcassa (PHP)
• github.com/carloscm/gossie (Go)
#NoSQLNow
#NoSQLNow
2 clusters
6 machines per region
3 regions
(stats cluster pending addition
of third DC)
Operational
Cluster
Stat...
#NoSQLNow
AWS VPCs with Open
VPN links
3 AZs per region
m1.large machines
Provisoned IOPS EBS
Operational
Cluster
Stats
Cl...
#NoSQLNow
Multi DC
• Something that Cassandra makes trivial
• Would have been very difficult to accomplish active-active i...
#NoSQLNow
#NoSQLNow
Acunu Analytics at Hailo
#NoSQLNow
Analytics
• With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY
• We use Acu...
#NoSQLNow
NSQ Acunu C*
events
#NoSQLNow
AQL
SELECT
SUM(accepted),
SUM(ignored),
SUM(declined),
SUM(withdrawn)
FROM Allocations
WHERE timestamp BETWEEN '...
#NoSQLNow
#NoSQLNow
#NoSQLNow
Challenges
#NoSQLNow
10 Average years experience
per team member
MySQL Cassandra
#NoSQLNow
People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra
#NoSQLNow
#NoSQLNow
Lessons learned
• Have an advovate - get someone who will sell the vision internally
• Teach team members the fu...
#NoSQLNow
People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra
#NoSQLNow
Conclusion
#NoSQLNow
We like Cassandra
• Solid design
• HA characteristics
• Easy multi-DC setup
• Simplicity of operation
#NoSQLNow
The future
• We will continue to invest in Cassandra as we expand globally
• We will hire people with experience...
Thank you
#NoSQLNow
Come and work with NoSQL full time: jobs.hailocab.com
Upcoming SlideShare
Loading in...5
×

How Hailo fuels its growth using NoSQL Storage and Analytics

1,471

Published on

Hailo is building the world's best taxi app -- we're already in 9 cities worldwide, have 300,000 registered passengers, and are growing (30%+) every month. Of course, that presents a serious infrastructure challenge.
I'll explain how we've built our service around tools that have three key NoSQL characteristics -- they're all distributed, resilient and operationally simple. The particular goals we set ourselves were around making it easy to replicate our architecture as we launch in new cities, to scale as we grow in each city, while all the time being able to coordinate that setup in a straightforward way.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,471
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Intro – about me
  • I founded the London meetup group in 2010 and have been flying the C* flag over London ever since. My motivation was to connect with others who were using Cassandra. Back then “swapping war stories” was a common theme. Cassandra was not easy to use.
  • Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want.On iOS and Android, live in London, New York, Chicago, Toronto, Boston, Dublin, Madrid
  • Trade offs
  • My recommendation was based on the solid design principles behind C*, something I’ve talked about in the past.
  • 27:00
  • Row key = entity ID, in this instance, a 64 bit integer a-la SnowflakeColumn name = property nameValue = property valueA key point when using this pattern is to only mutate columns that you change
  • Row key = entity ID, in this instance, a 64 bit integer a-la SnowflakeColumn name = property nameValue = property valueA key point when using this pattern is to only mutate columns that you change
  • Time series for storing records of emails sent. In this instance bucketed by a daily row key, for all messages. The column name is a type 1 UUID.
  • We also denormalise for other indexes, eg: here we store every message sent to a given address under a single row.
  • Row key = entity ID, in this instance, a 64 bit integer a-la SnowflakeColumn name = property nameValue = property valueA key point when using this pattern is to only mutate columns that you change
  • We are not using CQL.
  • London, NYC, Tokyo, Osaka, Dublin, Toronto, Boston, Chicago, Madrid, Barcelona, Washington, Montreal
  • Our rings, plus key stats (m1.large, 18 nodes in cluster A, 12 nodes in cluster B, 100GB per node in cluster A, ~ 600GB in cluster B)
  • Our rings, plus key stats (m1.large, 18 nodes in cluster A, 12 nodes in cluster B, 100GB per node in cluster A, ~ 600GB in cluster B)
  • We can execute AQL
  • Some screenshot
  • Some screenshot
  • 43:00
  • 1. Most people have N years of SQL experience where N >= 5
  • There is a perceptionthat we have made it much harder to get at our data. In the early days at Hailo, when we all worked in one room, developers could execute ad-hoc queries on the fly for management. Nowadays we can’t. The reasons behind this are two-fold – firstly it is true that C* is harder to execute ad-hoc queries. But that’s not the whole picture. Much of our data is still in MySQL, and the queries we used to do against this data do not run smoothly either. The perception, however, is that it is the “new database” that is the cause of problems.
  • It’s easy to cause yourself a “Big Data” problem. Developers collect and store data because they can, without being clear about the business implications.
  • With the right tools, we could change the picture completely.
  • 43:00
  • Transcript of "How Hailo fuels its growth using NoSQL Storage and Analytics"

    1. 1. How Hailo fuels its growth using NoSQL Storage and Analytics David Gardner, Architect @ Hailo #NoSQLNow
    2. 2. #NoSQLNow
    3. 3. #NoSQLNow
    4. 4. #NoSQLNow
    5. 5. #NoSQLNow • The world’s highest-rated taxi app – over 10,000 five-star reviews • Over 500,000 registered passengers • A Hailo e-hail is accepted by a driver every four seconds around the world • Hailo operates in ten cities from Tokyo to Toronto in just over eighteen months of operation What is Hailo?
    6. 6. #NoSQLNow • Hailo is a marketplace that facilitates over $100M in run-rate transactions and is making the world a better place for passengers and drivers • Hailo has raised over $50M in financing from the world's best investors including Union Square Ventures, Accel, the founder of Skype (via Atomico), Wellington Partners (Spotify), Sir Richard Branson, and our CEO's mother, Janice Hailo is growing
    7. 7. #NoSQLNow • Why Hailo are using NoSQL • How we use Cassandra • How we use Acunu Analytics • Challenges of NoSQL What this talk is about
    8. 8. #NoSQLNow Why choose NoSQL?
    9. 9. #NoSQLNow “NoSQL DBs trade off traditional features to better support new and emerging use cases” Andy Gross, Riak http://www.slideshare.net/argv0/riak-use-cases-dissecting-the-solutions-to-hard-problems
    10. 10. #NoSQLNow • More widely used, tested and documented software • Ad-hoc querying • Talent pool with direct experience What are we trading off?
    11. 11. #NoSQLNow • High availability • Scalability • Operational simplicity What do we get back in return?
    12. 12. #NoSQLNow The path to adoption at Hailo
    13. 13. #NoSQLNow Hailo launched in London in November 2011 • Launched on AWS • Two PHP/MySQL web apps plus a Java backend • Mostly built by a team of 3 or 4 backend engineers • MySQL multi-master for single AZ resilience
    14. 14. #NoSQLNow Why Cassandra? • A desire for greater resilience – “become a utility” Cassandra is designed for high availability • Plans for international expansion around a single consumer app Cassandra is good at global replication • Expected growth Cassandra scales linearly for both reads and writes • Prior experience I had experience with Cassandra and could recommend it
    15. 15. #NoSQLNow The path to adoption • Largely unilateral decision by developers – a result of a startup culture • Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store • Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London
    16. 16. #NoSQLNow Cassandra at Hailo
    17. 17. #NoSQLNow “Cassandra just works” Dom W, Senior Engineer
    18. 18. #NoSQLNow Use cases 1. Entity storage 2. Time series data
    19. 19. #NoSQLNow CF = customers 126007613634425612: createdTimestamp: 1370465412 email: dave@cruft.co givenName: Dave familyName: Gardner locale: en_GB phone: +447911111111
    20. 20. #NoSQLNow Considerations for entity storage • Do not read the entire entity, update one property and then write back a mutation containing every column • Only mutate columns that have been set • This avoids read-before-write race conditions
    21. 21. #NoSQLNow CF = comms 2013-06-01: 55374fa0-ce2b-11e2-8b8b-0800200c9a66: {“to”:”dave@c… a48bd800-ce2b-11e2-8b8b-0800200c9a66: {“to”:”foo@ex… b0e15850-ce2b-11e2-8b8b-0800200c9a66: {“to”:”bar@ho … bfac6c80-ce2b-11e2-8b8b-0800200c9a66: {“to”:”baz@fo…
    22. 22. #NoSQLNow CF = comms dave@cruft.co: 13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c… 20f70a40-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c… 2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c… 338a22f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…
    23. 23. #NoSQLNow Considerations for time series storage • Choose row key carefully, since this partitions the records • Think about how many records you want in a single row • Denormalise on write into many indexes
    24. 24. #NoSQLNow Client libraries • Astyanax (Java) • phpcassa (PHP) • github.com/carloscm/gossie (Go)
    25. 25. #NoSQLNow
    26. 26. #NoSQLNow 2 clusters 6 machines per region 3 regions (stats cluster pending addition of third DC) Operational Cluster Stats Cluster ap-southeast-1 us-east-1 eu-west-1 us-east-1 eu-west-1
    27. 27. #NoSQLNow AWS VPCs with Open VPN links 3 AZs per region m1.large machines Provisoned IOPS EBS Operational Cluster Stats Cluster ~ 600GB/node ~ 100GB/node
    28. 28. #NoSQLNow Multi DC • Something that Cassandra makes trivial • Would have been very difficult to accomplish active-active inter-DC replication with a team of 2 without Cassandra • Rolling repair needed to make it safe (we use LOCAL_QUORUM) • We schedule “narrow repairs” on different nodes in our cluster each night
    29. 29. #NoSQLNow
    30. 30. #NoSQLNow Acunu Analytics at Hailo
    31. 31. #NoSQLNow Analytics • With Cassandra we lost the ability to carry out analytics eg: COUNT, SUM, AVG, GROUP BY • We use Acunu Analytics to give us this abilty in real time, for pre- planned query templates • It is backed by Cassandra and therefore highly available, resilient and globally distributed • Integration is straightforward
    32. 32. #NoSQLNow NSQ Acunu C* events
    33. 33. #NoSQLNow AQL SELECT SUM(accepted), SUM(ignored), SUM(declined), SUM(withdrawn) FROM Allocations WHERE timestamp BETWEEN '1 week ago' AND 'now’ AND driver='LON123456789’ GROUP BY timestamp(day)
    34. 34. #NoSQLNow
    35. 35. #NoSQLNow
    36. 36. #NoSQLNow Challenges
    37. 37. #NoSQLNow 10 Average years experience per team member MySQL Cassandra
    38. 38. #NoSQLNow People who can attempt to query MySQL People who can attempt to query Cassandra
    39. 39. #NoSQLNow
    40. 40. #NoSQLNow Lessons learned • Have an advovate - get someone who will sell the vision internally • Teach team members the fundamentals of how the solution works • Don’t cause yourself a “big data” problem unnecessarily • Explain trade-offs in choosing NoSQL to all parts of the business • Provide solutions!
    41. 41. #NoSQLNow People who can attempt to query MySQL People who can attempt to query Cassandra
    42. 42. #NoSQLNow Conclusion
    43. 43. #NoSQLNow We like Cassandra • Solid design • HA characteristics • Easy multi-DC setup • Simplicity of operation
    44. 44. #NoSQLNow The future • We will continue to invest in Cassandra as we expand globally • We will hire people with experience running Cassandra • We will focus on expanding our reporting facilities • We aspire to extend our network (1M consumer installs, wallet) beyond cabs • We will continue to hire the best engineers in London, NYC and Asia
    45. 45. Thank you #NoSQLNow Come and work with NoSQL full time: jobs.hailocab.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×