BlaBlaCar Elastic Search Feedback


Published on

How ElasticSearch was deployed in BlaBlaCar company

Published in: Travel, Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

BlaBlaCar Elastic Search Feedback

  1. 1. 1/37ElasticSearchfeedback
  2. 2. 2/37Introduction
  3. 3. 3/37Nicolas Blanc - BlaBlArchitectSinfomicSinfomic(1999)@thewhitegeek(2001)(2005)(2008)(2012)
  4. 4. 4/37What is BlaBlaCar ?
  5. 5. 5/373 000 000MEMBERSIN EUROPE
  6. 6. 6/3710 9 countries10 9 countries● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg● NEW Germany● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg
  7. 7. 7/37Growth50 millions25 millionsJanuary2008January2013
  8. 8. 8/37Infrastructure 2 front web servers 2 MySQL master (+4 slaves SSD) 1 private cloud(KVM + Open vSwitch)●Redis●Memcache●RabbitMQ/workers 1 cluster ElasticSearch
  9. 9. 9/37Changing the Search Engine
  10. 10. 10/37Whats existing ? Why Changing ?MySQL Database●Relationnal DB (lots of join needed)●Plain SQL query●Home made geographical searchRecent problems●New feature, means more complex queries●Scalability : Performance depending on DB load
  11. 11. 11/37Initial requirementsScalability●Trip search need to be made in less than 200ms●The system part of the solution easy to maintain●Be able to cluster it (also to not have SPOF)Low code impact on existing application●Same features as of today (geographical search)●Minimize the developpers work●Add one missing feature : facets
  12. 12. 12/37Initial CompetitorsSenseiDB
  13. 13. 13/37Why ElasticSearch✔Easyest cluster possibility✔Good performance when indexing✔Few code to write to use it✔Schema less✔Based on Lucene✔Written in Java (need to code grouping feature)
  14. 14. 14/37ElasticSearch has won,now migrate our search !
  15. 15. 15/37Changing our mindsetObject in Relationnal Database●Can be exploded on multiple tables●Lots of informations usable by JOINObject in Document Oriented Database●Only one big index for theses objects●All informations need to be in the object, not onmultiple tables
  16. 16. 16/37Changing our mindsetObject in Relationnal Database●Can be exploded on multiple tables●Lots of informations usable by JOINObject in Document Oriented Database●Only one big index for theses objects●All informations need to be in the object, not onmultiple tables
  17. 17. 17/37Well defining our objectsNeed to know what we want to search●Searching trips (front office usage)●Searching members (backoffice usage)●Searching FAQ (front office usage)Think of all needed field●The ones used for query●The ones used for filters●The ones used for facets
  18. 18. 18/37Thinking of well defining indexSystem point of view●Number of Nodes in the cluster●Number of Shards●Number of ReplicaApplication point of view●Define type and attributes for all fields (mapping)●Using parent/child or nested to improve indexing●How to push documents from DB ?
  19. 19. 19/37Indexing : using a river or not ?River advantages●Plugs directly to our source backend●ElasticSearch API exists to code a new oneRiver problems●Not easy to add business logic on some fields●Really hard when your DB is unconventionnal●Full Reindex all the documents
  20. 20. 20/37Indexing : our manual wayWe write an asynchronous indexer●Written in java●Have business logic when fetching from db●Fetch from multiple DB/source●Use of java ES library●Easy interface●send {“trip”:1234567} and the server answer {“OK”}
  21. 21. 21/37One index sample : Trip
  22. 22. 22/37Well defining our object TripThink of all needed field●The ones used for query●Trip date of departure,from where,to where,user id●The ones used for filters●User ratings,price,vehicle,seats left,is user blocked(a blocked user, is a user who made some forbiddenaction on the website.)●The ones used for facets●User ratings,price,vehicle
  23. 23. 23/37Well defining our index TripThink of all system requirement●The cluster has 2 nodes●We keep the default configuration for shards/replicaThink of object mapping●For each field :●Define the type (string, long, geo_point, date,float, boolean)●Define the scope (include_in_all)●Define the analyzer (for type string)
  24. 24. 24/37Trip Mapping"trip": {"properties": {"is_user_blocked": {"type": "boolean","include_in_all" : false},"user_ratings" : {"type" : "long","include_in_all" : false},"from": {"type": "geo_point","include_in_all" : false},"price": {"include_in_all": false,"type": "float"},"price_euro": {"type": "float",“include_in_all: false},"seats_left": {"include_in_all": false,"type": "long"},"seats_offered": {"include_in_all": false,"type": "long"},"to": {"include_in_all": false,"type": "geo_point"},"trip_date": {"format": "dateOptionalTime","include_in_all": false,"type": "date"},“vehicle”: {"include_in_all": false,"type": "string"},"userid": {"include_in_all": false,"index": "not_analyzed","type": "string"}}}
  25. 25. 25/37Well indexing eventsWhich modification send event change●All trips creation/deletion/modification●Member modifications (block or not)●New ratings from other members●A seat has been reserved●Member change his vehicleEvent change is a call to internal indexer●Send {“trip”:123456} to indexer (create/update)●Send {“tripd”:123456} to indexer (delete)
  26. 26. 26/37Sample trip index query{"query": {"filtered": {"query": {"match_all": {}},"filter": {"and": [{"geo_distance": {"distance": "40.14937866995km","from": {"lat": 48.856614,"lon": 2.3522219}}}, {"geo_distance": {"distance": "40.14937866995km","to": {"lat": 45.764043,"lon": 4.835659}}},{"range": {"price": {"from": 0,"include_lower": false}}}]}}},"sort": [{"trip_date": { "order": "asc" },}],"filter": {"term": { "is_user_blocked": false }}},"from": 0,"size": 10}
  27. 27. 27/37The Real WorldA trip has now more than 30 fields●(faq is around 25 fields)●(members even more...)To build a trip document we need 3differents SQL queries●(FAQ : 2 differents SQL queries)●(Member : 10 differents SQL queries)A trip has only 1 shard (grouping)
  28. 28. 28/37And now the caveats
  29. 29. 29/37Preloaded ScriptsWe use mvel script to improve scoring●They are not clustered●Each node need to have the scripts●Need a node restart to be added or modifiedSolution : Chef (tool from Opscode)All nodes configurations are centralized into Chefrepository
  30. 30. 30/37Grouping documentsHome made patchs to ElasticSearch(based on a Martijn Van Groningen work in ElasticSearch(I hope so much)
  31. 31. 31/37Mapping modificationOn a running index :Changing a type is not allowedChanging analyzer is not allowedSolution : index alias1) Changing mapping → create a new index2) When new index is up to date → changing alias
  32. 32. 32/37IOs limitsWe have only 2 nodes●Trip index is around 2GB●But only 1 shard for Trip index●Can index 100 trips / seconds on busy eveningSolution : We put Intel SSDs(waiting for distributed grouping feature)
  33. 33. 33/37Choosing the analyzerSome field need to not be analyzed●If you use ISO code for country(IT, for Italy or DE for Germany are ignored insome cases)Global analyzer has limits●Accentuation from countries like France,Germany or Spain are not always parsed correctly●One analyzer by country is difficult to implementin some cases
  34. 34. 34/37OK Sweet,Whats next?
  35. 35. 35/37Using ElasticSearch to ease log analysis
  36. 36. 36/37By the way…We’re hiring !!!Dev, HTML Ninja, leader,…Come & See me right now… or send me your friends (And we have beer, baby foot and arcade cabinet  )
  37. 37. 37/37Thank you !Follow us !@covoiturageApply now