Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013


Published on

Wanderu is a consumer-focused search engine for buses and trains. Eddy will recount the architectural, modeling and other technical “lessons learned” and “lessons unlearned” in implementing our geospatial and search features using Neo4j in the context of a NoSQL polyglot solution.

Published in: Technology
1 Comment
  • The            setup            in            the            video            no            longer            works.           
    And            all            other            links            in            comment            are            fake            too.           
    But            luckily,            we            found            a            working            one            here (copy paste link in browser) :            www.goo.gl/yT1SNP
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

  1. 1. Wanderu: Lessons Learned Lessons Learned and Unlearned from Building a Travel Site with Graphs and Neo4j Eddy Wong CTO, Wanderu.com @eddywongch
  2. 2. About Wanderu.com Search Engine for (Intercity) Buses and Trains
  3. 3. Demo
  4. 4. From pt A to pt B A Shortest Path Problem as a function of depart, arrive, price, duration, date times Philly A: NYC MEG, $9, 11/07/2013 MEG, $4, 11/07/2013 BOLT, $13, 11/07/2013 Nomenclature: Stations, Trips B: DC
  5. 5. Lessons •Architectural •Modeling •Geo Learned UnLearned Idea
  6. 6. Our Story • 2 yr startup, Tech started about 1+ yr ago • Beta in Mar 2013, Launch in Aug 2013 • Knew nothing about Neo4j when we started (Jun 2012) • Did not like the relational model: wanted schema-less and no self-joins • Wanted a graph model
  7. 7. Workflow Scraping Bus Websites JSON Non-uniform Data Server Store Uniform Data
  8. 8. Architectural Lessons Art: MC Escher
  9. 9. Our Situation • Data is written only in one direction • Users search for paths, then segments • Searches are done by date • Needed online capability • Trip info (price/avail) could change on some
  10. 10. Solution Scraping Bus Websites JSON Uniform Data Non-uniform Data Replica Mechanism Nodes & Edges Neo4j Mongo Conn MongoDB
  11. 11. MongoConnector • • • • • • • MongoDB Lab project, open source, unsupported Uses Replica Mechanism: Oplog Eventually Consistent (not real time) Written in Python Main methods: Upserts and Deletes, passes doc Implement DocMgr->Neo4jDocMgr->py2neo We can add new properties easily on the fly
  12. 12. Polyglot Arch BOS, NYC BOS, PHL NYC, DC NYC, PHL Scraping Bus Websites JSON Non-uniform Data Replica Mechanism MongoDB REST Server Nodes & Edges Neo4j Mongo Conn
  13. 13. Modeling Lessons Art: MC Escher
  14. 14. Our Story • We tried to “dump” all data into Neo4j • Edges had dates -> too many Edges -> “Super Node Problem” • Query perf was terrible (1+ mins) and worse as # edges increased • Tried Gremlin -> No improvements • Needed range queries on Edges
  15. 15. “Dehydate” • Don’t store everything in the Neo4j, only metadata • Use Neo4j as a “connection index” • Don’t store entities in Nodes, only keys • Don’t store heavy properties in Edges
  16. 16. Neo4j Model source: Wes Freeman, Tobias Lindaaker
  17. 17. Our Solution • Serve paths from Neo4j • Segments from MongoDB (with date constraints) • Back to “Joins” • “Join” across Neo4j + MongoDB: 1 != 525d9031e6c9236072114387
  18. 18. Joins across DBs MongoDB: Stations Neo4j: Nodes BOS NYC DC DC ... generated by dbs BOS NYC • Forget seq id ... • Use a human-created “UUID” string for id MongoDB: Trips Neo4j: Edges BOS-NYC BOS-NYC BOS-DC BOS-DC NYC-DC NYC-DC ... ... • Convert pair into id: depart-arrive • For example: BOSNYC
  19. 19. Geo Lessons Art: MC Escher
  20. 20. Hybrid Solution • Google Autocomplete • Google Maps • MongoDB station geo lookup
  21. 21. Lessons of Lessons • Really understand the Neo4j Runtime Model • Pick universal human generated ids • Join across dbs better than RDBMS: 10s paths x 100s segments vs. 500k x 500k • Glad to have picked Neo4j: doing content gen and more geo features now
  22. 22. Useful Links • Neo4j Internals slideshare.net/thobe/an-overview-of-neo4j-internals • Aseem’s Lessons Learned with Neo4j http://aseemk.com/talks/neo4j-lessons-learned#/14 • Wes Freeman, Neo4j Internals http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf • MongoConnector blog.mongodb.org/post/29127828146/introducing-mongo-connector