• Share
  • Email
  • Embed
  • Like
  • Private Content
Wanderu - Lessons from Building a Travel Site with Neo4j
 

Wanderu - Lessons from Building a Travel Site with Neo4j

on

  • 2,117 views

Wanderu is a consumer-focused search engine for buses and trains. In this webinar, we will recount the architectural, modeling and other technical "lessons learned" and "lessons unlearned" in ...

Wanderu is a consumer-focused search engine for buses and trains. In this webinar, we will recount the architectural, modeling and other technical "lessons learned" and "lessons unlearned" in implementing our geospatial and search features using Neo4j in the context of a NoSQL polyglot solution.

Speaker: Eddy Wong, CTO, Wanderu
A technologist, innovator and entrepreneur who has architected products and web sites for companies like Hasbro, Maark, Allurent, Macromedia, Allaire, Open Sesame, Philips and AT&T. He was the Chief Architect at Open Sesame where he built one of the first attribute-based personalization engines. Eddy has over 15 years of experience as a software architect and is a Boston tech-community leader in the areas of NoSQL, Big Data and Personalization. He is also the organizer of the Boston GraphDB Meetup.

Statistics

Views

Total Views
2,117
Views on SlideShare
2,099
Embed Views
18

Actions

Likes
8
Downloads
41
Comments
1

2 Embeds 18

http://neo4j.com 17
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • It seems like OrientDB would have been a better fit: graphdb and document store all in one.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Wanderu - Lessons from Building a Travel Site with Neo4j Wanderu - Lessons from Building a Travel Site with Neo4j Presentation Transcript

    • Wanderu: Lessons Learned Lessons Learned and Unlearned from Building a Travel Site with Graphs and Neo4j Eddy Wong CTO, Wanderu.com @eddywongch
    • About Wanderu.com Search Engine for (Intercity) Buses and Trains
    • Demo
    • From pt A to pt B A: Boston B: DC NYC Nomenclature: Stations,Trips Amtrak, $101, 09/26/2013 Bolt, $25, 09/26/2013 Mega, $24, 09/26/2013
    • From pt A to pt B B: Brooklyn, NY A: Cambridge, MA 31st & 9th Ave, NYC South Station, Boston 28st & 7th Ave, NYC 34st & 8th Ave, NYC
    • Our Story • Tech Started about 1+ yr ago • Beta in Mar, Launch in Aug • Knew nothing about Neo4j when we started (Jun 2012) • Did not like the relational model: wanted schema-less and no self-joins • Wanted a graph model
    • Relational vs. Graph
    • Lessons Learned UnLearned Idea •Architectural •Modeling •Geo
    • Architectural Lessons Art: MC Escher
    • Our Story • Started with MongoDB as a general store: easy to manipulate and organize data • Wanted a db that could preserve the Graph Model • Debated: Document vs. Graph • Could not find one single db that could do both: general store + graph
    • Workflow Store Scraping JSON Bus Websites Non-uniform Data Uniform Data Server
    • noSQL • You need to make a choice of one noSQL database • You need ONE (centralized) database • The word “database” is a loaded term • Lots of (very diff) noSQL dbs options
    • Our Situation • Data is written only in one direction • Users search for paths, then segments • Searches are done by date • Needed online capability • Trip info (price/avail) could change on some
    • Our Solution • Use Both: MongoDB + Neo4j • “Docugraph” = Document + Graph • Syncing two kinds of databases • Eventual consistency
    • Pipeline Scraping JSON Bus Websites Non-uniform Data Uniform Data MongoDBNeo4j Mongo Conn Nodes & Edges Replica Mechanism
    • MongoConnector • MongoDB Lab project, open source, unsupported • Uses Replica Mechanism: Oplog • Eventually Consistent (not real time) • Written in Python • Main methods: Upserts and Deletes, passes doc • Implement DocMgr->Neo4jDocMgr->py2neo • Other impls: MongoDocMgr, SolrDocMgr, ESDocMgr
    • Populating Neo4j (2) • Created our own way of creating Edges • Auto Node creation when Edge is created: Could add Stations (nodes) on the fly • py2neo requires 2 “node ref”s to create an edge, ie. might need two round trips to Neo4j
    • Edge Creator P-code hashtable allStations = load_stations w_create_edge (station_id a, station_id b, otherdata) look_up a in allStations If found -> ref_a = allStations.get(a) If not found -> ref_a = py2neo.create_node(a) Add a to allStations ... py2neo.create_edge(ref_a, ref_b, ...)
    • Pipeline Scraping JSON Bus Websites Non-uniform Data MongoDB Neo4j Mongo ConnNodes & Edges Replica Mechanism REST Server BOS, NYC BOS, PHL NYC, DC NYC, PHL
    • Modeling Lessons Art: MC Escher
    • Our Story • We tried to “dump” all data into Neo4j • Stations -> Nodes,Trips -> Edges • Problem: Edges had dates -> too many Edges -> “Super Node” • Query perf was terrible (1+ mins) and worse as # edges increased
    • Our Story (2) • Went from Cypher to Gremlin, thinking that would have improve performance • Needed range queries on Edges
    • Our Solution • Don’t store everything in the Neo4j, only metadata • Use Neo4j as an index • Don’t store entities in Nodes, only keys • Don’t store heavy properties in Edges
    • Neo4j Model source:Tobias Lindaaker, Wes Freeman
    • Neo4j RuntimeModel • Relationships are in a linked list • Properties are in a linked list • Therefore:There is NO random access for Relationships or Properties • A range query of relationships required a full scan
    • Our Solution (2) • Needed ability to do range queries on Edges • Serve paths from Neo4j, segments from MongoDB • The one thing we tried to avoid we ended up doing: Joins • Came up with “Docugraph” approach
    • Docugraph • MongoDB Collections for Nodes and Edges • Neo4j: Only keys for nodes • Neo4j: Only Properties relevant for queries
    • Nodes & Edges • Collection for Stations (nodes) {id: “BOS”, name: “Boston South Station”, address: “Summer St”, ...} • Collection for Trips (edges) {depart_id: “BOS”, arrive_id: “NYC”, carrier: “Megabus”, price: 24.0, ...}
    • Modeling • Storing info in two or more dbs • Doing a “join” across multiple dbs
    • Joins across DBs MongoDB: Stations Neo4j: Nodes BOS BOS NYC NYC DC DC ... ... MongoDB: Trips Neo4j: Edges BOS-NYC BOS-NYC BOS-DC BOS-DC NYC-DC NYC-DC ... ... • Forget seq id generated by dbs • Use a human-created long string for id • Convert pair into id: depart-arrive • For example: BOS- NYC
    • Indexing Technique • Index Trips by {origin-dest, datetime}
    • Querying • REST API in node.js • Assemble results from two sources • Paths from Neo4j • Segments from MongoDB • Sort by price, duration
    • Geo Lessons Art: MC Escher
    • Our Story • Wanted to mix public transport data with intercity data • Did not want to host all public transport data • Created a hybrid solution
    • Our Solution • Hybrid: • Google Autocomplete • Google Maps • In house station geo lookup
    • Geo • Neo4j geo func was not out of the box • Requires jar install • Run a Java program to index • Needed better doc • Ended up using MongoDB geo instead • Make geo func out of the box
    • Conclusions • Even with a join across dbs -> solution better than relational • 10s paths x 100s segments vs. 500k x 500k • Glad to have picked Neo4j: doing content gen and more geo features now • Graph model will be useful for future analytics->Big Data
    • Useful Links • Neo4j Internals slideshare.net/thobe/an-overview-of-neo4j-internals • Aseem’s Lessons Learned with Neo4j http://aseemk.com/talks/neo4j-lessons-learned#/14 • Wes Freeman, Neo4j Internals http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf • MongoConnector blog.mongodb.org/post/29127828146/introducing-mongo-connector