What is aGraph DatabaseA graph database is an online (“real-time”)database management system with CRUDmethods that expose a graph data model• Two important properties:• Native graph processing, includingindex-free adjacency1 to facilitate traversals• Native graph storage engine, i.e.written from the ground up to beoptimized for managing graph data1] See Rodriguez, M.A., Neubauer, P., ,“The Graph Traversal Pattern,” 2010 (http://arxiv.org/abs/1004.1001)
Overview of PopularGraph Data Models• Property Graph• Description: A “directed, labeled, attributed, multi-graph”1 which exposes three building blocks: nodes, typedrelationships and key-value properties on both nodes andrelationships• Vendors: Neo4j, OrientDB, InﬁniteGraph, Dex• RDF Triples• Description: URI-centered subject-predicate-objecttriples as pioneered by the semantic web movement2• Vendors: AllegroGraph, Sesame• HyperGraph• Description: A generalized graph where a relationshipcan connect an arbitrary amount of nodes (compared tothe more common binary graph models)3• Vendors: HyperGraphDB,TrinityDB1] Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” 2010, http://arxiv.org/abs/1006.23612] W3C,“The Resource Description Framework (RDF),” 2004, http://www.w3.org/RDF/3] Wikipedia, http://en.wikipedia.org/wiki/Hypergraph
Graph Compute EngineProcessing platforms that enable graph globalcomputational algorithms to be run againstlarge data setsGraph MiningEngine(Working Storage)In-Memory ProcessingSystem(s)of RecordGraph ComputeEngineData extraction,transformation,and load
Neo Technology, Inc ConﬁdentialGraph Global QueriesWhat is the max/min/avg. number of connections per node?(aka “Degree Distribution”)
Neo Technology, Inc ConﬁdentialQuoi faire avec un Graph Database?Example: Facebook Graph Search
Neo Technology, Inc ConﬁdentialFor the Facebook Graph Question:What sushi restaurants in NYC do my friends like?
Neo Technology, Inc ConﬁdentialWhat the Graph Looks Like:What sushi restaurants in NYC do my friends like?
Neo Technology, Inc ConﬁdentialWhat the Cypher Query Looks Like:What sushi restaurants in NYC do my friends like?START me=node:person(name = Philip),location=node:location(location=New York),cuisine=node:cuisine(cuisine=Sushi)MATCH (me)-[:IS_FRIEND_OF]->(friend)-[:LIKES]->(restaurant)-[:LOCATED_IN]->(location),(restaurant)-[:SERVES]->(cuisine)RETURN restaurant
Neo Technology, Inc ConﬁdentialWhat the Search Looks Like:What sushi restaurants in NYC do my friends like?
Neo Technology, Inc ConﬁdentialWhat Other Graph Searches Look LikeWhat drugs will bind to protein X and not interact with drugY?
Neo Technology, Inc ConﬁdentialBackground• World’s largest provider of IT infrastructure, software& services• HP’s Uniﬁed Correlation Analyzer (UCA) application is akey application inside HP’s OSS Assurance portfolio• Carrier-class resource & service management, problemdetermination, root cause & service impact analysis• Helps communications operators manage large,complex and fast changing networksBusiness problem• Use network topology information to identify rootproblems causes on the network• Simplify alarm handling by human operators• Automate handling of certain types of alarms Helpoperators respond rapidly to network issues• Filter/group/eliminate redundant NetworkManagement System alarms by event correlationSolution & Beneﬁts• Accelerated product development time• Extremely fast querying of network topology• Graph representation a perfect domain ﬁt• 24x7 carrier-grade reliability with Neo4j HA clustering• Met objective in under 6 monthsIndustry: Web/ISV, CommunicationsUse case: Network ManagementGlobal (U.S., France)
Neo Technology, Inc ConﬁdentialBackground•One of the world’s largest logistics carriers•Projected to outgrow capacity of old system•New parcel routing system•Single source of truth for entire network•B2C & B2B parcel tracking•Real-time routing: up to 5M parcels per dayBusiness problem•24x7 availability, year round•Peak loads of 2500+ parcels per second•Complex and diverse software stack•Need predictable performance & linearscalability•Daily changes to logistics network: route fromany point, to any pointSolution & Beneﬁts•Neo4j provides the ideal domain ﬁt:•a logistics network is a graph•Extreme availability & performance with Neo4jclustering•Hugely simpliﬁed queries, vs. relational forcomplex routing•Flexible data model can reﬂect real-world datavariance much better than relational•“Whiteboard friendly” model easy to understandIndustry: LogisticsUse case: Parcel Routing
Neo Technology, Inc ConﬁdentialIndustry: Online Job SearchUse case: Social / Recommendations• Online jobs and career community, providinganonymized inside information to job seekersBusiness problem• Wanted to leverage known fact that most jobs arefound through personal & professional connections• Needed to rely on an existing source of socialnetwork data. Facebook was the ideal choice.• End users needed to get instant gratiﬁcation• Aiming to have the best job search service, in a verycompetitive marketSolution & Beneﬁts• First-to-market with a product that let users ﬁnd jobsthrough their network of Facebook friends• Job recommendations served real-time from Neo4j• Individual Facebook graphs imported real-time into Neo4j• Glassdoor now stores > 50% of the entire Facebooksocial graph• Neo4j cluster has grown seamlessly, with new instancesbeing brought online as graph size and load have increasedPersonCompanyKNOWSPersonPersonKNOWSCompanyKNOWSWORKS_ATWORKS_ATNeo Technology ConﬁdentialBackgroundSausalito, CA
Neo Technology, Inc ConﬁdentialIndustry: CommunicationsUse case: Recommendations•Cisco.com serves customer and businesscustomers with Support Services•Needed real-time recommendations, toencourage use of online knowledge base•Cisco had been successfully using Neo4j for itsinternal master data management solution.•Identiﬁed a strong ﬁt for onlinerecommendationsSolution & Beneﬁts•Cases, solutions, articles, etc. continuously scrapedfor cross-reference links, and represented in Neo4j•Real-time reading recommendations via Neo4j•Neo4j Enterprise with HA cluster•The result: customers obtain help faster, withdecreased reliance on customer supportNeo Technology ConﬁdentialBackgroundBusiness problem•Call center volumes needed to be lowered byimproving the efﬁcacy of online self service•Leverage large amounts of knowledge stored inservice cases, solutions, articles, forums, etc.•Problem resolution times, as well as supportcosts, needed to be loweredSupportCaseSupportCaseKnowledgeBaseArticleSolutionKnowledgeBaseArticleKnowledgeBaseArticleMessageSan Jose, CACisco.com
Neo Technology, Inc ConﬁdentialInteractive Television ProgrammingIndustry: CommunicationsUse case: Social gamingBackground• Europe’s largest communications company• Provider of mobile & land telephone lines toconsumers and businesses, as well as internetservices, television, and other servicesSolution & Beneﬁts• Interactive, social offering gives fans a way toexperience the game more closely• Increased customer stickiness for Deutsche Telekom• A completely new channel for reaching customerswith information, promotions, and ads• Clear competitive advantageFrankfurt, GermanyBusiness problem• The Fanorakel application allows fans to have aninteractive experience while watching sports• Fans can vote for referee decisions and interact withother fans watching the game• Highly connected dataset with real-time updates• Queries need to be served real-time on rapidlychanging data• One technical challenge is to handle the very highspikes of activity during popular games
Neo Technology, Inc ConﬁdentialReasons for Choosing a GraphDatabase1. Order-of-magnitude improvements in queryperformance for complex, connected data2. Drastically accelerated applicationdevelopment cycles3. Maintainability and extensibility of thedata model4. Maturity and reliability of the product