SAN FRANCISCO | 10.22.2014 THE BUSINESS GRAPH
The Business Graph
(Why we chose Neo4j to rebuild CrunchBase)
THE BUSINESS GRAPH
Kurt Freytag
Head of Product, CrunchBase
kurt@crunchbase.com
415.891.7761
@kfreytag
5’10”, 155lbs.
Coding since 1977
Who Am I?
THE BUSINESS GRAPH
• Concise History of CrunchBase
• Our Vision
• Why Neo4j?
• Building w/ Neo4j & The Web
• Q&A
What am I Talking About?
THE BUSINESS GRAPH
• Started in 2007 by Michael Arrington
• Zero dedicated staff from 2007-2013
• Organically became source of truth for Startup Ecosystem
• Millions of Monthly Users
• Ran on two crappy AWS servers
History of CrunchBase - In One Slide
MySQL 5.0Rails 2.0
THE BUSINESS GRAPH
• The Complete Graph of the Connected Business World
• Entities: people, products, companies
• Activities: fundings, acquisitions, job changes
• Connections: how everything relates
• Time: the lifecycle of every element
• World’s Most Powerful Startup Community
• Open to all
The Vision of CrunchBase
THE BUSINESS GRAPH
Emil Eifrem
Founder
• A natural way of modeling data
Why Neo4j?
Neotechnologies
Company
Neo4j Enterprise Edition
Product
Seed Round
Funding
Sunstone Capital
Investor
Connor Venture Partners
Investor
Lars Nordwall
COO
Philip Rathle
VP of Products
GraphConnect 2014
Event
Kurt Freytag
Speaker
THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
Why Neo4j?
Neotechnologies
Company
Seed Round
Funding
Sunstone Capital
Investor
Connor Venture Partners
Investor
Investment
Investment
John Smith
Lead Investor
John Smith
Lead Investor
THE BUSINESS GRAPH
• A natural way to model data
• Adapts easily to changing requirements
• Built-In Business Intelligence
• Very specific or very general questions
• We don’t know the questions in advance
Why Neo4j?
select
if (tg.described_count > 1, 'complex', 'basic') dup
o.normalized_name,
concat('=hyperlink("http://www.crunchbase.com', o.p
ifnull(o.domain, '') domain,
ifnull(o.homepage_url, '') homepage_url,
if(o.status = 'unknown', '', o.status) status,
o.permalink,
ifnull(o.investment_rounds, '') investment_rounds,
ifnull(o.funding_rounds, '') funding_rounds,
ifnull(o.relationships, '') relationships,
ifnull(o.milestones, '') milestones,
if( o.logo_url is null, '', 'Yes') has_logo,
length(ifnull(o.overview, '')) overview_length,
ifnull(o.created_by, '') created_by,
date_format(o.created_at, '%Y-%m-%d %H:%i:%s') crea
UNIX_TIMESTAMP(o.created_at) ts,
( ifnull(o.investment_rounds, 0)*20 +
ifnull(o.funding_rounds, 0)*20 +
ifnull(o.relationships, 0)*10 +
ifnull(o.milestones, 0) +
length(ifnull(o.overview, '')) +
if( o.logo_url is null, 0, 50)) entity_rank,
o.entity_type,
o.entity_id
from cb_objects o
join t_duplicate_objects td on td.object_id = o.id
join t_duplicate_groups tg on tg.id = td.duplicate_
EXPLAIN PLAN
THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
• Built-In Business Intelligence
• Very specific or very general questions
• We don’t know the questions in advance
• Directly maps to our OO thinking
Why Neo4j?
class Organization < BaseEntity
relationship :has_funding_round,
relationship :has_customer,
relationship :sponsors_event,
...
end
Neotechnologies
Company
class FundingRound < BaseActivity
attribute :announced_on,
attribute :closed_on,
attribute :funding_type,
attribute :series,
attribute :money_raised,
attribute :post_money_valuation,
...
end
Seed Round
Funding
class HasFundingRound < BaseRelationship
relationship :has_funding_round,
relationship :has_customer,
relationship :sponsors_event,
...
end
has_funding_round
THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
• Built-In Business Intelligence
• Very specific or very general questions
• We don’t know the questions in advance
• Directly maps to our OO thinking
• We move faster
• Just launched CrunchBase Events @ TC Disrupt London
• Design, development, QA, and release was 2 weeks
Why Neo4j?
Okay, if Neo’s so awesome, why doesn’t everybody use it?
THE BUSINESS GRAPH
• CGI
• design a data model
• roll-your-own database connection
• manually write all your queries
• ORM (Hibernate, Doctrine)
• design a data model
• build the objects
• map ‘em through configuration
Databases & the Web - A Brief History
THE BUSINESS GRAPH
• Today’s languages use datastores as dumb repos
• Generate schemas from code
• Isolate developer from writing queries
• Focus on business logic, not data
• Couple of Problems
• The DBA role existed for a reason
• Data modeling is the foundation of a scalable architecture
• Generated queries can easily be 1,000x less efficient
• Quick development can lead to slow applications
Database as a Commodity
THE BUSINESS GRAPH
• Neo4j is tough to adopt
• Languages don’t support it out-of-the-box
• The tools / drivers that exist are immature
• Neo4j is not plug-n-play
• However…
• Neo4j is ideal for Object-Oriented development
• Graphs are a natural fit for many use cases
• We need to make Neo4j as easy to choose as MySQL
Means that…
+ = ?
THE BUSINESS GRAPH
• ActiveRecord for Neo4j
• Implements a lot of ActiveModel
• Validations
• Serialization
• Callbacks
• Handles all Marshalling / UnMarshalling
• “Feels” like ActiveRecord
• Makes Neo4j plug-n-play for Rails
• We Will Open Source It
“Deja”
Thanks. Enjoy.

GraphConnect 2014 SF: The Business Graph

  • 1.
    SAN FRANCISCO |10.22.2014 THE BUSINESS GRAPH The Business Graph (Why we chose Neo4j to rebuild CrunchBase)
  • 2.
    THE BUSINESS GRAPH KurtFreytag Head of Product, CrunchBase kurt@crunchbase.com 415.891.7761 @kfreytag 5’10”, 155lbs. Coding since 1977 Who Am I?
  • 3.
    THE BUSINESS GRAPH •Concise History of CrunchBase • Our Vision • Why Neo4j? • Building w/ Neo4j & The Web • Q&A What am I Talking About?
  • 4.
    THE BUSINESS GRAPH •Started in 2007 by Michael Arrington • Zero dedicated staff from 2007-2013 • Organically became source of truth for Startup Ecosystem • Millions of Monthly Users • Ran on two crappy AWS servers History of CrunchBase - In One Slide MySQL 5.0Rails 2.0
  • 5.
    THE BUSINESS GRAPH •The Complete Graph of the Connected Business World • Entities: people, products, companies • Activities: fundings, acquisitions, job changes • Connections: how everything relates • Time: the lifecycle of every element • World’s Most Powerful Startup Community • Open to all The Vision of CrunchBase
  • 6.
    THE BUSINESS GRAPH EmilEifrem Founder • A natural way of modeling data Why Neo4j? Neotechnologies Company Neo4j Enterprise Edition Product Seed Round Funding Sunstone Capital Investor Connor Venture Partners Investor Lars Nordwall COO Philip Rathle VP of Products GraphConnect 2014 Event Kurt Freytag Speaker
  • 7.
    THE BUSINESS GRAPH •A natural way of modeling data • Adapts easily to changing requirements Why Neo4j? Neotechnologies Company Seed Round Funding Sunstone Capital Investor Connor Venture Partners Investor Investment Investment John Smith Lead Investor John Smith Lead Investor
  • 8.
    THE BUSINESS GRAPH •A natural way to model data • Adapts easily to changing requirements • Built-In Business Intelligence • Very specific or very general questions • We don’t know the questions in advance Why Neo4j? select if (tg.described_count > 1, 'complex', 'basic') dup o.normalized_name, concat('=hyperlink("http://www.crunchbase.com', o.p ifnull(o.domain, '') domain, ifnull(o.homepage_url, '') homepage_url, if(o.status = 'unknown', '', o.status) status, o.permalink, ifnull(o.investment_rounds, '') investment_rounds, ifnull(o.funding_rounds, '') funding_rounds, ifnull(o.relationships, '') relationships, ifnull(o.milestones, '') milestones, if( o.logo_url is null, '', 'Yes') has_logo, length(ifnull(o.overview, '')) overview_length, ifnull(o.created_by, '') created_by, date_format(o.created_at, '%Y-%m-%d %H:%i:%s') crea UNIX_TIMESTAMP(o.created_at) ts, ( ifnull(o.investment_rounds, 0)*20 + ifnull(o.funding_rounds, 0)*20 + ifnull(o.relationships, 0)*10 + ifnull(o.milestones, 0) + length(ifnull(o.overview, '')) + if( o.logo_url is null, 0, 50)) entity_rank, o.entity_type, o.entity_id from cb_objects o join t_duplicate_objects td on td.object_id = o.id join t_duplicate_groups tg on tg.id = td.duplicate_ EXPLAIN PLAN
  • 9.
    THE BUSINESS GRAPH •A natural way of modeling data • Adapts easily to changing requirements • Built-In Business Intelligence • Very specific or very general questions • We don’t know the questions in advance • Directly maps to our OO thinking Why Neo4j? class Organization < BaseEntity relationship :has_funding_round, relationship :has_customer, relationship :sponsors_event, ... end Neotechnologies Company class FundingRound < BaseActivity attribute :announced_on, attribute :closed_on, attribute :funding_type, attribute :series, attribute :money_raised, attribute :post_money_valuation, ... end Seed Round Funding class HasFundingRound < BaseRelationship relationship :has_funding_round, relationship :has_customer, relationship :sponsors_event, ... end has_funding_round
  • 10.
    THE BUSINESS GRAPH •A natural way of modeling data • Adapts easily to changing requirements • Built-In Business Intelligence • Very specific or very general questions • We don’t know the questions in advance • Directly maps to our OO thinking • We move faster • Just launched CrunchBase Events @ TC Disrupt London • Design, development, QA, and release was 2 weeks Why Neo4j?
  • 11.
    Okay, if Neo’sso awesome, why doesn’t everybody use it?
  • 12.
    THE BUSINESS GRAPH •CGI • design a data model • roll-your-own database connection • manually write all your queries • ORM (Hibernate, Doctrine) • design a data model • build the objects • map ‘em through configuration Databases & the Web - A Brief History
  • 13.
    THE BUSINESS GRAPH •Today’s languages use datastores as dumb repos • Generate schemas from code • Isolate developer from writing queries • Focus on business logic, not data • Couple of Problems • The DBA role existed for a reason • Data modeling is the foundation of a scalable architecture • Generated queries can easily be 1,000x less efficient • Quick development can lead to slow applications Database as a Commodity
  • 14.
    THE BUSINESS GRAPH •Neo4j is tough to adopt • Languages don’t support it out-of-the-box • The tools / drivers that exist are immature • Neo4j is not plug-n-play • However… • Neo4j is ideal for Object-Oriented development • Graphs are a natural fit for many use cases • We need to make Neo4j as easy to choose as MySQL Means that… + = ?
  • 15.
    THE BUSINESS GRAPH •ActiveRecord for Neo4j • Implements a lot of ActiveModel • Validations • Serialization • Callbacks • Handles all Marshalling / UnMarshalling • “Feels” like ActiveRecord • Makes Neo4j plug-n-play for Rails • We Will Open Source It “Deja”
  • 16.