Commercial Graph at Intuit
Gokuldas Pillai
Engineer, Data Services, Intuit
@gokool
Improving the lives of 60M people
…creates a unique and compelling set of data
1 in 3
Tax Returns
1 in12
Americans
Pay
$2.6T
in Transactions
25 Million
Questions Answered
1 to 50
Apps
From
7 Million
Mobile Customers
45M Customers Using Connected Services
Is it time to
hire?
Small Business Hiring Trends
My revenue
increased
5%...is that
good?
Revenue Comparisons
Am I
spending
more than my
friends?
Spending Profiles
Auto $750
Rent
$1,200
Groceries
$400
Intuit Payment Graph
• Discover the latent network from multiple
product data-stores
– Uniquely identify entities and their connections
– Connections scored by volume of trade
• Empower Business Unit (BU) teams to
leverage the Intuit Payment Graph to build
applications.
– Graph to be available for real time access
The Graph Server provides rich profiles
Identity
Name
Address
Phone
Email
Mint Id
Etc.
Social
Facebook
Yelp
Twitter
Etc.
Demographics
Age
Gender
Etc.
Consumer Profile Facets
Identity
Name
Address
Phone
Email
QBO Id
Etc.
Social
Facebook
Yelp
Twitter
Etc.
Firmographics
Category
Revenue
Employees
Etc.
Business Profile Facets
And the buyer-seller relationships
May 2011
3 purchases
$650.25
May 2011
1 purchase
$25.95
Consumer
Business Business
Design
Fuzzy matching & de-duplicating entities
ID: 002114902
Name: The Windsor-Press Inc
Street: 6 N 3rd St
City: Hamburg
State: PA
Zip: 19526-1502
Phone: (610)-562-2267
Company ABC
name: The Windsor Press, Inc.
address: PO Box 465 6 North Third Street
city: Hamburg
state: PA
zip: 19526
phone: (610) 562-2267
name: The Windsor Press
address: P.O. Box 465 6 North 3rd St.
city: Hamburg
state: PA
zip: 19526-0465
phone: (610) 562-2267
Company PQR
Dun &
Bradstreet
Both of the above vendor records map to external reference data:
Commercial Graph Architecture
Business
names, address, phone, in
dustry code
Real-time Applications
Request
Response
De-duped Nodes
Transactions
Invoices, bills,
payments, ve
ndors, custom
ers
Categorization
Matching/De-duping
Offline analytics
Data Model
Company
Name: Acme Inc
Zip: 95134
…
Company
Name: Veva LLC
Zip: 94040
…
Product
Name:Quickb
ooks
…
Product
Name:Payroll
…
Relationship
:CUSTOMER
Txn Count: 125
No. of years:1
Relationship
:LICENSED
No. of years:8
Company
Name: Beta LLC
Location: 94043
…
Relationship
:CUSTOMER
Txn Count: 467
No. of years:3
Data-model Demo
Scale
• Size of the graph
– 29 Mn Unique Nodes
– 315 Mn Properties
– 48 Mn Relationships
Referrals &
recommendations
Connecting
consumers with
small businesses
Small business
micro-communities
Big Data
for the Little Guy
Usecase - Vendor Recommendation
START n=node(23539)
MATCH
n-[:PAYS]-v-[:PAYS]-vov
WHERE
has(vov.IC4_DESC)
AND vov.IC4_DESC =~ 'Legal.*'
AND not (ID(vov) = ID(v))
RETURN
ID(vov),vov.ENTITY_TYPE,vov.CITY?,vov.IC4_DESC?
ORDER BY vov.loyalty;
Why Neo4J
• Java – matched in-house skills
• Flexible/Supports quick exploration
• Easy admin functionality – set-up, adding data
• Built in access points over HTTP (REST/JSON)
• SQL-like Query language (Cypher is awesome!)
• Active mailing list
• Good documentation
• Vendor support
Neo4j for real-time graph applications
18
Cypher Query Language
START biz = node(100) MATCH biz–
[TRANSACTS]- x RETURN x
Great for… Opportunity
Areas…
Real time
Cypher
Built-in Algos
Lucene search
Horizontal
scaling
Access control
Indexing
Experiment.
Measure. Pivot.
Persevere.
Privacy
matters…a lot.
Build the right
team.
Team
• 2 Engineers (100%)
• 2 Data Scientists (50%)
• 1 Product Manager
• We are hiring Data Engineers !
– http://careers.intuit.com/professional
Thank you.

Neo4j gokuldaspillai-121018170144-phpapp01

  • 1.
    Commercial Graph atIntuit Gokuldas Pillai Engineer, Data Services, Intuit @gokool
  • 2.
    Improving the livesof 60M people
  • 3.
    …creates a uniqueand compelling set of data 1 in 3 Tax Returns 1 in12 Americans Pay $2.6T in Transactions 25 Million Questions Answered 1 to 50 Apps From 7 Million Mobile Customers 45M Customers Using Connected Services
  • 4.
    Is it timeto hire? Small Business Hiring Trends My revenue increased 5%...is that good? Revenue Comparisons Am I spending more than my friends? Spending Profiles Auto $750 Rent $1,200 Groceries $400
  • 5.
    Intuit Payment Graph •Discover the latent network from multiple product data-stores – Uniquely identify entities and their connections – Connections scored by volume of trade • Empower Business Unit (BU) teams to leverage the Intuit Payment Graph to build applications. – Graph to be available for real time access
  • 6.
    The Graph Serverprovides rich profiles Identity Name Address Phone Email Mint Id Etc. Social Facebook Yelp Twitter Etc. Demographics Age Gender Etc. Consumer Profile Facets Identity Name Address Phone Email QBO Id Etc. Social Facebook Yelp Twitter Etc. Firmographics Category Revenue Employees Etc. Business Profile Facets
  • 7.
    And the buyer-sellerrelationships May 2011 3 purchases $650.25 May 2011 1 purchase $25.95 Consumer Business Business
  • 8.
  • 9.
    Fuzzy matching &de-duplicating entities ID: 002114902 Name: The Windsor-Press Inc Street: 6 N 3rd St City: Hamburg State: PA Zip: 19526-1502 Phone: (610)-562-2267 Company ABC name: The Windsor Press, Inc. address: PO Box 465 6 North Third Street city: Hamburg state: PA zip: 19526 phone: (610) 562-2267 name: The Windsor Press address: P.O. Box 465 6 North 3rd St. city: Hamburg state: PA zip: 19526-0465 phone: (610) 562-2267 Company PQR Dun & Bradstreet Both of the above vendor records map to external reference data:
  • 10.
    Commercial Graph Architecture Business names,address, phone, in dustry code Real-time Applications Request Response De-duped Nodes Transactions Invoices, bills, payments, ve ndors, custom ers Categorization Matching/De-duping Offline analytics
  • 11.
    Data Model Company Name: AcmeInc Zip: 95134 … Company Name: Veva LLC Zip: 94040 … Product Name:Quickb ooks … Product Name:Payroll … Relationship :CUSTOMER Txn Count: 125 No. of years:1 Relationship :LICENSED No. of years:8 Company Name: Beta LLC Location: 94043 … Relationship :CUSTOMER Txn Count: 467 No. of years:3
  • 12.
  • 13.
    Scale • Size ofthe graph – 29 Mn Unique Nodes – 315 Mn Properties – 48 Mn Relationships
  • 14.
    Referrals & recommendations Connecting consumers with smallbusinesses Small business micro-communities
  • 15.
    Big Data for theLittle Guy
  • 16.
    Usecase - VendorRecommendation START n=node(23539) MATCH n-[:PAYS]-v-[:PAYS]-vov WHERE has(vov.IC4_DESC) AND vov.IC4_DESC =~ 'Legal.*' AND not (ID(vov) = ID(v)) RETURN ID(vov),vov.ENTITY_TYPE,vov.CITY?,vov.IC4_DESC? ORDER BY vov.loyalty;
  • 17.
    Why Neo4J • Java– matched in-house skills • Flexible/Supports quick exploration • Easy admin functionality – set-up, adding data • Built in access points over HTTP (REST/JSON) • SQL-like Query language (Cypher is awesome!) • Active mailing list • Good documentation • Vendor support
  • 18.
    Neo4j for real-timegraph applications 18 Cypher Query Language START biz = node(100) MATCH biz– [TRANSACTS]- x RETURN x Great for… Opportunity Areas… Real time Cypher Built-in Algos Lucene search Horizontal scaling Access control Indexing
  • 19.
  • 20.
    Team • 2 Engineers(100%) • 2 Data Scientists (50%) • 1 Product Manager • We are hiring Data Engineers ! – http://careers.intuit.com/professional
  • 21.

Editor's Notes

  • #4 Just to give you a sense of the kind of data and the rate of the data, 1 in 3 Tax Returns filed electronically are through Intuit. Our payroll product enables paying salaries to employees of small business.
  • #5 Some of the work that the data team at Intuit drives. We create a rich profile of the spending of small businesses and consumers. For eg. In Mint.com you can see how much money from your budget was spent on Dining this month Vs. Fuel for your car.Small businesses often ask the question, “OK so I made a 5% increase in revenue, how well did I do in the context of others, locally , across the country and so on.. They want to know if it is the right time to hire.
  • #6 Partner with Bus. And enable the apps.We build the platform, Bus focus on building the applications.Applications may be new features leveraging the graph.Connections maybe first level or second level.Edges annotated with transactions.
  • #10 Match based on attributes, output score between 0-1 Step 1: Generate multiple candidate mappings Lookup in local lucene indexStep 2: Identify the best candidate mapping from Step 1 apply custom scoring algorithm. Name, Address, Email use String Similarity approachesAddress has state restrictionIf no state and no zip code, no match attemptedThe same business exists in multiple product databases. In different forms Red Rock Café Inc, Mountain View, CARed Rok, Castro St, CA 94043RR Cafe, Mountain View Identifying the common businesses promotes a more connected network
  • #13 Show Neo4J Admin console. Pull up a node and show the properties.Also show the visualization using the style for NAME.
  • #14 Size:In just QBO and QBDT, we have about 5 Mn customers. However, the graph is about 30 Mn nodes due to the fact that we have access to the businesses at the second level and map them.Growth initially was in no. of nodes & propertiesNow that we have loaded few datasets, growth would be primarily in no. of edges and no. of properties
  • #15 R5, especially for related industries: Wedding planner, photographer, florist, musicians, cateringSm biz needs a Janitorial serviceSome janitorial services specialize in the medical industrySo simply looking looking up in the yellow pages under janitorial you might not find the right bizCan you find a service provider that is specialized for your industry?Peppermint experiment – email campaign to matchmake between (oil & gas, medical, churches, Trucking, General Contractors)Referrals a la DemandForce. Vision, dentistry, automotiveLinking our ecosystems. Find a gardener. Live experiment with solar panel installationMicro-communitesConnects SMBs for peer advice, business referral, and collaborationSMBs request and receive advice from peer SMBsShare business leads with other SMBsFind SMB partners to form buying cooperatives
  • #17 This is a vendor reco. Usecase implemented as a Vendor of Vendor.Demo the Graphite Apps page.
  • #19 Cypher QueryLanguage example: Identifying immediate neighbors of a business using relationshipsAdvantagesGraph is persisted to disk so you could compute graph metrics, persist on the graph itself and iterate, for eg. Computing derived attributes like centralityCan be used as standalone as well as embedded JAR in your application.Search powered by Lucene on all key-value properties.Common Graph algorithms implementedDoing friend of a friend queries are straightforwardDoes not require an additional platform like Hadoop(in the case of Giraph)WeaknessesNo scale out option, only vertically scalable.No horizontal scaling, so no sharding of DB.Unlike a regular RDBMS, does not implement any access control out of the box.Smaller community compared to other OS projects like Hadoop (but active nevertheless)
  • #20 It’s not our data – it’s our customers’ data. We build data-driven innovation on a strong foundation of trust.A great data scientist isn’t just a coder or a great statistician. She’s curious. Deep customer empathy. A passion for business problems.From: decisions by “politics and Powerpoint” To: “enable ideas to prove themselves”. Minimal Viable Product. Build-measure-learn rapid experimentation loop.