3. …creates a unique and compelling set of data
1 in 3
Tax Returns
1 in12
Americans
Pay
$2.6T
in Transactions
25 Million
Questions Answered
1 to 50
Apps
From
7 Million
Mobile Customers
45M Customers Using Connected Services
4. Is it time to
hire?
Small Business Hiring Trends
My revenue
increased
5%...is that
good?
Revenue Comparisons
Am I
spending
more than my
friends?
Spending Profiles
Auto $750
Rent
$1,200
Groceries
$400
5. Intuit Payment Graph
• Discover the latent network from multiple
product data-stores
– Uniquely identify entities and their connections
– Connections scored by volume of trade
• Empower Business Unit (BU) teams to
leverage the Intuit Payment Graph to build
applications.
– Graph to be available for real time access
6. The Graph Server provides rich profiles
Identity
Name
Address
Phone
Email
Mint Id
Etc.
Social
Facebook
Yelp
Twitter
Etc.
Demographics
Age
Gender
Etc.
Consumer Profile Facets
Identity
Name
Address
Phone
Email
QBO Id
Etc.
Social
Facebook
Yelp
Twitter
Etc.
Firmographics
Category
Revenue
Employees
Etc.
Business Profile Facets
7. And the buyer-seller relationships
May 2011
3 purchases
$650.25
May 2011
1 purchase
$25.95
Consumer
Business Business
9. Fuzzy matching & de-duplicating entities
ID: 002114902
Name: The Windsor-Press Inc
Street: 6 N 3rd St
City: Hamburg
State: PA
Zip: 19526-1502
Phone: (610)-562-2267
Company ABC
name: The Windsor Press, Inc.
address: PO Box 465 6 North Third Street
city: Hamburg
state: PA
zip: 19526
phone: (610) 562-2267
name: The Windsor Press
address: P.O. Box 465 6 North 3rd St.
city: Hamburg
state: PA
zip: 19526-0465
phone: (610) 562-2267
Company PQR
Dun &
Bradstreet
Both of the above vendor records map to external reference data:
10. Commercial Graph Architecture
Business
names, address, phone, in
dustry code
Real-time Applications
Request
Response
De-duped Nodes
Transactions
Invoices, bills,
payments, ve
ndors, custom
ers
Categorization
Matching/De-duping
Offline analytics
11. Data Model
Company
Name: Acme Inc
Zip: 95134
…
Company
Name: Veva LLC
Zip: 94040
…
Product
Name:Quickb
ooks
…
Product
Name:Payroll
…
Relationship
:CUSTOMER
Txn Count: 125
No. of years:1
Relationship
:LICENSED
No. of years:8
Company
Name: Beta LLC
Location: 94043
…
Relationship
:CUSTOMER
Txn Count: 467
No. of years:3
16. Usecase - Vendor Recommendation
START n=node(23539)
MATCH
n-[:PAYS]-v-[:PAYS]-vov
WHERE
has(vov.IC4_DESC)
AND vov.IC4_DESC =~ 'Legal.*'
AND not (ID(vov) = ID(v))
RETURN
ID(vov),vov.ENTITY_TYPE,vov.CITY?,vov.IC4_DESC?
ORDER BY vov.loyalty;
17. Why Neo4J
• Java – matched in-house skills
• Flexible/Supports quick exploration
• Easy admin functionality – set-up, adding data
• Built in access points over HTTP (REST/JSON)
• SQL-like Query language (Cypher is awesome!)
• Active mailing list
• Good documentation
• Vendor support
18. Neo4j for real-time graph applications
18
Cypher Query Language
START biz = node(100) MATCH biz–
[TRANSACTS]- x RETURN x
Great for… Opportunity
Areas…
Real time
Cypher
Built-in Algos
Lucene search
Horizontal
scaling
Access control
Indexing
Just to give you a sense of the kind of data and the rate of the data, 1 in 3 Tax Returns filed electronically are through Intuit. Our payroll product enables paying salaries to employees of small business.
Some of the work that the data team at Intuit drives. We create a rich profile of the spending of small businesses and consumers. For eg. In Mint.com you can see how much money from your budget was spent on Dining this month Vs. Fuel for your car.Small businesses often ask the question, “OK so I made a 5% increase in revenue, how well did I do in the context of others, locally , across the country and so on.. They want to know if it is the right time to hire.
Partner with Bus. And enable the apps.We build the platform, Bus focus on building the applications.Applications may be new features leveraging the graph.Connections maybe first level or second level.Edges annotated with transactions.
Match based on attributes, output score between 0-1 Step 1: Generate multiple candidate mappings Lookup in local lucene indexStep 2: Identify the best candidate mapping from Step 1 apply custom scoring algorithm. Name, Address, Email use String Similarity approachesAddress has state restrictionIf no state and no zip code, no match attemptedThe same business exists in multiple product databases. In different forms Red Rock Café Inc, Mountain View, CARed Rok, Castro St, CA 94043RR Cafe, Mountain View Identifying the common businesses promotes a more connected network
Show Neo4J Admin console. Pull up a node and show the properties.Also show the visualization using the style for NAME.
Size:In just QBO and QBDT, we have about 5 Mn customers. However, the graph is about 30 Mn nodes due to the fact that we have access to the businesses at the second level and map them.Growth initially was in no. of nodes & propertiesNow that we have loaded few datasets, growth would be primarily in no. of edges and no. of properties
R5, especially for related industries: Wedding planner, photographer, florist, musicians, cateringSm biz needs a Janitorial serviceSome janitorial services specialize in the medical industrySo simply looking looking up in the yellow pages under janitorial you might not find the right bizCan you find a service provider that is specialized for your industry?Peppermint experiment – email campaign to matchmake between (oil & gas, medical, churches, Trucking, General Contractors)Referrals a la DemandForce. Vision, dentistry, automotiveLinking our ecosystems. Find a gardener. Live experiment with solar panel installationMicro-communitesConnects SMBs for peer advice, business referral, and collaborationSMBs request and receive advice from peer SMBsShare business leads with other SMBsFind SMB partners to form buying cooperatives
This is a vendor reco. Usecase implemented as a Vendor of Vendor.Demo the Graphite Apps page.
Cypher QueryLanguage example: Identifying immediate neighbors of a business using relationshipsAdvantagesGraph is persisted to disk so you could compute graph metrics, persist on the graph itself and iterate, for eg. Computing derived attributes like centralityCan be used as standalone as well as embedded JAR in your application.Search powered by Lucene on all key-value properties.Common Graph algorithms implementedDoing friend of a friend queries are straightforwardDoes not require an additional platform like Hadoop(in the case of Giraph)WeaknessesNo scale out option, only vertically scalable.No horizontal scaling, so no sharding of DB.Unlike a regular RDBMS, does not implement any access control out of the box.Smaller community compared to other OS projects like Hadoop (but active nevertheless)
It’s not our data – it’s our customers’ data. We build data-driven innovation on a strong foundation of trust.A great data scientist isn’t just a coder or a great statistician. She’s curious. Deep customer empathy. A passion for business problems.From: decisions by “politics and Powerpoint” To: “enable ideas to prove themselves”. Minimal Viable Product. Build-measure-learn rapid experimentation loop.