Neo4j gokuldaspillai-121018170144-phpapp01


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Just to give you a sense of the kind of data and the rate of the data, 1 in 3 Tax Returns filed electronically are through Intuit. Our payroll product enables paying salaries to employees of small business.
  • Some of the work that the data team at Intuit drives. We create a rich profile of the spending of small businesses and consumers. For eg. In you can see how much money from your budget was spent on Dining this month Vs. Fuel for your car.Small businesses often ask the question, “OK so I made a 5% increase in revenue, how well did I do in the context of others, locally , across the country and so on.. They want to know if it is the right time to hire.
  • Partner with Bus. And enable the apps.We build the platform, Bus focus on building the applications.Applications may be new features leveraging the graph.Connections maybe first level or second level.Edges annotated with transactions.
  • Match based on attributes, output score between 0-1 Step 1: Generate multiple candidate mappings Lookup in local lucene indexStep 2: Identify the best candidate mapping from Step 1 apply custom scoring algorithm. Name, Address, Email use String Similarity approachesAddress has state restrictionIf no state and no zip code, no match attemptedThe same business exists in multiple product databases. In different forms Red Rock Café Inc, Mountain View, CARed Rok, Castro St, CA 94043RR Cafe, Mountain View Identifying the common businesses promotes a more connected network
  • Show Neo4J Admin console. Pull up a node and show the properties.Also show the visualization using the style for NAME.
  • Size:In just QBO and QBDT, we have about 5 Mn customers. However, the graph is about 30 Mn nodes due to the fact that we have access to the businesses at the second level and map them.Growth initially was in no. of nodes & propertiesNow that we have loaded few datasets, growth would be primarily in no. of edges and no. of properties
  • R5, especially for related industries: Wedding planner, photographer, florist, musicians, cateringSm biz needs a Janitorial serviceSome janitorial services specialize in the medical industrySo simply looking looking up in the yellow pages under janitorial you might not find the right bizCan you find a service provider that is specialized for your industry?Peppermint experiment – email campaign to matchmake between (oil & gas, medical, churches, Trucking, General Contractors)Referrals a la DemandForce. Vision, dentistry, automotiveLinking our ecosystems. Find a gardener. Live experiment with solar panel installationMicro-communitesConnects SMBs for peer advice, business referral, and collaborationSMBs request and receive advice from peer SMBsShare business leads with other SMBsFind SMB partners to form buying cooperatives
  • This is a vendor reco. Usecase implemented as a Vendor of Vendor.Demo the Graphite Apps page.
  • Cypher QueryLanguage example: Identifying immediate neighbors of a business using relationshipsAdvantagesGraph is persisted to disk so you could compute graph metrics, persist on the graph itself and iterate, for eg. Computing derived attributes like centralityCan be used as standalone as well as embedded JAR in your application.Search powered by Lucene on all key-value properties.Common Graph algorithms implementedDoing friend of a friend queries are straightforwardDoes not require an additional platform like Hadoop(in the case of Giraph)WeaknessesNo scale out option, only vertically scalable.No horizontal scaling, so no sharding of DB.Unlike a regular RDBMS, does not implement any access control out of the box.Smaller community compared to other OS projects like Hadoop (but active nevertheless)
  • It’s not our data – it’s our customers’ data. We build data-driven innovation on a strong foundation of trust.A great data scientist isn’t just a coder or a great statistician. She’s curious. Deep customer empathy. A passion for business problems.From: decisions by “politics and Powerpoint” To: “enable ideas to prove themselves”. Minimal Viable Product. Build-measure-learn rapid experimentation loop.
  • Neo4j gokuldaspillai-121018170144-phpapp01

    1. 1. Commercial Graph at IntuitGokuldas PillaiEngineer, Data Services, Intuit@gokool
    2. 2. Improving the lives of 60M people
    3. 3. …creates a unique and compelling set of data1 in 3Tax Returns1 in12AmericansPay$2.6Tin Transactions25 MillionQuestions Answered1 to 50AppsFrom7 MillionMobile Customers45M Customers Using Connected Services
    4. 4. Is it time tohire?Small Business Hiring TrendsMy thatgood?Revenue ComparisonsAm Ispendingmore than myfriends?Spending ProfilesAuto $750Rent$1,200Groceries$400
    5. 5. Intuit Payment Graph• Discover the latent network from multipleproduct data-stores– Uniquely identify entities and their connections– Connections scored by volume of trade• Empower Business Unit (BU) teams toleverage the Intuit Payment Graph to buildapplications.– Graph to be available for real time access
    6. 6. The Graph Server provides rich profilesIdentityNameAddressPhoneEmailMint IdEtc.SocialFacebookYelpTwitterEtc.DemographicsAgeGenderEtc.Consumer Profile FacetsIdentityNameAddressPhoneEmailQBO IdEtc.SocialFacebookYelpTwitterEtc.FirmographicsCategoryRevenueEmployeesEtc.Business Profile Facets
    7. 7. And the buyer-seller relationshipsMay 20113 purchases$650.25May 20111 purchase$25.95ConsumerBusiness Business
    8. 8. Design
    9. 9. Fuzzy matching & de-duplicating entitiesID: 002114902Name: The Windsor-Press IncStreet: 6 N 3rd StCity: HamburgState: PAZip: 19526-1502Phone: (610)-562-2267Company ABCname: The Windsor Press, Inc.address: PO Box 465 6 North Third Streetcity: Hamburgstate: PAzip: 19526phone: (610) 562-2267name: The Windsor Pressaddress: P.O. Box 465 6 North 3rd Hamburgstate: PAzip: 19526-0465phone: (610) 562-2267Company PQRDun &BradstreetBoth of the above vendor records map to external reference data:
    10. 10. Commercial Graph ArchitectureBusinessnames, address, phone, industry codeReal-time ApplicationsRequestResponseDe-duped NodesTransactionsInvoices, bills,payments, vendors, customersCategorizationMatching/De-dupingOffline analytics
    11. 11. Data ModelCompanyName: Acme IncZip: 95134…CompanyName: Veva LLCZip: 94040…ProductName:Quickbooks…ProductName:Payroll…Relationship:CUSTOMERTxn Count: 125No. of years:1Relationship:LICENSEDNo. of years:8CompanyName: Beta LLCLocation: 94043…Relationship:CUSTOMERTxn Count: 467No. of years:3
    12. 12. Data-model Demo
    13. 13. Scale• Size of the graph– 29 Mn Unique Nodes– 315 Mn Properties– 48 Mn Relationships
    14. 14. Referrals &recommendationsConnectingconsumers withsmall businessesSmall businessmicro-communities
    15. 15. Big Datafor the Little Guy
    16. 16. Usecase - Vendor RecommendationSTART n=node(23539)MATCHn-[:PAYS]-v-[:PAYS]-vovWHEREhas(vov.IC4_DESC)AND vov.IC4_DESC =~ Legal.*AND not (ID(vov) = ID(v))RETURNID(vov),vov.ENTITY_TYPE,vov.CITY?,vov.IC4_DESC?ORDER BY vov.loyalty;
    17. 17. Why Neo4J• Java – matched in-house skills• Flexible/Supports quick exploration• Easy admin functionality – set-up, adding data• Built in access points over HTTP (REST/JSON)• SQL-like Query language (Cypher is awesome!)• Active mailing list• Good documentation• Vendor support
    18. 18. Neo4j for real-time graph applications18Cypher Query LanguageSTART biz = node(100) MATCH biz–[TRANSACTS]- x RETURN xGreat for… OpportunityAreas…Real timeCypherBuilt-in AlgosLucene searchHorizontalscalingAccess controlIndexing
    19. 19. Experiment.Measure. Pivot.Persevere.Privacymatters…a lot.Build the rightteam.
    20. 20. Team• 2 Engineers (100%)• 2 Data Scientists (50%)• 1 Product Manager• We are hiring Data Engineers !–
    21. 21. Thank you.