Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GraphTour Boston - LPL Financial


Published on

Presentation from GraphTour Boston - A Data Guy's Graph Journey - Mayank Gupta, LPL Financial

Published in: Business
  • Be the first to comment

GraphTour Boston - LPL Financial

  1. 1. A Data Guy’s Graph Journey Miles still to go…
  2. 2. Outline for Today’s Talk * What I do (did) * Early graph work (though I did not know it) * Re-Introduced to Graphs and introduced to a Graph Database (there is the one!) * Applying neo4j to an Integrated Data Platform * Diversion into RDF, semantics and Linked Data * Applying neo4j to Fraud, Data Quality, Lead Capture and Audit Trails * Upcoming – Privacy (metadata and instance connections) * Experimenting with – Knowledge Graphs (Corpus) for Search/Chat & Call Routing
  3. 3. Me oCurrently at LPL Financial – Technology Lead for Enterprise Data & Information Services and Advisor Commissions/Compensation Platform oProduct Manager for Enterprise Data @ Bloomberg oData Design Authority @ UBS oData Distribution @ UBS oData Distribution @ Morgan Stanley oTime-Series Databases @ Morgan Stanley
  4. 4. Data Distribution @ Morgan Stanley oJoined MS to work on their proprietary time-series platforms (one for periodic series, another for aperiodic series) oBuilt a simple data distribution system for Market Data that counted ‘hits’ so we could satisfy our data vendors/partners oSystem evolved to cover all enterprise data domains (~70 unique data systems) and was heavily use for accessing data across the enterprise o Our super power was the speed at which we were able to add new data sources and more importantly new data access flows o Consolidating access patterns and workloads made our infra and dba teams very happy
  5. 5. Early Inklings of Graph-iness oThe G-Star behind our super power was that we were configuration based o As much of the logic (queries, processing instructions, shaping) in config the better o We used trees (XML) to model metadata and flows o We used truth tables to ‘route’ – to a flow oThe problem domain at MS was getting more and more complex o @ 70 sources and several thousand tables, loading at different times o Desire to do more work closer to the data (access system) and unburden applications/systems oWithout realizing it we came up with an algorithm (HnF) that moved us to dynamic vs pre-configured flows o flows are paths
  6. 6. NY Qcon oDeep focus on data access at UBS o Move beyond push to pull o Isolate read workloads from write workloads (typically at SOR’s or ODS’s) o Support customizable capability/capacity for consuming applications oWe imagined a data model – to house knowledge and configuration, but it was a struggle to: o Explain and bring people on-board o Build the data model (even pragmatically) that could keep up with the complexity oThen I went to a session by Jim Webber at an NY Qcon and the pieces started to fall into place
  7. 7. Integrated Data Platform @ UBS oMoving from abstract data modelling concepts to instances (nodes and relations) made the approach apparent (even to senior management) o The bundled query interface (and simple visualization) helped greatly oThe team was able to work in the same ‘brain’ but easily able to isolate their use cases from each other (this was very tough to do with trees and ER data models). o A small central team collected proven use cases and iterated, independently, on the ‘industrial’ graph model that supported them all oA side-benefit was that the graph was extensible to emerging needs around BCBS 239 (data lineage) and GDPR (privacy) – though I left just as these got going..
  8. 8. Linked Data (and REST) @ Bloomberg oSlight change in career and moved into the business team (for a data company) oFocus on reducing the distance (time/cost/effort) for value extraction from data oTwo key areas of research and definition: o REST – representational state transfer, for a company dealing with 1000’s of distinct consumers, deep focus on being true to the http protocol o Linked Data – JSON-LD to transform and shape data “over the wire” from other constructs to mine o Physical shape of payload o Naming and format of attributes/entities o Semantic linking
  9. 9. Back to tech oJoined LPL – I like research but I love building things oBroad focus, as lead for data technology, on strategic plays and critical problems oMain challenges in my area were (are): o Disparate data systems o Same yet different (values) data o No agreed definitions for data concepts o New systems/efforts don’t know what data to trust thus build their own – leading to yet another disparate data system
  10. 10. Initial use of Graph oAgile Strategy approach, instead of broad brush/massive projects, introduce data capabilities incrementally oInitial neo4j use case was to ‘household’ (group really) persons and accounts to consolidate fraud alerts o Fraud analysts could look at alerts for a person or household collectively o Optimized advisor and end-investor experience by reducing the amount of phone call/mails to follow up on the alert o Gave analysts a tool to look at connections between accounts and holders/persons visually and a more natural way to explore the data oWe loaded accounts data into nodes for o Accounts o Account Persons – holder, secondary, beneficiary, advisor o Geography oTo apply algorithms for the grouping(s) desired by Fraud Investigations
  11. 11. Pleasant surprise – easily determine data quality issues and more oIn loading the data we were able to observe nodes with massive relations coming off of them o These were indications of data quality issues o We operationalized this type of analysis and fed back ‘data quality alerts’ back to the accounts data team oTo account for the time it takes to do investigations, we also kept state changes o Account transfers and movements between accounts o This enables a way for our business users (especially service/ops) to visualize changes and explain (paths, again) how something got to be the way it is – this is turning out to be a bigger use case oHaving advisor and investors, we were able to ‘profile’ their properties and connections and use this to recommend advisors to Leads
  12. 12. It comes full-circle – California Consumer Privacy Act oComprehensive data privacy (at a consumer, not regulatory level) is passed. oWe need to know who (persons) we deal with oWe need to know what (data) we have on them oWe need to be able to take actions on the above oWe already have a persons (most of them) graph oAdd in metadata and you get the privacy brain o Navigate it from a role level (where do we have data on Advisors) o Navigate it from an instance level (where do we have data on Kim)
  13. 13. The road goes ever on.. oAt a graph meetup in CLT, I met Graph Aware (and Dr. Negro) oFrom BBG/REST days, an idea around knowledge graphs was percolating oExperimenting with a corpus (knowledge graph) to start to drive: o Better Search o Chat bots oAdding in employee roles and skills o Use this and the corpus to route calls – to the right person (skill and availability) oUse cases present a need to move to real-time