2. Outline for Today’s Talk
* What I do (did)
* Early graph work (though I did not know it)
* Re-Introduced to Graphs and introduced to a Graph Database (there is the one!)
* Applying neo4j to an Integrated Data Platform
* Diversion into RDF, semantics and Linked Data
* Applying neo4j to Fraud, Data Quality, Lead Capture and Audit Trails
* Upcoming – Privacy (metadata and instance connections)
* Experimenting with – Knowledge Graphs (Corpus) for Search/Chat & Call Routing
3. Me
oCurrently at LPL Financial – Technology Lead for Enterprise Data & Information Services and
Advisor Commissions/Compensation Platform
oProduct Manager for Enterprise Data @ Bloomberg
oData Design Authority @ UBS
oData Distribution @ UBS
oData Distribution @ Morgan Stanley
oTime-Series Databases @ Morgan Stanley
4. Data Distribution @ Morgan Stanley
oJoined MS to work on their proprietary time-series platforms (one for periodic series, another
for aperiodic series)
oBuilt a simple data distribution system for Market Data that counted ‘hits’ so we could satisfy
our data vendors/partners
oSystem evolved to cover all enterprise data domains (~70 unique data systems) and was heavily
use for accessing data across the enterprise
o Our super power was the speed at which we were able to add new data sources and more importantly
new data access flows
o Consolidating access patterns and workloads made our infra and dba teams very happy
5. Early Inklings of Graph-iness
oThe G-Star behind our super power was that we were configuration based
o As much of the logic (queries, processing instructions, shaping) in config the better
o We used trees (XML) to model metadata and flows
o We used truth tables to ‘route’ – to a flow
oThe problem domain at MS was getting more and more complex
o @ 70 sources and several thousand tables, loading at different times
o Desire to do more work closer to the data (access system) and unburden applications/systems
oWithout realizing it we came up with an algorithm (HnF) that moved us to dynamic vs pre-configured
flows
o flows are paths
6. NY Qcon
oDeep focus on data access at UBS
o Move beyond push to pull
o Isolate read workloads from write workloads (typically at SOR’s or ODS’s)
o Support customizable capability/capacity for consuming applications
oWe imagined a data model – to house knowledge and configuration, but it was a struggle to:
o Explain and bring people on-board
o Build the data model (even pragmatically) that could keep up with the complexity
oThen I went to a session by Jim Webber at an NY Qcon and the pieces started to fall into place
7. Integrated Data Platform @ UBS
oMoving from abstract data modelling concepts to instances (nodes and relations) made the
approach apparent (even to senior management)
o The bundled query interface (and simple visualization) helped greatly
oThe team was able to work in the same ‘brain’ but easily able to isolate their use cases from
each other (this was very tough to do with trees and ER data models).
o A small central team collected proven use cases and iterated, independently, on the ‘industrial’ graph
model that supported them all
oA side-benefit was that the graph was extensible to emerging needs around BCBS 239 (data
lineage) and GDPR (privacy) – though I left just as these got going..
8. Linked Data (and REST) @ Bloomberg
oSlight change in career and moved into the business team (for a data company)
oFocus on reducing the distance (time/cost/effort) for value extraction from data
oTwo key areas of research and definition:
o REST – representational state transfer, for a company dealing with 1000’s of distinct consumers, deep
focus on being true to the http protocol
o Linked Data – JSON-LD to transform and shape data “over the wire” from other constructs to mine
o Physical shape of payload
o Naming and format of attributes/entities
o Semantic linking
9. Back to tech
oJoined LPL – I like research but I love building things
oBroad focus, as lead for data technology, on strategic plays and critical problems
oMain challenges in my area were (are):
o Disparate data systems
o Same yet different (values) data
o No agreed definitions for data concepts
o New systems/efforts don’t know what data to trust thus build their own – leading to yet another
disparate data system
10. Initial use of Graph
oAgile Strategy approach, instead of broad brush/massive projects, introduce data capabilities incrementally
oInitial neo4j use case was to ‘household’ (group really) persons and accounts to consolidate fraud alerts
o Fraud analysts could look at alerts for a person or household collectively
o Optimized advisor and end-investor experience by reducing the amount of phone call/mails to follow up on the alert
o Gave analysts a tool to look at connections between accounts and holders/persons visually and a more natural way to
explore the data
oWe loaded accounts data into nodes for
o Accounts
o Account Persons – holder, secondary, beneficiary, advisor
o Geography
oTo apply algorithms for the grouping(s) desired by Fraud Investigations
11. Pleasant surprise – easily determine data
quality issues and more
oIn loading the data we were able to observe nodes with massive relations coming off of them
o These were indications of data quality issues
o We operationalized this type of analysis and fed back ‘data quality alerts’ back to the accounts data team
oTo account for the time it takes to do investigations, we also kept state changes
o Account transfers and movements between accounts
o This enables a way for our business users (especially service/ops) to visualize changes and explain (paths,
again) how something got to be the way it is – this is turning out to be a bigger use case
oHaving advisor and investors, we were able to ‘profile’ their properties and connections and use this
to recommend advisors to Leads
12. It comes full-circle – California Consumer
Privacy Act
oComprehensive data privacy (at a consumer, not regulatory level) is passed.
oWe need to know who (persons) we deal with
oWe need to know what (data) we have on them
oWe need to be able to take actions on the above
oWe already have a persons (most of them) graph
oAdd in metadata and you get the privacy brain
o Navigate it from a role level (where do we have data on Advisors)
o Navigate it from an instance level (where do we have data on Kim)
13. The road goes ever on..
oAt a graph meetup in CLT, I met Graph Aware (and Dr. Negro)
oFrom BBG/REST days, an idea around knowledge graphs was percolating
oExperimenting with a corpus (knowledge graph) to start to drive:
o Better Search
o Chat bots
oAdding in employee roles and skills
o Use this and the corpus to route calls – to the right person (skill and availability)
oUse cases present a need to move to real-time