BDW Meetup:
Big MDM Part 2
Using a Graph Database for MDM
& Relationship Management
Sponsored by:Hosted by:
6:30 Networking
Grab some food and drink... Make some friends.
6:45 Joe Caserta
President
Caserta Concepts
Welcome + Intro to Big MDM
About the Meetup. Why MDM needs Graph
now.
7:00 Elliott Cordo
Chief Architect
Caserta Concepts
Intro to Graph Databases and Cypher
Deep dive into graph technology and how to
work with it.
7:20 David Fauth,
Senior Engineering Consultant
Neo Technology
Neo4j and Use Cases
Real-world solutions and a demo of Neo4j
for relationship management.
7:50 Aaron Wallace
Principal Product Manager
Pitney Bowes
Spectrum MDM Hub
Model, manage and govern data with graph
database.
8:20 Q&A Ask Questions, Share your experience
Agenda
• Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Founded by Caserta Concepts
• November 10, 2012
• Next BDW Meetup:
• April 7
• Topic: Predictive Analytics on
Hadoop (with Zementis)
• Location: NWC
About the BDW Meetup #BDWmeetup
@CasertaConcepts
@neo4j
@PitneyBowes
Top 20 Big Data
Consulting - CIO Review
Launched Big Data practice
Co-author, with Ralph Kimball, The
Data Warehouse ETL Toolkit (Wiley)
Dedicated to Data Warehousing,
Business Intelligence since 1996
Began consulting database
programing and data modeling 25+ years hands-on experience
building database solutions
Founded Caserta Concepts in NYC
Web log analytics solution published
in Intelligent Enterprise
Formalized Alliances / Partnerships –
System Integrators
Partnered with Big Data vendors
Cloudera, Hortonworks, IBM, Cisco,
Datameer, Basho more…
Launched Training practice, teaching
data concepts world-wide
Laser focus on extending Data
Warehouses with Big Data solutions
1986
2004
1996
2009
2001
2010
2013
Launched Big Data Warehousing
(BDW) Meetup-NYC ~1500 Members
2012
2014
Established best practices for big
data ecosystem implementation –
Healthcare, Finance, Insurance
Top 20 Most Powerful
Big Data consulting firms
Dedicated to Data Governance
Techniques on Big Data (Innovation)
Caserta Timeline
About Caserta Concepts
• Award-winning technology innovation consulting with
expertise in:
• Big Data Solutions
• Data Warehousing
• Business Intelligence
• Core focus in the following industries:
• eCommerce / Retail / Marketing
• Financial Services / Insurance
• Healthcare / Ad Tech / Higher Ed
• Established in 2001:
• Increased growth year-over-year
• Industry recognized work force
• Strategy, Implementation
• Writing, Education, Mentoring
• Data Science & Analytics
• Cloud Computing
• Data Interaction & Visualization
Does this word cloud excite you?
Speak with us about our open positions: leslie@casertaconcepts.com
Help Wanted
Spark
Big Data Architect NoSQL
EC2,EMR,Redshift
Data
User
Interface
Services
WorkflowRules
Security
Members Providers
Agents Plans
Policies
Consistent Policy
Enforcement and Security
Integration with exiting
ecosystem
Data Governance through
Workflow Management
Data Quality enforcement
through metadata-driven
rules
Time-Variant Hierarchies
and attributes
High Performance,
Flexible, Scalable
Database – Think Graph!
Master Data Management Components
How MDM Works
Standardization
Matching
Survivorship
Validation
Publication
Staging
Library
Consolidated
Library
Standardization Matching
Integrated
Library
Survivorship
Source ID Name Home Address Birth Date SSN
SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789
SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789
SYS C XYZ James Stag NULL 8/20/1959 NULL
Source ID Name Home Address Birth Date SSN Std Name Std Addr MDM ID
SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789 James Stagnitto 123 Main Street 1
SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789 James Stagnitto 132 Main Street 1
SYS C XYZ James Stag NULL 8/20/1959 NULL James Stag NULL 1
MDM ID Name Home Address Birth Date SSN
1 James Stagnitto 123 Main Street 8/20/1959 123-45-6789
Mastering Data
Validation
Informational
Master Data
MDM Information Ecosystem
10
Operational
Master Data
Holistic
Master Data
Service
Leads
Policies
Claims
Enrolls
Sales
Finance
DW
Dimensions &
Cross-References
Marketing
Insights
The Reality of Mastering Data
Graph Databases (NoSQL) to the Rescue
 Hierarchical relationships are never
rigid
 Relational models with tables and
columns not flexible enough
 Neo4j is the leading graph database
 Many MDM systems are going graph:
 Pitney Bowes - Spectrum MDM
 Reltio - Worry-Free Data for Life
Sciences.
Graph Databases - Who are the players?
Base on popularity: http://db-engines.com/en/ranking/graph+dbms
Our favorite – Neo4J
• Open source graph database, implemented in Java
• 1.0 released in 2010  mature
• Popular - large community
• Commercially supported
• Easy to setup and use
GRAPH DATABASES FOR MDM
Elliott Cordo
Chief Architect, Caserta Concepts
Graph DB, special kind of NoSQL
• A NoSQL database that is all about relationships
• Relationships are first class citizens, not a just a
“constraint”
Graph Use Cases
• Social Networks
• Network Asset Management
• Portfolio Management
• Risk Analysis
• Master Data Management
…and many more
Caserta Projects
• Relationship Science Workspace  Financial
• Alumni/Corporate Donor Network  Higher Education
What is wrong with traditional approach to
MDM
• Conceptually problems with “enterprise” approach
• Long, complex implementations  low ROI
• Complex data model
• Too much human interaction
• Deliverable???
• Challenges with big data
• Data volumes
• Evolving data sources
• Need to further remove humans out of the process
MDM data persistence
• Fundamental challenges with data storage
• Sparse data
• Evolving schema
• Relationships
• How do we handle in RDBMS
• Custom relations
• Extreme normalization
How does a Graph DB help MDM
• Data is stored in it’s natural form  no mismatch between
requirements and data model
• Both Nodes and Relationships can have properties 
supports sparse and evolving data
• MDM for analytics  your MDM solution now delivers
new enablement, not just a back office system
• Relationship science
So how do you work with a graph
• Gremlin – traditional, supported by most Graph databases
• Cyper - high level, user friendly
Cypher – ascii art
Cypher – “select *”
Match a pattern, return results
Relationship directions
Easy queries where graphs shine
2nd or nth level connections
Shortest path
Getting data in
Cypher shell
API’s: REST, modules/libraries for most languages
CSV Loader
Tools and Data Viz
Cypher is cool and all but are there BI tools?
• Little support by mainstream tools
• Healthy ecosystem of graph specific exploration and data
visualization tools
Open source and commercialGelphi
Tom Sawyer
linkurio.us
elliott@casertaconcepts.com

Big MDM Part 2: Using a Graph Database for MDM and Relationship Management

  • 1.
    BDW Meetup: Big MDMPart 2 Using a Graph Database for MDM & Relationship Management Sponsored by:Hosted by:
  • 2.
    6:30 Networking Grab somefood and drink... Make some friends. 6:45 Joe Caserta President Caserta Concepts Welcome + Intro to Big MDM About the Meetup. Why MDM needs Graph now. 7:00 Elliott Cordo Chief Architect Caserta Concepts Intro to Graph Databases and Cypher Deep dive into graph technology and how to work with it. 7:20 David Fauth, Senior Engineering Consultant Neo Technology Neo4j and Use Cases Real-world solutions and a demo of Neo4j for relationship management. 7:50 Aaron Wallace Principal Product Manager Pitney Bowes Spectrum MDM Hub Model, manage and govern data with graph database. 8:20 Q&A Ask Questions, Share your experience Agenda
  • 3.
    • Big Datais a complex, rapidly changing landscape • We want to share our stories and hear about yours • Great networking opportunity for like minded data nerds • Founded by Caserta Concepts • November 10, 2012 • Next BDW Meetup: • April 7 • Topic: Predictive Analytics on Hadoop (with Zementis) • Location: NWC About the BDW Meetup #BDWmeetup @CasertaConcepts @neo4j @PitneyBowes
  • 4.
    Top 20 BigData Consulting - CIO Review Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Dedicated to Data Warehousing, Business Intelligence since 1996 Began consulting database programing and data modeling 25+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise Formalized Alliances / Partnerships – System Integrators Partnered with Big Data vendors Cloudera, Hortonworks, IBM, Cisco, Datameer, Basho more… Launched Training practice, teaching data concepts world-wide Laser focus on extending Data Warehouses with Big Data solutions 1986 2004 1996 2009 2001 2010 2013 Launched Big Data Warehousing (BDW) Meetup-NYC ~1500 Members 2012 2014 Established best practices for big data ecosystem implementation – Healthcare, Finance, Insurance Top 20 Most Powerful Big Data consulting firms Dedicated to Data Governance Techniques on Big Data (Innovation) Caserta Timeline
  • 5.
    About Caserta Concepts •Award-winning technology innovation consulting with expertise in: • Big Data Solutions • Data Warehousing • Business Intelligence • Core focus in the following industries: • eCommerce / Retail / Marketing • Financial Services / Insurance • Healthcare / Ad Tech / Higher Ed • Established in 2001: • Increased growth year-over-year • Industry recognized work force • Strategy, Implementation • Writing, Education, Mentoring • Data Science & Analytics • Cloud Computing • Data Interaction & Visualization
  • 6.
    Does this wordcloud excite you? Speak with us about our open positions: leslie@casertaconcepts.com Help Wanted Spark Big Data Architect NoSQL EC2,EMR,Redshift
  • 7.
    Data User Interface Services WorkflowRules Security Members Providers Agents Plans Policies ConsistentPolicy Enforcement and Security Integration with exiting ecosystem Data Governance through Workflow Management Data Quality enforcement through metadata-driven rules Time-Variant Hierarchies and attributes High Performance, Flexible, Scalable Database – Think Graph! Master Data Management Components
  • 8.
  • 9.
    Staging Library Consolidated Library Standardization Matching Integrated Library Survivorship Source IDName Home Address Birth Date SSN SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789 SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789 SYS C XYZ James Stag NULL 8/20/1959 NULL Source ID Name Home Address Birth Date SSN Std Name Std Addr MDM ID SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789 James Stagnitto 123 Main Street 1 SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789 James Stagnitto 132 Main Street 1 SYS C XYZ James Stag NULL 8/20/1959 NULL James Stag NULL 1 MDM ID Name Home Address Birth Date SSN 1 James Stagnitto 123 Main Street 8/20/1959 123-45-6789 Mastering Data Validation
  • 10.
    Informational Master Data MDM InformationEcosystem 10 Operational Master Data Holistic Master Data Service Leads Policies Claims Enrolls Sales Finance DW Dimensions & Cross-References Marketing Insights
  • 11.
    The Reality ofMastering Data
  • 12.
    Graph Databases (NoSQL)to the Rescue  Hierarchical relationships are never rigid  Relational models with tables and columns not flexible enough  Neo4j is the leading graph database  Many MDM systems are going graph:  Pitney Bowes - Spectrum MDM  Reltio - Worry-Free Data for Life Sciences.
  • 13.
    Graph Databases -Who are the players? Base on popularity: http://db-engines.com/en/ranking/graph+dbms
  • 14.
    Our favorite –Neo4J • Open source graph database, implemented in Java • 1.0 released in 2010  mature • Popular - large community • Commercially supported • Easy to setup and use
  • 15.
    GRAPH DATABASES FORMDM Elliott Cordo Chief Architect, Caserta Concepts
  • 16.
    Graph DB, specialkind of NoSQL • A NoSQL database that is all about relationships • Relationships are first class citizens, not a just a “constraint”
  • 17.
    Graph Use Cases •Social Networks • Network Asset Management • Portfolio Management • Risk Analysis • Master Data Management …and many more
  • 18.
    Caserta Projects • RelationshipScience Workspace  Financial • Alumni/Corporate Donor Network  Higher Education
  • 19.
    What is wrongwith traditional approach to MDM • Conceptually problems with “enterprise” approach • Long, complex implementations  low ROI • Complex data model • Too much human interaction • Deliverable??? • Challenges with big data • Data volumes • Evolving data sources • Need to further remove humans out of the process
  • 20.
    MDM data persistence •Fundamental challenges with data storage • Sparse data • Evolving schema • Relationships • How do we handle in RDBMS • Custom relations • Extreme normalization
  • 21.
    How does aGraph DB help MDM • Data is stored in it’s natural form  no mismatch between requirements and data model • Both Nodes and Relationships can have properties  supports sparse and evolving data • MDM for analytics  your MDM solution now delivers new enablement, not just a back office system • Relationship science
  • 22.
    So how doyou work with a graph • Gremlin – traditional, supported by most Graph databases • Cyper - high level, user friendly
  • 23.
  • 24.
    Cypher – “select*” Match a pattern, return results
  • 25.
  • 26.
    Easy queries wheregraphs shine 2nd or nth level connections Shortest path
  • 27.
    Getting data in Cyphershell API’s: REST, modules/libraries for most languages CSV Loader
  • 28.
    Tools and DataViz Cypher is cool and all but are there BI tools? • Little support by mainstream tools • Healthy ecosystem of graph specific exploration and data visualization tools
  • 29.
    Open source andcommercialGelphi Tom Sawyer linkurio.us
  • 30.

Editor's Notes

  • #8 Workflow: OpenSymphony Rules: Drools Database: Neo4j Interface: Cytoscape