This document provides an agenda and overview for the "Big MDM Part 2" meetup event. The agenda includes presentations on using graph databases for master data management (MDM) and relationship management. Speakers from Caserta Concepts, Neo Technology, and Pitney Bowes will discuss graph databases, MDM use cases, and modeling and managing data with graph databases. The meetup is sponsored by Caserta Concepts and hosted by Neo Technology. It will include networking, five presentations on graph databases and MDM topics, and a Q&A session.
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
1. BDW Meetup:
Big MDM Part 2
Using a Graph Database for MDM
& Relationship Management
Sponsored by:Hosted by:
2. 6:30 Networking
Grab some food and drink... Make some friends.
6:45 Joe Caserta
President
Caserta Concepts
Welcome + Intro to Big MDM
About the Meetup. Why MDM needs Graph
now.
7:00 Elliott Cordo
Chief Architect
Caserta Concepts
Intro to Graph Databases and Cypher
Deep dive into graph technology and how to
work with it.
7:20 David Fauth,
Senior Engineering Consultant
Neo Technology
Neo4j and Use Cases
Real-world solutions and a demo of Neo4j
for relationship management.
7:50 Aaron Wallace
Principal Product Manager
Pitney Bowes
Spectrum MDM Hub
Model, manage and govern data with graph
database.
8:20 Q&A Ask Questions, Share your experience
Agenda
3. • Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Founded by Caserta Concepts
• November 10, 2012
• Next BDW Meetup:
• April 7
• Topic: Predictive Analytics on
Hadoop (with Zementis)
• Location: NWC
About the BDW Meetup #BDWmeetup
@CasertaConcepts
@neo4j
@PitneyBowes
4. Top 20 Big Data
Consulting - CIO Review
Launched Big Data practice
Co-author, with Ralph Kimball, The
Data Warehouse ETL Toolkit (Wiley)
Dedicated to Data Warehousing,
Business Intelligence since 1996
Began consulting database
programing and data modeling 25+ years hands-on experience
building database solutions
Founded Caserta Concepts in NYC
Web log analytics solution published
in Intelligent Enterprise
Formalized Alliances / Partnerships –
System Integrators
Partnered with Big Data vendors
Cloudera, Hortonworks, IBM, Cisco,
Datameer, Basho more…
Launched Training practice, teaching
data concepts world-wide
Laser focus on extending Data
Warehouses with Big Data solutions
1986
2004
1996
2009
2001
2010
2013
Launched Big Data Warehousing
(BDW) Meetup-NYC ~1500 Members
2012
2014
Established best practices for big
data ecosystem implementation –
Healthcare, Finance, Insurance
Top 20 Most Powerful
Big Data consulting firms
Dedicated to Data Governance
Techniques on Big Data (Innovation)
Caserta Timeline
5. About Caserta Concepts
• Award-winning technology innovation consulting with
expertise in:
• Big Data Solutions
• Data Warehousing
• Business Intelligence
• Core focus in the following industries:
• eCommerce / Retail / Marketing
• Financial Services / Insurance
• Healthcare / Ad Tech / Higher Ed
• Established in 2001:
• Increased growth year-over-year
• Industry recognized work force
• Strategy, Implementation
• Writing, Education, Mentoring
• Data Science & Analytics
• Cloud Computing
• Data Interaction & Visualization
6. Does this word cloud excite you?
Speak with us about our open positions: leslie@casertaconcepts.com
Help Wanted
Spark
Big Data Architect NoSQL
EC2,EMR,Redshift
9. Staging
Library
Consolidated
Library
Standardization Matching
Integrated
Library
Survivorship
Source ID Name Home Address Birth Date SSN
SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789
SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789
SYS C XYZ James Stag NULL 8/20/1959 NULL
Source ID Name Home Address Birth Date SSN Std Name Std Addr MDM ID
SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789 James Stagnitto 123 Main Street 1
SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789 James Stagnitto 132 Main Street 1
SYS C XYZ James Stag NULL 8/20/1959 NULL James Stag NULL 1
MDM ID Name Home Address Birth Date SSN
1 James Stagnitto 123 Main Street 8/20/1959 123-45-6789
Mastering Data
Validation
10. Informational
Master Data
MDM Information Ecosystem
10
Operational
Master Data
Holistic
Master Data
Service
Leads
Policies
Claims
Enrolls
Sales
Finance
DW
Dimensions &
Cross-References
Marketing
Insights
12. Graph Databases (NoSQL) to the Rescue
Hierarchical relationships are never
rigid
Relational models with tables and
columns not flexible enough
Neo4j is the leading graph database
Many MDM systems are going graph:
Pitney Bowes - Spectrum MDM
Reltio - Worry-Free Data for Life
Sciences.
13. Graph Databases - Who are the players?
Base on popularity: http://db-engines.com/en/ranking/graph+dbms
14. Our favorite – Neo4J
• Open source graph database, implemented in Java
• 1.0 released in 2010 mature
• Popular - large community
• Commercially supported
• Easy to setup and use
16. Graph DB, special kind of NoSQL
• A NoSQL database that is all about relationships
• Relationships are first class citizens, not a just a
“constraint”
17. Graph Use Cases
• Social Networks
• Network Asset Management
• Portfolio Management
• Risk Analysis
• Master Data Management
…and many more
19. What is wrong with traditional approach to
MDM
• Conceptually problems with “enterprise” approach
• Long, complex implementations low ROI
• Complex data model
• Too much human interaction
• Deliverable???
• Challenges with big data
• Data volumes
• Evolving data sources
• Need to further remove humans out of the process
20. MDM data persistence
• Fundamental challenges with data storage
• Sparse data
• Evolving schema
• Relationships
• How do we handle in RDBMS
• Custom relations
• Extreme normalization
21. How does a Graph DB help MDM
• Data is stored in it’s natural form no mismatch between
requirements and data model
• Both Nodes and Relationships can have properties
supports sparse and evolving data
• MDM for analytics your MDM solution now delivers
new enablement, not just a back office system
• Relationship science
22. So how do you work with a graph
• Gremlin – traditional, supported by most Graph databases
• Cyper - high level, user friendly
26. Easy queries where graphs shine
2nd or nth level connections
Shortest path
27. Getting data in
Cypher shell
API’s: REST, modules/libraries for most languages
CSV Loader
28. Tools and Data Viz
Cypher is cool and all but are there BI tools?
• Little support by mainstream tools
• Healthy ecosystem of graph specific exploration and data
visualization tools