SlideShare a Scribd company logo
1 of 6
Download to read offline
10/4/2017 GraphDB as MetaStore
file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 1/6
Using Neo4j as a Data Catalogue for Data Lake
Oracle gave us a nice way to store metadata at attribute level using comments. This worked fine until we had limited attributes but when number of
attributes started to grow in Data Warehouse construct, we started facing challenges like non-updated metadata, different business terms for same
attribute etc. This gave requirement for Data Warehouse Metadata management systems which can provide unified and canonical data
dictionary/classification to users. E.g. – IBM Infosphere Business Glossary. These tools can link business metadata to technical metadata and helps a
business user to land to specific attribute of interest in your DW. Hence they help in information discovery in these huge DW systems. Business users
then use the information from RDBMS via SQL scripts or some visualization packages. Then came the real challenge – NOSQL DataStores. Now the
data is not stored in tabular format and we can’t use SQL queries (simple tool for all) to fetch the data of interest. Data could be stored in numerous
formats like XML, Key Value pair, JSON etc. This gave a challenge to
1. Help business users to discover the information stored in a NOSQL DB
2. Abstract the way information is stored and help users to view the information in simple tabular format
3. Apply granular access policies
To solve this challenge, we must store the metadata in a highly connected format. So that data can be structured in a much usable format. Say Subject
Area Tables/Documents Attributes. These hierarchies could go much deeper in case of a semi-structured data store format like XML. Hence I
thought of using a Graph Database to store this metadata. We selected Neo4J (a property graph) over a triple store because of few reasons –
1. Data Discovery – With a graph store, users will have capability to search the attribute they are interested in and find out its relations like –
which subject area it belongs to, or which table/document it is stored in or vice versa
2. We can use nodes properties – to store backed NOSQL DB information like the “Key” for some attribute in a Key Value Store, table/document
name, column name etc.
3. We can restrict who can preview what data based on the relationships of attributes with roles. Roles can also be defined as nodes in the graph
store.
This is only a small demonstration of this POV. I am using Oracle as my data store where I am storing data in Key-Value pair format. I am storing
customer’s address information in a table which is organized in Key Value format.
10/4/2017 GraphDB as MetaStore
file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 2/6
In [2]: %load_ext cypher
import cx_Oracle
import pandas.io.sql as sql
con = cx_Oracle.connect('haris/Passxxxx2017@127.0.0.1/XE')
import pandas as pd
Data Stored in Oracle in KVP format
Here we are storing multiple address types in kv-pairs. Key here is "Customer ID + ADDR_TYPE + ADDR_VAL_TYPE" and ADDR_VAL is the value of
the key.
In [6]: sql.read_sql_query("select * from CUST_ADDR", con)
Metadata stored in Neo4j
This is how we have stored the metadata in Neo4j DB. We have stored "Subject Area" nodes at the highest level of hierarchy - Customers. Later we
have table node - Address (which represents a table/document at the backend DB). For all these nodes we have stored the metadata and backend db
information as properties, which you will see in the next 2 Cypher queries which I am running on Neo4j DB.
Out[6]: CUSTOMER_ID ADDR_TYPE ADDR_VAL_TYPE ADDR_VAL ST_DT END_DT
0 1 HOME POSTBOX 1002036 2010-01-01 2012-08-14
1 1 OFFICE POSTBOX 1002037 2010-01-01 2012-08-14
2 1 OFFICE STREET 102, La Trobe Street, Melbourne 2010-01-01 2012-08-14
3 1 HOME STREET 102, La Trobe Street, Melbourne 2010-01-01 2012-08-14
10/4/2017 GraphDB as MetaStore
file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 3/6
In [45]: ## Overall Data with all relationship
from IPython.display import Image
Image('Graph_Data.jpg')
Table Metadata
Out[45]:
10/4/2017 GraphDB as MetaStore
file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 4/6
In [6]: z = %%cypher http://neo4j:Passxxxx@localhost:7474/db/data 
match (a)-[r:Has]->(b) 
where a.Type = 'Table' return a
pd.DataFrame(z[0])
Attribute Metadata
Look at the properties stores for a Attribute Node
In [8]: z = %%cypher http://neo4j:Passxxxx@localhost:7474/db/data 
match (a)-[r:Has]->(b) 
where a.Type = 'Entity' return a
pd.DataFrame(z[0])
2 rows affected.
Out[6]: Backend_Col_Sel Backend_Table Backend_Where Catlog_Info Name Size Source_Name Type Update_Freq
0 cust_addr
This Table stores
Customer
Address Info.
Curre...
Address >1GB Customer DB Table Real Time
2 rows affected.
Out[8]: Backend_Col_Sel_CSV Backend_Table Backend_Where Catlog_Info Name Size Source_Nam
0 customer_id,addr_type,addr_val_type,addr_val cust_addr
ADDR_TYPE =
'HOME'
This column
stores
Customer
Home
Address
Info....
Add_Home >1GB Customer DB
10/4/2017 GraphDB as MetaStore
file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 5/6
This is the key - Build a function which can dynamically generate a SQL to be fired on the backend Data Store (Oracle in our case) based on
the User Selection from Graph DB. Business user will see all metadata from Neo4J during data discovery. Once the user find the attributes
he is looking for a query is dynamically built query is fired on backend DataStore and data preview is available. This completely abstracts the
way data is stored and organised in th backend tables
In [10]: def Build_SQL(where,select_cols,table):
sql_str = "Select " + (',').join(select_cols) + ' from ' + table + ' where ' + where
return sql_str
This function extracts all the metadata from Neo4J and displays it for user Selection. Once user enters an attribute he wants to look at, an
SQL is built dynamically and fired on Oracle to fetch the data preview.
In [11]: def get_attr_details():
results = %cypher http://neo4j:Passxxxx@localhost:7474/db/data match (a)-[r:Has]->(b) 
return distinct a.Name as PRNT_NAME,a.Type as PRNT_TYPE,b.Name as CHLD_NAME,b.Type as CHLD_TYPE
df = results.get_dataframe()
print df
Attrx = raw_input("Enter the Attribut You Want details = ")
result = %cypher http://neo4j:Passxxxx@localhost:7474/db/data match (a:Attribute)
where a.Name = '{Attrx}' return a.Backend_Table,a.Backend_Where,a.Backend_Col_Sel_CSV
table = result[0][0]
where_clause = result[0][1]
select_cols = (result[0][2]).split(',') #get this dynamically
sql_script = Build_SQL(where_clause,select_cols,table)
df = sql.read_sql_query(sql_script, con)
return df
In [ ]: data = get_attr_details()
5 rows affected.
PRNT_NAME PRNT_TYPE CHLD_NAME CHLD_TYPE
0 Customer Subject_Area Address Table
1 Address Table Add_Home Entity
2 Address Table Add_Office Entity
3 Add_Home Entity Add_Home_PBO Entity
4 Add_Office Entity Add_Office_PBO Entity
10/4/2017 GraphDB as MetaStore
file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 6/6
Invoking the function first displays a table showing the metadata from Neo4j, where we are seeing "Customer Subject Area" as parent having "Address
Table" as child. Later Table has child attrubutes.
Function also expects user to Input the Attribute name he/she wants to look at.
Below I have input "Add_Home_PBO" to fetch that attribute details from Oracle.
In [13]: data = get_attr_details()
In [15]: data
In [ ]: con.close()
Conclusion:
This small test demonstrates the concept of using a Graph DB as metadata hub for a NOSQL DB or any Data lake. This way we have abstracted all the
technicalities of fetching data from a NOSQL DB and use can view the data in the tabular format with which we all are comfortable with. Thanks ..
5 rows affected.
PRNT_NAME PRNT_TYPE CHLD_NAME CHLD_TYPE
0 Customer Subject_Area Address Table
1 Address Table Add_Home Entity
2 Address Table Add_Office Entity
3 Add_Home Entity Add_Home_PBO Entity
4 Add_Office Entity Add_Office_PBO Entity
Enter the Attribut You Want details = Add_Home_PBO
1 rows affected.
Out[15]: CUSTOMER_ID ADDR_TYPE ADDR_VAL_TYPE ADDR_VAL
0 1 HOME POSTBOX 1002036

More Related Content

What's hot

Chapter 3: ado.net
Chapter 3: ado.netChapter 3: ado.net
Chapter 3: ado.netNgeam Soly
 
For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.netTarun Jain
 
Ch06 ado.net fundamentals
Ch06 ado.net fundamentalsCh06 ado.net fundamentals
Ch06 ado.net fundamentalsMadhuri Kavade
 
Database programming in vb net
Database programming in vb netDatabase programming in vb net
Database programming in vb netZishan yousaf
 
ASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationRandy Connolly
 
Web based database application design using vb.net and sql server
Web based database application design using vb.net and sql serverWeb based database application design using vb.net and sql server
Web based database application design using vb.net and sql serverAmmara Arooj
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
 
Disconnected Architecture and Crystal report in VB.NET
Disconnected Architecture and Crystal report in VB.NETDisconnected Architecture and Crystal report in VB.NET
Disconnected Architecture and Crystal report in VB.NETEverywhere
 

What's hot (20)

Ado.net
Ado.netAdo.net
Ado.net
 
For Beginers - ADO.Net
For Beginers - ADO.NetFor Beginers - ADO.Net
For Beginers - ADO.Net
 
Chapter 3: ado.net
Chapter 3: ado.netChapter 3: ado.net
Chapter 3: ado.net
 
Ado.Net Tutorial
Ado.Net TutorialAdo.Net Tutorial
Ado.Net Tutorial
 
For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.net
 
Ado .net
Ado .netAdo .net
Ado .net
 
ADO .Net
ADO .Net ADO .Net
ADO .Net
 
Ado.net
Ado.netAdo.net
Ado.net
 
Ch06 ado.net fundamentals
Ch06 ado.net fundamentalsCh06 ado.net fundamentals
Ch06 ado.net fundamentals
 
Database programming in vb net
Database programming in vb netDatabase programming in vb net
Database programming in vb net
 
Ado.net
Ado.netAdo.net
Ado.net
 
ADO.NET by ASP.NET Development Company in india
ADO.NET by ASP.NET  Development Company in indiaADO.NET by ASP.NET  Development Company in india
ADO.NET by ASP.NET Development Company in india
 
ASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And Representation
 
ADO.NET
ADO.NETADO.NET
ADO.NET
 
ASP.NET 09 - ADO.NET
ASP.NET 09 - ADO.NETASP.NET 09 - ADO.NET
ASP.NET 09 - ADO.NET
 
Ado.net
Ado.netAdo.net
Ado.net
 
ADO.NET -database connection
ADO.NET -database connectionADO.NET -database connection
ADO.NET -database connection
 
Web based database application design using vb.net and sql server
Web based database application design using vb.net and sql serverWeb based database application design using vb.net and sql server
Web based database application design using vb.net and sql server
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 
Disconnected Architecture and Crystal report in VB.NET
Disconnected Architecture and Crystal report in VB.NETDisconnected Architecture and Crystal report in VB.NET
Disconnected Architecture and Crystal report in VB.NET
 

Similar to Graph db as metastore

RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
 
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptxsukrithlal008
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile appsIvano Malavolta
 
Local storage in Web apps
Local storage in Web appsLocal storage in Web apps
Local storage in Web appsIvano Malavolta
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflakeSivakumar Ramar
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB DatabaseTariqul islam
 
Adv DB - Full Handout.pdf
Adv DB - Full Handout.pdfAdv DB - Full Handout.pdf
Adv DB - Full Handout.pdf3BRBoruMedia
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 
Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Servicesukdpe
 
introtomongodb
introtomongodbintrotomongodb
introtomongodbsaikiran
 
PPT SQL CLASS.pptx
PPT SQL CLASS.pptxPPT SQL CLASS.pptx
PPT SQL CLASS.pptxAngeOuattara
 
Air Line Management System | DBMS project
Air Line Management System | DBMS projectAir Line Management System | DBMS project
Air Line Management System | DBMS projectAniketHandore
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in sparkSubhasish Guha
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in sparkSubhasish Guha
 
The Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataThe Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataPaulo Fagundes
 

Similar to Graph db as metastore (20)

RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
 
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile apps
 
3. ADO.NET
3. ADO.NET3. ADO.NET
3. ADO.NET
 
Local storage in Web apps
Local storage in Web appsLocal storage in Web apps
Local storage in Web apps
 
Practical OData
Practical ODataPractical OData
Practical OData
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Oracle tutorial
Oracle tutorialOracle tutorial
Oracle tutorial
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Adv DB - Full Handout.pdf
Adv DB - Full Handout.pdfAdv DB - Full Handout.pdf
Adv DB - Full Handout.pdf
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Services
 
introtomongodb
introtomongodbintrotomongodb
introtomongodb
 
PPT SQL CLASS.pptx
PPT SQL CLASS.pptxPPT SQL CLASS.pptx
PPT SQL CLASS.pptx
 
Database fundamentals
Database fundamentalsDatabase fundamentals
Database fundamentals
 
Air Line Management System | DBMS project
Air Line Management System | DBMS projectAir Line Management System | DBMS project
Air Line Management System | DBMS project
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in spark
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in spark
 
The Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataThe Power of Relationships in Your Big Data
The Power of Relationships in Your Big Data
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Graph db as metastore

  • 1. 10/4/2017 GraphDB as MetaStore file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 1/6 Using Neo4j as a Data Catalogue for Data Lake Oracle gave us a nice way to store metadata at attribute level using comments. This worked fine until we had limited attributes but when number of attributes started to grow in Data Warehouse construct, we started facing challenges like non-updated metadata, different business terms for same attribute etc. This gave requirement for Data Warehouse Metadata management systems which can provide unified and canonical data dictionary/classification to users. E.g. – IBM Infosphere Business Glossary. These tools can link business metadata to technical metadata and helps a business user to land to specific attribute of interest in your DW. Hence they help in information discovery in these huge DW systems. Business users then use the information from RDBMS via SQL scripts or some visualization packages. Then came the real challenge – NOSQL DataStores. Now the data is not stored in tabular format and we can’t use SQL queries (simple tool for all) to fetch the data of interest. Data could be stored in numerous formats like XML, Key Value pair, JSON etc. This gave a challenge to 1. Help business users to discover the information stored in a NOSQL DB 2. Abstract the way information is stored and help users to view the information in simple tabular format 3. Apply granular access policies To solve this challenge, we must store the metadata in a highly connected format. So that data can be structured in a much usable format. Say Subject Area Tables/Documents Attributes. These hierarchies could go much deeper in case of a semi-structured data store format like XML. Hence I thought of using a Graph Database to store this metadata. We selected Neo4J (a property graph) over a triple store because of few reasons – 1. Data Discovery – With a graph store, users will have capability to search the attribute they are interested in and find out its relations like – which subject area it belongs to, or which table/document it is stored in or vice versa 2. We can use nodes properties – to store backed NOSQL DB information like the “Key” for some attribute in a Key Value Store, table/document name, column name etc. 3. We can restrict who can preview what data based on the relationships of attributes with roles. Roles can also be defined as nodes in the graph store. This is only a small demonstration of this POV. I am using Oracle as my data store where I am storing data in Key-Value pair format. I am storing customer’s address information in a table which is organized in Key Value format.
  • 2. 10/4/2017 GraphDB as MetaStore file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 2/6 In [2]: %load_ext cypher import cx_Oracle import pandas.io.sql as sql con = cx_Oracle.connect('haris/Passxxxx2017@127.0.0.1/XE') import pandas as pd Data Stored in Oracle in KVP format Here we are storing multiple address types in kv-pairs. Key here is "Customer ID + ADDR_TYPE + ADDR_VAL_TYPE" and ADDR_VAL is the value of the key. In [6]: sql.read_sql_query("select * from CUST_ADDR", con) Metadata stored in Neo4j This is how we have stored the metadata in Neo4j DB. We have stored "Subject Area" nodes at the highest level of hierarchy - Customers. Later we have table node - Address (which represents a table/document at the backend DB). For all these nodes we have stored the metadata and backend db information as properties, which you will see in the next 2 Cypher queries which I am running on Neo4j DB. Out[6]: CUSTOMER_ID ADDR_TYPE ADDR_VAL_TYPE ADDR_VAL ST_DT END_DT 0 1 HOME POSTBOX 1002036 2010-01-01 2012-08-14 1 1 OFFICE POSTBOX 1002037 2010-01-01 2012-08-14 2 1 OFFICE STREET 102, La Trobe Street, Melbourne 2010-01-01 2012-08-14 3 1 HOME STREET 102, La Trobe Street, Melbourne 2010-01-01 2012-08-14
  • 3. 10/4/2017 GraphDB as MetaStore file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 3/6 In [45]: ## Overall Data with all relationship from IPython.display import Image Image('Graph_Data.jpg') Table Metadata Out[45]:
  • 4. 10/4/2017 GraphDB as MetaStore file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 4/6 In [6]: z = %%cypher http://neo4j:Passxxxx@localhost:7474/db/data match (a)-[r:Has]->(b) where a.Type = 'Table' return a pd.DataFrame(z[0]) Attribute Metadata Look at the properties stores for a Attribute Node In [8]: z = %%cypher http://neo4j:Passxxxx@localhost:7474/db/data match (a)-[r:Has]->(b) where a.Type = 'Entity' return a pd.DataFrame(z[0]) 2 rows affected. Out[6]: Backend_Col_Sel Backend_Table Backend_Where Catlog_Info Name Size Source_Name Type Update_Freq 0 cust_addr This Table stores Customer Address Info. Curre... Address >1GB Customer DB Table Real Time 2 rows affected. Out[8]: Backend_Col_Sel_CSV Backend_Table Backend_Where Catlog_Info Name Size Source_Nam 0 customer_id,addr_type,addr_val_type,addr_val cust_addr ADDR_TYPE = 'HOME' This column stores Customer Home Address Info.... Add_Home >1GB Customer DB
  • 5. 10/4/2017 GraphDB as MetaStore file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 5/6 This is the key - Build a function which can dynamically generate a SQL to be fired on the backend Data Store (Oracle in our case) based on the User Selection from Graph DB. Business user will see all metadata from Neo4J during data discovery. Once the user find the attributes he is looking for a query is dynamically built query is fired on backend DataStore and data preview is available. This completely abstracts the way data is stored and organised in th backend tables In [10]: def Build_SQL(where,select_cols,table): sql_str = "Select " + (',').join(select_cols) + ' from ' + table + ' where ' + where return sql_str This function extracts all the metadata from Neo4J and displays it for user Selection. Once user enters an attribute he wants to look at, an SQL is built dynamically and fired on Oracle to fetch the data preview. In [11]: def get_attr_details(): results = %cypher http://neo4j:Passxxxx@localhost:7474/db/data match (a)-[r:Has]->(b) return distinct a.Name as PRNT_NAME,a.Type as PRNT_TYPE,b.Name as CHLD_NAME,b.Type as CHLD_TYPE df = results.get_dataframe() print df Attrx = raw_input("Enter the Attribut You Want details = ") result = %cypher http://neo4j:Passxxxx@localhost:7474/db/data match (a:Attribute) where a.Name = '{Attrx}' return a.Backend_Table,a.Backend_Where,a.Backend_Col_Sel_CSV table = result[0][0] where_clause = result[0][1] select_cols = (result[0][2]).split(',') #get this dynamically sql_script = Build_SQL(where_clause,select_cols,table) df = sql.read_sql_query(sql_script, con) return df In [ ]: data = get_attr_details() 5 rows affected. PRNT_NAME PRNT_TYPE CHLD_NAME CHLD_TYPE 0 Customer Subject_Area Address Table 1 Address Table Add_Home Entity 2 Address Table Add_Office Entity 3 Add_Home Entity Add_Home_PBO Entity 4 Add_Office Entity Add_Office_PBO Entity
  • 6. 10/4/2017 GraphDB as MetaStore file:///C:/Users/haris_khan/Documents/Python/Graph_DB_MetaStore/GraphDB+as+MetaStore.html 6/6 Invoking the function first displays a table showing the metadata from Neo4j, where we are seeing "Customer Subject Area" as parent having "Address Table" as child. Later Table has child attrubutes. Function also expects user to Input the Attribute name he/she wants to look at. Below I have input "Add_Home_PBO" to fetch that attribute details from Oracle. In [13]: data = get_attr_details() In [15]: data In [ ]: con.close() Conclusion: This small test demonstrates the concept of using a Graph DB as metadata hub for a NOSQL DB or any Data lake. This way we have abstracted all the technicalities of fetching data from a NOSQL DB and use can view the data in the tabular format with which we all are comfortable with. Thanks .. 5 rows affected. PRNT_NAME PRNT_TYPE CHLD_NAME CHLD_TYPE 0 Customer Subject_Area Address Table 1 Address Table Add_Home Entity 2 Address Table Add_Office Entity 3 Add_Home Entity Add_Home_PBO Entity 4 Add_Office Entity Add_Office_PBO Entity Enter the Attribut You Want details = Add_Home_PBO 1 rows affected. Out[15]: CUSTOMER_ID ADDR_TYPE ADDR_VAL_TYPE ADDR_VAL 0 1 HOME POSTBOX 1002036