Data Processing over Very Large Databases
Ing. Ľuboš Takáč
Supervisor: doc. Ing. Michal Zábovský, PhD.
Faculty of Manageme...
Large Databases
• VLDB (very large databases)
• Relational Databases with hundreds of tables and millions
of rows
The Problem
• How to understand relational database model so that we
could find information in them.
• Orientation in larg...
Existing approaches
• Database metrics
• Database visualization
• Database to ontology mapping and examination of ontology
Database Metrics
• Database metric is a function that assigns to an object from the
database a numeric value.
• Examples o...
RDB Visualization
• Database schema visualization.
• Standard ER - diagram is insufficient for large RDB model.
SchemaBall
• Visualization of large or complex RDB schemas.
• Using RDB metrics and rankings.
• We implemented and enhance...
SchemaBall
Visualization of RDB schema graph
• Vertex and edge weighted graph based on RDB metrics.
• Using Gephi for visualization
–...
Analyzing of RDB graph
• Three approaches
– graph of RDB model (vertex – table, edges – foreign key relations)
– alternati...
Analyzing of RDB Graph – first approach
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
1 2 3 4 5 6 7 8 9 10 11 13 17 18 29
p...
Analyzing of RDB Graph – second approach
probability
vertex degree
Distribution function of vertex degree.
0
0.1
0.2
0.3
0...
Analyzing of RDB Graph – third approach
count
vertex degree
Distribution function of vertex degree.
Analyzing of RDB Graph – Scale free networks
•
Visualization of RDB schema network
Analyzing of RDB Graph - Conclusion
• RDB model is scale-free.
• To understand RDB you must to understand centers at first...
A Method for Analyzing Large RDB
• Find components of schema graph (tables = vertices, FK =
edges)
• Examine each componen...
Practical Example
• Unknown complex RDB
– 332 tables
– 2339 attributes
– 192 foreign keys
– Size 2,4 GB
All tables
Archive Tables
• Each alone table is archive table, with convention “_A”
Component A
Component B
RDBAnalyzer
• supports all RDB Systems supporting JDBC, easy
scalable, online connection
• features
– large online RDB sch...
RDBAnalyzer
RDB to Ontology Mapping
– better understanding and searching for information without
knowledge of RDB model, data mining f...
RDB Schema NORTHWIND (ER-Diagram)
OntoGraph (Protége)
How to find information in Ontologies
• using query language (SPARQL)
• interactive (e.g. Protégé)
– using OntoGraf combin...
Disadvantages & Problems of mapped RDBs to
Ontologies
• Difficult to maintain actual data (static & dynamic Ontology
creat...
Conclusion & Scientific Contribution
• Design and creation of method for orientation, understanding
and finding informatio...
Thank you for your attention!
lubos.takac@gmail.com
Data Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
Upcoming SlideShare
Loading in …5
×

Data Processing over very Large Relational Databases

370 views

Published on

Final presentation of my dissertation thesis focused on orientation, analyzing and finding information in large or unknown relational databases and data visualisation

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
370
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Processing over very Large Relational Databases

  1. 1. Data Processing over Very Large Databases Ing. Ľuboš Takáč Supervisor: doc. Ing. Michal Zábovský, PhD. Faculty of Management Science and Informatics University of Žilina
  2. 2. Large Databases • VLDB (very large databases) • Relational Databases with hundreds of tables and millions of rows
  3. 3. The Problem • How to understand relational database model so that we could find information in them. • Orientation in large RDB – given by the complexity of RDB model • Modification and development of RDB.
  4. 4. Existing approaches • Database metrics • Database visualization • Database to ontology mapping and examination of ontology
  5. 5. Database Metrics • Database metric is a function that assigns to an object from the database a numeric value. • Examples of table metrics – DRT(T) – depth of relational tree – TS(T) – table size – RD(T) – referential degree – … • Rankings – grouping metrics with different weights.
  6. 6. RDB Visualization • Database schema visualization. • Standard ER - diagram is insufficient for large RDB model.
  7. 7. SchemaBall • Visualization of large or complex RDB schemas. • Using RDB metrics and rankings. • We implemented and enhanced such solution.
  8. 8. SchemaBall
  9. 9. Visualization of RDB schema graph • Vertex and edge weighted graph based on RDB metrics. • Using Gephi for visualization – automatic generated layout – interactive visualization (selections, examinations of nodes and edges) – using graph algorithms
  10. 10. Analyzing of RDB graph • Three approaches – graph of RDB model (vertex – table, edges – foreign key relations) – alternative (vertex – table, edge – foreign key relation for each tuple) – graph of tuples (vertex – tuple, edge – foreign key relation between tuples)
  11. 11. Analyzing of RDB Graph – first approach 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 1 2 3 4 5 6 7 8 9 10 11 13 17 18 29 probability vertex degree Distribution function of vertex degree.
  12. 12. Analyzing of RDB Graph – second approach probability vertex degree Distribution function of vertex degree. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  13. 13. Analyzing of RDB Graph – third approach count vertex degree Distribution function of vertex degree.
  14. 14. Analyzing of RDB Graph – Scale free networks •
  15. 15. Visualization of RDB schema network
  16. 16. Analyzing of RDB Graph - Conclusion • RDB model is scale-free. • To understand RDB you must to understand centers at first. (there is not a lot of centres) • Very useful metric NR(T) – number of references validated by analyzing of RDB Graph. • We created 2 new metrics based on mentioned three approaches.
  17. 17. A Method for Analyzing Large RDB • Find components of schema graph (tables = vertices, FK = edges) • Examine each component starting in order with largest first – If you get alone table, very probably is an archive, try to check it or find another purpose. – Else visualize it via ER diagram, Schamaball or graph using table metrics.
  18. 18. Practical Example • Unknown complex RDB – 332 tables – 2339 attributes – 192 foreign keys – Size 2,4 GB
  19. 19. All tables
  20. 20. Archive Tables • Each alone table is archive table, with convention “_A”
  21. 21. Component A
  22. 22. Component B
  23. 23. RDBAnalyzer • supports all RDB Systems supporting JDBC, easy scalable, online connection • features – large online RDB schema visualization – finding the components of graph – schema graph creation, visualization and export (GEPHI) – transform RDB to tuple graph – metrics charts, parallel coordinates visualization
  24. 24. RDBAnalyzer
  25. 25. RDB to Ontology Mapping – better understanding and searching for information without knowledge of RDB model, data mining from RDB – can be used by web search engines to search in RDBs – getting information from RDB by people, whose do not understand RDB technology (layman) – a method how to merge multiple databases (ontology merging) – interactive searching for information (Protégé)
  26. 26. RDB Schema NORTHWIND (ER-Diagram)
  27. 27. OntoGraph (Protége)
  28. 28. How to find information in Ontologies • using query language (SPARQL) • interactive (e.g. Protégé) – using OntoGraf combined with text searching – explore entities and individuals
  29. 29. Disadvantages & Problems of mapped RDBs to Ontologies • Difficult to maintain actual data (static & dynamic Ontology creation). • Aggregated queries are very slow. • Existing tools are not capable with large RDBs (or large ontologies).
  30. 30. Conclusion & Scientific Contribution • Design and creation of method for orientation, understanding and finding information in large or unknown relational databases. (RDBAnalyzer supports mentioned principles) • Detection of RDB graph characteristics (Scale free network) and using this knowledge to create 2 new and validate 1 existing metric. • Design and creation of method for finding information in ontologies generated from RDB.
  31. 31. Thank you for your attention! lubos.takac@gmail.com

×