Final presentation of my dissertation thesis focused on orientation, analyzing and finding information in large or unknown relational databases and data visualisation
Data Processing over very Large Relational Databases
1. Data Processing over Very Large Databases
Ing. Ľuboš Takáč
Supervisor: doc. Ing. Michal Zábovský, PhD.
Faculty of Management Science and Informatics
University of Žilina
2. Large Databases
• VLDB (very large databases)
• Relational Databases with hundreds of tables and millions
of rows
3. The Problem
• How to understand relational database model so that we
could find information in them.
• Orientation in large RDB
– given by the complexity of RDB model
• Modification and development of RDB.
4. Existing approaches
• Database metrics
• Database visualization
• Database to ontology mapping and examination of ontology
5. Database Metrics
• Database metric is a function that assigns to an object from the
database a numeric value.
• Examples of table metrics
– DRT(T) – depth of relational tree
– TS(T) – table size
– RD(T) – referential degree
– …
• Rankings – grouping metrics with different weights.
11. Visualization of RDB schema graph
• Vertex and edge weighted graph based on RDB metrics.
• Using Gephi for visualization
– automatic generated layout
– interactive visualization (selections, examinations of nodes and
edges)
– using graph algorithms
12.
13.
14. Analyzing of RDB graph
• Three approaches
– graph of RDB model (vertex – table, edges – foreign key relations)
– alternative (vertex – table, edge – foreign key relation for each
tuple)
– graph of tuples (vertex – tuple, edge – foreign key relation between
tuples)
15. Analyzing of RDB Graph – first approach
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
1 2 3 4 5 6 7 8 9 10 11 13 17 18 29
probability
vertex degree
Distribution function of vertex degree.
16. Analyzing of RDB Graph – second approach
probability
vertex degree
Distribution function of vertex degree.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
17. Analyzing of RDB Graph – third approach
count
vertex degree
Distribution function of vertex degree.
20. Analyzing of RDB Graph - Conclusion
• RDB model is scale-free.
• To understand RDB you must to understand centers at first.
(there is not a lot of centres)
• Very useful metric NR(T) – number of references validated
by analyzing of RDB Graph.
• We created 2 new metrics based on mentioned three
approaches.
21. A Method for Analyzing Large RDB
• Find components of schema graph (tables = vertices, FK =
edges)
• Examine each component starting in order with largest first
– If you get alone table, very probably is an archive, try to check it or
find another purpose.
– Else visualize it via ER diagram, Schamaball or graph using table
metrics.
30. RDB to Ontology Mapping
– better understanding and searching for information without
knowledge of RDB model, data mining from RDB
– can be used by web search engines to search in RDBs
– getting information from RDB by people, whose do not understand
RDB technology (layman)
– a method how to merge multiple databases (ontology merging)
– interactive searching for information (Protégé)
34. How to find information in Ontologies
• using query language (SPARQL)
• interactive (e.g. Protégé)
– using OntoGraf combined with text searching
– explore entities and individuals
35.
36. Disadvantages & Problems of mapped RDBs to
Ontologies
• Difficult to maintain actual data (static & dynamic Ontology
creation).
• Aggregated queries are very slow.
• Existing tools are not capable with large RDBs (or large
ontologies).
37. Conclusion & Scientific Contribution
• Design and creation of method for orientation, understanding
and finding information in large or unknown relational
databases. (RDBAnalyzer supports mentioned principles)
• Detection of RDB graph characteristics (Scale free network) and
using this knowledge to create 2 new and validate 1 existing
metric.
• Design and creation of method for finding information in
ontologies generated from RDB.