SlideShare a Scribd company logo
1 of 38
Data Processing over Very Large Databases
Ing. Ľuboš Takáč
Supervisor: doc. Ing. Michal Zábovský, PhD.
Faculty of Management Science and Informatics
University of Žilina
Large Databases
• VLDB (very large databases)
• Relational Databases with hundreds of tables and millions
of rows
The Problem
• How to understand relational database model so that we
could find information in them.
• Orientation in large RDB
– given by the complexity of RDB model
• Modification and development of RDB.
Existing approaches
• Database metrics
• Database visualization
• Database to ontology mapping and examination of ontology
Database Metrics
• Database metric is a function that assigns to an object from the
database a numeric value.
• Examples of table metrics
– DRT(T) – depth of relational tree
– TS(T) – table size
– RD(T) – referential degree
– …
• Rankings – grouping metrics with different weights.
RDB Visualization
• Database schema visualization.
• Standard ER - diagram is insufficient for large RDB model.
SchemaBall
• Visualization of large or complex RDB schemas.
• Using RDB metrics and rankings.
• We implemented and enhanced such solution.
SchemaBall
Visualization of RDB schema graph
• Vertex and edge weighted graph based on RDB metrics.
• Using Gephi for visualization
– automatic generated layout
– interactive visualization (selections, examinations of nodes and
edges)
– using graph algorithms
Analyzing of RDB graph
• Three approaches
– graph of RDB model (vertex – table, edges – foreign key relations)
– alternative (vertex – table, edge – foreign key relation for each
tuple)
– graph of tuples (vertex – tuple, edge – foreign key relation between
tuples)
Analyzing of RDB Graph – first approach
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
1 2 3 4 5 6 7 8 9 10 11 13 17 18 29
probability
vertex degree
Distribution function of vertex degree.
Analyzing of RDB Graph – second approach
probability
vertex degree
Distribution function of vertex degree.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Analyzing of RDB Graph – third approach
count
vertex degree
Distribution function of vertex degree.
Analyzing of RDB Graph – Scale free networks
•
Visualization of RDB schema network
Analyzing of RDB Graph - Conclusion
• RDB model is scale-free.
• To understand RDB you must to understand centers at first.
(there is not a lot of centres)
• Very useful metric NR(T) – number of references validated
by analyzing of RDB Graph.
• We created 2 new metrics based on mentioned three
approaches.
A Method for Analyzing Large RDB
• Find components of schema graph (tables = vertices, FK =
edges)
• Examine each component starting in order with largest first
– If you get alone table, very probably is an archive, try to check it or
find another purpose.
– Else visualize it via ER diagram, Schamaball or graph using table
metrics.
Practical Example
• Unknown complex RDB
– 332 tables
– 2339 attributes
– 192 foreign keys
– Size 2,4 GB
All tables
Archive Tables
• Each alone table is archive table, with convention “_A”
Component A
Component B
RDBAnalyzer
• supports all RDB Systems supporting JDBC, easy
scalable, online connection
• features
– large online RDB schema visualization
– finding the components of graph
– schema graph creation, visualization and export (GEPHI)
– transform RDB to tuple graph
– metrics charts, parallel coordinates visualization
RDBAnalyzer
RDB to Ontology Mapping
– better understanding and searching for information without
knowledge of RDB model, data mining from RDB
– can be used by web search engines to search in RDBs
– getting information from RDB by people, whose do not understand
RDB technology (layman)
– a method how to merge multiple databases (ontology merging)
– interactive searching for information (Protégé)
RDB Schema NORTHWIND (ER-Diagram)
OntoGraph (Protége)
How to find information in Ontologies
• using query language (SPARQL)
• interactive (e.g. Protégé)
– using OntoGraf combined with text searching
– explore entities and individuals
Disadvantages & Problems of mapped RDBs to
Ontologies
• Difficult to maintain actual data (static & dynamic Ontology
creation).
• Aggregated queries are very slow.
• Existing tools are not capable with large RDBs (or large
ontologies).
Conclusion & Scientific Contribution
• Design and creation of method for orientation, understanding
and finding information in large or unknown relational
databases. (RDBAnalyzer supports mentioned principles)
• Detection of RDB graph characteristics (Scale free network) and
using this knowledge to create 2 new and validate 1 existing
metric.
• Design and creation of method for finding information in
ontologies generated from RDB.
Thank you for your attention!
lubos.takac@gmail.com

More Related Content

What's hot

A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...Joshua Shinavier
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar PresentationMuntazir Mehdi
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Alexey Zinoviev
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Joshua Shinavier
 
Redesigning our Combine Harvester
Redesigning our Combine HarvesterRedesigning our Combine Harvester
Redesigning our Combine HarvesterTry PurpleSearch
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph DatabasesInfiniteGraph
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring databodaceacat
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
Metadata mapping
Metadata mappingMetadata mapping
Metadata mappingVlad Vega
 

What's hot (17)

A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
 
TinkerPop 2020
TinkerPop 2020TinkerPop 2020
TinkerPop 2020
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Redesigning our Combine Harvester
Redesigning our Combine HarvesterRedesigning our Combine Harvester
Redesigning our Combine Harvester
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring data
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Metadata mapping
Metadata mappingMetadata mapping
Metadata mapping
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 

Similar to Data Processing over very Large Relational Databases

LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Synaptica, LLC
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLConceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLMongoDB
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika IntroductionJosef Hardi
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfTamiratDejene1
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databasesthai
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databasessjwoodman
 

Similar to Data Processing over very Large Relational Databases (20)

LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
No sql
No sqlNo sql
No sql
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLConceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQL
 
Database
DatabaseDatabase
Database
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
 
NoSQL
NoSQLNoSQL
NoSQL
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
No SQL
No SQLNo SQL
No SQL
 
Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdf
 
Graph Theory and Databases
Graph Theory and DatabasesGraph Theory and Databases
Graph Theory and Databases
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
NoSql
NoSqlNoSql
NoSql
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 

More from kvaderlipa

2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layoutkvaderlipa
 
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transportkvaderlipa
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionkvaderlipa
 
Art & Science Data Visualization
Art & Science Data VisualizationArt & Science Data Visualization
Art & Science Data Visualizationkvaderlipa
 
Visualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel CoordinatesVisualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel Coordinateskvaderlipa
 
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length PatternsFast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patternskvaderlipa
 
Design and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring SystemDesign and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring Systemkvaderlipa
 

More from kvaderlipa (7)

2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
 
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introduction
 
Art & Science Data Visualization
Art & Science Data VisualizationArt & Science Data Visualization
Art & Science Data Visualization
 
Visualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel CoordinatesVisualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel Coordinates
 
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length PatternsFast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
 
Design and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring SystemDesign and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring System
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Data Processing over very Large Relational Databases

  • 1. Data Processing over Very Large Databases Ing. Ľuboš Takáč Supervisor: doc. Ing. Michal Zábovský, PhD. Faculty of Management Science and Informatics University of Žilina
  • 2. Large Databases • VLDB (very large databases) • Relational Databases with hundreds of tables and millions of rows
  • 3. The Problem • How to understand relational database model so that we could find information in them. • Orientation in large RDB – given by the complexity of RDB model • Modification and development of RDB.
  • 4. Existing approaches • Database metrics • Database visualization • Database to ontology mapping and examination of ontology
  • 5. Database Metrics • Database metric is a function that assigns to an object from the database a numeric value. • Examples of table metrics – DRT(T) – depth of relational tree – TS(T) – table size – RD(T) – referential degree – … • Rankings – grouping metrics with different weights.
  • 6. RDB Visualization • Database schema visualization. • Standard ER - diagram is insufficient for large RDB model.
  • 7.
  • 8.
  • 9. SchemaBall • Visualization of large or complex RDB schemas. • Using RDB metrics and rankings. • We implemented and enhanced such solution.
  • 11. Visualization of RDB schema graph • Vertex and edge weighted graph based on RDB metrics. • Using Gephi for visualization – automatic generated layout – interactive visualization (selections, examinations of nodes and edges) – using graph algorithms
  • 12.
  • 13.
  • 14. Analyzing of RDB graph • Three approaches – graph of RDB model (vertex – table, edges – foreign key relations) – alternative (vertex – table, edge – foreign key relation for each tuple) – graph of tuples (vertex – tuple, edge – foreign key relation between tuples)
  • 15. Analyzing of RDB Graph – first approach 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 1 2 3 4 5 6 7 8 9 10 11 13 17 18 29 probability vertex degree Distribution function of vertex degree.
  • 16. Analyzing of RDB Graph – second approach probability vertex degree Distribution function of vertex degree. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 17. Analyzing of RDB Graph – third approach count vertex degree Distribution function of vertex degree.
  • 18. Analyzing of RDB Graph – Scale free networks •
  • 19. Visualization of RDB schema network
  • 20. Analyzing of RDB Graph - Conclusion • RDB model is scale-free. • To understand RDB you must to understand centers at first. (there is not a lot of centres) • Very useful metric NR(T) – number of references validated by analyzing of RDB Graph. • We created 2 new metrics based on mentioned three approaches.
  • 21. A Method for Analyzing Large RDB • Find components of schema graph (tables = vertices, FK = edges) • Examine each component starting in order with largest first – If you get alone table, very probably is an archive, try to check it or find another purpose. – Else visualize it via ER diagram, Schamaball or graph using table metrics.
  • 22. Practical Example • Unknown complex RDB – 332 tables – 2339 attributes – 192 foreign keys – Size 2,4 GB
  • 24. Archive Tables • Each alone table is archive table, with convention “_A”
  • 27.
  • 28. RDBAnalyzer • supports all RDB Systems supporting JDBC, easy scalable, online connection • features – large online RDB schema visualization – finding the components of graph – schema graph creation, visualization and export (GEPHI) – transform RDB to tuple graph – metrics charts, parallel coordinates visualization
  • 30. RDB to Ontology Mapping – better understanding and searching for information without knowledge of RDB model, data mining from RDB – can be used by web search engines to search in RDBs – getting information from RDB by people, whose do not understand RDB technology (layman) – a method how to merge multiple databases (ontology merging) – interactive searching for information (Protégé)
  • 31. RDB Schema NORTHWIND (ER-Diagram)
  • 33.
  • 34. How to find information in Ontologies • using query language (SPARQL) • interactive (e.g. Protégé) – using OntoGraf combined with text searching – explore entities and individuals
  • 35.
  • 36. Disadvantages & Problems of mapped RDBs to Ontologies • Difficult to maintain actual data (static & dynamic Ontology creation). • Aggregated queries are very slow. • Existing tools are not capable with large RDBs (or large ontologies).
  • 37. Conclusion & Scientific Contribution • Design and creation of method for orientation, understanding and finding information in large or unknown relational databases. (RDBAnalyzer supports mentioned principles) • Detection of RDB graph characteristics (Scale free network) and using this knowledge to create 2 new and validate 1 existing metric. • Design and creation of method for finding information in ontologies generated from RDB.
  • 38. Thank you for your attention! lubos.takac@gmail.com