A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system.
The document discusses advanced database management systems (ADBMS). It provides background on how databases have become essential in modern society and outlines new applications like multimedia databases, geographic information systems, and data warehouses. The document then covers the history of database applications from early hierarchical and network systems to relational databases and object-oriented databases needed for e-commerce. It also discusses how database capabilities have been extended to support new applications involving scientific data, images, videos, data mining, spatial data, and time series data.
2013 NIST Big Data Subgroups Combined Outputs Bob Marcus
This document provides an outline for combining deliverables from the NIST Big Data Working Group into an integrated document. It includes sections on introducing a definition of Big Data, describing reference architectures at a high and detailed level, mapping requirements to the architectures, discussing future directions and security/privacy challenges, and concluding with general advice. Appendices provide terminology glossaries, use case examples, roles/actors, and questions from a customer perspective.
Data is the most valuable entity in today’s world which has to be managed. The huge data
available is to be processed for knowledge and predictions.This huge data in other words big data is
available from various sources like Facebook, twitter and many more resources. The processing time taken
by the frameworks such as Spark ,MapReduce Hierachial Distributed Matrix(HHDM) is more. Hence
Hybrid Hierarchically Distributed Data Matrix(HHHDM) is proposed. This framework is used to develop
Bigdata applications. In existing system developed programs are by default or automatically roughly
defined, jobs are without any functionality being described to be reusable.It also reduces the ability to
optimize data flow of job sequences and pipelines. To overcome the problems of existing framework we
introduce a HHHDM method for developing the big data processing jobs. The proposed method is a Hybrid
method which has the advantages of Hierarchial Distributed Matrix (HHDM) which is functional,
stronglytyped for writing big data applications which are composable. To improve the performance of
executing HHHDM jobs multiple optimizationsis applied to the HHHDM method. The experimental results
show that the improvement of the processing time is 65-70 percent when compared to the existing
technology that is spark.
A Review on Classification of Data Imbalance using BigDataIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.
Getting relational database from legacy data mdre approachAlexander Decker
This document summarizes a research paper that proposes a model-driven reverse engineering (MDRE) approach to extract data from legacy COBOL systems and transform it into a normalized relational database. The approach includes four phases: 1) Extraction of data structure models from COBOL source code, 2) Merging of extracted models, 3) Transformation of the merged model into a domain class diagram and refinement by adding integrity constraints and normal forms, 4) Generation of a relational database schema from the domain class diagram. A CASE tool called CETL is developed to support the extraction and merging phases. The paper argues that this MDRE approach provides benefits over traditional reverse engineering like improved abstraction, inclusion of data integrity rules,
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
This document proposes a recommendation system based on graph database techniques. It uses Neo4j to develop a recommendation approach using content-based filtering, collaborative filtering, and hybrid filtering. The system recommends restaurants and meals to customers based on reviews and friend recommendations. It stores data about restaurants, meals, customers and their reviews in a graph database to allow for complex queries and recommendations. The implementation and results of the proposed recommendation system are also discussed.
This document discusses integrating big data in the banking sector to speed up analytical processes. It proposes developing a Hadoop data warehouse using ETL tools to integrate heterogeneous banking data from various sources into a single format and location. A multi-dimensional data model using star and snowflake schemas would be designed. Data would be extracted from sources, transformed and loaded into a staging area, then into the data warehouse. This would allow generating reports faster for business analysis, forecasting and developing strategies to enhance the bank's objectives.
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
This document analyzes and evaluates the performance of the Riak KV NoSQL database cluster using the Basho-bench benchmark tool. Experiments were conducted on a 5-node Riak KV cluster to test throughput and latency under different workloads, data sizes, and operations (read, write, update). The results found that Riak KV can handle large volumes of data and various workloads effectively with good throughput, though latency increased with larger data sizes. Overall, Riak KV is suitable for distributed big data environments where high availability, scalability and fault tolerance are important.
The document discusses advanced database management systems (ADBMS). It provides background on how databases have become essential in modern society and outlines new applications like multimedia databases, geographic information systems, and data warehouses. The document then covers the history of database applications from early hierarchical and network systems to relational databases and object-oriented databases needed for e-commerce. It also discusses how database capabilities have been extended to support new applications involving scientific data, images, videos, data mining, spatial data, and time series data.
2013 NIST Big Data Subgroups Combined Outputs Bob Marcus
This document provides an outline for combining deliverables from the NIST Big Data Working Group into an integrated document. It includes sections on introducing a definition of Big Data, describing reference architectures at a high and detailed level, mapping requirements to the architectures, discussing future directions and security/privacy challenges, and concluding with general advice. Appendices provide terminology glossaries, use case examples, roles/actors, and questions from a customer perspective.
Data is the most valuable entity in today’s world which has to be managed. The huge data
available is to be processed for knowledge and predictions.This huge data in other words big data is
available from various sources like Facebook, twitter and many more resources. The processing time taken
by the frameworks such as Spark ,MapReduce Hierachial Distributed Matrix(HHDM) is more. Hence
Hybrid Hierarchically Distributed Data Matrix(HHHDM) is proposed. This framework is used to develop
Bigdata applications. In existing system developed programs are by default or automatically roughly
defined, jobs are without any functionality being described to be reusable.It also reduces the ability to
optimize data flow of job sequences and pipelines. To overcome the problems of existing framework we
introduce a HHHDM method for developing the big data processing jobs. The proposed method is a Hybrid
method which has the advantages of Hierarchial Distributed Matrix (HHDM) which is functional,
stronglytyped for writing big data applications which are composable. To improve the performance of
executing HHHDM jobs multiple optimizationsis applied to the HHHDM method. The experimental results
show that the improvement of the processing time is 65-70 percent when compared to the existing
technology that is spark.
A Review on Classification of Data Imbalance using BigDataIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.
Getting relational database from legacy data mdre approachAlexander Decker
This document summarizes a research paper that proposes a model-driven reverse engineering (MDRE) approach to extract data from legacy COBOL systems and transform it into a normalized relational database. The approach includes four phases: 1) Extraction of data structure models from COBOL source code, 2) Merging of extracted models, 3) Transformation of the merged model into a domain class diagram and refinement by adding integrity constraints and normal forms, 4) Generation of a relational database schema from the domain class diagram. A CASE tool called CETL is developed to support the extraction and merging phases. The paper argues that this MDRE approach provides benefits over traditional reverse engineering like improved abstraction, inclusion of data integrity rules,
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
This document proposes a recommendation system based on graph database techniques. It uses Neo4j to develop a recommendation approach using content-based filtering, collaborative filtering, and hybrid filtering. The system recommends restaurants and meals to customers based on reviews and friend recommendations. It stores data about restaurants, meals, customers and their reviews in a graph database to allow for complex queries and recommendations. The implementation and results of the proposed recommendation system are also discussed.
This document discusses integrating big data in the banking sector to speed up analytical processes. It proposes developing a Hadoop data warehouse using ETL tools to integrate heterogeneous banking data from various sources into a single format and location. A multi-dimensional data model using star and snowflake schemas would be designed. Data would be extracted from sources, transformed and loaded into a staging area, then into the data warehouse. This would allow generating reports faster for business analysis, forecasting and developing strategies to enhance the bank's objectives.
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
This document analyzes and evaluates the performance of the Riak KV NoSQL database cluster using the Basho-bench benchmark tool. Experiments were conducted on a 5-node Riak KV cluster to test throughput and latency under different workloads, data sizes, and operations (read, write, update). The results found that Riak KV can handle large volumes of data and various workloads effectively with good throughput, though latency increased with larger data sizes. Overall, Riak KV is suitable for distributed big data environments where high availability, scalability and fault tolerance are important.
This document discusses perspectives on big data applications for database engineers and IT students. It summarizes key concepts of big data and MongoDB, a popular NoSQL database for managing big data. It then demonstrates practical learning activities using MongoDB, such as installation, terminology, and basic syntax. The document concludes by emphasizing the importance of skills in big data and cloud computing for IT professionals and recommends further research on MongoDB security.
A Survey on Graph Database Management Techniques for Huge Unstructured Data IJECEIAES
Data analysis, data management, and big data play a major role in both social and business perspective, in the last decade. Nowadays, the graph database is the hottest and trending research topic. A graph database is preferred to deal with the dynamic and complex relationships in connected data and offer better results. Every data element is represented as a node. For example, in social media site, a person is represented as a node, and its properties name, age, likes, and dislikes, etc and the nodes are connected with the relationships via edges. Use of graph database is expected to be beneficial in business, and social networking sites that generate huge unstructured data as that Big Data requires proper and efficient computational techniques to handle with. This paper reviews the existing graph data computational techniques and the research work, to offer the future research line up in graph database management.
Query Optimization Techniques in Graph Databasesijdms
Graph databases (GDB) have recently been arisen to overcome the limits of traditional databases for
storing and managing data with graph-like structure. Today, they represent a requirementfor many
applications that manage graph-like data,like social networks.Most of the techniques, applied to optimize
queries in graph databases, have been used in traditional databases, distribution systems,… or they are
inspired from graph theory. However, their reuse in graph databases should take care of the main
characteristics of graph databases, such as dynamic structure, highly interconnected data, and ability to
efficiently access data relationships. In this paper, we survey the query optimization techniques in graph
databases. In particular,we focus on the features they have in
The document describes an AI-driven Occupational Skills Generator (AIOSG) that aims to automate the process of creating occupational skills reference documents. The AIOSG utilizes an intelligent web crawler, natural language processing, neural networks, and a blockchain to gather data on occupational skills from various sources, analyze the data, and generate standardized skills reference documents. It is intended to reduce the time and resources required to manually produce these documents while ensuring more comprehensive and up-to-date skills information. The AIOSG system architecture and its use of analytics, artificial intelligence, and blockchain technologies are explained in detail.
An efficient resource sharing technique for multi-tenant databases IJECEIAES
Multi-tenancy is a key component of Software as a Service (SaaS) paradigm. Multi-tenant software has gained a lot of attention in academics, research and business arena. They provide scalability and economic benefits for both cloud service providers and tenants by sharing same resources and infrastructure in isolation of shared databases, network and computing resources with Service level agreement (SLA) compliances. In a multitenant scenario, active tenants compete for resources in order to access the database. If one tenant blocks up the resources, the performance of all the other tenants may be restricted and a fair sharing of the resources may be compromised. The performance of tenants must not be affected by resource-intensive activities and volatile workloads of other tenants. Moreover, the prime goal of providers is to accomplish low cost of operation, satisfying specific schemas/SLAs of each tenant. Consequently, there is a need to design and develop effective and dynamic resource sharing algorithms which can handle above mentioned issues. This work presents a model referred as MultiTenant Dynamic Resource Scheduling Model (MTDRSM) embracing a query classification and worker sorting technique enabling efficient and dynamic resource sharing among tenants. The experiments show significant performance improvement over existing model.
This presentation discusses the following topics:
Concepts
Component Architecture for a DDBMS
Distributed Processing
Parallel DBMS
Advantages of DDBMSs
Disadvantages of DDBMSs
Homogeneous DDBMS
Heterogeneous DDBMS
Open Database Access and Interoperability
Multidatabase System
Functions of a DDBMS
Reference Architecture for DDBMS
Components of a DDBMS
Fragmentation
Transparencies in a DDBMS
Date’s 12 Rules for a DDBMS
Lecture 03 - The Data Warehouse and Design phanleson
The chapter discusses the design process for building a data warehouse, including designing the interface from operational systems and designing the data warehouse itself. It covers topics like beginning with operational data, data and process models, data warehouse data models at different levels, normalization and denormalization, and managing complexity in transformations and integrations between operational and warehouse systems. The goal is to extract, transform, and load relevant data from operational sources into the data warehouse in a way that supports analysis and decision-making.
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexingijdms
Nowadays, the structure of cloud uses the data processing of simple key value, which cannot adapted in
search of closeness effectively because of the lack of structures of effective indexes, and with the increase of
dimension, the structures of indexes similar to the tree existing could lead to the problem " of the curse of
dimension”. In this paper, we define a new cloud computing architecture in such global index, storage
nodes is based on a Distributed Hash Table (DHT). In order to reduce the cost of calculation, store
different services on the hyperbolic tree by using virtual coordinates thus that greedy routing based on
hyperbolic space properties in order to improve query storage and retrieving performance effectively. Next,
we perform and evaluate our cloud computing indexing structure based on a hyperbolic tree using virtual
coordinates taken in the hyperbolic plane. We show through our experimental results that we compare with
others clouds systems to show our solution ensures consistence and scalability for Cloud platform.
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentIJAEMSJORNAL
The processing of massive amounts of data has become indispensable especially with the potential proliferation of big data. The volume of information available nowadays makes it difficult for the user to find relevant information in a vast collection of documents. As a result, the exploitation of vast document collections necessitates the implementation of automated technologies that enable appropriate and effective retrieval. In this paper, we will examine the state of the art of IR in XML documents. We will also discuss some works that have used graphs to represent documents in the context of IR. In the same vein, the relationships between the components of a graph are the center of our attention.
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning is an essential part of building a data warehouse as it improves data quality by detecting and removing errors and inconsistencies. Data warehouses integrate large amounts of data from various sources, so the probability of dirty data is high. Clean data is vital for decision making based on the data warehouse. The data cleaning process involves data analysis, defining transformation rules, verification of cleaning, applying transformations, and incorporating cleaned data. Tools can help support the different phases of data cleaning from data profiling to specialized cleaning of particular domains.
The key steps in developing a data warehouse can be summarized as:
1. Project initiation and requirements analysis
2. Design of the architecture, databases, and applications
3. Construction by selecting tools, developing data feeds, and building reports
4. Deployment including release and training
5. Ongoing maintenance
https://www.learntek.org/blog/types-of-databases/
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses.
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...dbpublications
This document summarizes a research paper about Big File Cloud (BFC), a high-performance distributed big-file cloud storage system based on a key-value store. BFC addresses challenges in designing an efficient storage engine for cloud systems requiring support for big files, lightweight metadata, low latency, parallel I/O, deduplication, distribution, and scalability. It proposes a lightweight metadata design with fixed-size metadata regardless of file size. It also details BFC's architecture, logical data layout using file chunks, metadata and data storage, distribution and replication, and uploading/deduplication algorithms. The results can be used to build scalable distributed data cloud storage supporting files up to terabytes in size.
A Case Study of Innovation of an Information Communication System and Upgrade...gerogepatton
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication
system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is
gained by means of different tools that are able to provide data and information having different formats
and structures into an unique bus system connected to a Big Data. The initial part of the research is
focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the
proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz
Information Miner) Gradient Boosted Trees workflow processing data of the communication system which
travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the
new KB collected into a Cassandra big data system could be processed through the ESB by predictive
algorithms solving possible conflicts between hardware and software. The conflicts are due to the
integration of different database technologies and data structures. In order to check the outputs of the
Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been
tested. The test has been performed on a prototype network system modeling a part of the whole
communication system. The paper shows how to validate industrial research by following a complete
design and development of a whole communication system network improving business intelligence (BI).
IRJET- Comparative Study of ETL and E-LT in Data WarehousingIRJET Journal
This document compares the Extract, Transform, Load (ETL) approach and the Extract, Load, Transform (ELT) approach for loading data into a data warehouse. It discusses how ETL works by extracting data from various sources, transforming it using business rules, and loading it into the data warehouse. ELT instead extracts and loads the raw data first before transforming it. The document reviews past research on both approaches and discusses their advantages and disadvantages. It aims to evaluate the performance differences between ETL and ELT.
A Survey on Heterogeneous Data Exchange using XmlIRJET Journal
This document summarizes a research paper on heterogeneous data exchange using XML. It discusses how XML has become a standard for data transmission due to its flexibility, extensibility and ability to represent heterogeneous data. The document then reviews related work on XML data exchange and mapping between relational and XML models. It also describes the process of exporting data from a source database to XML, importing XML data by validating, transforming and storing it in the target database, and transmitting data between different servers.
DATACENTRE TOTAL COST OF OWNERSHIP (TCO) MODELS: A SURVEYIJCSEA Journal
Datacenter total cost of ownerships (TCO) tools and spreadsheets can be used to estimate the capital and operational costs required for running datacenters. These tools are helpful for business owners to improve and evaluate the costs and the underlying efficiency of such facilities or evaluate the costs of alternatives, such as off-site computing. Well understanding of the cost drivers of TCO models can provide more opportunities to business owners to control costs .In addition, they also introduce an analytical structure in which anecdotal information can be cross-checked for consistency with other well-known parameters driving data center costs. This work focuses on comparing between number of proposed tools and spreadsheets which are publicly available to calculate datacenter total cost of ownership (TCO) ,The comparison is based on many aspects such as what are the parameters included and not included in such tools and whether the tools are documented or not. Such an approach presents a solid ground for designing more and better tools and spreadsheets in the future.
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...ijaia
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is gained by means of different tools that are able to provide data and information having different formats and structures into an unique bus system connected to a Big Data. The initial part of the research is focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz Information Miner) Gradient Boosted Trees workflow processing data of the communication system which travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the new KB collected into a Cassandra big data system could be processed through the ESB by predictive algorithms solving possible conflicts between hardware and software. The conflicts are due to the integration of different database technologies and data structures. In order to check the outputs of the Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been tested. The test has been performed on a prototype network system modeling a part of the whole communication system. The paper shows how to validate industrial research by following a complete design and development of a whole communication system network improving business intelligence (BI).
Big Data Processing with Hadoop : A ReviewIRJET Journal
1. This document provides an overview of big data processing with Hadoop. It defines big data and describes the challenges of volume, velocity, variety and variability.
2. Traditional data processing approaches are inadequate for big data due to its scale. Hadoop provides a distributed file system called HDFS and a MapReduce framework to address this.
3. HDFS uses a master-slave architecture with a NameNode and DataNodes to store and retrieve file blocks. MapReduce allows distributed processing of large datasets across clusters through mapping and reducing functions.
Big data service architecture: a surveyssuser0191d4
This document discusses big data service architecture. It begins with an introduction to big data services and their economic benefits. It then describes the key components of big data service architecture, including data collection and storage, data processing, and applications. For data collection and storage, it covers Extract-Transform-Load tools, distributed file systems, and NoSQL databases. For data processing, it discusses batch, stream, and hybrid processing frameworks like MapReduce, Storm, and Spark. It concludes by noting big data applications in various fields and cloud computing services for big data.
Big data - what, why, where, when and howbobosenthil
The document discusses big data, including what it is, its characteristics, and architectural frameworks for managing it. Big data is defined as data that exceeds the processing capacity of conventional database systems due to its large size, speed of creation, and unstructured nature. The architecture for managing big data is demonstrated through Hadoop technology, which uses a MapReduce framework and open source ecosystem to process data across multiple nodes in parallel.
This document discusses perspectives on big data applications for database engineers and IT students. It summarizes key concepts of big data and MongoDB, a popular NoSQL database for managing big data. It then demonstrates practical learning activities using MongoDB, such as installation, terminology, and basic syntax. The document concludes by emphasizing the importance of skills in big data and cloud computing for IT professionals and recommends further research on MongoDB security.
A Survey on Graph Database Management Techniques for Huge Unstructured Data IJECEIAES
Data analysis, data management, and big data play a major role in both social and business perspective, in the last decade. Nowadays, the graph database is the hottest and trending research topic. A graph database is preferred to deal with the dynamic and complex relationships in connected data and offer better results. Every data element is represented as a node. For example, in social media site, a person is represented as a node, and its properties name, age, likes, and dislikes, etc and the nodes are connected with the relationships via edges. Use of graph database is expected to be beneficial in business, and social networking sites that generate huge unstructured data as that Big Data requires proper and efficient computational techniques to handle with. This paper reviews the existing graph data computational techniques and the research work, to offer the future research line up in graph database management.
Query Optimization Techniques in Graph Databasesijdms
Graph databases (GDB) have recently been arisen to overcome the limits of traditional databases for
storing and managing data with graph-like structure. Today, they represent a requirementfor many
applications that manage graph-like data,like social networks.Most of the techniques, applied to optimize
queries in graph databases, have been used in traditional databases, distribution systems,… or they are
inspired from graph theory. However, their reuse in graph databases should take care of the main
characteristics of graph databases, such as dynamic structure, highly interconnected data, and ability to
efficiently access data relationships. In this paper, we survey the query optimization techniques in graph
databases. In particular,we focus on the features they have in
The document describes an AI-driven Occupational Skills Generator (AIOSG) that aims to automate the process of creating occupational skills reference documents. The AIOSG utilizes an intelligent web crawler, natural language processing, neural networks, and a blockchain to gather data on occupational skills from various sources, analyze the data, and generate standardized skills reference documents. It is intended to reduce the time and resources required to manually produce these documents while ensuring more comprehensive and up-to-date skills information. The AIOSG system architecture and its use of analytics, artificial intelligence, and blockchain technologies are explained in detail.
An efficient resource sharing technique for multi-tenant databases IJECEIAES
Multi-tenancy is a key component of Software as a Service (SaaS) paradigm. Multi-tenant software has gained a lot of attention in academics, research and business arena. They provide scalability and economic benefits for both cloud service providers and tenants by sharing same resources and infrastructure in isolation of shared databases, network and computing resources with Service level agreement (SLA) compliances. In a multitenant scenario, active tenants compete for resources in order to access the database. If one tenant blocks up the resources, the performance of all the other tenants may be restricted and a fair sharing of the resources may be compromised. The performance of tenants must not be affected by resource-intensive activities and volatile workloads of other tenants. Moreover, the prime goal of providers is to accomplish low cost of operation, satisfying specific schemas/SLAs of each tenant. Consequently, there is a need to design and develop effective and dynamic resource sharing algorithms which can handle above mentioned issues. This work presents a model referred as MultiTenant Dynamic Resource Scheduling Model (MTDRSM) embracing a query classification and worker sorting technique enabling efficient and dynamic resource sharing among tenants. The experiments show significant performance improvement over existing model.
This presentation discusses the following topics:
Concepts
Component Architecture for a DDBMS
Distributed Processing
Parallel DBMS
Advantages of DDBMSs
Disadvantages of DDBMSs
Homogeneous DDBMS
Heterogeneous DDBMS
Open Database Access and Interoperability
Multidatabase System
Functions of a DDBMS
Reference Architecture for DDBMS
Components of a DDBMS
Fragmentation
Transparencies in a DDBMS
Date’s 12 Rules for a DDBMS
Lecture 03 - The Data Warehouse and Design phanleson
The chapter discusses the design process for building a data warehouse, including designing the interface from operational systems and designing the data warehouse itself. It covers topics like beginning with operational data, data and process models, data warehouse data models at different levels, normalization and denormalization, and managing complexity in transformations and integrations between operational and warehouse systems. The goal is to extract, transform, and load relevant data from operational sources into the data warehouse in a way that supports analysis and decision-making.
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexingijdms
Nowadays, the structure of cloud uses the data processing of simple key value, which cannot adapted in
search of closeness effectively because of the lack of structures of effective indexes, and with the increase of
dimension, the structures of indexes similar to the tree existing could lead to the problem " of the curse of
dimension”. In this paper, we define a new cloud computing architecture in such global index, storage
nodes is based on a Distributed Hash Table (DHT). In order to reduce the cost of calculation, store
different services on the hyperbolic tree by using virtual coordinates thus that greedy routing based on
hyperbolic space properties in order to improve query storage and retrieving performance effectively. Next,
we perform and evaluate our cloud computing indexing structure based on a hyperbolic tree using virtual
coordinates taken in the hyperbolic plane. We show through our experimental results that we compare with
others clouds systems to show our solution ensures consistence and scalability for Cloud platform.
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentIJAEMSJORNAL
The processing of massive amounts of data has become indispensable especially with the potential proliferation of big data. The volume of information available nowadays makes it difficult for the user to find relevant information in a vast collection of documents. As a result, the exploitation of vast document collections necessitates the implementation of automated technologies that enable appropriate and effective retrieval. In this paper, we will examine the state of the art of IR in XML documents. We will also discuss some works that have used graphs to represent documents in the context of IR. In the same vein, the relationships between the components of a graph are the center of our attention.
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning is an essential part of building a data warehouse as it improves data quality by detecting and removing errors and inconsistencies. Data warehouses integrate large amounts of data from various sources, so the probability of dirty data is high. Clean data is vital for decision making based on the data warehouse. The data cleaning process involves data analysis, defining transformation rules, verification of cleaning, applying transformations, and incorporating cleaned data. Tools can help support the different phases of data cleaning from data profiling to specialized cleaning of particular domains.
The key steps in developing a data warehouse can be summarized as:
1. Project initiation and requirements analysis
2. Design of the architecture, databases, and applications
3. Construction by selecting tools, developing data feeds, and building reports
4. Deployment including release and training
5. Ongoing maintenance
https://www.learntek.org/blog/types-of-databases/
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses.
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...dbpublications
This document summarizes a research paper about Big File Cloud (BFC), a high-performance distributed big-file cloud storage system based on a key-value store. BFC addresses challenges in designing an efficient storage engine for cloud systems requiring support for big files, lightweight metadata, low latency, parallel I/O, deduplication, distribution, and scalability. It proposes a lightweight metadata design with fixed-size metadata regardless of file size. It also details BFC's architecture, logical data layout using file chunks, metadata and data storage, distribution and replication, and uploading/deduplication algorithms. The results can be used to build scalable distributed data cloud storage supporting files up to terabytes in size.
A Case Study of Innovation of an Information Communication System and Upgrade...gerogepatton
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication
system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is
gained by means of different tools that are able to provide data and information having different formats
and structures into an unique bus system connected to a Big Data. The initial part of the research is
focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the
proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz
Information Miner) Gradient Boosted Trees workflow processing data of the communication system which
travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the
new KB collected into a Cassandra big data system could be processed through the ESB by predictive
algorithms solving possible conflicts between hardware and software. The conflicts are due to the
integration of different database technologies and data structures. In order to check the outputs of the
Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been
tested. The test has been performed on a prototype network system modeling a part of the whole
communication system. The paper shows how to validate industrial research by following a complete
design and development of a whole communication system network improving business intelligence (BI).
IRJET- Comparative Study of ETL and E-LT in Data WarehousingIRJET Journal
This document compares the Extract, Transform, Load (ETL) approach and the Extract, Load, Transform (ELT) approach for loading data into a data warehouse. It discusses how ETL works by extracting data from various sources, transforming it using business rules, and loading it into the data warehouse. ELT instead extracts and loads the raw data first before transforming it. The document reviews past research on both approaches and discusses their advantages and disadvantages. It aims to evaluate the performance differences between ETL and ELT.
A Survey on Heterogeneous Data Exchange using XmlIRJET Journal
This document summarizes a research paper on heterogeneous data exchange using XML. It discusses how XML has become a standard for data transmission due to its flexibility, extensibility and ability to represent heterogeneous data. The document then reviews related work on XML data exchange and mapping between relational and XML models. It also describes the process of exporting data from a source database to XML, importing XML data by validating, transforming and storing it in the target database, and transmitting data between different servers.
DATACENTRE TOTAL COST OF OWNERSHIP (TCO) MODELS: A SURVEYIJCSEA Journal
Datacenter total cost of ownerships (TCO) tools and spreadsheets can be used to estimate the capital and operational costs required for running datacenters. These tools are helpful for business owners to improve and evaluate the costs and the underlying efficiency of such facilities or evaluate the costs of alternatives, such as off-site computing. Well understanding of the cost drivers of TCO models can provide more opportunities to business owners to control costs .In addition, they also introduce an analytical structure in which anecdotal information can be cross-checked for consistency with other well-known parameters driving data center costs. This work focuses on comparing between number of proposed tools and spreadsheets which are publicly available to calculate datacenter total cost of ownership (TCO) ,The comparison is based on many aspects such as what are the parameters included and not included in such tools and whether the tools are documented or not. Such an approach presents a solid ground for designing more and better tools and spreadsheets in the future.
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...ijaia
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is gained by means of different tools that are able to provide data and information having different formats and structures into an unique bus system connected to a Big Data. The initial part of the research is focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz Information Miner) Gradient Boosted Trees workflow processing data of the communication system which travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the new KB collected into a Cassandra big data system could be processed through the ESB by predictive algorithms solving possible conflicts between hardware and software. The conflicts are due to the integration of different database technologies and data structures. In order to check the outputs of the Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been tested. The test has been performed on a prototype network system modeling a part of the whole communication system. The paper shows how to validate industrial research by following a complete design and development of a whole communication system network improving business intelligence (BI).
Big Data Processing with Hadoop : A ReviewIRJET Journal
1. This document provides an overview of big data processing with Hadoop. It defines big data and describes the challenges of volume, velocity, variety and variability.
2. Traditional data processing approaches are inadequate for big data due to its scale. Hadoop provides a distributed file system called HDFS and a MapReduce framework to address this.
3. HDFS uses a master-slave architecture with a NameNode and DataNodes to store and retrieve file blocks. MapReduce allows distributed processing of large datasets across clusters through mapping and reducing functions.
Big data service architecture: a surveyssuser0191d4
This document discusses big data service architecture. It begins with an introduction to big data services and their economic benefits. It then describes the key components of big data service architecture, including data collection and storage, data processing, and applications. For data collection and storage, it covers Extract-Transform-Load tools, distributed file systems, and NoSQL databases. For data processing, it discusses batch, stream, and hybrid processing frameworks like MapReduce, Storm, and Spark. It concludes by noting big data applications in various fields and cloud computing services for big data.
Big data - what, why, where, when and howbobosenthil
The document discusses big data, including what it is, its characteristics, and architectural frameworks for managing it. Big data is defined as data that exceeds the processing capacity of conventional database systems due to its large size, speed of creation, and unstructured nature. The architecture for managing big data is demonstrated through Hadoop technology, which uses a MapReduce framework and open source ecosystem to process data across multiple nodes in parallel.
Study on potential capabilities of a nodb systemijitjournal
There is a need of optimal data to query processing technique to handle the increasing database size,
complexity, diversity of use. With the introduction of commercial website, social network, expectations are
that the high scalability, more flexible database will replace the RDBMS. Complex application and Big
Table require highly optimized queries. Users are facing the increasing bottlenecks in their data analysis. A
growing part of the database community recognizes the need for significant and fundamental changes to
database design. A new philosophy for creating database systems called noDB aims at minimizing the datato-
query time, most prominently by removing the need to load data before launching queries. That will
process queries without any data preparation or loading steps. There may not need to store data. User can
pipe raw data from websites, DBs, excel sheets into two promise sample inputs without storing anything.
This study is based on PostgreSQL systems. A series of the baseline experiment are executed to evaluate the
Performance of this system as per -a. Data loading cost, b-Query processing timing, c-Avoidance of
Collision and Deadlock, d-Enabling the Big data storage and e-Optimize query processing etc. The study
found significant possible capabilities of noDB system over the traditional database management system.
This document discusses modern data pipelines and compares Extract-Transform-Load (ETL) and Extract-Load-Transform (ELT) approaches. ETL extracts data from sources, transforms it, and loads it into a data warehouse. ELT extracts and loads raw data directly into a data warehouse or data lake and then transforms it. The document argues that ELT is better suited for modern data as it handles both structured and unstructured data, supports cloud-based data warehouses/lakes, and is more efficient and cost-effective than ETL. Key advantages of ELT over ETL include lower costs, faster data loading, and better support for large, heterogeneous datasets.
This document provides an overview of big data storage technologies and their role in the big data value chain. It identifies key insights about data storage, including that scalable storage technologies have enabled virtually unbounded data storage and advanced analytics across sectors. However, lack of standards and challenges in distributing graph-based data limit interoperability and scalability. The document also notes the social and economic impacts of big data storage in enabling a data-driven society and transforming sectors like health and media through consolidated data analysis.
The aim of this paper is to evaluate, through indexing techniques, the performance of Neo4j and
OrientDB, both graph databases technologies and to come up with strength and weaknesses os each
technology as a candidate for a storage mechanism of a graph structure. An index is a data structure that
makes the searching faster for a specific node in concern of graph databases. The referred data structure
is habitually a B-tree, however, can be a hash table or some other logic structure as well. The pivotal
point of having an index is to speed up search queries, primarily by reducing the number of nodes in a
graph or table to be examined. Graphs and graph databases are more commonly associated with social
networking or “graph search” style recommendations. Thus, these technologies remarkably are a core
technology platform for some Internet giants like Hi5, Facebook, Google, Badoo, Twitter and LinkedIn.
The key to understanding graph database systems, in the social networking context, is they give equal
prominence to storing both the data (users, favorites) and the relationships between them (who liked
what, who ‘follows’ whom, which post was liked the most, what is the shortest path to ‘reach’ who). By a
suitable application case study, in case a Twitter social networking of almost 5,000 nodes imported in
local servers (Neo4j and Orient-DB), one queried to retrieval the node with the searched data, first
without index (full scan), and second with index, aiming at comparing the response time (statement query
time) of the aforementioned graph databases and find out which of them has a better performance (the
speed of data or information retrieval) and in which case. Thereof, the main results are presented in the
section 6.
Bridging the gap between the semantic web and big data: answering SPARQL que...IJECEIAES
Nowadays, the database field has gotten much more diverse, and as a result, a variety of non-relational (NoSQL) databases have been created, including JSON-document databases and key-value stores, as well as extensible markup language (XML) and graph databases. Due to the emergence of a new generation of data services, some of the problems associated with big data have been resolved. In addition, in the haste to address the challenges of big data, NoSQL abandoned several core databases features that make them extremely efficient and functional, for instance the global view, which enables users to access data regardless of how it is logically structured or physically stored in its sources. In this article, we propose a method that allows us to query non-relational databases based on the ontology-based access data (OBDA) framework by delegating SPARQL protocol and resource description framework (RDF) query language (SPARQL) queries from ontology to the NoSQL database. We applied the method on a popular database called Couchbase and we discussed the result obtained.
This paper presents FACADE, a compiler framework that can generate highly efficient data manipulation code for big data applications written in managed languages like Java. FACADE transforms applications by separating data storage from data manipulation. Data is stored in native memory rather than Java heap objects, bounding the number of heap objects. This significantly reduces memory management overhead and improves scalability. The compiler locally transforms methods to insert data conversion functions. Experiments show the generated code runs faster, uses less memory, and scales to larger datasets than the original code for several real-world big data applications and frameworks.
Big Data Storage System Based on a Distributed Hash Tables Systemijdms
The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
final Year Projects, Final Year Projects in Chennai, Software Projects, Embedded Projects, Microcontrollers Projects, DSP Projects, VLSI Projects, Matlab Projects, Java Projects, .NET Projects, IEEE Projects, IEEE 2009 Projects, IEEE 2009 Projects, Software, IEEE 2009 Projects, Embedded, Software IEEE 2009 Projects, Embedded IEEE 2009 Projects, Final Year Project Titles, Final Year Project Reports, Final Year Project Review, Robotics Projects, Mechanical Projects, Electrical Projects, Power Electronics Projects, Power System Projects, Model Projects, Java Projects, J2EE Projects, Engineering Projects, Student Projects, Engineering College Projects, MCA Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, Wireless Networks Projects, Network Security Projects, Networking Projects, final year projects, ieee projects, student projects, college projects, ieee projects in chennai, java projects, software ieee projects, embedded ieee projects, "ieee2009projects", "final year projects", "ieee projects", "Engineering Projects", "Final Year Projects in Chennai", "Final year Projects at Chennai", Java Projects, ASP.NET Projects, VB.NET Projects, C# Projects, Visual C++ Projects, Matlab Projects, NS2 Projects, C Projects, Microcontroller Projects, ATMEL Projects, PIC Projects, ARM Projects, DSP Projects, VLSI Projects, FPGA Projects, CPLD Projects, Power Electronics Projects, Electrical Projects, Robotics Projects, Solor Projects, MEMS Projects, J2EE Projects, J2ME Projects, AJAX Projects, Structs Projects, EJB Projects, Real Time Projects, Live Projects, Student Projects, Engineering Projects, MCA Projects, MBA Projects, College Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, M.Sc Projects, Final Year Java Projects, Final Year ASP.NET Projects, Final Year VB.NET Projects, Final Year C# Projects, Final Year Visual C++ Projects, Final Year Matlab Projects, Final Year NS2 Projects, Final Year C Projects, Final Year Microcontroller Projects, Final Year ATMEL Projects, Final Year PIC Projects, Final Year ARM Projects, Final Year DSP Projects, Final Year VLSI Projects, Final Year FPGA Projects, Final Year CPLD Projects, Final Year Power Electronics Projects, Final Year Electrical Projects, Final Year Robotics Projects, Final Year Solor Projects, Final Year MEMS Projects, Final Year J2EE Projects, Final Year J2ME Projects, Final Year AJAX Projects, Final Year Structs Projects, Final Year EJB Projects, Final Year Real Time Projects, Final Year Live Projects, Final Year Student Projects, Final Year Engineering Projects, Final Year MCA Projects, Final Year MBA Projects, Final Year College Projects, Final Year BE Projects, Final Year BTech Projects, Final Year ME Projects, Final Year MTech Projects, Final Year M.Sc Projects, IEEE Java Projects, ASP.NET Projects, VB.NET Projects, C# Projects, Visual C++ Projects, Matlab Projects, NS2 Projects, C Projects, Microcontroller Projects, ATMEL Projects, PIC Projects, ARM Projects, DSP Projects, VLSI Projects, FPGA Projects, CPLD Projects, Power Electronics Projects, Electrical Projects, Robotics Projects, Solor Projects, MEMS Projects, J2EE Projects, J2ME Projects, AJAX Projects, Structs Projects, EJB Projects, Real Time Projects, Live Projects, Student Projects, Engineering Projects, MCA Projects, MBA Projects, College Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, M.Sc Projects, IEEE 2009 Java Projects, IEEE 2009 ASP.NET Projects, IEEE 2009 VB.NET Projects, IEEE 2009 C# Projects, IEEE 2009 Visual C++ Projects, IEEE 2009 Matlab Projects, IEEE 2009 NS2 Projects, IEEE 2009 C Projects, IEEE 2009 Microcontroller Projects, IEEE 2009 ATMEL Projects, IEEE 2009 PIC Projects, IEEE 2009 ARM Projects, IEEE 2009 DSP Projects, IEEE 2009 VLSI Projects, IEEE 2009 FPGA Projects, IEEE 2009 CPLD Projects, IEEE 2009 Power Electronics Projects, IEEE 2009 Electrical Projects, IEEE 2009 Robotics Projects, IEEE 2009 Solor Projects, IEEE 2009 MEMS Projects, IEEE 2009 J2EE P
This document provides a review of Hadoop storage and clustering algorithms. It begins with an introduction to big data and the challenges of storing and processing large, diverse datasets. It then discusses related technologies like cloud computing and Hadoop, including the Hadoop Distributed File System (HDFS) and MapReduce processing model. The document analyzes and compares various clustering techniques like K-means, fuzzy C-means, hierarchical clustering, and Self-Organizing Maps based on parameters such as number of clusters, size of clusters, dataset type, and noise.
AtomicDB is a proprietary software technology that uses an n-dimensional associative memory system instead of a traditional table-based database. This allows information to be stored and related in a way analogous to human memory. The technology does not require extensive programming and can rapidly build and modify information systems to meet evolving needs. It provides significant cost and performance advantages over traditional databases for managing complex, relational data.
Key aspects of big data storage and its architectureRahul Chaturvedi
This paper helps understand the tools and technologies related to a classic BigData setting. Someone who reads this paper, especially Enterprise Architects, will find it helpful in choosing several BigData database technologies in a Hadoop architecture.
Re-Engineering Databases using Meta-Programming TechnologyGihan Wikramanayake
G N Wikramanayake (1997) "Re-engineering Databases using Meta-Programming Technology" In:16th National Information Technology Conference on Information Technology for Better Quality of Life Edited by:R. Ganepola et al. pp. 1-14. Computer Society of Sri Lanka, Colombo: CSSL Jul 11-13, ISBN 955-9155-05-9
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories
or collection of data to provide more accurate predictions and analysis. Classification using supervised
learning method aims to identify the category of the class to which a new data will fall under. With the
advancement of technology and increase in the generation of real-time data from various sources like
Internet, IoT and Social media it needs more processing and challenging. One such challenge in
processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes
causing the machine learning classifiers to be more biased towards majority classes and also most
classification algorithm predicts all the test data with majority classes. In this paper, the author analysis
the data imbalance models using big data and classification algorithm
Here are the key points about the application and utility of database management systems based on the article:
- Database management systems allow for efficient storage, organization and retrieval of large amounts of data. They help businesses and organizations manage their data in a centralized and structured manner.
- Teaching accounting information systems (AIS) courses effectively requires hands-on experience with database software like Microsoft Access. Simply lecturing from textbooks is not sufficient in today's environment.
- Incorporating database software into the AIS curriculum gives students practical experience building and working with databases. This helps demonstrate real-world applications of concepts like database design, queries, forms and reports.
- Hands-on learning with databases helps reinforce topics covered in A
Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated
especially to support business intelligence processes. However, keeping these repositories updated usually
involves complex and time-consuming processes, commonly denominated as Extract-Transform-Load tasks.
These data intensive tasks normally execute in a limited time window and their computational requirements
tend to grow in time as more data is dealt with. Therefore, we believe that a grid environment could suit
rather well as support for the backbone of the technical infrastructure with the clear financial advantage of
using already acquired desktop computers normally present in the organization. This article proposes a
different approach to deal with the distribution of ETL processes in a grid environment, taking into account
not only the processing performance of its nodes but also the existing bandwidth to estimate the grid
availability in a near future and therefore optimize workflow distribution.
Similar to Growth of relational model: Interdependence and complementary to big data (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Recycled Concrete Aggregate in Construction Part III
Growth of relational model: Interdependence and complementary to big data
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 11, No. 2, April 2021, pp. 1780~1795
ISSN: 2088-8708, DOI: 10.11591/ijece.v11i2.pp1780-1795 1780
Journal homepage: http://ijece.iaescore.com
Growth of relational model: Interdependence and
complementary to big data
Sucharitha Shetty1
, B. Dinesh Rao2
, Srikanth Prabhu3
1,3
Department of Computer Science and Engineering, Manipal Institute of Technology,
Manipal Academy of Higher Education, India
2
Manipal School of Information Science, Manipal Academy of Higher Education, India
Article Info ABSTRACT
Article history:
Received Mar 20, 2020
Revised Jul 27, 2020
Accepted Nov 6, 2020
A database management system is a constant application of science that
provides a platform for the creation, movement, and use of voluminous data.
The area has witnessed a series of developments and technological
advancements from its conventional structured database to the recent
buzzword, bigdata. This paper aims to provide a complete model of a
relational database that is still being widely used because of its well known
ACID properties namely, atomicity, consistency, integrity and durability.
Specifically, the objective of this paper is to highlight the adoption of
relational model approaches by bigdata techniques. Towards addressing the
reason for this in corporation, this paper qualitatively studied the
advancements done over a while on the relational data model. First, the
variations in the data storage layout are illustrated based on the needs of the
application. Second, quick data retrieval techniques like indexing, query
processing and concurrency control methods are revealed. The paper
provides vital insights to appraise the efficiency of the structured database in
the unstructured environment, particularly when both consistency and
scalability become an issue in the working of the hybrid transactional and
analytical database management system.
Keywords:
Big data
Hybrid transaction/analytical
processing
Online analytical processing
Online transactional processing
This is an open access article under the CC BY-SA license.
Corresponding Author:
Sucharitha Shetty
Department of Computer Science and Engineering
Manipal Institute of Technology
Manipal Academy of Higher Education, Manipal, India
Email: sucha.shetty@manipal.edu
1. INTRODUCTION
Internet computing, smartphones have made every layman, academician, industrialist, developer a
part of the database management system. In light of this awareness, there is a pressing demand to provide the
best of knowledge in the least time to these customers. Many different types of databases are in use today.
However, for many years, the relational database in various splendiferous models captured in the form of
row-store, column-store and hybrid store has been around. The traditional model of a database of single
machines can no longer meet the growing demands. The information must be put away on disk or in a remote
server as the measure of information increases. Therefore, the queries used to fill the visualization may take
minutes, hours or longer to return, bringing about long holding up times between the co-operations of every
user and diminishing the capacity of the user to investigate the information rapidly [1].
Today, the trend is moving away from a “one-size-fits-all” structure [2]. During the early 1990s,
companies wanted to collect data from multiple operating databases into a business intelligence data ware-
2. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1781
house. This information had to be processed to deliver the best results in the shortest response time resulting
in the distinction between online analytical processing (OLAP) and online transactional processing (OLTP).
OLAP systems provided answers to multi-dimensional queries that contribute to decision-making, planning and
problem-solving. On the other hand, OLTP mainly concentrates on the CRUD operations (create, read, update
and delete). Different column-oriented OLAP frameworks, for example, BLU [3], Vertica [4], Vectorvise, as
well as many in-memory OLTP frameworks, including VoltDB [5], Hekaton [6] and many others, have seen a
rise in the database sector. Hardware developments have been the biggest push for new database designs. Over
the years, disk space has undergone an increase in Moore’s law [7]. The second generation of OLAP and OLTP
systems makes better use of multi-core and large memories.
All of these innovations have contributed to the rise of a new database paradigm known as big data
that does not provide all of the relational model’s operations but do scale to big data sizes. NoSQL or key-
value stores, such as Voldemort [8], Cassandra [9], RocksDB [10], offer fast write throughput and lookups on
the primary key [11], and very high scale-out, but lack in their query capabilities, and provide loose
transactional guarantees. SQL-on-Hadoop solutions [12] (e.g., Hive, Impala, Big SQL [13], and Spark SQL
[14]) use OLAP model templates and column formats to run OLAP queries over large static data volumes.
These frameworks are not arranged as real-time operational databases albeit appropriate for batch processing.
The amalgamation of simultaneous transactional and analytical workloads is a subject that is now
looked upon by database vendors and industry analysts. It is known as hybrid transactional and analytical
processing (HTAP). This system must provide answers to simple transactional queries to complex analytical
queries. Applications of HTAP include banks and trading firms which have transactions that raise the volume
of data almost every second, and they also need to evaluate their fund at the same time. In industries related
to telecom services, energy-related companies need to analyze their past and present states of their networks.
HTAP allows decision-makers to become more conscious of the past and present state of the company and
where it is going. It is also the fact that not all companies could benefit from this system as one of the factors
could be a delay in the process itself rather than software. HTAP tries to address the few major drawbacks of
traditional approaches [15]:
a. Architectural and technical complexity: Earlier data had to go through the process of extraction,
transformation, and loading (ETL) phase i.e. the data had to be extracted from the operational database,
then transformed to be loaded into the analytical database.
b. Analytical latency: Prior it would take a long time to turn live data into analytical data. This could be
appropriate for certain applications like month-end combinations. But this is not feasible for real-time
analyses.
c. Synchronization: If both analytical and transactional databases are segregated and the researcher attempts
to drill down to the origin at some point during the analysis, then there is a risk of “out-of-synch” error
messages
d. Data duplication: For keeping up predictable information all through, the conventional methodology had
various duplicates of similar information that had to be controlled and overseen.
e. Reader-writer problem: Conventional DBMS had the trouble executing both OLAP and OLTP together as
readers would block writers and writers would block readers [16].
As a result, the demand for mixed workloads resulted in several hybrid data architectures, as
demonstrated in the “lambda” architecture, where multiple solutions were grouped, but this increased
complexity, time-consumption, and cost. The lambda architecture is divided into three layers: a) the batch
layer for han- dling and pre-processing append-only collection of raw data; b) the serving layer which
indexes batch views so that we can accommodate low latency ad-hoc queries; and c) the speed layer which
manages all low latency requirements using quick and incremental algorithms over recent data only [17, 18].
The study covers the below objectives:
a. The paper illustrates development and innovation done towards the refinement of the relational
model in terms of data storage layout and efficient data retrieval methods.
b. It acknowledges the complexities involved in the simultaneous usage of both transactional and analytical
processing in a relational model with the introduction of the current approach known as hybrid
transactional and analytical processing.
c. The paper incorporates an understanding of big data models adopting a structured database.
2. DATABASE
The database is the fundamental heart for the working of huge information banks which are ensured
by not uncovering how the information is sorted out in the framework. While structuring a database, the
fundamental three conditions should be taken care of: Ordering dependency (whether the data should be stored
in an ordered way or not, keeping into account insertion and deletion as a bottleneck), indexing dependency
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1780 - 1795
1782
(the selection of attributes for indexes should be such that they can be built and removed from time to time)
and access path dependency (the path to be chosen to fetch the data either sequentially or byindices) [19].
Among the various data models for the database management system proposed over a while, this paper
focuses on the relational model which is the most prominent. Early referencing strategies in programming
languages didn’t distinguish the logical structure of the information from its physical structure. However, the
relational model which was introduced in the mid-1970s gave a clear division [20]: information is organized
as tables and it is up to the database management framework to choose how these tables are taken care of in
the memory [21].
Later, as the size of data increased with the advent of the internet, the need for fast retrieval of data
became the need of the hour. This lead to the division of a single relational model to separate transactional
and analytical data models maintaining the structured database. On an everyday premise, the mix of addition,
deletion, update and basic queries on data is named as online transactional processing (OLTP). However, a
typical issue with the database is it appears to be very broad, and the techniques of query processing frequently
search the whole set repeatedly. So one of the preliminary approaches used was sampling. In contrast to
customary information, be that as it may, sampling information is inalienably questionable, for example, it
doesn’t speak to the whole populace information. It is along these lines beneficial to return the results of the
query as well as the confidence intervals that show the outcomes’ accuracy [22]. Besides, there may be no or
too few samples in a certain section in a multidimensional space [23]. It needs some further research to
produce trustworthy findings. The conventional astuteness says that since (1) numerous samples must be
taken to accomplish sensible accuracy, and (2) sampling algorithms must do an I/O disk per tuple inspected,
while query assessment algorithms can amortize the expense of an I/O disk on a page containing all tuples,
and (3) in sampling, the overhead of starting an activity is n times when n is the number of samples taken;
however if the query is analyzed, the overhead is just once brought about, the cost of evaluating the size of the
query by sampling is too high to ever be effective [24]. This eventually leads to online analytical processing
(OLAP) becoming attractive. Further, as velocity, assortment and volume of information have expanded the
requirement for a mix of both transactional and explanatory databases is unavoidable for constant real-time
decision making [25]. This platform which empowers to deal with an assortment of information and executes
the two exchanges and complex inquiries with low latencies are termed as hybrid transactional and analytical
processing (HTAP). Figure 1 represents a block diagram that summarizes the key points covered in the paper.
Figure 1. Block diagram of HTAP
3. DATASTORAGE MODELS
The structure of the database can be seen as the process of representation, index creation, processing,
and maintenance. Nelson et al. [26] proposed the evolutionary list file [ELF] structure which was adopted in
the design of the relational model. ELF structure emphasis the key elements of file structure as entities, lists,
4. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1783
and links. The earliest implementation of the relational approach can be seen as a research project launched in
the summer of 1968 at the Massachusetts Institute of Technology known as Mac advanced internet
management system (MacAIMS) [27], incorporating reference numbers now known as primary keys, user
access rights, and B-tree concept. Thereafter, several relational data models were proposed of which a few
are mentioned below.
3.1. N-ary storage model (NSM)
This model is also well-known as a row-store where N stands for the number of columns in the
table. Most of this DBMS followed the volcano-style processing model in which one tuple (or block) of
information is processed at a time. Row stores claim to stake for workloads that query an increased number
of attributes from wide tables. These workloads include business (e.g., network performance and management
applications) as well as in scientific fields (e.g., neuroscience, chemical, and biological applications). Several
important datasets, related to the medical field are made of more than 7000 attributes. One of the solutions
being, constantly increasing support for large tables, for instance, SQL Server today permits 30K columns per
row but at the maximum, only 4096 columns are allowed in the SELECT query. This storage layout shows
slower performance for OLAP queries where there is a need for a limited number of attributes, as row-stores
generate disk and memory bandwidth overhead [28, 29].
3.2. Decomposition storage model (DSM)
DSM [30] partitions an n-property relationship vertically into n sub-relations, which are accessed
based on the interest for the corresponding values of the attribute. In each sub-relation of the component, a
surrogate record-id field is required with the goal that records can be arranged together. Sybase-IQ
incorporates vertical partitioning for data warehouse applications within conjunction with bitmap indices
[31]. DSM needs consid- erably less I/O than NSM for table scans with just a few attributes. Queries
involving multiple attributes from a relationship need to spend extra time bringing the participating sub-
relations together; this extra time can be important. There is also space wastage for the extra record-id field in
each of the sub-tables.
3.3. Partition attributes across (PAX)
PAX [32] addresses the issue of low cache utilization in NSM. Unlike NSM, all values of a specific
attribute are grouped by PAX on a mini page within each page. PAX makes full use of the cache resources
during sequential access to data, as only the values of the required attribute are loaded into the cache. PAX
does not optimize the I/O between disk and memory relative to DSM. Unlike NSM, PAX loads all the pages
that belong to the relationship for scans into the memory, regardless of whether the request needs all or only a
few of the attributes.
3.4. Multi-resolution block storage model (MBSM)
MBSM [33] is responsible for handling both I/O performance and main memory cache use. It shares
the positive cache behavior of PAX where attributes are stored in physical contiguous segments. Disk blocks
are grouped in super-blocks with a single-record stored in a partitioned manner among the blocks and then
these blocks in-turn are organized in fixed-size mega-blocks on the disk. MBSM only loads pages from the
disk with the values of related attributes. Better performance while inserting/updating data and does not need
a join to rebuild records. They work faster for sequential scanning and slower for random. They are optimized
for un-compressed fixed-width data.
3.4. Column-stores
Column-store systems [34] partition a database vertically into a collection of separately stored
individual columns. Here, instead of the primary keys, virtual ids are used as tuple identifiers, as ids reduce
the width of the information stored on the disk. The offset location is used as the virtual identifier. It is useful
for OLAP operations. It offers better cache usage bypassing cache-line sized blocks of tuples among operators
and working on different values one after another, instead of utilizing an ordinary tuple-at-a-time iterator. It
also provides late materialization, compression and decompression techniques, Database cracking and
Adaptive indexing features. Druid [35] is an open-source real-time analytical data store which is column
oriented in structure. It supports two subsystems one in the historical node for read-optimization and others in
the real-time nodes for write-optimization. However, the real-time system is used to ingest high data but does
not support data updates. The drawbacks include frequent multiple inserts and updates are slow.
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1780 - 1795
1784
3.5. Hybrid stores
If one knows a priori the workload for a given application, then it is possible to use a sophisticated
hybrid system that can be precisely calibrated for the given workload. Through the analysis of query workloads,
a cache-efficient attribute layout can be designed. This is the concept followed in data morphing [36]. This
layout can then be used as a template for storing data in the cache. As a consequence, fewer cache misses
occur during query processing when compared to NSM and DSM under all types of workloads. The drawback
of this technique is that, if the query workload changes dynamically reorganizing data according to new data
layout can take time and make performance decrease. Along these lines, below are mentioned few such hybrid
systems [37].
a. In clothe storage model, separate data layouts are designed for in-memory and non-volatile storage.
Clotho creates in-memory pages individually tailored for compound and dynamically changing work-
loads, and enables efficient use of different storage technologies (e.g., disk arrays or MEMS-based storage
devices) [38]. The layout on the disk follows the PAX model and the in-memory pages contain attributes
according to queries’ demands.
b. Fractured mirrors retain the best of the two data models namely NSM and DSM. Fractured mirrors create
data layouts of both NSM and DSM in the two disks but they are not identically mirrored [39]. If the
NSM table is divided into two equal segments and similarly DSM is divided into two equal segments.
Then, one combination will be on one disk and so on the other. Since, if there is a query skew then there
will be maximum disk utilization. Also, due to fractured mirrors, better query plans can be formulated to
minimize the L1 and L2 cache misses. The drawback of Fractured Mirrors design is the duplication of
space
c. H2O [40] presents two new concepts. One is data access patterns and multiple storage layout for a single-
engine. The system decides an adaptive hybrid system during query processing that which partitioning
model is best for that query class and its corresponding data pieces. As the load changes based on the
query, the storage, and the access patterns will continuously change to it accordingly. Vertical partitioning is
done based on creating groups of columns that are frequently accessed together.
d. Dynamic vertical partitioning [41] uses a set of rules for the development of an active system called
dynamic vertical partitioning-DYVEP which dynamically partitions distributed database in the vertical
order. In the case of static vertical partitioning, some attributes may not be used by queries hence wastage of
space. So, in this approach DYVEP monitors queries and collects and evaluates relevant statistics for the
vertical partitioning process to decide if a new partitioning is necessary, and if so, it fires the Vertical
Partitioning algorithm. If the vertical partitioning scheme is better than the already existing one, then the
system reorganizes the scheme. The architecture of DYVEP has three modules: Statistic collector,
partitioning processor, and partitioning reorganizer. Acceptable query response time was observed when
experimented on a benchmark database TPC-H.
Narrow partitions perform better for columns accessed as a major aspect of analytical queries (for
example through successive scans). Conversely, wider partitions perform better for columns accessed as a
major aspect of OLTP-style query, because such exchanges frequently insert, delete, update, or access huge
numbers of a row’s fields. It was seen that performance improvement of 20 percent to 400 percent over pure all
column or all row structures is both increasingly adaptable and delivers better plans for main memory
frameworks than past vertical partitioning approaches. Database device vendors have different storage
engines to serve workloads with different characteristics effectively under the same software suite. For
example, MySQL supports multiple storage engines (e.g. InnoDB, MyISAM), but it is difficult to
communicate on the storage layer between different data formats. More specifically, a different execution
engine is required for each storage engine.
4. RELATED WORKS ON DATA RETRIEVAL
This section is divided into third subsections. The first section presents various strategies adopted
for data indexing. The second subsection describes the common query processing techniques used to speed
up the data retrieval process. The third subsection describes the concurrency control methods used to prevent
transaction mismatches.
4.1. Indexing
Relational model indexing [42] is a blessing for the database management system for fast and quick
retrieval of data through bypassing the traversal of every row. An index may include a column or more, and
sometimes the size of an index may grow larger than the table it is being created for, but eventually, the index
will provide a rapid lookup and fast access to the data. This compensates for the overhead of having indexes.
Table 1 provides a glimpse of various indexing strategies followed so far.
6. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1785
Table 1. Various indexing methods
Indexing Strategy Ref Properties Challenges
B-Tree [43] to manage situations where records
were expected to surpass the size of
the main storage and disk operations
were required
Simultaneous updates
Log Structured Merge-tree [44] insertion efficiency their query performance is comparatively worse
thanaB- Tree
Bloom Filter [45] negligible space Storage savings are at the cost of false positives.
However, this drawback does not impact
information processing if the likelihood of an
error is small enough
R-tree [46] manage this multidimensional array
queries effectively
Index consumes more memory space
Minmax indexes or block
scope indexes
[47] quick scanning of extremely large
tables
the need to update each page range’s stored
min/max values as tuples are inserted into them
Inverted index [48] Recommended for fast keyword and
full-text searches. Here, the query is
resolved by looking at the word ID
in the inverted index
large storage and mainte nance cost for insert,
update and delete operation
Function-Based Indexes [49] built on expressions-works on lin
guistic sorts, case insensitive sorts
can be used only on cost-based optimization
Application domain in- dexes [49] applies to specific application
domain-used mainly in applica tions
related to spatial data, video clips
application-specific software related to
indexing has to be incorporated
4.2. Query processing
A query is a language expression that identifies information from a database to be retrieved. To address
a query, a database optimization algorithm must choose from several known access paths [50]. The point of
query optimization is to discover a query assessment technique that limits the most significant performance
metric, which can be the CPU, I/O, and network time, the time-space result of locked database objects and
their lock span storage costs, complete resource use, even energy utilization (for example for battery-controlled
laptops), a mix of the above-mentioned, or some other performance measures [51]. Table 2 presents several
related studies dealing with query optimizations. The internal specifics of implementing these access paths and
deriving the associated cost functions are beyond the scope of this paper.
Table 2. Studies related to query optimization
Ref. Objectives Method
[52] Proposed a method of dynam ically
adjusting database per formance in a
computer system based on the
request made by the database query.
Access plans are generated for incoming process query and for each of the plans the
desired memory is calculated and the estimated cost for the same is calculated. A
comparison is done with the estimated cost for the current memory.
[53] Addresses the problem of cleaning
the stale data from Materialized
Views
This clean sample is used to estimate aggregate query results. The approach is a
maintenance view strategy that is employed periodically for a given materialized view.
During this periodic interval, the materialized view is marked stale and an estimated
up-to-date view is developed.
[54] Proposes a caching method whereby
the compiler is optimized to perform
caching.
Every cache is assigned a Web variable. If a query is run with no WHERE clause, then
no free Web variable is assigned. However, if a query has WHERE clause, then that
condition field becomes the Web variable. Based on the insert and update on the web
variable, accordingly, the cache is refilled
[55] Design a web cache with im proved
data freshness
The main memory database is used to cache result sets of previous queries for
subsequent reusing. Invalidation of the cache works by replacing or deleting obsolete
data directly in a caching system. For single table queries, the parser extracts the
selection predicates from the where clause and maintains a list of predicates and its
corresponding rowsets in the cache.
[56] Extends traditional lazy evaluation In lazy evaluation, by contrast, the evaluation of a statement does not cause the
statement to execute; instead, the evaluation produces a single batch of all the queries
involved in building the model until the data from any of the queries in the batch is
needed either because the model needs to be displayed, or because the data is needed to
construct a new query.
[57] Analyses of multiple queries and
suggests frequent queries and query
components.
Mining query plans for finding candidate queries and sub-queries for materialized
views will, in turn, optimize all the queries of these components. These frequent
components may represent frequent sub-queries. Here, importance is given to the
analysis of query execution plan than to the query text since most of the time query is
the same but the representation of it in text form is different.
[58] Dynamically adjusts the materialized
view set using Clustering
With the help of the similarity threshold, a cluster of SQL queries is found and any new
query whose similarity is below the threshold, then a new cluster is formed.
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1780 - 1795
1786
4.3. Concurrency control
OLTP DBMS supports the part that communicates with end-users. End users send requests for some
purpose (e.g. buy or sell an item) to the application. These requests are processed by the application and then
the transactions are executed in the database to read or write to the database. Concurrency control has a long
and rich history that goes back to the start of database systems. Two-phase locking and time-stamp ordering are
the two classes of algorithms used in concurrency control. The first method of ensuring the proper execution
of simultaneous DBMS transactions was the two-phase locking (2PL). Under this scheme, transactions must
obtain locks in the database for a particular element before a read or write operation on that element can be
performed. 2PL is considered a negative approach because it believes that transactions overlap may occur and
therefore need to be locked. However, if this lock is uncontrolled, it may lead to deadlock. Another approach
is the Timestamp ordering (T/O) concurrency control scheme where an increase in timestamp is produced for
serialization of transactions. These timestamps are used during conflicting operations such as reading and
writing operations on a similar component, or two different write operations on a similarelement [16].
Multi-core systems are now omnipresent [59], conventional DBMSs, however, still scale poorly
beyond a few cores. The main weakness of OLTP databases is their concurrency control algorithm. Previous
work showed that concurrency control algorithms suffer from bottlenecks of both fundamental and artificial
scalability [60, 61]. Threads spend a significant amount of time waiting for a global lock to be obtained by a
large number of cores. This is a fundamental bottleneck inherent in the way these algorithms work. The two
major scalability bottlenecks: (1) conflict detection and (2) allocation of timestamps. The conflict detection
bottleneck is detecting and resolving deadlocks is a burdensome operation that consumes most of the
transaction’s processing time as the number of cores increases, even for relatively low contention levels.
Although recent work improves some artificial weaknesses, fundamental bottlenecks remain.
In conventional databases, the function of concurrency control mechanisms is to decide the
interleaving order of operations between simultaneous transactions over shared data. But there is no
fundamental reason to rely on concurrency control logic during the actual execution, nor is it necessary to
force the same thread to be responsible for the execution of both the transaction and concurrency control
logic. This significant in- sight was found in subsequent studies [62] which could lead to a complete paradigm
shift in how we think about transactions [63, 64]. It is important to note that the two tasks of placing the order
for accessing shared data and implementing the logic of the transaction are completely independent. Such
functions can therefore tech- nically be carried out through separate threads in different phases of execution.
For instance, Ren et al. [65] propose ORTHRUS which is based on pessimistic concurrency control, where
transaction executing threads delegate locking usefulness to devoted lock manager threads. ORTHRUS
depends on explicit message-passing to convey among threads, which can acquaint pointless overhead with
transaction execution regardless of the accessible shared memory model of a single machine.
Multi-version concurrency control (MVCC) is a commonly used method for concurrency control, as
it enables modes of execution where readers never block writers. MVCC methods have a long history as
well. Rather than refreshing data objects, each update makes another version of that data object, with the end
goal that simultaneous readers can, in any case, observe the old version while the update transaction continues
simultaneously. Therefore, read-only transactions never need to pause and don’t need to utilize locking. This is
an amazingly desiring property and the motivation behind why numerous DBMSs actualize MVCC [66]. The
optimistic approach to concurrency control came from Kung and Robinson, but only one-version databases
were considered [67]. The optimistic techniques [68] are used in applications where transaction conflicts are
very rare. In this approach, the transactions are allowed to perform as desired until a write phase is encountered.
Thereby, increases the degree of concurrency.
Snapshot isolation (SI) [69] is a multi-versioning scheme used by many database systems. To isolate
read-only transactions from updates, many commercial database systems support snapshot isolation whichnot
serializable (Oracle [70], SQL Server [71] and others). However, many articles have addressed the conditions
under which SI can be serialized or how it can be serialized. In 2008, Cahill et al released a detailed and
realistic solution [72]. Their technique requires transactions to check for dependencies of read-write.
Techniques such as validating read and predicate repeatability checks have already been used in the past [73].
Oracle TimesTen [70] and solidDB [74] from IBM employ multiple lock types.
4.4. RUM conjecture
The fundamental challenges faced by every researcher, programmer or network architect when
designing a new access method is how to reduce read times (R), upgrade costs (U), and memory/storage
overhead (M). It is observed that while optimizing in any two areas would negatively impact third. Figure 2
presents an overview of RUM conjecture.
The three criteria that people working on database systems are seeking to design for are outlined by
researchers from Harvard DB lab: Read overhead, update overhead, and memory overhead [75]. Tree-based
8. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1787
data structures [76] are mainly targeted for improving the read requests, traded for space overhead and increase
in update time as the nodes get shifted and new nodes get added. Utilizing adaptive data models gives better
read execution at a greater expense of support. Adding metadata to permit traversals, (for example,
fragmentary falling) may influence write time and space yet may expand read time.
Figure 2. Indexes in the RUM space
LSM-Trees [44], the Partitioned B-tree [77] and the Stepped Merge algorithm [78] optimize the
performance of write. All updates and deletions in LSM Trees do not require searching for disk data and
guarantee sequential writings by deferring and buffering all insert, modifying and deleting operations. This
comes at a price of higher maintenance costs and compaction requirements and more expensive reads. At the
same time, LSM Trees help improve memory performance by avoiding reserving space and enabling block
compression.
Compression (for instance, algorithms, for example, Gorilla compression [79], delta encoding, and
numerous others) could be used to maximize memory capacity, adding some read and write overhead. Heap
files and hash indexes, for example, will give great performance guarantees and lower overhead storage due
to the simplicity of file format. Trade efficiency reliability can be tested by using approximate data structures
such as Bloom Filter [45], HyperLogLog [80], Count Min Sketch [81], Sparse Indexes such as ZoneMaps
[82], Column Imprints [83] and many others. Choosing which overhead(s) to streamline for and to what
degree, stays a noticeable piece of the way toward planning another access method, particularly as hardware
and workload changes after some time [84].
5. STORAGE-TIERING TECHNIQUES
Storage tiering [85], or transferring data between different storage types to suit data I/O
characteristics to that of storage, is a strategy that administrators have been using for many years. Data aging is
the easiest and most common reason for this. That is, as sections of the data age, they change their access
patterns, and are less likely to be accessed: the data will cool down and eventually get cold. Cold data have
very different I/O patterns from warm and hot data, and thus storage needs. Also, hot data may have specific
storage needs for a largely random I/O pattern for an OLTP server workload. For example, SQL Server can
use a design based on a per-node or a CPU. Even latches and memory objects can be dynamically partitioned
by SQL Server to minimize a hot spot.
There are different customary strategies for surveying whether the data is viewed as significant: (i)
frequencies analysis because of the workload, (ii) predefined business rules dependent on expert examination
or functional expertise, and (iii) storing (cache) approach. The principle behind the access frequency analysis
is simple: the more data is accessed, the more important it is and hence the higher the likelihood of repeated
accesses. There are various approaches to quick and resource-efficient monitoring of data accesses. One ex-
ample is the logging scheme proposed by Levandoski et al. [86] using techniques of sampling and smoothing
to minimize memory usage. By contrast, rule-based approaches require application experts to work manually
9. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1780 - 1795
1788
[87]. An example might be: “All payments made older than a year are cold.” Business rules have many
advantages: knowledge of the domain. High precision can be introduced, especially given the time parameter.
But at the same time, experts can devise rules that are too weak and lose the potential for optimization. Also,
current online transactional or analytical processing (OLxP) frameworks are getting progressively
unpredictable be- cause of the combination of different platforms (for example operational reporting,
warehousing, transaction processing, information mining applications, and so forth.) which make rule-driven
experts move towards impossible in the long haul. Caching is another way of finding important data. For
example, page buffer monitoring using least recently used or most recently used policy [88]. Such methods
can be faster than frequency approaches and consume less memory. On the other hand, they may be less
precise and have high bookkeeping overhead. Developers need to update their cache configuration constantly
as the workload of their application varies [89]. In reality, however, it is difficult to identify changes in
workload for large applications [90]. Despite understanding the improvements in the workflow, developers
still have to make a lot of effort to understand the new workload and re-configure the caching system
manually.
The technique for query filtering and tuple reconstruction was suggested by Boissier et al. [91]. All
the attributes used to evaluate the query are stored in DRAM. This info gives enough data to rebuild the
information separating it into two segments: (i) tuples kept in main memory and (ii) tuples moved to disk.
Two methods were presented to compute the hotness of a table cell and discussed the advantages and
disadvantages of both approaches. To do as such, they presented two kinds of bit-vectors: columnar and row
vectors, containing data about the hotness of a column or a row. They at that point proposed to cluster segments
with comparative hotness behavior and to consolidate columnar vectors of each cluster into a single vector. This
helped to reduce the complexity of sorting large datasets and tuple reconstruction.
5.1. Big data models-interdependence and complementary
5.1.1. Data storage model
SQL in Map-reduce systems: Mapreduce, well-known for processing very large datasets in a parallel
fashion using computer clusters is a highly distributed and scalable program. However, when it comes to
joining different datasets or finding the top-k records, it is comparatively slow. To overcome this, the
relational approach needs to be adopted. As such two strategies are employed (1) Usage of SQL-like
language on top of Map-reduce, e.g., Apache Pig and Apache Hive (2) inte- grate Map-reduce computation
with SQL e.g., Spark.
Not only SQL (NoSQL): NoSQL provides higher availability and scalability over traditional databases
by using distributed and fault-tolerant architecture [92]. With these computers can be added easily, replicated
and even support consistency over a while. The various types of NoSQL being Key-value store (e.g
Cassandra, Riak), Document-store (e.g, Couch DB, Mongo DB), Tabular data- stores (e.g, HBase, BigTable)
and Graph Database. However, in most of these, creating custom views, normalization and working with
multiple relations is difficult. Though originally SQL was not supported by NoSQL, present observations
indicate the need for highly expressive and declar-ative query language like SQL. This is observed in a
couple of systems like SlamData, Impala and Presto that support SQL Queries on MongoDB, HBase and
Cassandra [93].
Graph database is widely explored nowadays to analyze massive online semantic web, social
networks, road networks and biological networks. Reachability and shortest path queries are time- consuming
to be performed in relational databases using SQL. Writing recursive and chains of joins is unpleasant.
However, since RDBMS stores many relations that can be related to graphs in the real application,
combination of the two, could produce better results. AgensGraph [94] is a multi-model graph database based
on PostgreSQL RDBMS. It enables developers to integrate the legacy relational data model and the flexible
graph data model in one database.
NewSQL: This class of database incorporates both the scalability and availability of NoSQL systems as
well as the ACID properties of the relational model. Systems with SQL engines like InfoBright and MySQL
Cluster support the same interface like SQL with better scalability. Sharding is done transparently in systems
like ScaleBase and dbShards providing a middle layer that fragments and distributes databases over multiple
nodes. Based on the application scenarios, most of these new SQL systems are used to learn SQL [60].
5.1.2. Indexing
Indexing in big data is a challenge for several reasons [95], primarily because the volume, variety,
and velocity (3V’s of Gartner’s definition) of big data are very large; a generated index should be small (a
fraction of the original data) and fast in comparison with the data because searches must consider billions or
even trillions of data values in seconds. As big data models entertain multi-variable queries, indexing is
required to be in place for all in-operation variables to provide faster access for individual variable search
10. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1789
results that are further combined for a collective response as a whole. Models should have the ability to
produce smaller indexes when there is a possibility for granularity reduction. The indexes should be able to
be portioned easily into sections whenever the opportunities for parallel processing arise. As the rate of data
generated under the big data era is very large, to process data in place, an index should be built at the same
rate. The indexing strategies in Big Data can be classified into artificial intelligence (AI) approach and non-
intelligence (NAI) approach.
In AI, the indexes are formed based on a relationship that is established by observing similar
patterns and traits among data items or objects. The two popular approaches for the AI indexing approach
are:
a. Latent semantic indexing (AI) which uses singular value decomposition for finding patterns between
unstructured databases. It consumes more memory space.
b. Hidden Markov model is based on the Markov model. It uses states connected by transitions that depend
on the present state and independent of previous states. States depend on the current query and store the
results and it also helps to predict future queries. It demands high computational performance. In NAI,
the indexes are based on frequently queried data sets. It uses indexes used in the relational model like B-
tree, R-Tree, X-Tree, LSM, bloom filter. Also, there are indexes like generalized search trees,
generalized inverted index which is implemented similar to B-tree and supports arbitrary indexes.
5.1.3. Query processing
Even for newer platforms based on MapReduce and its variants, the interest in big data models
which are well known for highly parallel platforms, the network-transmission cost, storage cost, I/O cost of
movement of data across nodes is significant when query optimization is concerned. The popular frame- work,
such as MapReduce hence lags in query processing efficiency [95]. In Map Reduce, a query also has to be
translated to a set of MapReduce jobs sentence by sentence. This is time-consuming and dif- ficult. Hence,
high-level languages proposed for Map-reduce such as PIG and HIVE resemble SQL in many ways.
Google’s BigQuery runs in the cloud to overcome the optimization Issues. Here, the data is not indexed
instead stored in columnar format and b-tree architecture is used. In this tree, the data is distributed among
nodes. It provides a query language similar to SQL. Cassandra employs materialized views for complex
query analysis. Expensive table joins and data aggregation is excluded.
5.1.4. Concurrency control
Versioning the database is an important feature to control concurrency in big data models [95]. New
changes are assigned a number, known as the version or revision number. It allows us to find the latest
version of a database table and eases audit trailing. MongoDB, CouchDB, HBase, Google’s BigTable are few
examples which use versioning. With this powerful technique of versioning for such voluminous data, comes
the potential problems like all the data that makes the particular version needs to be stored again in the
database. This could lead to interdependencies between objects so to track the differences is a task.
6. HTAP
HTAP systems are capable of reducing deployment, maintenance, and storage costs by eliminating
the ETL cycle and, most significantly, allowing real-time analysis over production data. Terms such as online
transactional analytical processing (OLTAP) or Online transactional or analytical processing (OLxP) are also
synonymous with them [96]. The Oracle RDBMS in-memory option (DBIM) [97] is an industry-first
distributed dual-format architecture which enables highly optimized columnar-format processing of a
database object in main memory to break performance barriers in analytical query workloads while retaining
transactional compatibility with the corresponding OLTP optimized row-major format which remains in
storage and accesses through the database buffer cache.
SAP HANA scale-out extension [98], a modern distributed database architecture was designed to
separate transaction processing from query processing and use a high-performance distributed shared log as
the persistence backbone. It provides full guarantees of ACID (Atomicity, Consistency, Isolation, and
Durability) and is based on a logical timestamp framework to give MVCC-based snapshot isolation without
requiring synchronous replica refreshes. Rather, it uses asynchronous update propagation to ensure
consistency with the validity of timestamps. MemSQL[99] is a distributed memory-optimized SQL database
which is a shared- nothing architecture that excels in analytical and transactional mixed-time processing. This
exploits in-memory data storage with MVCC and memory-optimized lock-free data structures to permit
highly simultaneous reading and writing of data, empowering ongoing analytics over an operational database.
It ingests as well as keeps the data in row format in the in-memory. However, once data are written to disk, it
is translated to columnar format for faster review.
11. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1780 - 1795
1790
Appuswamy et al. [100] designed a new architecture for the hybrid transaction and analytical
processing named heterogeneous-HTAP (H2TAP). OLAP queries are data-parallel tasks. HTAP uses the
kernel-based execution model for executing OLAP queries on the GPU. However, for OLTP queries which
are task-parallel tasks, it uses message passing-based parallelism. Kemper et al. [34] proposed a hybrid
system, called HyPer, which can handle both ongoing and analytical queries simultaneously by maintaining
consistent snapshots of the transactional data. HyPer maintains the ACID properties of OLTP transactions as
well as executes multiple OLAP queries by taking the current and consistent snapshots. The system makes
use of the operating system functionality of using fork(). All modifications to OLTP are made to a different
area called Delta which consists of replicated database pages. OLAP operations are given the view of virtual
memory. Whenever a transactional query is performed using the fork()-operation a virtual memory snapshot is
created. This will be used for executing the OLAP session. Frequently, OLAP is converged with the Delta by
forking another process for the state-of-the-art OLAP session.
The project Peloton [101] intends to create a self-governing multi-threaded, in-memory HTAP system.
It provides versatile data organization of data, which changes the data design at run-time based on-demand
type [102]. Wildfire [103] is an IBM Research project which produces an HTAP engine where both analytical
and ongoing requests go through the same columnar data format., i.e. Parquet [104] (non-proprietary storage
format), open to any reader for all data. The Spark ecosystem is also used by Wildfire to allow distributed
analytics on a large scale. This allows analysis of the latest committed data immediately by using a single data
format for both data ingestion and analysis.
Sadoghi et al. [105] proposed L-Store (direct information store) that joins ongoing transaction and
analytical workload handling inside a single unified engine by creating a contention-free and lazy staging of
data from write-optimized to read-optimized data. Pinot [106] is an open-source relational database that is
distributed in nature proposed by LinkedIn. Though it is designed mainly to support OLAP, for applications
that require data freshness, it is directly able to ingest the streaming data from Kafka. Like a traditional
database, it consists of records in tables that are further divided into segments. The data in segments is stored
in columnar format. Hence, data size can be minimized using bit packing, etc.
7. PERFORMANCE FACTORS
Various key performance factors affect the performance of the database. Learning these variables
helps identify performance opportunities and avoid problems [107]. The performance of the database is
heavily dependent on disk I/O and memory utilization. The need to know the baseline performance of the
hardware on which the DBMS is deployed is needed to set performance expectations accurately. Hardware
component performance such as CPUs, hard drives, disk controllers, RAM, and network interfaces will have a
significant impact on how quickly your database performs. System resources are, therefore, one of the key
factors affecting performance. The overall system performance may be influenced by DBMS upgrades.
Formulation of the query language, database configuration parameters, table design, data distribution, etc.,
permit the optimizer of the database query to create the most productive accessplans.
The workload equals all the requests from the DBMS, and over time it varies. The overall workload is
a blend at some random time of user queries, programs, batch tasks, transactions, and machine commands. For
example, during the preparation of month-end queries, there is an increase in workload whereas the other days’
workload is minimum. Workload strongly influences database performance. The selection of data structures
that adapt to changes in workloads and pinnacle request times plans help to prepare for the most productive
utilization of the system resources and enables the maximum possible workload to be handled.
The transaction throughput generally measures the speed of a database system, expressed as a number
of transactions per second. Disk I/O is slower relative to other system resources such as CPU which reduces
throughput. Hence, techniques are identified to keep more data in cache or memory. The contention is another
condition where at least two workload components endeavor to utilize the system in a clashing way for
instance, multiple queries endeavoring to refresh a similar bit of information simultaneously or various work-
loads going after system resources. Along these lines, throughput diminishes as the contention increments.
Latency otherwise called response time or access time is a proportion of to what extent it takes the database
to react to a single request. Two such latencies in the database are (i) write latency is the number of mil-
liseconds required to fulfill a single write request. (ii) read latency is the number of milliseconds required to
fulfill a read request [108]. In parallel databases, every lock operation in a processor introduces a lot of
latency. Overload Detection is also desired to restrict the response time of a server to be below a certain
threshold to retain customers [109]. Nevertheless, if many user transactions are executed concurrently, a
database system can be overwhelmed. As a consequence, computing resources such as CPU cycles and
memory space can be exhausted. Also, because of data contention, most transactions can be blocked or
prematurely ended and restarted.
12. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1791
In terms of physical space and query processing time, cost models are useful in analyzing the cost
[110]. Cost models provide the inputs (data distribution statistics and operator execution algorithms in a
query plan) needed to estimate the cost of query plans and thereby facilitate the selection of optimal plans
[111, 112]. An important concept underlying the execution of a query plan is that it takes fewer resources
and/or minimum time to process fewer data [113]. What series of operations require fewer data and time for
request execution is not always clear. Thus, the cost model provides some real insight into a database
system’s actual physical design and performance.
8. CONCLUSION
The study focused on the importance of relational model merits to the big data. Potential determinants
(e.g. data storage layouts and data retrieval techniques) related to the relational model were also highlighted.
The paper mainly contributed to the literature on the storage layouts, indexing, query processing, and
concurrency control techniques in the relational model. The study revealed there are a few key areas that still
need to be worked upon in the relational models. The identification of gaps in indexing has led to an
understanding that it is not straightforward to index the data being distributed and accessed on a large scale to
allow efficient point lookups. Faster point access to these objects is still an open issue. Efficient indexing
algorithms have been explored, but as the RUM conjecture states, any two optimizations lead to the third
being reduced. Con-currency in many-core systems is still a matter of research. In the case of MVCC, the
overhead grows with each concurrent update.
The paper also highlighted the on-going research on hybrid transactional and analytical processing
(HTAP). While there are many systems classified as HTAP solutions out there, none of them support true HTAP.
Current frameworks provide a reasonable platform to support transactional as well as analytical requests when
sent separately to the system. Nonetheless, none of the existing solutions support effective transactional and
analytical request processing within the same transaction. However, today most HTAP systems use multiple
components to provide all the desired capabilities. Different groups of people usually maintain these different
components. It is, therefore, a challenging task to keep these components compatible and to give end-users
the illusion of a single system. Last, the study also proffered the acceptance and implementation of a few
significant methodologies of the relational model in big data. The present system demands for data to be both
“offensive” (data being exploratory) and “defensive” (control of data).
REFERENCES
[1] M. Procopio, et al., “Load-n-go: Fast approximate join visualizations that improve over time,” in Proceedings of the
Workshop on Data Systems for Interactive Analysis (DSIA), Phoenix, AZ, USA, 2017, pp. 1-2.
[2] P. Atzeni, et al., “Data modeling in the nosql world,” Computer Standards & Interfaces, vol. 67, p. 103149, pp.1-36,
2020.
[3] V.Raman, et al., “Db2 with blu acceleration: So much more than just a column store,” Proceedings of the VLDB
Endowment, vol. 6, no. 11, 2013, pp. 1080-1091.
[4] M. F. Lamb, et al., “The vertica analytic database: C-store 7 years later,” Proceedings of the VLDB Endowment,
vol. 5, no. 12, 2012, pp. 1790-1801.
[5] M. Stonebraker and A. Weisberg, “The voltDB main memory DBMS,” IEEE Data Engineering Bulletin, vol. 36,
no. 2, pp. 21-27, 2013.
[6] C. Diaconu, et al., “Hekaton: Sql server’s memory-optimized oltp engine,” in Proceedings of the 2013 ACM
SIGMOD International Conference on Management of Data, 2013, pp.1243-1254.
[7] G. Moore, “Moore’s law,” Electronics Magazine, vol. 38, no. 8, p. 114, 1965.
[8] R. Sumbaly, et al., “Serving large-scale batch computed data with project voldemort,” in Proceedings of the 10th
USENIX conference on File and Storage Technologies, 2012, pp. 1-18.
[9] Cassandra, “Apache cassandra,” 2014. [Online]. Available: http://planetcassandra.org/what-is-apache-cassandra.
[10] S. Dong, et al., “Optimizing space amplification in rocksdb,” in CIDR, vol. 3, p. 3, 2017.
[11] M. A. Qader, et al., “A comparative study of secondary indexing techniques in lsm- based nosql databases,” in
Proceedings of the 2018 International Conference on Management of Data, 2018, pp. 551-566.
[12] Z. H. Jin, et al., “Cirrodata: Yet another SQL-on-hadoop data analytics engine with high performance,” Journal of
Computer Science and Technology, vol. 35, no. 1, pp. 194-208, 2020.
[13] F. O¨zcan, et al., “Hybrid transactional/analytical processing: A survey,” in Proceedings of the 2017 ACM
International Conference on Management of Data, 2017, pp. 1771-1775.
[14] M. Armbrust, et al., “Spark sql: Relational data processing in spark,” in Proceedings of the 2015 ACM SIGMOD
international conference on management of data, 2015, pp. 1383-1394.
[15] M. Pezzini, et al., “Hybrid transaction/analytical processing will foster opportunities for dramatic business
innovation,” Gartner, 2014.
[16] M. Mohamed, et al., “An improved algorithm for database concurrency control,” International Journal of
Information Technology, vol. 11, no. 1, pp. 21-30, 2019.
13. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1780 - 1795
1792
[17] Y. Tian, et al., “Dinodb: Efficient large- scale raw data analytics,” in Proceedings of the First International
Workshop on Bringing the Value of Big Data to Users (Data4U 2014), 2014, pp. 1-6.
[18] M. Kiran, et al., “Lambda architecture for cost-effective batch and speed big data processing,” in 2015 IEEE
International Conference on Big Data (Big Data), 2015, pp. 2785-2792.
[19] E. F. Codd, “A relational model of data for large shared data banks,” Communications of the ACM, vol. 13, no. 6,
pp. 377-387, 1970.
[20] T. Wolfson, et al., “Break it down: A question understanding benchmark,” Transactions of the Association for
Computational Linguistics, vol. 8, pp. 183-198, 2020.
[21] G. Strawn and C. Strawn, “Relational databases: Codd, stonebraker, and Ellison,” IT Professional, vol. 18, no. 2,
pp. 63-65, 2016.
[22] X. Wei, etal., “Online adaptive approximate stream processing with customized error control,” IEEE Access, vol. 7,
pp. 25 123-25 137, 2019.
[23] S. Agarwal, et al., “Blinkdb: queries with bounded errors and bounded response times on very large data,” in
Proceedings of the 8th ACM European Conference on Computer Systems, 2013, pp. 29-42.
[24] R. J. Lipton, et al., “Efficient sampling strategies for relational database operations,” Theoretical Computer Science,
vol. 116, no. 1, pp. 195-226,1993.
[25] C. W.Olofson, “The analytic-transactional data platform: Enabling the real-time enterprise,” CBS Interactive, 2014.
[26] T. H. Nelson, “Complex information processing: a file structure for the complex, the changing and the
indeterminate,” in Proceedings of the 1965 20th national conference, 1965, pp. 84-100.
[27] R. C. Goldstein and A. J. Strnad, “The macaims data management system,” in Proceedings of the 1970 ACM
SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control, pp. 201-229, 1970.
[28] K. Sterjo, “Row-store/column-store/hybrid-store,” pp. 1-10, 2017.
[29] L. Sun, et al., “Skipping-oriented partitioning for columnar layouts,” Proceedings of the VLDB Endowment,
vol. 10, no. 4, 2016, pp. 421-432.
[30] D. Tsitsigkos, et al., “A two-level spatial in- memory index,” arXiv preprint arXiv: 2005.08600, 2020.
[31] H. Lang, et al., “Tree-encoded bitmaps,” in Proceedings of the 2020 ACM SIGMOD International Conference on
Management of Data, 2020, pp. 937-967.
[32] M. Olma, et al., “Adaptive partitioning and indexing for in situ query processing,” The VLDB Journal, vol. 29, no. 1,
pp. 569-591, 2020.
[33] L. Sun, “Skipping-oriented data design for large-scale analytics,” Ph.D. dissertation, UC Berkeley, 2017.
[34] T. Li, et al., “Mainlining databases: Supporting fast transactional workloads on universal columnar data file
formats,” arXiv preprint arXiv: 2004.14471, 2020.
[35] F. Yang, et al., “Druid: A real-time analytical data store,” in Proceedings of the 2014 ACM SIGMOD international
conference on Management of data, 2014., pp. 157-168
[36] J. Patel and V. Singh, “Query morphing: A proximity-based data exploration for query reformulation,” in
Computational Intelligence: Theories, Applications and Future Directions, vol. 1, pp. 247-259, 2019.
[37] M. Grund, J. Kru¨ger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden, “Hyrise: a main memory hybrid
storage engine,” Proceedings of the VLDB Endowment, vol. 4, no. 2, pp. 105-116, 2010.
[38] M. Shao, et al., “Clotho: Decoupling memory page layout from storage organization,” in Proceedings of the
Thirtieth international conference on Very large databases, vol. 30, 2004, pp. 696-707.
[39] R. R. D. J. DeWitt and Q. Su, “A case for fractured mirrors,” in Proceedings 2002 VLDB Conference: 28th
International Conference on Very Large Databases (VLDB), 2002, p. 430.
[40] S. I. Alagiannis and A. Ailamaki, “H2o: a hands-free adaptive store,” in Proceedings of the 2014 ACM SIGMOD
International conference on Management of data, 2014, pp. 1103-1114.
[41] L. Rodr´ıguez and X. Li, “A dynamic vertical partitioning approach for distributed database system,” in 2011 IEEE
International Conference on Systems, Man, and Cybernetics, 2011, pp. 1853-1858.
[42] E. Bertino, et al., “Indexing techniques for advanced database systems,” Springer Science & Business Media, vol. 8,
2012.
[43] M. A. Awad, et al., “Engineering a high- performance gpu b-tree,” in Proceedings of the 24th Symposium on
Principles and Practice of Parallel Programming, 2019, pp. 145-157.
[44] C. Luo and M. J. Carey, “Lsm-based storage techniques: a survey,” The VLDB Journal, vol. 29, no. 1, pp. 393-418,
2020.
[45] B. Ding, et al., “Bitvector-aware query optimization for decision support queries,” in Proceedings of the 2020 ACM
SIGMOD International Conference on Management of Data, 2020, pp. 2011-2026.
[46] H. Ibrahim, et al., “Analyses of indexing techniques on uncertain data with high dimensionality,” IEEE Access,
vol. 8, pp. 74 101-74117, 2020.
[47] M. Zukowski and P. Boncz, “Vectorwise: Beyond column stores,” IEEE Data Engineering Bulletin, pp. 1-6, 2012.
[48] C. O. Truica, et al., “Building an inverted index at the dbms layer for fast full text search,” Journal of Control
Engineering and Applied Informatics, vol. 19, no. 1, pp. 94-101, 2017.
[49] J. M. Medina, et al., “Evaluation of indexing strategies for possibilistic queries based on indexing techniques
available in traditional rdbms,” International Journal of Intelligent Systems, vol. 31, no. 12, pp. 1135-1165, 2016.
[50] R. Singh, “Inductive learning-based sparql query optimization,” in Data Science and Intelligent Applications,
pp. 121-135, 2020.
[51] G. Graefe, “Query evaluation techniques for large databases,” ACM Computing Surveys (CSUR), vol. 25, no. 2,
pp. 73-169, 1993.
14. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1793
[52] P. Day, et al., “Dynamic adjustment of system resource allocation during query execution in a database management
system,” uS Patent App. 10/671,043, 2005.
[53] S. Krishnan, et al., “Stale view cleaning: Getting fresh answers from stale materialized views,” Proceedings of the
VLDB Endowment, vol. 8, no. 12, 2015, pp. 1370-1381.
[54] Z. Scully and A. Chlipala, “A program optimization for automatic database result caching,” in ACM SIGPLAN
Notices, vol. 52, no. 1, pp. 271-284, 2017.
[55] X. Qin and X. Zhou, “Db facade: A web cache with improved data freshness,” in 2009 Second International
Symposium on Electronic Commerce and Security, vol. 2, 2009, pp.483-487.
[56] S. M. Cheung and A. Solar-Lezama, “Sloth: Being lazy is a virtue (when issuing database queries),” ACM
Transactions on Database Systems (ToDS), vol. 41, no. 2, pp. 1-42, 2016.
[57] S. D. Thakare, et al., “Mining query plans for finding candidate queries and sub-queries for materialized views in bi
systems without cube generation,” Computing and Informatics, vol. 38, no. 2, pp. 473-496, 2019.
[58] Gong and W. Zhao, “Clustering-based dynamic materialized view selection algorithm,” in 2008 Fifth International
Conference on Fuzzy Systems and Knowledge Discovery, vol. 5, 2008, pp. 391-395.
[59] Y. Huang, et al., “Opportunities for optimism in contended main- memory multicore transactions,” Proceedings of
the VLDB Endowment, vol. 13, no. 5, pp. 629-642, 2020.
[60] Y. N. Silva, et al., “Sql: From traditional databases to big data,” in Proceedings of the 47th ACM Technical
Symposium on Computing Science Education, 2016, pp. 413-418.
[61] X. Yu, et al., “Staring into the abyss: An evaluation of concurrency control with one thousand cores,” Proceedings
of the VLDB Endowment, vol. 8, no. 3, 2014, pp. 209-220.
[62] K. Ren, et al., “Design principles for scaling multi-core oltp under high contention,” in Proceedings of the 2016
International Conference on Management of Data, 2016, pp. 1583-1598.
[63] D. A. Yao, et al., “Exploiting single-threaded model in multi-core in-memory systems,” IEEE Transactions on
Knowledge and Data Engineering, vol. 28, no. 10, pp. 2635-2650, 2016.
[64] M. Sadoghi and S. Blanas, “Transaction processing on modern hardware,” Synthesis Lectures on Data
Management, vol. 14, no. 2, pp. 1-138, 2019.
[65] K. Ren, et al., “Design principles for scaling multi-core oltp under high contention,” in Proceedings of the 2016
International Conference on Management of Data, 2016, pp. 1583-1598.
[66] M. R. Shamis, et al., “Fast general distributed transactions with opacity using global time,” arXiv preprint
arXiv: 2006.14346, 2020.
[67] H. T. Kung and J. T. Robinson, “On optimistic methods for concurrency control,” ACM Transactions on Database
Systems (TODS), vol. 6, no. 2, pp. 213-226, 1981.
[68] J. Guo, et al., “Adaptive optimistic concurrency control for heterogeneous workloads,” Proceedings of the VLDB
Endowment, vol. 12, no. 5, 2019, pp. 584-596,.
[69] H. Berenson, etal., “A critique of ansi sql isolation levels,” in ACM SIGMOD Record, vol. 24, no. 2, pp.1-10, 1995.
[70] T. Lahiri, et al., “Oracle timesten: An in-memory database for enterprise applications,” IEEE Data Enginnering
Bulletin, vol. 36, no. 2, pp. 6-13,2013.
[71] R. Rankins, etal., “Microsoft SQL Server 2012 Un- leashed,” Pearson Education, 2013.
[72] M. J. Cahill, et al., “Serializable isolation for snapshot databases,” ACM Transactions on Database Systems
(TODS), vol. 34, no. 4, pp. 729-738, 2009.
[73] M. A. Bornea, et al., “One-copy serializability with snapshot isolation under the hood,” in 2011 IEEE 27th
International Conference on Data Engineering, 2011, pp. 625-636.
[74] J. Lindstro¨m, et al., “IBM soliddb: In-memory database optimized for extreme speed and availability,” IEEE Data
Engineering Bulletin, vol. 36, no. 2, pp. 14-20, 2013.
[75] M. Athanassoulis, et al., “Designing access methods: The rum conjecture,” in EDBT, vol. 2016, pp. 461-466, 2016.
[76] H. Zhang, et al., “Order-preserving key compression for in-memory search trees,” in Proceedings of the 2020 ACM
SIGMOD International Conference on Management of Data, 2020, pp. 1601-1615.
[77] G. Graefe, “Sorting and indexing with partitioned b-trees,” in CIDR, vol. 3, pp.5-8, 2003.
[78] H. Jagadish, et al., “Incremental organization for data recording and warehousing,” in VLDB, vol. 97, pp. 16-25,
1997.
[79] T. Pelkonen, et al., “Gorilla: A fast, scalable, in-memory time series database,” Proceedings of the VLDB
Endowment, vol. 8, no. 12, 2015, pp. 1816-1827.
[80] P. Flajolet, et al., “Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm,” in Discrete
Mathematics and Theoretical Computer Science, pp. 137-156, 2007.
[81] G. Cormode and S. Muthukrishnan, “An improved data stream summary: the count-min sketch and its
applications,” Journal of Algorithms, vol. 55, no. 1, pp. 58-75, 2005.
[82] P. Francisco, et al., “The netezza data appliance architecture: A platform for high performance data warehousing
and analytics,” IBM Corp., 2011.
[83] L. Sidirourgos and M. Kersten, “Column imprints: a secondary index structure,” in Proceedings of the 2013 ACM
SIGMOD International Conference on Management of Data, 2013, pp. 893-904.
[84] M. Athanassoulis and S. Idreos, “Design tradeoffs of data access methods,” in Proceedings of the 2016
International Conference on Management of Data, 2016, pp.2195-2200.
[85] R. Sherkat, et al., “Native store extension for sap hana,” Proceedings of the VLDB Endowment, vol. 12, no. 12,
pp. 2047-2058, 2019.
[86] J. J. Levandoski, et al., “Identifying hot and cold data in main-memory databases,” in 2013 IEEE 29th International
Conference on Data Engineering (ICDE), 2013, pp. 26-37.
15. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1780 - 1795
1794
[87] N. W. Paton and O. D´ıaz,“Active database systems,” ACM Computing Surveys (CSUR), vol. 31, no. 1, pp. 63-103,
1999.
[88] S. Dar, et al., “Semantic data caching and replacement,” in VLDB, vol. 96, pp. 330-341, 1996.
[89] T. H. Chen, et al., “Cacheoptimizer: Helping developers configure caching frameworks for hibernate-based
database-centric web applications,” in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on
Foundations of Software Engineering, 2016, pp. 666-677.
[90] P. Zheng, “Artificial intelligence for understanding large and complex datacenters,” Ph.D. dissertation, Duke
University, 2020.
[91] M. Boissier, et al., “Improving tuple reconstruction for tiered column stores: a workload-aware ansatz based on table
reordering,” in Proceedings of the Australasian Computer Science Week Multiconference, 2017, p. 25.
[92] M. Kuszera, et al., “Toward rdb to nosql: transforming data with metamor- fose framework,” in Proceedings of the
34th ACM/SIGAPP Symposium on Applied Computing, 2019, pp. 456-463.
[93] R Roman Č. and M. Kvet, “Comparison of query performance in relational a non-relation databases,”
Transportation Research Procedia, vol. 40, pp. 170-177, 2019.
[94] Zhang and J. Lu, “Holistic evaluation in multi-model databases benchmarking,” Distributed and Parallel
Databases, pp. 1-33, 2019.
[95] S. Sharma, et al., “A brief review on leading big data models,” Data Science Journal, pp. 14-41, 2014.
[96] J. P. Coelho, et al., “Htapbench: Hybrid transactional and analytical processing benchmark,” in Proceedings of the
8th ACM/SPEC on International Conference on Performance Engineering, 2017, pp. 293-304.
[97] N. Mukherjee, et al., “Distributed architecture of oracle database in-memory,” Proceedings of the VLDB
Endowment, vol. 8, no. 12, 2015, pp. 1630-1641.
[98] K. Goel, et al., “Towards scalable real-time analytics: an architecture for scale-out of olxp workloads,” Proceedings
of the VLDB Endowment, vol. 8, no. 12, 2015, pp. 1716-1727.
[99] Z. Xie, et al., “Cool, a cohort online analytical processing system,” in 2020 IEEE 36th International Conference on
Data Engineering (ICDE), 2020, pp. 577-588.
[100] R. Appuswamy, et al., “The case for heterogeneous htap,” in 8th Biennial Conference on Innovative Data Systems
Research, 2017.
[101] G. A. Pavlo, et al., “Self-driving database management systems,” in CIDR, vol. 4, p. 1, 2017.
[102] J. Arulraj, et al., “Bridging the archipelago between row-stores and column-stores for hybrid workloads,” in
Proceedings of the 2016 International Conference on Management of Data. ACM, 2016, pp. 583-598.
[103] R. Barber, et al., “Evolving databases for new-gen big data applications,” in CIDR, 2017.
[104] Vohra, “Apache parquet,” in Practical Hadoop Ecosystem, pp. 325-335, 2016.
[105] M. Sadoghi, et al., “L-store: A real-time oltp and olap system,” arXiv preprint arXiv: 1601.04084, 2016.
[106] J. F. Im, et al., “Pinot: Realtime olap for 530 million users,” in Proceedings of the 2018 International Conference
on Management of Data, 2018, pp. 583-594.
[107] Gpdb.docs.pivotal.io, “Pivotal GreenplumQR 6.3 Documentation - Pivotal Greenplum Docs,” 2020.
[108] Niyizamwiyitira and L. Lundberg, “Performance evaluation of sql and nosql database management systems in a
cluster,” International Journal of Database Management Systems, vol. 9, no. 6, pp. 1-24, 2017.
[109] N. Bhatti, et al., “Integrating user-perceived quality into web server design,” Computer Networks, vol. 33,
pp. 1-16, 2000.
[110] S. B. Yao and A. R. Hevner, “A guide to performance evaluation of database systems,” NBS Special Publication,
1984.
[111] Olken and D. Rotem, “Random sampling from databases: a survey,” Statistics and Computing, vol. 5, no. 1,
pp. 25-42, 1995.
[112] R. K. Kurella, “Systematic literature review: Cost estimation in relational databases,” Ph.D. dissertation, University
of Magdeburg, 2018.
[113] Y. Theodoridis, et al., “Efficient cost models for spatial queries using r-trees,” IEEE Transactions on knowledge
and data engineering, vol. 12, no. 1, pp. 19-32, 2000.
BIOGRAPHIES OF AUTHORS
Sucharitha Shetty is currently working as Assistant Professor in the Department of Computer
Sci- ence and Engineering at MIT, Manipal Academy of Higher Education, Manipal. She is also
pursuing her PhD at Manipal Academy of Higher Education, Manipal. Her research interests
include database and parallelism. She has published few papers in International conferences.
16. Int J Elec & Comp Eng ISSN: 2088-8708
Growth of relational model: interdependence and complementary to big data (Sucharitha Shetty)
1795
B. Dinesh Rao acquired his MTech degree from the Indian Institute of Technology, Kanpur in the
year 1993 and PhD degree from the Manipal Academy of Higher Education, Manipal in the year
2014. He is currently working as Professor in Manipal School of Information Sciences, Manipal
Academy of Higher Education, Manipal. His research interests include Fractals and theoretical
Computer Sci- ence. He has around 26 years of teaching experience and has published around 11
research papers in National/International Journals/Conferences.
Srikanth Prabhu received his M.Tech degree and PhD degree from the Indian Institute of Tech-
nology, Kharagpur in the year 2003 and 2013 respectively. He is presently working as Associate
Professor in the Department of Computer Science and Engineering, Manipal Institute of
Technol- ogy, Manipal Academy of Higher Education, Manipal. His research interests include
Biomedical Image processing, Agricultural image processing, Face recognition, wireless visual
sensor networks. He has around 15 years of teaching experience and has published around 35
research papers in Na- tional/International Journals/Conferences. He is also on the editorial board
of the some journals.