There is a vast collection of data for consumers due to tremendous development in digital marketing. For their ads or for consumers to validate nearby services which already are upgraded to the dataset systems, consumers are more concerned with the amount of data. Hence there is a void formed between the producer and the client. To fill that void, there is the need for a framework which can facilitate all the needs for query updating of the data. The present systems have some shortcomings by a vast number of information that each time lead to decision tree-based approach. A systematic solution to the automated incorporation of data into a Hadoop distributed file system (HDFS) warehouse (Hadoop file system) includes a data hub server, a generic data charging mechanism and a metadata model. In our model framework, the database would be able to govern the data processing schema. In the future, as a variety of data is archived, the datalake will play a critical role in managing that data. To order to carry out a planned loading function, the setup files immense catalogue move the datahub server together to attach the miscellaneous details dynamically to its schemas.
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
This document describes a method for processing large amounts of data stored in cloud storage using Hadoop clusters. Data is uploaded to cloud storage by users and then processed using MapReduce on Hadoop clusters. The method involves storing data in the cloud for processing and then running MapReduce algorithms on Hadoop clusters to analyze the data in parallel. The results are then stored back in the cloud for users to download. An architecture is proposed involving a controller that directs requests to Hadoop masters which coordinate nodes to perform mapping and reducing of data according to the algorithm implemented.
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
The computer industry is being challenged to develop methods and techniques for affordable data processing on large datasets at optimum response times. The technical challenges in dealing with the increasing demand to handle vast quantities of data is daunting and on the rise. One of the recent processing models with a more efficient and intuitive solution to rapidly process large amount of data in parallel is called MapReduce. It is a framework defining a template approach of programming to perform large-scale data computation on clusters of machines in a cloud computing environment. MapReduce provides automatic parallelization and distribution of computation based on several processors. It hides the complexity of writing parallel and distributed programming code. This paper provides a comprehensive systematic review and analysis of large-scale dataset processing and dataset handling challenges and
requirements in a cloud computing environment by using the MapReduce framework and its open-source implementation Hadoop. We defined requirements for MapReduce systems to perform large-scale data processing. We also proposed the MapReduce framework and one implementation of this framework on Amazon Web Services. At the end of the paper, we presented an experimentation of running MapReduce
system in a cloud environment. This paper outlines one of the best techniques to process large datasets is MapReduce; it also can help developers to do parallel and distributed computation in a cloud environment.
Today’s era is generally treated as the era of data on each and every field of computing application huge amount of data is generated. The society is gradually more dependent on computers so large amount of data is generated in each and every second which is either in structured format, unstructured format or semi structured format. These huge amount of data are generally treated as big data. To analyze big data is a biggest challenge in current world. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage and it generally follows horizontal processing. Map Reduce programming is generally run over Hadoop Framework and process the large amount of structured and unstructured data. This Paper describes about different joining strategies used in Map reduce programming to combine the data of two files in Hadoop Framework and also discusses the skewness problem associate to it.
This document provides a survey of distributed heterogeneous big data mining adaptation in the cloud. It discusses how big data is large, heterogeneous, and distributed, making it difficult to analyze with traditional tools. The cloud helps overcome these issues by providing scalable infrastructure on demand. However, directly applying Hadoop MapReduce in the cloud is inefficient due to its assumption of homogeneous nodes. The document surveys different approaches for improving MapReduce performance in heterogeneous cloud environments through techniques like optimized task scheduling and resource allocation.
Big Data Storage System Based on a Distributed Hash Tables Systemijdms
The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research
Managing Big data using Hadoop Map Reduce in Telecom DomainAM Publications
Map reduce is a programming model for analysing and processing large massive data sets. Apache Hadoop is an efficient frame work and the most popular implementation of the map reduce model. Hadoop’s success has motivated research interest and has led to different modifications as well as extensions to framework. In this paper, the challenges faced in different domains like data storage, analytics, online processing and privacy/ security issues while handling big data are explored. Also, the various possible solutions with respect to Telecom domain with Hadoop Map reduce implementation is discussed in this paper.
We are in the age of big data which involves collection of large datasets.Managing and processing large data sets is difficult with existing traditional database systems.Hadoop and Map Reduce has become one of the most powerful and popular tools for big data processing . Hadoop Map Reduce a powerful programming model is used for analyzing large set of data with parallelization, fault tolerance and load balancing and other features are it is elastic,scalable,efficient.MapReduce with cloud is combined to form a framework for storage, processing and analysis of massive machine maintenance data in a cloud computing environment.
Big Data Analytics in the Cloud for Business Intelligence.docxVENKATAAVINASH10
The document discusses how cloud computing and big data analytics can be combined to provide powerful benefits for businesses when it comes to business intelligence. It outlines that cloud computing reduces costs for businesses by providing on-demand access to computing resources, while big data analytics allows businesses to gain insights from large amounts of data. However, big data analysis requires significant computing power that can be expensive for small-to-medium enterprises. The document argues that cloud computing can provide the storage and processing power needed for big data analytics in a cost-effective way for businesses. Combining these two technologies has the potential to improve decision making for businesses.
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
This document describes a method for processing large amounts of data stored in cloud storage using Hadoop clusters. Data is uploaded to cloud storage by users and then processed using MapReduce on Hadoop clusters. The method involves storing data in the cloud for processing and then running MapReduce algorithms on Hadoop clusters to analyze the data in parallel. The results are then stored back in the cloud for users to download. An architecture is proposed involving a controller that directs requests to Hadoop masters which coordinate nodes to perform mapping and reducing of data according to the algorithm implemented.
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
The computer industry is being challenged to develop methods and techniques for affordable data processing on large datasets at optimum response times. The technical challenges in dealing with the increasing demand to handle vast quantities of data is daunting and on the rise. One of the recent processing models with a more efficient and intuitive solution to rapidly process large amount of data in parallel is called MapReduce. It is a framework defining a template approach of programming to perform large-scale data computation on clusters of machines in a cloud computing environment. MapReduce provides automatic parallelization and distribution of computation based on several processors. It hides the complexity of writing parallel and distributed programming code. This paper provides a comprehensive systematic review and analysis of large-scale dataset processing and dataset handling challenges and
requirements in a cloud computing environment by using the MapReduce framework and its open-source implementation Hadoop. We defined requirements for MapReduce systems to perform large-scale data processing. We also proposed the MapReduce framework and one implementation of this framework on Amazon Web Services. At the end of the paper, we presented an experimentation of running MapReduce
system in a cloud environment. This paper outlines one of the best techniques to process large datasets is MapReduce; it also can help developers to do parallel and distributed computation in a cloud environment.
Today’s era is generally treated as the era of data on each and every field of computing application huge amount of data is generated. The society is gradually more dependent on computers so large amount of data is generated in each and every second which is either in structured format, unstructured format or semi structured format. These huge amount of data are generally treated as big data. To analyze big data is a biggest challenge in current world. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage and it generally follows horizontal processing. Map Reduce programming is generally run over Hadoop Framework and process the large amount of structured and unstructured data. This Paper describes about different joining strategies used in Map reduce programming to combine the data of two files in Hadoop Framework and also discusses the skewness problem associate to it.
This document provides a survey of distributed heterogeneous big data mining adaptation in the cloud. It discusses how big data is large, heterogeneous, and distributed, making it difficult to analyze with traditional tools. The cloud helps overcome these issues by providing scalable infrastructure on demand. However, directly applying Hadoop MapReduce in the cloud is inefficient due to its assumption of homogeneous nodes. The document surveys different approaches for improving MapReduce performance in heterogeneous cloud environments through techniques like optimized task scheduling and resource allocation.
Big Data Storage System Based on a Distributed Hash Tables Systemijdms
The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research
Managing Big data using Hadoop Map Reduce in Telecom DomainAM Publications
Map reduce is a programming model for analysing and processing large massive data sets. Apache Hadoop is an efficient frame work and the most popular implementation of the map reduce model. Hadoop’s success has motivated research interest and has led to different modifications as well as extensions to framework. In this paper, the challenges faced in different domains like data storage, analytics, online processing and privacy/ security issues while handling big data are explored. Also, the various possible solutions with respect to Telecom domain with Hadoop Map reduce implementation is discussed in this paper.
We are in the age of big data which involves collection of large datasets.Managing and processing large data sets is difficult with existing traditional database systems.Hadoop and Map Reduce has become one of the most powerful and popular tools for big data processing . Hadoop Map Reduce a powerful programming model is used for analyzing large set of data with parallelization, fault tolerance and load balancing and other features are it is elastic,scalable,efficient.MapReduce with cloud is combined to form a framework for storage, processing and analysis of massive machine maintenance data in a cloud computing environment.
Big Data Analytics in the Cloud for Business Intelligence.docxVENKATAAVINASH10
The document discusses how cloud computing and big data analytics can be combined to provide powerful benefits for businesses when it comes to business intelligence. It outlines that cloud computing reduces costs for businesses by providing on-demand access to computing resources, while big data analytics allows businesses to gain insights from large amounts of data. However, big data analysis requires significant computing power that can be expensive for small-to-medium enterprises. The document argues that cloud computing can provide the storage and processing power needed for big data analytics in a cost-effective way for businesses. Combining these two technologies has the potential to improve decision making for businesses.
This document discusses big data and its applications. It begins with an abstract that introduces big data and how extracting valuable information from large amounts of structured and unstructured data can help governments and organizations develop policies. It then discusses key aspects of big data including volume, velocity, and variety. Current big data technologies are outlined such as Hadoop, HBase, and Hive. Some big data problems and applications are also mentioned like using big data in commerce, business, and scientific research to improve forecasting, policies, productivity, and research.
This document evaluates the performance of a hybrid differential evolution-genetic algorithm (DE-GA) approach for load balancing in cloud computing. It first provides background on cloud computing and load balancing. It then describes the DE-GA approach, which uses differential evolution initially and switches to genetic algorithm if needed. The results show that the hybrid DE-GA approach improves performance over differential evolution and genetic algorithm alone, reducing makespan, average response time, and improving resource utilization. The study demonstrates the benefits of the hybrid evolutionary algorithm for an important problem in cloud computing.
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexingijdms
Nowadays, the structure of cloud uses the data processing of simple key value, which cannot adapted in
search of closeness effectively because of the lack of structures of effective indexes, and with the increase of
dimension, the structures of indexes similar to the tree existing could lead to the problem " of the curse of
dimension”. In this paper, we define a new cloud computing architecture in such global index, storage
nodes is based on a Distributed Hash Table (DHT). In order to reduce the cost of calculation, store
different services on the hyperbolic tree by using virtual coordinates thus that greedy routing based on
hyperbolic space properties in order to improve query storage and retrieving performance effectively. Next,
we perform and evaluate our cloud computing indexing structure based on a hyperbolic tree using virtual
coordinates taken in the hyperbolic plane. We show through our experimental results that we compare with
others clouds systems to show our solution ensures consistence and scalability for Cloud platform.
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET Journal
This document provides an overview and comparative analysis of the K-Means++ and Mini Batch K-Means clustering algorithms in cloud computing using MapReduce. It first introduces cloud computing and its advantages for processing big data using Hadoop MapReduce. It then discusses the K-Means++ algorithm as an improved version of the standard K-Means algorithm that initializes cluster centroids more intelligently. Finally, it compares the performance of K-Means++ and Mini Batch K-Means when implemented using MapReduce for large-scale clustering in cloud environments.
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...ijaia
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is gained by means of different tools that are able to provide data and information having different formats and structures into an unique bus system connected to a Big Data. The initial part of the research is focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz Information Miner) Gradient Boosted Trees workflow processing data of the communication system which travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the new KB collected into a Cassandra big data system could be processed through the ESB by predictive algorithms solving possible conflicts between hardware and software. The conflicts are due to the integration of different database technologies and data structures. In order to check the outputs of the Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been tested. The test has been performed on a prototype network system modeling a part of the whole communication system. The paper shows how to validate industrial research by following a complete design and development of a whole communication system network improving business intelligence (BI).
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
This document presents an implementation of the p-PIC clustering algorithm using the MapReduce framework to handle big data. P-PIC is a parallel version of the Power Iteration Clustering (PIC) algorithm that is able to cluster large datasets in a distributed environment. The document first provides background on PIC and challenges with scaling to big data. It then describes how p-PIC addresses these challenges using MPI for parallelization. The design of implementing p-PIC within MapReduce is presented, including the map and reduce functions. Experimental results on synthetic datasets up to 100,000 records show that p-PIC using MapReduce has increased performance and scalability compared to the original p-PIC implementation using MPI.
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...IIRindia
Numerous movable applications on secure groceries expenditure and e-health have designed recently. Health aware clients respect such applications for secure groceries expenditure, particularly to avoid irritating groceries and added substances. However, there is the lack of a complete database including organized or unstructured information to help such applications. In the paper propose the Multiple Scoring Frameworks (MSF), a healthy groceries expenditure search service for movable applications using Hadoop and MapReduce (MR). The MSF works in a procedure behind a portable application to give a search service for data on groceries and groceries added substances. MSF works with similar logic from a web search engine (WSE) and it crawls over Web sources cataloguing important data for possible utilize in reacting to questions from movable applications. MSF outline and advancement are featured in the paper during its framework design, inquiry understanding, its utilization of the Hadoop/MapReduce infrastructure, and activity contents.
Efficient Cost Minimization for Big Data ProcessingIRJET Journal
This document discusses efficient cost minimization techniques for big data processing. It characterizes big data processing using a two-dimensional Markov chain model to evaluate expected completion time. The problem is formulated as a mixed non-linear programming problem to optimize data assignment, placement, and migration across distributed data centers. A weighted bloom filter approach is presented to reduce communication costs through distributed incomplete pattern matching.
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value. We then present background technology for big data summarization brings to us. The objective of this paper is to discuss the big data summarization framework, challenges and possible solutions as well as methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.
We then present background technology for big data summarization brings to us. The objective of this
paper is to discuss the big data summarization framework, challenges and possible solutions as well as
methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSaciijournal
This document discusses big data summarization, including the challenges and potential solutions. It presents a framework for big data summarization with four main stages: 1) data clustering to group similar documents, 2) data generalization to abstract data to a higher conceptual level, 3) semantic term identification to identify metadata for more efficient data representation, and 4) evaluation of the summaries. Key challenges addressed include initializing clustering methods, selecting attributes to control generalization, and ensuring semantic associations in representations. Solutions proposed are detailed assessments of clustering initialization methods and statistical approaches for clustering, generalization and term identification.
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.
We then present background technology for big data summarization brings to us. The objective of this
paper is to discuss the big data summarization framework, challenges and possible solutions as well as
methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of
open problems and future directions..
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET Journal
This document provides an overview of K-means++ clustering algorithm and how it can be implemented using MapReduce in cloud computing. It first discusses cloud computing and its ability to handle big data through flexibility and scalability. It then explains Hadoop and MapReduce, which provide a framework to parallelize processing of large datasets. The document describes the K-means++ clustering algorithm, which improves upon standard K-means by better initializing cluster centroids. Finally, it outlines how K-means++ can be implemented in MapReduce by splitting data and computing distances across mappers and reducers.
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
This technical report describes a reconfigurable component-based problem solving environment called DISCWorld. The key features discussed are:
1) DISCWorld uses a data flow model represented as directed acyclic graphs (DAGs) of operators to integrate distributed computing components across networks.
2) It supports both long running simulations and parameter search applications by allowing complex processing requests to be composed graphically or through scripting and executed on heterogeneous platforms.
3) Operators can be simple "pure Java" implementations or wrappers to fast platform-specific implementations, and some operators may represent sub-graphs that can be reconfigured to run across multiple servers for faster execution.
Big Data Processing with Hadoop : A ReviewIRJET Journal
1. This document provides an overview of big data processing with Hadoop. It defines big data and describes the challenges of volume, velocity, variety and variability.
2. Traditional data processing approaches are inadequate for big data due to its scale. Hadoop provides a distributed file system called HDFS and a MapReduce framework to address this.
3. HDFS uses a master-slave architecture with a NameNode and DataNodes to store and retrieve file blocks. MapReduce allows distributed processing of large datasets across clusters through mapping and reducing functions.
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
Big data is the biggest challenges as we need huge processing power system and good algorithms to make a decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.
This document provides a review of Hadoop storage and clustering algorithms. It begins with an introduction to big data and the challenges of storing and processing large, diverse datasets. It then discusses related technologies like cloud computing and Hadoop, including the Hadoop Distributed File System (HDFS) and MapReduce processing model. The document analyzes and compares various clustering techniques like K-means, fuzzy C-means, hierarchical clustering, and Self-Organizing Maps based on parameters such as number of clusters, size of clusters, dataset type, and noise.
Data-Intensive Technologies for CloudComputinghuda2018
This document provides an overview of data-intensive computing technologies for cloud computing. It discusses key concepts like data-parallelism and MapReduce architectures. It also summarizes several data-intensive computing systems including Google MapReduce, Hadoop, and LexisNexis HPCC. Hadoop is an open source implementation of MapReduce while HPCC provides distinct processing environments for batch and online query processing using its proprietary ECL programming language.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Hybrid model for detection of brain tumor using convolution neural networksCSITiaesprime
The development of aberrant brain cells, some of which may turn cancerous, is known as a brain tumor. Magnetic resonance imaging (MRI) scans are the most common technique for finding brain tumors. Information about the aberrant tissue growth in the brain is discernible from the MRI scans. In numerous research papers, machine learning, and deep learning algorithms are used to detect brain tumors. It takes extremely little time to forecast a brain tumor when these algorithms are applied to MRI pictures, and better accuracy makes it easier to treat patients. The radiologist can make speedy decisions because of this forecast. The proposed work creates a hybrid convolution neural networks (CNN) model using CNN for feature extraction and logistic regression (LR). The pre-trained model visual geometry group 16 (VGG16) is used for the extraction of features. To reduce the complexity and parameters to train we eliminated the last eight layers of VGG16. From this transformed model the features are extracted in the form of a vector array. These features fed into different machine learning classifiers like support vector machine (SVM), naïve bayes (NB), LR, extreme gradient boosting (XGBoost), AdaBoost, and random forest for training and testing. The performance of different classifiers is compared. The CNN-LR hybrid combination outperformed the remaining classifiers. The evaluation measures such as recall, precision, F1-score, and accuracy of the proposed CNN-LR model are 94%, 94%, 94%, and 91% respectively.
Implementing lee's model to apply fuzzy time series in forecasting bitcoin priceCSITiaesprime
Over time, cryptocurrencies like Bitcoin have attracted investor's and speculators' interest. Bitcoin's dramatic rise in value in recent years has caught the attention of many who see it as a promising investment asset. After all, Bitcoin investment is inseparable from Bitcoin price volatility that investors must mitigate. This research aims to use Lee's Fuzzy Time Series approach to forecast the price of Bitcoin. A time series analysis method called Lee's Fuzzy Time Series to get around ambiguity and uncertainty in time series data. Ching-Cheng Lee first introduced this approach in his research on time series prediction. This method is a development of several previous fuzzy time series (FTS) models, namely Song and Chissom and Cheng and Chen. According to most previous studies, Lee's model was stated to be able to convey more precise forecasting results than the classic model from the FTS. This study used first and second orders, where researchers obtained error values from the first order of 5.419% and the second order of 4.042%, which means that the forecasting results are excellent. But of both orders, only the first order can be used to predict the next period's Bitcoin price. In the second order, the resulting relations in the next period do not have groups in their fuzzy logical relationship group (FLRG), so they can not predict the price in the next period. This study contributes to considering investors and the general public as a factor in keeping, selling, or purchasing cryptocurrencies.
More Related Content
Similar to AdMap: a framework for advertising using MapReduce pipeline
This document discusses big data and its applications. It begins with an abstract that introduces big data and how extracting valuable information from large amounts of structured and unstructured data can help governments and organizations develop policies. It then discusses key aspects of big data including volume, velocity, and variety. Current big data technologies are outlined such as Hadoop, HBase, and Hive. Some big data problems and applications are also mentioned like using big data in commerce, business, and scientific research to improve forecasting, policies, productivity, and research.
This document evaluates the performance of a hybrid differential evolution-genetic algorithm (DE-GA) approach for load balancing in cloud computing. It first provides background on cloud computing and load balancing. It then describes the DE-GA approach, which uses differential evolution initially and switches to genetic algorithm if needed. The results show that the hybrid DE-GA approach improves performance over differential evolution and genetic algorithm alone, reducing makespan, average response time, and improving resource utilization. The study demonstrates the benefits of the hybrid evolutionary algorithm for an important problem in cloud computing.
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexingijdms
Nowadays, the structure of cloud uses the data processing of simple key value, which cannot adapted in
search of closeness effectively because of the lack of structures of effective indexes, and with the increase of
dimension, the structures of indexes similar to the tree existing could lead to the problem " of the curse of
dimension”. In this paper, we define a new cloud computing architecture in such global index, storage
nodes is based on a Distributed Hash Table (DHT). In order to reduce the cost of calculation, store
different services on the hyperbolic tree by using virtual coordinates thus that greedy routing based on
hyperbolic space properties in order to improve query storage and retrieving performance effectively. Next,
we perform and evaluate our cloud computing indexing structure based on a hyperbolic tree using virtual
coordinates taken in the hyperbolic plane. We show through our experimental results that we compare with
others clouds systems to show our solution ensures consistence and scalability for Cloud platform.
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET Journal
This document provides an overview and comparative analysis of the K-Means++ and Mini Batch K-Means clustering algorithms in cloud computing using MapReduce. It first introduces cloud computing and its advantages for processing big data using Hadoop MapReduce. It then discusses the K-Means++ algorithm as an improved version of the standard K-Means algorithm that initializes cluster centroids more intelligently. Finally, it compares the performance of K-Means++ and Mini Batch K-Means when implemented using MapReduce for large-scale clustering in cloud environments.
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...ijaia
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is gained by means of different tools that are able to provide data and information having different formats and structures into an unique bus system connected to a Big Data. The initial part of the research is focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz Information Miner) Gradient Boosted Trees workflow processing data of the communication system which travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the new KB collected into a Cassandra big data system could be processed through the ESB by predictive algorithms solving possible conflicts between hardware and software. The conflicts are due to the integration of different database technologies and data structures. In order to check the outputs of the Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been tested. The test has been performed on a prototype network system modeling a part of the whole communication system. The paper shows how to validate industrial research by following a complete design and development of a whole communication system network improving business intelligence (BI).
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
This document presents an implementation of the p-PIC clustering algorithm using the MapReduce framework to handle big data. P-PIC is a parallel version of the Power Iteration Clustering (PIC) algorithm that is able to cluster large datasets in a distributed environment. The document first provides background on PIC and challenges with scaling to big data. It then describes how p-PIC addresses these challenges using MPI for parallelization. The design of implementing p-PIC within MapReduce is presented, including the map and reduce functions. Experimental results on synthetic datasets up to 100,000 records show that p-PIC using MapReduce has increased performance and scalability compared to the original p-PIC implementation using MPI.
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...IIRindia
Numerous movable applications on secure groceries expenditure and e-health have designed recently. Health aware clients respect such applications for secure groceries expenditure, particularly to avoid irritating groceries and added substances. However, there is the lack of a complete database including organized or unstructured information to help such applications. In the paper propose the Multiple Scoring Frameworks (MSF), a healthy groceries expenditure search service for movable applications using Hadoop and MapReduce (MR). The MSF works in a procedure behind a portable application to give a search service for data on groceries and groceries added substances. MSF works with similar logic from a web search engine (WSE) and it crawls over Web sources cataloguing important data for possible utilize in reacting to questions from movable applications. MSF outline and advancement are featured in the paper during its framework design, inquiry understanding, its utilization of the Hadoop/MapReduce infrastructure, and activity contents.
Efficient Cost Minimization for Big Data ProcessingIRJET Journal
This document discusses efficient cost minimization techniques for big data processing. It characterizes big data processing using a two-dimensional Markov chain model to evaluate expected completion time. The problem is formulated as a mixed non-linear programming problem to optimize data assignment, placement, and migration across distributed data centers. A weighted bloom filter approach is presented to reduce communication costs through distributed incomplete pattern matching.
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value. We then present background technology for big data summarization brings to us. The objective of this paper is to discuss the big data summarization framework, challenges and possible solutions as well as methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.
We then present background technology for big data summarization brings to us. The objective of this
paper is to discuss the big data summarization framework, challenges and possible solutions as well as
methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSaciijournal
This document discusses big data summarization, including the challenges and potential solutions. It presents a framework for big data summarization with four main stages: 1) data clustering to group similar documents, 2) data generalization to abstract data to a higher conceptual level, 3) semantic term identification to identify metadata for more efficient data representation, and 4) evaluation of the summaries. Key challenges addressed include initializing clustering methods, selecting attributes to control generalization, and ensuring semantic associations in representations. Solutions proposed are detailed assessments of clustering initialization methods and statistical approaches for clustering, generalization and term identification.
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.
We then present background technology for big data summarization brings to us. The objective of this
paper is to discuss the big data summarization framework, challenges and possible solutions as well as
methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of
open problems and future directions..
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET Journal
This document provides an overview of K-means++ clustering algorithm and how it can be implemented using MapReduce in cloud computing. It first discusses cloud computing and its ability to handle big data through flexibility and scalability. It then explains Hadoop and MapReduce, which provide a framework to parallelize processing of large datasets. The document describes the K-means++ clustering algorithm, which improves upon standard K-means by better initializing cluster centroids. Finally, it outlines how K-means++ can be implemented in MapReduce by splitting data and computing distances across mappers and reducers.
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
This technical report describes a reconfigurable component-based problem solving environment called DISCWorld. The key features discussed are:
1) DISCWorld uses a data flow model represented as directed acyclic graphs (DAGs) of operators to integrate distributed computing components across networks.
2) It supports both long running simulations and parameter search applications by allowing complex processing requests to be composed graphically or through scripting and executed on heterogeneous platforms.
3) Operators can be simple "pure Java" implementations or wrappers to fast platform-specific implementations, and some operators may represent sub-graphs that can be reconfigured to run across multiple servers for faster execution.
Big Data Processing with Hadoop : A ReviewIRJET Journal
1. This document provides an overview of big data processing with Hadoop. It defines big data and describes the challenges of volume, velocity, variety and variability.
2. Traditional data processing approaches are inadequate for big data due to its scale. Hadoop provides a distributed file system called HDFS and a MapReduce framework to address this.
3. HDFS uses a master-slave architecture with a NameNode and DataNodes to store and retrieve file blocks. MapReduce allows distributed processing of large datasets across clusters through mapping and reducing functions.
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
Big data is the biggest challenges as we need huge processing power system and good algorithms to make a decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.
This document provides a review of Hadoop storage and clustering algorithms. It begins with an introduction to big data and the challenges of storing and processing large, diverse datasets. It then discusses related technologies like cloud computing and Hadoop, including the Hadoop Distributed File System (HDFS) and MapReduce processing model. The document analyzes and compares various clustering techniques like K-means, fuzzy C-means, hierarchical clustering, and Self-Organizing Maps based on parameters such as number of clusters, size of clusters, dataset type, and noise.
Data-Intensive Technologies for CloudComputinghuda2018
This document provides an overview of data-intensive computing technologies for cloud computing. It discusses key concepts like data-parallelism and MapReduce architectures. It also summarizes several data-intensive computing systems including Google MapReduce, Hadoop, and LexisNexis HPCC. Hadoop is an open source implementation of MapReduce while HPCC provides distinct processing environments for batch and online query processing using its proprietary ECL programming language.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to AdMap: a framework for advertising using MapReduce pipeline (20)
Hybrid model for detection of brain tumor using convolution neural networksCSITiaesprime
The development of aberrant brain cells, some of which may turn cancerous, is known as a brain tumor. Magnetic resonance imaging (MRI) scans are the most common technique for finding brain tumors. Information about the aberrant tissue growth in the brain is discernible from the MRI scans. In numerous research papers, machine learning, and deep learning algorithms are used to detect brain tumors. It takes extremely little time to forecast a brain tumor when these algorithms are applied to MRI pictures, and better accuracy makes it easier to treat patients. The radiologist can make speedy decisions because of this forecast. The proposed work creates a hybrid convolution neural networks (CNN) model using CNN for feature extraction and logistic regression (LR). The pre-trained model visual geometry group 16 (VGG16) is used for the extraction of features. To reduce the complexity and parameters to train we eliminated the last eight layers of VGG16. From this transformed model the features are extracted in the form of a vector array. These features fed into different machine learning classifiers like support vector machine (SVM), naïve bayes (NB), LR, extreme gradient boosting (XGBoost), AdaBoost, and random forest for training and testing. The performance of different classifiers is compared. The CNN-LR hybrid combination outperformed the remaining classifiers. The evaluation measures such as recall, precision, F1-score, and accuracy of the proposed CNN-LR model are 94%, 94%, 94%, and 91% respectively.
Implementing lee's model to apply fuzzy time series in forecasting bitcoin priceCSITiaesprime
Over time, cryptocurrencies like Bitcoin have attracted investor's and speculators' interest. Bitcoin's dramatic rise in value in recent years has caught the attention of many who see it as a promising investment asset. After all, Bitcoin investment is inseparable from Bitcoin price volatility that investors must mitigate. This research aims to use Lee's Fuzzy Time Series approach to forecast the price of Bitcoin. A time series analysis method called Lee's Fuzzy Time Series to get around ambiguity and uncertainty in time series data. Ching-Cheng Lee first introduced this approach in his research on time series prediction. This method is a development of several previous fuzzy time series (FTS) models, namely Song and Chissom and Cheng and Chen. According to most previous studies, Lee's model was stated to be able to convey more precise forecasting results than the classic model from the FTS. This study used first and second orders, where researchers obtained error values from the first order of 5.419% and the second order of 4.042%, which means that the forecasting results are excellent. But of both orders, only the first order can be used to predict the next period's Bitcoin price. In the second order, the resulting relations in the next period do not have groups in their fuzzy logical relationship group (FLRG), so they can not predict the price in the next period. This study contributes to considering investors and the general public as a factor in keeping, selling, or purchasing cryptocurrencies.
Sentiment analysis of online licensing service quality in the energy and mine...CSITiaesprime
The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019 to February 2023, with 24,112 entries. The choice of classification methods on the highest accuracy values among decision tree, random forest, naive bayes, stochastic gradient descent, logistic regression (LR), and k-nearest neighbor. The text data was converted into numerical form using CountVectorizer and term frequency-inverse document frequency (TF-IDF) techniques, along with unigrams and bigrams for dividing sentences into word segments. LR bigram CountVectorizer ranked highest with 89% for average precision, F1-score, and recall, compared to the other five classification methods. The sentiment analysis polarity level was 36.2% negative. Negative sentiment revealed expectations from the public to the ministry to improve the top three aspects: system, mechanism, and procedure; infrastructure and facilities; and service specification product types.
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...CSITiaesprime
This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. This study aims to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiments can change along with the development of Indonesian tourism itself.
The impact of usability in information technology projectsCSITiaesprime
Achieving success in information system and technology (IS/IT) projects is a complex and multifaceted endeavour that has proven difficult. The literature is replete with project failures, but identifying the critical success factors contributing to favourable outcomes remains challenging. The triad of Time-Cost-Quality is widely accepted as key to achieving project success. While time and cost can be quantified and measured, quality is a more complex construct that requires different metrics and measurement approaches. Utilizing the PRISMA Methodology, this study initiated a comprehensive search across literature databases and identified 142 relevant articles pertaining to the specified keywords. A subset of ten articles was deemed suitable for further examination through rigorous screening and eligibility assessments. Notably, a primary finding indicates that despite recognizing usability as a critical element, there is a tendency to neglect usability enhancements due to time and resource constraints. Regarding the influence of usability on project success, the active involvement of end-users emerges as a pivotal factor. Moreover, fostering the enhancement of Human Computer Interaction (HCI) knowledge within the development team is essential. Failure to provide good usability can lead to project failure, undermining user satisfaction and adoption of the technology.
Collecting and analyzing network-based evidenceCSITiaesprime
Since nearly the beginning of the Internet, malware has been a significant deterrent to productivity for end users, both personal and business related. Due to the pervasiveness of digital technologies in all aspects of human lives, it is increasingly unlikely that a digital device is involved as goal, medium or simply ‘witness’ of a criminal event. Forensic investigations include collection, recovery, analysis, and presentation of information stored on network devices and related to network crimes. These activities often involve wide range of analysis tools and application of different methods. This work presents methods that helps digital investigators to correlate and present information acquired from forensic data, with the aim to get a more valuable reconstructions of events or action to reach case conclusions. Main aim of network forensic is to gather evidence. Additionally, the evidence obtained during the investigation must be produced through a rigorous investigation procedure in a legal context.
Agile adoption challenges in insurance: a systematic literature and expert re...CSITiaesprime
The drawback of agile is struggled to function in large businesses like banks, insurance companies, and government agencies, which are frequently associated with cumbersome processes. Traditional software development techniques were cumbersome and pay more attention to standardization and industry, this leads to high costs and prolonged costs. The insurance company does not embrace change and agility may find themselves distracted and lose customers to agile competitors who are more relevant and customer-centric. Thus, to investigate the challenges and to recognize the prospect of agile adoption in insurance industry, a systematic literature review (SLR) in this study was organized and validated by expert review from professional with expertise in agile. The project performance domain from project management body of knowledge (PMBOK) was applied to align the challenges and the solution. Academicians and practitioners can acquire the perception and knowledge in having exceeded understanding about the challenge and solution of agile adoption from the results.
Exploring network security threats through text mining techniques: a comprehe...CSITiaesprime
In response to the escalating cybersecurity threats, this research focuses on leveraging text mining techniques to analyze network security data effectively. The study utilizes user-generated reports detailing attacks on server networks. Employing clustering algorithms, these reports are grouped based on threat levels. Additionally, a classification algorithm discerns whether network activities pose security risks. The research achieves a noteworthy 93% accuracy in text classification, showcasing the efficacy of these techniques. The novelty lies in classifying security threat report logs according to their threat levels. Prioritizing high-risk threats, this approach aids network management in strategic focus. By enabling swift identification and categorization of network security threats, this research equips organizations to take prompt, targeted actions, enhancing overall network security.
An LSTM-based prediction model for gradient-descending optimization in virtua...CSITiaesprime
A virtual learning environment (VLE) is an online learning platform that allows many students, even millions, to study according to their interests without being limited by space and time. Online learning environments have many benefits, but they also have some drawbacks, such as high dropout rates, low engagement, and students' self-regulated behavior. Evaluating and analyzing the students' data generated from online learning platforms can help instructors to understand and monitor students learning progress. In this study, we suggest a predictive model for assessing student success in online learning. We investigate the effect of hyperparameters on the prediction of student learning outcomes in VLEs by the long short-term memory (LSTM) model. A hyperparameter is a parameter that has an impact on prediction results. Two optimization algorithms, adaptive moment estimation (Adam) and Nesterov-accelerated adaptive moment estimation (Nadam), were used to modify the LSTM model's hyperparameters. Based on the findings of research done on the optimization of the LSTM model using the Adam and Nadam algorithm. The average accuracy of the LSTM model using Nadam optimization is 89%, with a maximum accuracy of 93%. The LSTM model with Nadam optimisation performs better than the model with Adam optimisation when predicting students in online learning.
Generalization of linear and non-linear support vector machine in multiple fi...CSITiaesprime
Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. They belong to a family of generalized linear classifiers. In other terms, SVM is a classification and regression prediction tool that uses machine learning theory to maximize predictive accuracy. In this article, the discussion about linear and non-linear SVM classifiers with their functions and parameters is investigated. Due to the equality type of constraints in the formulation, the solution follows from solving a set of linear equations. Besides this, if the under-consideration problem is in the form of a non-linear case, then the problem must convert into linear separable form with the help of kernel trick and solve it according to the methods. Some important algorithms related to sentimental work are also presented in this paper. Generalization of the formulation of linear and non-linear SVMs is also open in this article. In the final section of this paper, the different modified sections of SVM are discussed which are modified by different research for different purposes.
Designing a framework for blockchain-based e-voting system for LibyaCSITiaesprime
A transition to democratic rule is considered the first step down a long road towards Libya’s recovery and prosperity. Thus, it strives to improve the country’s elections by introducing new technologies. A blockchain is a distributed ledger that is characterised by independence and security. Therefore, it has been widely applied in various fields ranging from credit encryption and digital currency. With the development of internet technology, electronic voting (E-voting) systems have been greatly popularised. However, they suffer from various security threats, which create a sense of distrust among existing systems. Integrating blockchain with online elections is a promising trend, which could lead to make an election transparent, immutable, reliable, and more secure. In this paper, we present a literature review and a case analysis of blockchain technology. Moreover, a framework for an E-voting system based on blockchain is proposed. The methodology is adopted on the basis of three activities, they are identification of the relevant literature about E-voting, system modelling, and the determination of suitable technological tools. The framework is secure and reliable. Thus, it could help increase the number of voters and ensure a high level of participation, as well as facilitate free and fair electoral processes.
Development reference model to build management reporter using dynamics great...CSITiaesprime
The digital technology transformation impacts changes in business patterns that require companies to innovate to act appropriately in making strategic decisions quickly, precisely, and accurately to increase efficiency, be practical company performance, and impacts changes in business patterns that require companies to innovate to act appropriately in making strategic decisions quickly to improve the performance. An enterprise resource planning (ERP) system is one step toward achieving performance. ERP system is one step to achieving performance. ERP system is essential for companies to automate the efficiency of business processes. The decisions from management in implementing the ERP system are necessary for ERP implementation to be successful. However, in practice, companies still experience complexity. For that, it needs to be considered related a business process reference model is essential to enhance efficiency in implementing the ERP used. This research discusses the business process reference model based on the ERP dynamics great plain (GP) application aggregated using management reporter (MR) to help users better understand the practical overview. The methodology utilizes a reference model based on Microsoft Dynamics GP guidelines with a business process redesign approach. This contributes to developing business processes to help users understand using the ERP dynamics GP application.
Social media and optimization of the promotion of Lake Toba tourism destinati...CSITiaesprime
Tourism is one of the largest contributors to Indonesia's foreign exchange earnings, surpassing taxation, energy, and gas. This study seeks to investigate the use of social media to optimize the promotion of Lake Toba as a tourist destination, which has been impacted by the COVID-19 pandemic. Using interview techniques and live field observations, it was discovered that social media, particularly the Instagram platform, play a significant role in promoting Lake Toba tourism. The Department of Culture and Tourism of the North Sumatra Province uses landscape photography as its primary promotion method, which has proved to be more effective and interesting than conventional methods such as the distribution of brochures or the use of manuals. The capture procedure and techniques for landscape photography were carried out by professional photographers in collaboration with the Department of Culture and Tourism of the North Sumatra Province. In addition to providing information, tourism sumut's Instagram account functions as a platform to raise public awareness about Lake Toba tourism and as a promotional medium for North Sumatra's tourist attractions on an international scale. Department of Culture and Tourism of the North Sumatra Province collaborates with travel agencies and local communities to disseminate Lake Toba tourism information.
Caring jacket: health monitoring jacket integrated with the internet of thing...CSITiaesprime
One of the policies that have been made by the World Health Organization (WHO) and the Indonesian government during this COVID-19 pandemic, is to use an oximeter for self-isolation patients. The oximeter is used to monitor the patient if happy hypoxia which is a silent killer, happens to the patient. To maintain body endurance, exercise is needed by COVID-19 patients, but doing too much exercise can also cause decreased immunity. That’s why fatigue level and exercise intensity need to be monitored. When exercising, social distancing protocol should be also reminded because can lower COVID-19 spreading up to 13.6%. To solve this issue, the Caring Jacket is proposed which is a health monitoring jacket integrated with an IoT system. This jacket is equipped with some sensors and the global positioning system (GPS) for tracking. The data from the test showed the temperature reading accuracy is up to 99.38%, the oxygen rate up to 97.31%, the beats per minute (BPM) sensor up to 97.82%, and the precision of all sensors is 97.00% compared with a calibrated device.
Net impact implementation application development life-cycle management in ba...CSITiaesprime
Digital transformation in the banking sector creates a lot of demand for application development, either new development or application enhancement. Continuous demand for reimagining, revamping, and running applications reliably needs to be supported by collaboration tools. Several big banks in Indonesia use Atlassian products, including Jira, Confluence, Bamboo, Bitbucket, and Crowd, to support strategic company projects. We need to measure the net impact of application development life-cycle management (ADLM) as a collaboration tool. Using the deLone and McLean model, process questionnaire data from banks in Indonesia that use ADLM. Processing data using structural equation modeling (SEM), multiple variables are analyzed statistically to establish, estimate, and test the causation model. The conclusions highlight that system quality strongly affected only User Satisfaction (p=0.049 and β=0.39). Information quality strongly affected use (p=0.001 and β=0.84) and strongly affected user satisfaction (p=0.169 and β=0.28). Service quality strongly affected only use (p=0.127 and β=0.31). Conclusion research verifies the information system's achievement approach described by DeLone and McLean. Importantly, it was discovered that system usability and quality were key indicators of ADLM success. To fulfill their objective, ADLM must be developed in a way that is simple to use, adaptable, and functional.
Circularly polarized metamaterial Antenna in energy harvesting wearable commu...CSITiaesprime
When battery powered sensors are spread out in places that are sometimes hard to reach, sustaining them become difficult. Therefore, to develop this technology on a large scale such as in the internet of things (IoT) scenario, it is necessary to figure out how to power them. The proffered solution in this work, is to get energy from the environment using energy harvesting Antennas. This work presents a wearable circular polarized efficient receiving and transmitting sensors for medical, IoT, and communication systems at the frequency range of WLAN, and GSM from 900 MHz up to 6 GHz. Using a cascaded system block of a circularly polarized Antenna, a rectifier and t-matching network, the design was successfully simulated. A DC charging voltage of 2.8V was achieved to power-up batteries of the wearable and IoT sensors. The major contribution of this work is the tri-band Antenna system which is able to harvest reflected Wi-Fi frequencies and also GSM frequencies combined in a miniaturized manner. This innovative configuration is a step forward in building devices with over 80% duty cycle.
An ensemble approach for the identification and classification of crime tweet...CSITiaesprime
Twitter is a famous social media platform, which supports short posts limited to 280 characters. Users tweet about many topics like movie reviews, customer service, meals they just ate, and awareness posts. Tweets carrying information about some crime scenes are crime tweets. Crime tweets are crucial and informative and separate classification is required. Identification and classification of crime tweets is a challenging task and has been the researcher’s latest interest. The researchers used different approaches to identify and classify crime tweets. This research has used an ensemble approach for the identification and classification of crime tweets. Tweepy and Twint libraries were used to collect datasets from Twitter. Both libraries use contrasting methods for extracting tweets from Twitter. This research has applied many ensemble approaches for the identification and classification of crime tweets. Logistic regression (LR), support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and random forest (RF) Classifier assigned with the weights of 1,2,1,1 and 1 respectively ensemble together by a soft weighted Voting classifier along with term frequency – inverse document frequency (TF-IDF) vectorizer gives the best performance with an accuracy of 96.2% on the testing dataset.
National archival platform system design using the microservice-based service...CSITiaesprime
Archives play a vital function concerning the dynamics of people and nations as an instrument to treasure information in diverse domains of politics, society, economics, culture, science, and technology. The acceleration of digital transformation triggers the implementation of a smart government that supports better public services. The smart government encourages a national archival system to facilitate archive producers and users. The four electronic-based government system (SPBE) factors in the archival sector and open archival information system (OAIS) as a data preservation standard are the benchmarks in developing this study's national archival platform system. An improved service computing system engineering (SCSE) framework adapted to the microservice architecture is used to aid the design of the national archival platform system. The proposed design met the four-factor service design validation of coupling, cohesion, complexity, and reusability. Also, the prototype suggests what resources are needed to put the design into action by passing the performance test of availability measurement.
Antispoofing in face biometrics: A comprehensive study on software-based tech...CSITiaesprime
The vulnerability of the face recognition system to spoofing attacks has piqued the biometric community's interest, motivating them to develop anti-spoofing techniques to secure it. Photo, video, or mask attacks can compromise face biometric systems (types of presentation attacks). Spoofing attacks are detected using liveness detection techniques, which determine whether the facial image presented at a biometric system is a live face or a fake version of it. We discuss the classification of face anti-spoofing techniques in this paper. Anti-spoofing techniques are divided into two categories: hardware and software methods. Hardware-based techniques are summarized briefly. A comprehensive study on software-based countermeasures for presentation attacks is discussed, which are further divided into static and dynamic methods. We cited a few publicly available presentation attack datasets and calculated a few metrics to demonstrate the value of anti-spoofing techniques.
The antecedent e-government quality for public behaviour intention, and exten...CSITiaesprime
An The main objective of the study is to identify the antecedent of leadership quality, public satisfaction and public behaviour intention of e-government service. Also, this study integrated e-government quality to expectation-confirmation model. In order to achieve these goals, observational research was then carried out to collect primary information, using the method of data dissemination and obtaining the opinion of 360 from the public using the e-government service and some of the e-government and software quality experts. The results of the study show that the positive association among the e-government services quality and public perceived usefulness, public expectation confirmation, leadership quality and public satisfaction that also play a positive role on the public behavior intention.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
In our second session, we shall learn all about the main features and fundamentals of UiPath Studio that enable us to use the building blocks for any automation project.
📕 Detailed agenda:
Variables and Datatypes
Workflow Layouts
Arguments
Control Flows and Loops
Conditional Statements
💻 Extra training through UiPath Academy:
Variables, Constants, and Arguments in Studio
Control Flow in Studio
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
AdMap: a framework for advertising using MapReduce pipeline
1. Computer Science and Information Technologies
Vol. 3, No. 2, July 2022, pp. 82~93
ISSN: 2722-3221, DOI: 10.11591/csit.v3i2.pp82-93 82
Journal homepage: http://iaesprime.com/index.php/csit
AdMap: a framework for advertising using MapReduce
pipeline
Abhay Chaudhary1
, K R Batwada2
, Namita Mittal2
, Emmanuel S. Pilli2
1
Department of Computer Science and Engineering, Vellore Institute of Technology-AP, Andhra Pradesh, India
2
Department of Computer Science and Engineering, Malaviya National Institute of Technology, Jaipur, India
Article Info ABSTRACT
Article history:
Received Mar 5, 2021
Revised May 29, 2022
Accepted Jun 12, 2022
There is a vast collection of data for consumers due to tremendous
development in digital marketing. For their ads or for consumers to validate
nearby services which already are upgraded to the dataset systems,
consumers are more concerned with the amount of data. Hence there is a
void formed between the producer and the client. To fill that void, there is
the need for a framework which can facilitate all the needs for query
updating of the data. The present systems have some shortcomings by a vast
number of information that each time lead to decision tree-based approach.
A systematic solution to the automated incorporation of data into a Hadoop
distributed file system (HDFS) warehouse (Hadoop file system) includes a
data hub server, a generic data charging mechanism and a metadata model.
In our model framework, the database would be able to govern the data
processing schema. In the future, as a variety of data is archived, the
datalake will play a critical role in managing that data. To order to carry out
a planned loading function, the setup files immense catalogue move the
datahub server together to attach the miscellaneous details dynamically to its
schemas.
Keywords:
Advertising
Advertising and publishing
Data lake
Data warehouse
Hadoop distributed file system
Map reduce
This is an open access article under the CC BY-SA license.
Corresponding Author:
Abhay Chaudhary
Department of Computer Science and Engineering, Vellore Institute of Technology
Amaravati, Andhra Pradesh, India
Email: abhaychaudharydps@gmail.com
1. INTRODUCTION
Advertising can be called a convergent feature represented in two essential fields of science:
interaction and sales. From the scholarly and realistic roots, an advertisement was tackled both as a form of
networking for those interested in contemporary economic practices, and in addressing the communication
issues of separate organisations, for example, the newspaper. In general, advertisement and networking are
elements of modern social and economic processes [1], [2]. In culture today, advertisement has developed
into a dynamic communication mechanism that is important for the public as well as organisations. Over
time, messaging has become an integral aspect of marketing programmes, with the potential to submit letter
precisely planned for its goals. During their goods and services in the global markets, various enterprises,
beginning with multinational and local companies, assign growing significance to the advertisement [3].
Consumers have learned to use commercial data in their purchasing decisions in working consumer
economies [4], [5].
Besides broad scope text and data analysis, MapReduce is indeed an integer programming model.
Conventional MapReduce implementation, for instance, its Hadoop, needs to access the entire collection into
some kind of cluster even before evaluation is performed [6]. In situations whereby content is massive, data
will not be enabled but stored again, and again-for example of logs, safety reports even protected messages-
2. Comput Sci Inf Technol ISSN: 2722-3221
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
83
this will lead to significant congestion. We suggest an approach to the data pipeline to mask data latency in
the study of MapReduce. Our Hadoop MapReduce-based implementation is utterly transparent to the user.
The distributed competitor queue includes the allocation and syncing of data blocks such that data upload and
implementation overlap. This article addresses main concerns: the specific limit of mapping, as well as the
configurable interactive amount of mapping, facilitate unidentified data entry. We also use a delay scheduler
for data pipeline data localisation. The test of the solution for multiple real-world data sets implementations
reveals that our methodology demonstrates efficiency improvements [7], [8].
In business and academia, the MapReduce method has significant advantages. It offers a basic
programming paradigm with optimised fault tolerance and parallel synchronisation for large-scale data
processing [9]. The previous design of MapReduce follows a hierarchical data format with huge volumes
reflected throughout hierarchical files. The framework maps its activities then decreases them which fit for
regional data sources. Clickstream log analysis should be used by programmes that can benefit from a
systematic approach to data loading. The algorithms are searched linearly, page by page, to allow streaming
approaches to load data.
In MapReduce, we recommend an approach to data pipelines. As the logical unit of the stream, the
databank uses the storage node [10]. That is, once a block is fully uploaded, it shall begin processing. Map
tasks are begun, providing overlapped execution and the data up load before the final upload process is
complete. In specific, to coordinate the distributed file system, we suggest adding a distributed competition
queue and task elimination for MapReduce. The filesystem is a creator and generates block metadata on the
queue, while MapReduce employees are a customer and shall be alerted when block metadata are available
for them. Standard MapReduce practises data from the input and begins a sequence of map activities
dependent on the number of input separates.
A pipeline-coordinating architecture for data collection
Processing with MapReduce.
Two types of work visualisation were vague data processing of location understanding or avoidance of
error.
An apparent client data pipeline execution that does not require revisions to the current code.
Experimental findings showed better results.
In the science world, prototype matching strategies are omnipresent. Matching techniques have been
applied in an array of research disciplines ranging as of medical electrocardiogram evaluation to gene
expression in genetics, communications signature identification, geoscience seismic signals processing and
computer vision photo-tracking and recognition [11].
Model matching is a conventional approach since it is accurate, simple to read, and can be measured
or prototyped quickly. To evaluate correlations between two objects, which are single or multi-dimensional
signals, such as time series of images, the basic definition of matching models is there. There are different
methods to measure resemblance and including total variance. In this paper, we will be discussing the
background of the Advertising field and its heterogeneous relationship with data lake. Further, the paper talks
about the related work done in the field of advertising and overall review of the novel approach towards
advertising and MapReduce framework. The last section summarises the work present in this paper covers
the future scope in detail.
2. BACKGROUND
Advertising is one of the methods to sell goods to customers. Many ads are available selling
attractive advertising packages. In revealing the positive and the poor to the public, the media plays a
significant role [12], [13]. Local media have been Internationalised with the proliferation of emerging
technology and public demand. The technique concerns the efficient loading of heterogeneous data sources to
an ever-evolving data warehouse. More precisely, innovation involves the usage of a MapReduce system for
meta-data driven data ingestion. Usually, a data store is designed on the highest of an accessible collection
framework, for instance, Hadoop in the broad data sector [14]. Hadoop is a distributed, open-source
computer environment that uses MapReduce. MapReduce is a system for addressing massive data sets with a
massive range of very distributed problems. Many machines, or grids (if nodes are distinct in hardware), are
collectively called the cluster (when all nodes are prepared with the same hardware). Program processing
may happen on either an unstructured or a structured file system database.
In Figure 1, the practical schema 2 takes the sequence of operations carried out by average internet
user and all the metadata collected by the user into account. Chart step: The network layer splits your
feedback through minor annoyances then allocates everything to either the branches of the team. An
employee's path that contributes to something like a layout of a network in several sections is being done
once more. The employee cluster deals more with a larger problem, then gives itself the primary cluster.
3. ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
84
Lower step: The master node then gathers answers to all the subproblems and integrates them to form the
performance in some manner, i.e. the solution to the question that it wanted to solve.
Figure 1. Advertising ecosystem for heterogeneous data
MapReduce enables map analysis and reduction operations to be spread. While increasing routing
procedure can be done simultaneously, independently of specific processing procedures, throughout the fact
that amount with independent data sources as well as the number of CPUs near the increasing source is
limited [15]. Similarly, a variety of reduction stages can indeed be carried through given that only certain
operations and in the chart were concurrently displayed to the very same absorber with much the same
button. MapReduce may be extended to far bigger databases than generic servers. However, this method can
sometimes seem to have been inefficient in comparison to more sequential algorithms. Therefore, massive
server farm will use MapReduce in just a few hours to sort a petabyte of results. Parallelism also allows us to
recover from a partial server failure or storage during such kind of operation if one work can be changed as
long as the input data is available, only if mapper or reducer fails [16].
A distributed, flexible, and versatile file scheme transcribed in Java designed for the Hadoop project
function is the Hadoop file system (HDFS). Usually, every Hadoop node has one data node; the HDFS
cluster is a cluster of knowledge nodes. The condition is reasonable since not every node needs the existence
of a data node. A block protocol specific to HDFS is used for every single statistics node to serve chunks of
information across the system [17]. The file system is communicated using the transmission control
protocol/internet protocol (TCP/IP) layer. Customers are using remote procedure call (RPC) for
intercommunication. HDFS holds huge files on many computers (the optimal file size is 64 MB). Reliability
is accomplished by multiple host replication of the data, such that redundant array of independent disks
(RAID) storage on hosts is not necessary. Data on three nodes are stored with the default replication value,
i.e. 3; two on one stack as well as any on an additional stack. Information nodes will converse with each
other re-equilibrate information, push versions all around and maintain high data replication [18]. It is a
practical examination of how to effectively load heterogeneous data sources into a data centre for
everchanging schemas.
3. RELATED WORK
Exploration has investigated pipeline/data running for data latency reduction. Common metadata
repository (CMR) corresponds downloading and retrieval of data by storage of input data in a memory
buffer. C-MR is restricted to operate on a single computer and thus unfit for large data sets. Too, C-MR
needs an application programming interfaces (API) that eliminates a user's need for a new API. Claudia et al.
to overlap data transfer and process, implement a stream processing engine. This method does not work with
the programming model MapReduce and does not tolerate job failure. The distinction in our work is that we
retain the standard programming model MapReduce and use existing features from Hadoop such as failure
tolerance, and high scalability [19]. Moreover, we are fully open to consumers in our implementation.
The conventional cross-correlation techniques for single example create a linear statistical
relationship between the unique features of two objects. To look for the highest correlation coefficient, a
formula is serially interpreted over a signal or image, and a match may be found when a threshold
requirement is met. Continuous approaches also take models into account [20]. A template package serves as
4. Comput Sci Inf Technol ISSN: 2722-3221
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
85
a proof source of identification in one single signal of interest of numerous features. Any template in the set
represents a hypothesis; the highest coefficient template is seen to be the winner.
Three questions arise:
What can be achieved with this serial? Algorithms do not scale more massive datasets,
What should be returned if several templates
Record the coefficients for identical correlations) how sure is the stated template to be true?
4. ARCHITECTURE OF AdMap
This technique provides a general approach to data intake automatically in an HDFS-based data
warehouse. Amongst other modifications is a data jack, a standard data loading pipelines and an multiple
dwelling units (MDU) pattern, which together tackle the reliability of the loading of data, heterogeneities of
data sources and changing data warehouse schemas. The meta-data model includes configuration and
catalogue files setup per ingestion task is the configure file. The database administers the schema for the data
centre [21]. Towards inserting the diverse information to the objective outlines, the configuration files in
addition to the catalogue collaboratively drive the data subway.
In one particular use, the technique's implementations provide methods for automating the loading
of marketing data into a database. If customer requirements increase and all sources of an online campaign,
video, social and email. are integrated, there is a wide range of media channels. Marketers usually distribute
marketing transactions through multiple platforms, not just to a single advertising supplier. Such marketers
and advertisers are thus overarchingly concerned. It should be able to access an all-in-one central dashboard
and then comes up with a comprehensive perspective wherever the advertising plan happens expended in
addition to the overwhelming aggregate expenditures on the various broadcasting networks [22].
The creative measures are a fixed-key software intake system to promote this criterion. There are
various data schemas across heterogeneous data sources through these media networks. To incorporate these
various data sets into a specific system, marketers and advertisers will render detailed schema mappings.
Therefore, the heterogeneous schema is a part of the innovation.
Figure 1 is the illustration which shows the advertisement network, which displays various traces of
information such as advertisement cluster. As shown in Figure 1, Tv Channels 1-3 and Video Channel 1
operate for an advertiser automated border control (ABC). The report information for the ABC advertiser is
provided for each channel. The ABC advertiser needs to incorporate stories into a specific network system,
such as Amobee. The advertiser demands the transmission to the popular media schema for each platform
that it deals with [23]. This displays a basic query against the raising schema Report. In Figure 1 installation
of ingestion demands the advertiser ABC to takes place at different times.
ABC asked Channel 1 marketers to submit the monitoring data to Short Schema Versions 1 on 2
April 2020. Set up a configuration tab to periodically enter the data every day. 2 months later, on 2 June
2020, ABC demanded the sending to the popular schema version of Channel 2 of its reports info. Thus a
configuration file has been created. One month after, on 3 July 2020, Video Channel 1 is used as broadcast
by ABC advertiser. The new channel was told to submit a report to the standard schema of Turn.
Nevertheless, as the video network has several distinct areas which remained non specified by
Turn's Schematic, edition 1, standard schematic has been created by introducing two additional columns
(pause case, case complete). Video channel 1 can be modified with these changes Ingesting data into the
standard Schema of Turn. ABC requests to configure a second configuration to absorb data on channel 3, one
month later, on 10 August 2020. This time, the configuration file was created because of the schema. The
current schema targets version 2. In this dynamic business scenario, the novel approach makes the intake
request very smooth. As the standard system changes, a conventional design setup can rarely be modified
[24]. After heterogeneous data is ingested, you can view across an over-network statement that gives a
worldwide viewpoint of broadcasting consumption meant for advertisers by merely querying the standard
schema.
4.1. Data ingestion
Data pipelines move oil data from platform and database sources to computational and business
intelligence (BI) tools from software-as-service (SaaS) data management systems. Now businesses may use a
range of sources to pick data analytics based on big data. They need access to all their analytical and
commercial intelligence sources to make better decisions. The data then will be ingested into the Hadoop
Cluster, which then will process the data as per requirements. Figure 2 is a schematic cluster drawing which
shows the architecture by the technique for loading data. Data from various file transfer protocol (FTP)
servers are received as well as inserted in the Hadoop collection that includes numerous personal computers
linked to the service in conjunction with the management of Hadoop software [25]. A data hub server is in
charge of monitoring the loading of data. The server generates a range of tasks to perform the loading task.
5. ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
86
Some work is done on the datahub server; depending on the work nature, some work is carried out on the
Hadoop cluster. The datahub system also tracks and schedules pipeline function through the Hadoop and
ZooKeeper system communications.
Figure 2. Schematic cluster drawing
A Zookeeper cluster remains dispersed public synchronisation provision used for dispersed
submissions for this discussion. The event shows that a straightforward establishment of primitives knows
how to be used by distributed applications to implement higher-level synchronisation services such as
maintenance, and calling classes. It is developed uncomplicated to plan as well as utilises an information
prototype constructed according to well-known file system directory tree layout. It is operated in Java, and it
has Java and C bindings.
The professional individual should understand the elements of the technology like Hadoop,
MapReduce, ZooKeeper, and so forth and become acquainted with them for the debate. However, the
innovation herein must be understood that it is not restricted to the usage of these elements alone and in some
combination of some specific nature of these elements.
4.2. Datahub server
Figure 3 is a representation which shows the data hub cluster as a data packing generic innovation
platform. Extracts, transforms, and loads (ETL) files in every ingest process are taken through a pipeline to
the target. The tube has 5 linearly dependent jobs running one after the other. The data intake pipeline is
subject to a particular task for each job. The system creation has been tracked by a condition report residing
inside a perpetual medium along with a Hadoop cluster. A condition report can always be coordinated from
Kubernetes applications through some kind of networking link between handling huge server, as shown in
Figure 2 [26]. The tube is working sequentially. To complete its corresponding task, it can call on
MapReduce for each stage on the Hadoop cluster.
Figure 3. Data loading framework
The data loading process for the datahub server consists of 5 steps,
a) Transfer and procedure: This role runs mostly on the data hub network. This relates towards a document
or network progress report, which helps one can decide if and how the sources could be downloaded
into a locally owned and operated repository but instead transferred to the datahub client (ideally
uncompressed if required).
6. Comput Sci Inf Technol ISSN: 2722-3221
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
87
b) Checking safety: It is a MapReduce function, runs on a Hadoop cluster by a data hub application. It
once analyses the feedback documents and determines whether the response information is a legitimate
source of statistics. This moves correct response documents to the later step during the channel; then a
diagram decreases function at this time on the way to decrease the information analysing period
dramatically.
c) MR sign up job: a MapReduce position on Hadoop cluster that is driven by the Datahub server. It
initially reads both the files of new customers as well as the available statistics depository documents in
the destination. Subsequently, this one combines the two data points to be consumed for the following
task. Again, the map reduction job is used to analyses data efficiently.
d) Commitjob: This job is a simple wrap-up work on a Hadoop cluster driven by the data hub server. It
renames the previous work output folders of MapReduce into an output folder, whose contents are used
for input. It additionally revises channel status records toward how the stacking progresses.
e) Data Consumption: This work is a Hadoop cluster MapReduce work. It uses the entire connection
production after the preceding periods of the channel as well as incorporates the results incorporating
connection keen on target information documents.
4.3. Metadata management
The datahub repository offers a framework through computation to navigate input information to
both the objective. Through multiple pipelines examples, its documentation is examined to individual ingest
activities. In Figure 3 during the update as well as convert function, each configuration file is read in to
decide which FTP site to be accessed, how the passwords should be obtained, how documents should be
searched and how they should be uploaded to Hadoop cluster. Then, you have to verify its fitness, write data
input and then use that to search it. Throughout addition, a server will then be reviewed for analysis of
quantitative evidence to compliance with either the defined objective scheme [27]. The MapReduce career
entry appears throughout the archives but database for similar work. Then commitment role inspects the file
system to determine where even the information from either the prior employer was located throughout the
system.
The partition of curriculum and metadata benefits from clean-cut between program and metadata so
that the optimisation of programs and the modelling of workflows can be carried out independently and
generally. The discussion details the modelling of metadata during data intake. Several sections compose the
meta-data modelling. The first component schema modelling is used to build a database, and the second part
was modelling the system configuration ration where a configuration file is set up per ingestion mission.
4.3.1. Data cataloguing
Changing business needs often change in the standard schema in big data management. Schema
evolution is a must without code modification. Inventory configuration modifications to a goal schema use
the features of the schema. Metadata IDs: Such a pattern suggests an integral series resembling other
accessible schemes. For just about any column structure, a simple datatype becomes also allocated as ID. For,
e.g., Database IDs=1,2,3 implies that there have been 3 open.
ID. Name: The short name of the specified Database table is stored on this site.
ID.latestVersion: The last version of the ID-identified table is stored in this property. 1.latest version=3, for
example, means that the Schema has 3 editions, as well as the newest edition is 3.
ID.Storage.Hadoop Cluster: It stores the utter HDFS route to maintain the table file. ID.description: It stores
the ID-discovered versioned table schematic.
ID.Version. Default: A default value of this object is transferred to the versioned table schema defined by the
ID. E.g., 1.1.default=1.0 is the default value for the first column, and 0 is the second column's default value.
The structure herein shows the past of developing any schema with the properties mentioned.
Therefore, a record from a prior iteration of a system to a subsequent model of the same system may be
continuously created by checking the database. Suppose a database is usable in schema version K. The record
may be built to the same schema in two stages, but the K+1 version is the default value [28].
Step 1: The system will first create a default log using K+1, which is then instantiated the default record with
K+1 version's default values.
Step 2: The program would then scan the database for the type K variations from K+1 schemas edition,
which utilises edition K's information automatically towards overwriting the information in Phase 1.
avoidance database. When the column of Version K is equivalent to the column of Version K+1, a similar
copy would be rendered and, when possible, typecast. If a variant K column does not have this relationship,
the column would be removed. The new record created containing the K+1 edition of the schematic along
with one or the other edition K's before the nonappearance K+1 version values after the two steps are
completed.
7. ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
88
4.3.2. Configurational file
The complexity of reconciling data sources is another problem for big data incorporation. Various
data providers are provided with different local schemas and hardware. It is necessary to deal with the
inherited heterogeneities precisely and efficiently to centralise all types of heterogeneous data and to manage
these data. The modifications include a framework which uses data ingestion configuration file setup to deal
with this issue [29], [30]. Specifically, schema mapping and various problems of lanes heterogeneity are
necessary to address. Like date format, and location of the FTP site.
The following software file properties are required to satisfy the criteria for, ingestion of function
and can quickly be extended: schema, schema version for destination: these two connections signify the
schema and version in which the source files are used. For, e.g., schemaID=1,schemaVersion=1 implies that
the ingestion of the most tasked source info.
Pattern Date: this property specifies the date format of the source data. For examples, date pattern=M/d/yyyy
implies that a particular date format is used for the source data, as is 06/04/2020 10:12.
Schema Mapping: this property describes the mapping amongst the schemes of the source statistics and the
schema of the objective. Examples, mapping=-1,4 implies that there is no corresponding column on the first
database root section throughout the goal database, as well as the Root Database, refers to the final endpoint
database section.
Separate Name: This property illustrates the Root data physical table partition. E.g., partition name =
Turn_44 implies in the named partition that the source files are swallowed into the Schema Turn_44.
Protocol: Its domain defines the file transfer protocol. For example, secure file transfer protocol (SFTP) can
be used to collect data from the data source using the secure FTP protocol.
UserID: This attribute specifies the identity of the user used to access the application in the database.
Password: This property specifies the keyword used to link to the registry.
Ingesting separate source code details into a single database the structure of the catalogue model is
version-based. As such, separate copies of data from the same system may be modified inside the data centre
quickly [31]. This novel approach models provide a different method of loading and querying.
The establishment of a single generalisation as well as an API to manage to understand is a
significant part of that method:
Abstraction from records: record is a class (data structure) which stores 1-fold of a schema and a
combination of versions. The meaning list and a versioned schema are used. The data binary is stored in
the meaning sequence. For the data, the schema maintains the metadata.
ConvertToLatestSchema(): The record class has a convertToLatestSchema() function. The current
highest remains transferred to the most recent edition of the Schematic documents when this function is
invoked.
Figure 4 is a database framework showing various configuration settings for instructing particular
user data in several implementations of almost the given format, with techniques between one chart reduction
feature consummated to both the diversity within each organisational information and knowledge's version
[31]. For a place of a series, where different Format Variants should be included in the virtual machine, that
documenting scripting language is being used to automatically accommodate the diversity. In Figure 4 client
1, client 2, several setup data through version 1, version 2 simplified schema. Version M is the mapping
feature of the MapReduce system is where these various sources of data reach each other.
The cluster supports the arrangement documents to stack into the HDFS file system and all the
various data sources. Then, work is started on MapReduce to add to the existing data in the universal
destination schema [32]. Specific client data may be applied to multiple implementations of the same schema
because of different initialisation period. Client 1 data will, for example, be configured to go to schema
version 1, and user 2 information can remain configured for schematic edition 2. To connect data to the
current data in the same system, a function is performed.
The MapReduce project is implemented that execute a combined job performance, that recognises
every new data and information results mostly on endpoint database, which contributes to reducing a
computation structure. The warning for handling the various data versions is to call convertToLatestSchema()
in mapper before anything else in every record. Such compliance means that the most current iteration of the
same system is eliminated [33].
In Figure 4 plotter 1 delivers the data from user 1, then that one is set to plan to the schema of
edition 1. It additionally follows the data of user 2 that is set to plan to the Schema edition 2. In the same
mapper, they meet each other. User 1 is parsed into a user 1 database by the InputFormat in the MapReduce
system, and the client2 data is parsed into a list. The method calls to convert to the latest schema ()to the last
iteration of the two different formats, and the data flow then flows from a mapper to a MapReduce frame
decreases [34].
8. Comput Sci Inf Technol ISSN: 2722-3221
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
89
At the query time, a particular data scenario could be used to flow through a Hadoop cluster with a
different implementation of the same schema. For example, on the Hadoop cluster, you want to concurrently
verify various data versions of the system [35]. This experience will again be used to transform multiple
iterations of data to the current edition. The machine in exemplarity of a system which undergoes of a system
which undergoes of a particular set of guidelines meant for the system to execute some methodologies as
shown in Figure 5.
Figure 4. Schematic Illustration for different configurational setups to instruct different clients
Figure 5. Schematic illustration for a machine in exemplarity of a system which undergoes of a system which
undergoes of a particular set of guidelines meant for the system to execute some methodologies
4.3.3. Summary of meta data ingestion
In instantaneous, the novel approach encompasses metadata-driven data intake technique to
integrate Hadoop MapReduce data sources massively, including the data hub server, collection, and setup
files which give the versatility required for the complex bigger ones, which constitute the currently favoured
embodiment World of computer convergence.
5. ADVANTAGES OF THE PROPOSED FRAMEWORK APPROACH
This novel approach embodiments are the method used to address the following questions (high
ranking methodologies for solving the challenge are included under every task) among others: MapReduce
jobs; Many FTP servers; Effective processing plan; Heterogeneity of contracts/key/transfer (different format
Protocol/FTP/design schemas/servers); Files for Settings; Catalogue; Fault resistance (the next package auto-
recovery); Store state loading on HDFS file named state pipeline. The pipeline is used to build a fault
tolerance protocol; File of status; Immunity (dirty data protection); Check for fitness before loading with a
safety test; Act on maps; Conflicting pipe-lines coordinate synchronisation; Use of Zookeeper's distributed
lock service; Using the HDFS storage status file; Evolution of Schema (support different variants of the same
9. ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
90
schema for adapting different configurations of legacy intake); Meta-tracked versions of client schemas;
Catalogue of data repository.
In addition to its high efficiency, this ingestion method flexibility also occurs: It's as simple to add a
table as inserting any text lines; Catalog Catalogue; The development of a schema is so simple that a new
version is added in the Index structure; Enter various versions of customer data at once; User 1 uses
Schematic 1 Edition 1 to ingest documents; Client 2 uses schema 1 version 2 to ingest files; Client 3 uses
Schema 1 Version 3 to absorb data; Customers can modify their file schema; Mapping figuration without
computer change; The client may submit the order to add/delete/change; Growing destination schema fields
can be used; Catalogue Change; Handles with precision all manner of variability; Integration of data (date
configuration and the configuration of images).
6. CLAIMS OF THE NOVEL FRAMEWORK: AdMaP
Claim 1. System for automated data intake into a data warehouse. Makes use of a MapReduce environment
to ingest a variety of heterotopic data sources. Data warehousing schemas loads heterogeneous
data throughout the target schemas.
Claim 2. The claim 1 method whereby data sources are collected from a multitude of various media
channels on marketing data.
Claim 3. Network automated information ingest unit which involves: the database for the downloading of
data. Relational database Integrates diverse data from several databases, a standard pipelines
loading system. A multipurpose environment MapReduce Heterogeneous data sources; a
processorimplemented metadata model, consisting of several configuration files and a catalogue;
where a config file is installed by intake function: where this catalogue is handled via the data
warehouse schema: where the data storage server is performed with a programmable data loading
task; and which configurative environment.
Claim 4. Claim 3 apparatus, which controls and codes the pipeline jobs via the Hadoop assemble as well as
the Zookeeper.
Claim 5. Entitlement 4 device where respectively incorporation job accepts, transforms besides loads the
source files through a data-loading framework pipeline for the destination; The device of Claim 4.
While a pipeline status file monitors the progress of the pipeline data loader framework,
communication between that data hub server and the ZooKeeper server is used to sync the access
file for that pipeline status.
Claim 6. Claim 3, in which this pipeline data charge framework is running sequentially, requires
MapReduce jobs on that Hadoop cluster for most of the stage A corresponding task is to be
performed.
Claim 7. A way to consume info automatically in a system Warehouse comprising: provision for the data
loading tasks of a datahub server:
The generic data loading system provides the MapReduce environment to ingest a plurality of
heterogeneous sources and provides a meta-data model implemented by the processor that consists
of a multitude of configuration file and catalogue.
Where a config file is mounted per function of ingestion. Where the catalogue handles the data
centre architecture: when an expected operation is conducted to load the data. Such a web datahub
in which said configuration data and the catalogue collaboratively lead to the datahub server
mechanically and self-sufficiently of information foundation heterogeneity and data storage
schema development to load the heterogenous new data to their destination patterns; the datahub
server performs the same data loading duty by transferring and converting a job that works
happening that cluster.
Claim 8. Claim 8 contains a health check task, MapReduce (MR) task, work intake, job update, work turn
and commit, and the health check job and MR join the job and employment intake, each of which
involves a job MapReduce, powered by the datahub server as well as moving on top of a Hadoop
cluster.
Claim 9. The claim 8 technique is that the datahub server executes the data loading task by carrying out a
further step in health checking of the job; the datahub server drives MapReduce jobs to analyse
and once transform input files produced by that download, determining if a valid data source is an
input file and then transferring the valid input files for next task in the pipe.
Claim 10. The Claim 10 solution uses a MapReduce system to run that information-loading function,
interpret arrival customer data and also some highly appreciate factory data then link client records
or identifying the target warehouses data in a proposed method to generate a response.
10. Comput Sci Inf Technol ISSN: 2722-3221
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
91
Claim 11. The claim 11 method, which is the execution of that data-loading task by renaming previous job
output folders to a result directory with contents to be consumed by the ingestion job, is used as an
additional step to commit the work.
Claim 12. The argument 12 approaches are used to conduct this data loading function by carrying out the
other phase in the input of MapReduce research, utilising any add-ons generated from previous
pipeline stages and inserting the results into destination data files. The data reduction procedure is
focused on details.
Claim 13. A data hub server that consists of: a processor that uses the environment MapReduce to transmit
source data mining; said meta-data consultation framework for various instances of a pipeline for
ingesting tasks; in which meta-data modelling involves schema modelling by catalogue during
data ingestion and consumer configuration modelling through job ingestion.
Claim 14. The Claims 14 datahub server has mentioned catalogue supporting the evolution of schemas by
modelling the schema using any of the following schematic properties, without changing
framework code. A property with an integer sequence of all schemas accessible that gives a
specific integer identity (ID) for. Schema; an asset that provisions a table ID expressive name: a
property which stores a recent ID-identified version of a table; property stores an utter route to the
Hadoop (HDFS) file system, where the ID table is saved; a property which saves a versioned table
identification schema notified by ID; A stuff that stores are defaulting standards for the ID table
schema: wherein each schema documents the history of the evolution of these properties; and
Whereby the record can be evolved dynamically by consulting such a catalogue from a previous
variety of an assumed plan to an advanced form of the identical.
Claim 15. The information jump client of statement 15, where registration has been created in a model, from
that to the second, is to create a standard log for the same framework, while using the same similar
model where even the standard log is executed for the second iteration, to default parameters; In
any case, if there wasn't such a relationship, the columns of being the first system will be lowered
when fresh logging containing information of the very first model or even the scheme of a
subsequent one is generated.
Claim 16. Claim 14 data sub server where the configuration ration file is set up for schema mapping and
other heterogeneity problems utilising the data ingestion task: a total of two properties which
identify the schema and version of which the source files are supplied; a property which identifies
the source date format. A property identify maps between the source Scheme data and the
destination scheme: a property specifying the physical table division of the source data;a property
defining the protocol for file transfer; -a property that defines the username user name A server
data source and a password identification property used to log onto the server of the data source.
Claim 17. The Claim 14 which a record is indeed a category, a database schema, the dictionary processing of
the provided scheme, as well as the form mix of that record comprising a value array and the
version scheme: where even the label range comprises a transmitted number, as well as the
scheme, retains the location information; and where the schema is The database server provides a
records abstraction method for reconciliations versions;
Claim 18. The repository of Claim 18 data abstraction includes a feature that transforms the existing record
when invoked By consulting the catalogue for the latest version of the current schema of records.
Claim 19. A phase that includes: providing a trigger system for a processor MapReduce environment ageing
to a route data source to the purpose. It said framework metadata consulting for different purposes
pipeline instances to carry out tasks of ingestion. During data ingestion com meta-data modelling
Take a collection to create destination systems and Modeling client configuration by task intake by
a Folder for setup.
Claim 20. Statement 20 method assisted Schema creation through the usage of any of the following schema
characteristics without modifying frame codes through mapping the destination schema:a property
with an integer array that represents all available schemas and assigns a distinctive figure as its
identity (ID) used for each table schema; a property that holds a table identity descriptive name.
Created by ID: a property that contains the new table edition. The ID: property that saves an
absolute path of the Hadoop File System (HDFS) where an identity table is being stored; a
property that saves a versioned ID-identified table scheme; and a property that save default values
on a versioned IDdefined table schema: in which properties document changing history and a
dynamic scheme can be generated A later variant of th
Claim 21. The claim 21 process in which record evolution from first to second versions into the same schema
includes. Creating the default record of a schema with the same second edition, which instantiates
the default record of a second version in the same system; then looking for a database to locate
discrepancies between the schemas in the first update and that of the second version by using the
data of the first to substitute the details of that default, and if there is a connection Whereas the
11. ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
92
column of the first version is deleted if no such correspondence exists, and where a new record is
created that contains the schema of the second version with the data of that first version or This
second version's default value.
Claim 22. The method of claim 20, whereby the configuration file is configured to fix diagram plotting than
other conglomeration subjects with the subsequent features per data ingestion job: no or more
features within which schema or version source files are used; a feature which specifies a date
format in which the sources data are used; a feature that determines maps between the source data
schema properties to which the division for just a graphical column helps to define the source
information; a product that identified a mechanism for transferring files; a product that identifies
the user id that users log in to a computer for the data source
Claim 23. In the case of documents, the layout, the storage for one double of a given schema and a defined
version combination and the data archives, containing a meaning list and a versioned schema, the
procedure for claim 20 requires more information: a processing abstraction facility for version
reconciliation. Schema maintains meta-data in which the diagram is a version of a schema given in
the catalogue in memory.
Claim 24. Claim 24 also involves including a feature that invokes the new iteration of the current record
structure, converting the old record to the latest version.
Claim 25. A mechanism for data input to data storage that includes: a datahub server to monitor one or more
files to load heterogeneous data sources onto a Hadoop (HDFS) file system. The server to start a
MapReduce task to add all heterogeneous data sources with current data in the standard
destination.
Claim 26. An automated data input device in a data warehouse that includes one or more configuration files
on the datacube server. Loading a plurality of heterogeneous database in Hadoop (HDFS); said
server introducing a MapReduce vacancy to reach others. The heterogeneous databases with
existing ones in the standard schema and the data hub server, which carries out a joint task of
connecting the user data to existing data of the same schema by launching a MapReduce.
Claim 27. As we have been finding, significant data policy is cost-effective, analytical, versatile, plug-in and
personalised technology bundles. Entities that are joining the field of big data have discovered it is
not only a trend towards life but a voyage. Big data creates an open area of unparalleled problems
involving rational and empirical use of innovations powered by evidence.
Claim 28. Data-incidents through reference to storage were accessed. Activity intake becomes extremely
quick since it is a memory-based process. A tragedy like some kind of collapse of investigators or
even equipment failures can lead to a loss of data, though, because as detected improvements were
essentially unpredictable. Essential business operations are not a safe option, but storage interface
records may indeed be set.
7. CONCLUSION ANF FUTUTRE SCOPE
The novel approach provides a general approach for the automatic intake of data into an HDFS
datagram. Included in the modification process are a data hub, a generic data loading frame and a metadata
model to address the efficiency of data load, heterogeneity of data sources and evolution of warehouse
schemas. The MapReduce generic loading framework is datahub charging efficient framework. The meta-
data architecture comprises of directories and a database. The configurations file is designed for each
mission. The database administers the schema for the data centre. When a planned information loading
function is done, and the datahub setup and catalogues are shared to attach the heterogeneous data to their
schemas dynamically.
The framework can further be modified and can be used in various disciplines as per the
requirements. The Hadoop cluster can provide an insight of the user experience and it would be helpful for
thew advertising team. The results gained can be used to improve the user experience. The implementation of
data lake and data warehousing in various disciplines will lead to ease in handling of various repositories of
data and its diverse variety.
REFERENCES
[1] B. Nichifor, “Theoretical framework of advertising-some insights,” Stud. Sci. Res. Econ. Ed., no. 19, 2014.
[2] A. T. Kenyon and J. Liberman, “Controlling cross-border tobacco: advertising, promotion and sponsorship - implementing the
FCTC,” SSRN Electron. J., no. 161, 2006.
[3] F. Brassington and S. Pettitt, Principles of marketing. Pearson education, 2006.
[4] T. M. Barry, “The development of the hierarchy of effects: An historical perspective,” Curr. Issues Res. Advert., vol. 10, no. 1–2,
pp. 251–295, 1987.
[5] M. P. Gardner, “Mood states and consumer behavior: a critical review,” J. Consum. Res., vol. 12, no. 3, p. 281, Dec. 1985.
12. Comput Sci Inf Technol ISSN: 2722-3221
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
93
[6] G. Belch and M. Belch, “Advertising and promotion: an integrated marketing communications perspective,” New York McGraw-
Hil l, p. 864, 2011.
[7] J. B. Cooper and D. N. Singer, “The Role of Emotion in Prejudice,” J. Soc. Psychol., vol. 44, no. 2, pp. 241–247, Nov. 1956.
[8] E.-J. Lee and D. W. Schumann, “Explaining the special case of incongruity in advertising: combining classic theoretical
approaches,” Mark. Theory, vol. 4, no. 1–2, pp. 59–90, Jun. 2004.
[9] X. Nan and R. J. Faber, “Advertising theory: reconceptualizing the building blocks,” Mark. Theory, vol. 4, no. 1–2, pp. 7–30, Jun.
2004.
[10] R. E. Petty, J. T. Cacioppo, and D. Schumann, “Central and peripheral routes to advertising effectiveness: the moderating tole of
involvement,” J. Consum. Res., vol. 10, no. 2, p. 135, Sep. 1983.
[11] I. C. Popescu, “Comunicarea în marketing : concepte, tehnici, strategii,” a II-a, Ed. Uranus, Bucuresti, p. 271, 2003.
[12] R. E. Smith and X. Yang, “Toward a general theory of creativity in advertising: examining the role of divergence,” Mark. Theory,
vol. 4, no. 1–2, pp. 31–58, Jun. 2004.
[13] E. Thorson and J. Moore, Integrated communication: swynergy of persuasive voices. Psychology Press, 1996.
[14] D. Vakratsas and T. Ambler, “How advertising works: what do we really know?,” J. Mark., vol. 63, no. 1, pp. 26–43, Jan. 1999.
[15] R. Vaughan, “How advertising works: a planning model. .. putting it all together,” Advert. & Soc. Rev., vol. 1, no. 1, 2000.
[16] W. M. Weilbacher, “Point of view: does advertising cause a ‘hierarchy of effects’?,” J. Advert. Res., vol. 41, no. 6, pp. 19–26,
Nov. 2001.
[17] M. Dayalan, “MapReduce: simplified data processing on large cluster,” Int. J. Res. Eng., vol. 5, no. 5, pp. 399–403, Apr. 2018.
[18] D. Borthakur et al., “Apache hadoop goes realtime at Facebook,” in Proceedings of the 2011 international conference on
Management of data - SIGMOD ’11, 2011, p. 1071.
[19] D. Jiang, B. C. Ooi, L. Shi, and S. Wu, “The performance of mapreduce: An in-depth study,” Proc. VLDB Endow., vol. 3, no. 1–
2, pp. 472–483, 2010.
[20] A. Pavlo et al., “A comparison of approaches to large-scale data analysis,” in Proceedings of the 2009 ACM SIGMOD
International Conference on Management of data, 2009, pp. 165–178.
[21] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” ACM SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29–43,
Dec. 2003.
[22] B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy, “SCALLA: A platform for scalable one-pass analytics using
MapReduce,” ACM Trans. Database Syst., vol. 37, no. 4, pp. 1–43, Dec. 2012.
[23] D. Logothetis, C. Trezzo, K. C. Webb, and K. Yocum, “In-situ MapReduce for log processing,” in Proceedings of the 2011
USENIX Annual Technical Conference, USENIX ATC 2011, 2019, pp. 115–129.
[24] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay scheduling: a simple technique for
achieving locality and fairness in cluster scheduling,” in EuroSys’10 - Proceedings of the EuroSys 2010 Conference, 2010, pp.
265–278.
[25] N. Backman, K. Pattabiraman, R. Fonseca, and U. Çetintemel, “C-MR: continuously executing MapReduce workflows on multi-
core processors,” in MapReduce ’12 - 3rd International Workshop on MapReduce and Its Applications, 2012, pp. 1–8.
[26] R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul, “Stream as you Go: the case for incremental data access and
processing in the cloud,” in 2012 IEEE 28th International Conference on Data Engineering Workshops, 2012, pp. 159–166.
[27] T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears, “MapReduce online.,” in Nsdi, 2010, vol. 10, no.
4, p. 20.
[28] A. Verma, B. Cho, N. Zea, I. Gupta, and R. H. Campbell, “Breaking the MapReduce stage barrier,” Cluster Comput., vol. 16, no.
1, pp. 191–206, Mar. 2013.
[29] M. Elteir, H. Lin, and W. Feng, “Enhancing MapReduce via asynchronous data processing,” in 2010 IEEE 16th International
Conference on Parallel and Distributed Systems, 2010, pp. 397–405.
[30] M. Burrows, “The Chubby lock service for loosely-coupled distributed systems,” in OSDI 2006 - 7th USENIX Symposium on
Operating Systems Design and Implementation, 2006, pp. 335–350.
[31] F. Chang et al., “BigTable: a distributed storage system for structured data,” OSDI 2006 - 7th USENIX Symp. Oper. Syst. Des.
Implement., vol. 26, no. 2, pp. 205–218, 2006.
[32] R. Vernica, A. Balmin, K. S. Beyer, and V. Ercegovac, “Adaptive MapReduce using situation-aware mappers,” in Proceedings of
the 15th International Conference on Extending Database Technology - EDBT ’12, 2012, p. 420.
[33] Z. Guo, G. Fox, and M. Zhou, “Investigation of data locality and fairness in MapReduce,” in Proceedings of third international
workshop on MapReduce and its Applications Date - MapReduce ’12, 2012, p. 25.
[34] M. Hammoud and M. F. Sakr, “Locality-aware reduce task scheduling for mapreduce,” in Proceedings - 2011 3rd IEEE
International Conference on Cloud Computing Technology and Science, CloudCom 2011, 2011, pp. 570–576.
[35] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in 2010 IEEE 26th Symposium on Mass
Storage Systems and Technologies (MSST), 2010, pp. 1–10.