SlideShare a Scribd company logo
1 of 12
Download to read offline
Computer Science and Information Technologies
Vol. 3, No. 2, July 2022, pp. 82~93
ISSN: 2722-3221, DOI: 10.11591/csit.v3i2.pp82-93  82
Journal homepage: http://iaesprime.com/index.php/csit
AdMap: a framework for advertising using MapReduce
pipeline
Abhay Chaudhary1
, K R Batwada2
, Namita Mittal2
, Emmanuel S. Pilli2
1
Department of Computer Science and Engineering, Vellore Institute of Technology-AP, Andhra Pradesh, India
2
Department of Computer Science and Engineering, Malaviya National Institute of Technology, Jaipur, India
Article Info ABSTRACT
Article history:
Received Mar 5, 2021
Revised May 29, 2022
Accepted Jun 12, 2022
There is a vast collection of data for consumers due to tremendous
development in digital marketing. For their ads or for consumers to validate
nearby services which already are upgraded to the dataset systems,
consumers are more concerned with the amount of data. Hence there is a
void formed between the producer and the client. To fill that void, there is
the need for a framework which can facilitate all the needs for query
updating of the data. The present systems have some shortcomings by a vast
number of information that each time lead to decision tree-based approach.
A systematic solution to the automated incorporation of data into a Hadoop
distributed file system (HDFS) warehouse (Hadoop file system) includes a
data hub server, a generic data charging mechanism and a metadata model.
In our model framework, the database would be able to govern the data
processing schema. In the future, as a variety of data is archived, the
datalake will play a critical role in managing that data. To order to carry out
a planned loading function, the setup files immense catalogue move the
datahub server together to attach the miscellaneous details dynamically to its
schemas.
Keywords:
Advertising
Advertising and publishing
Data lake
Data warehouse
Hadoop distributed file system
Map reduce
This is an open access article under the CC BY-SA license.
Corresponding Author:
Abhay Chaudhary
Department of Computer Science and Engineering, Vellore Institute of Technology
Amaravati, Andhra Pradesh, India
Email: abhaychaudharydps@gmail.com
1. INTRODUCTION
Advertising can be called a convergent feature represented in two essential fields of science:
interaction and sales. From the scholarly and realistic roots, an advertisement was tackled both as a form of
networking for those interested in contemporary economic practices, and in addressing the communication
issues of separate organisations, for example, the newspaper. In general, advertisement and networking are
elements of modern social and economic processes [1], [2]. In culture today, advertisement has developed
into a dynamic communication mechanism that is important for the public as well as organisations. Over
time, messaging has become an integral aspect of marketing programmes, with the potential to submit letter
precisely planned for its goals. During their goods and services in the global markets, various enterprises,
beginning with multinational and local companies, assign growing significance to the advertisement [3].
Consumers have learned to use commercial data in their purchasing decisions in working consumer
economies [4], [5].
Besides broad scope text and data analysis, MapReduce is indeed an integer programming model.
Conventional MapReduce implementation, for instance, its Hadoop, needs to access the entire collection into
some kind of cluster even before evaluation is performed [6]. In situations whereby content is massive, data
will not be enabled but stored again, and again-for example of logs, safety reports even protected messages-
Comput Sci Inf Technol ISSN: 2722-3221 
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
83
this will lead to significant congestion. We suggest an approach to the data pipeline to mask data latency in
the study of MapReduce. Our Hadoop MapReduce-based implementation is utterly transparent to the user.
The distributed competitor queue includes the allocation and syncing of data blocks such that data upload and
implementation overlap. This article addresses main concerns: the specific limit of mapping, as well as the
configurable interactive amount of mapping, facilitate unidentified data entry. We also use a delay scheduler
for data pipeline data localisation. The test of the solution for multiple real-world data sets implementations
reveals that our methodology demonstrates efficiency improvements [7], [8].
In business and academia, the MapReduce method has significant advantages. It offers a basic
programming paradigm with optimised fault tolerance and parallel synchronisation for large-scale data
processing [9]. The previous design of MapReduce follows a hierarchical data format with huge volumes
reflected throughout hierarchical files. The framework maps its activities then decreases them which fit for
regional data sources. Clickstream log analysis should be used by programmes that can benefit from a
systematic approach to data loading. The algorithms are searched linearly, page by page, to allow streaming
approaches to load data.
In MapReduce, we recommend an approach to data pipelines. As the logical unit of the stream, the
databank uses the storage node [10]. That is, once a block is fully uploaded, it shall begin processing. Map
tasks are begun, providing overlapped execution and the data up load before the final upload process is
complete. In specific, to coordinate the distributed file system, we suggest adding a distributed competition
queue and task elimination for MapReduce. The filesystem is a creator and generates block metadata on the
queue, while MapReduce employees are a customer and shall be alerted when block metadata are available
for them. Standard MapReduce practises data from the input and begins a sequence of map activities
dependent on the number of input separates.
 A pipeline-coordinating architecture for data collection
 Processing with MapReduce.
 Two types of work visualisation were vague data processing of location understanding or avoidance of
error.
 An apparent client data pipeline execution that does not require revisions to the current code.
 Experimental findings showed better results.
In the science world, prototype matching strategies are omnipresent. Matching techniques have been
applied in an array of research disciplines ranging as of medical electrocardiogram evaluation to gene
expression in genetics, communications signature identification, geoscience seismic signals processing and
computer vision photo-tracking and recognition [11].
Model matching is a conventional approach since it is accurate, simple to read, and can be measured
or prototyped quickly. To evaluate correlations between two objects, which are single or multi-dimensional
signals, such as time series of images, the basic definition of matching models is there. There are different
methods to measure resemblance and including total variance. In this paper, we will be discussing the
background of the Advertising field and its heterogeneous relationship with data lake. Further, the paper talks
about the related work done in the field of advertising and overall review of the novel approach towards
advertising and MapReduce framework. The last section summarises the work present in this paper covers
the future scope in detail.
2. BACKGROUND
Advertising is one of the methods to sell goods to customers. Many ads are available selling
attractive advertising packages. In revealing the positive and the poor to the public, the media plays a
significant role [12], [13]. Local media have been Internationalised with the proliferation of emerging
technology and public demand. The technique concerns the efficient loading of heterogeneous data sources to
an ever-evolving data warehouse. More precisely, innovation involves the usage of a MapReduce system for
meta-data driven data ingestion. Usually, a data store is designed on the highest of an accessible collection
framework, for instance, Hadoop in the broad data sector [14]. Hadoop is a distributed, open-source
computer environment that uses MapReduce. MapReduce is a system for addressing massive data sets with a
massive range of very distributed problems. Many machines, or grids (if nodes are distinct in hardware), are
collectively called the cluster (when all nodes are prepared with the same hardware). Program processing
may happen on either an unstructured or a structured file system database.
In Figure 1, the practical schema 2 takes the sequence of operations carried out by average internet
user and all the metadata collected by the user into account. Chart step: The network layer splits your
feedback through minor annoyances then allocates everything to either the branches of the team. An
employee's path that contributes to something like a layout of a network in several sections is being done
once more. The employee cluster deals more with a larger problem, then gives itself the primary cluster.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
84
Lower step: The master node then gathers answers to all the subproblems and integrates them to form the
performance in some manner, i.e. the solution to the question that it wanted to solve.
Figure 1. Advertising ecosystem for heterogeneous data
MapReduce enables map analysis and reduction operations to be spread. While increasing routing
procedure can be done simultaneously, independently of specific processing procedures, throughout the fact
that amount with independent data sources as well as the number of CPUs near the increasing source is
limited [15]. Similarly, a variety of reduction stages can indeed be carried through given that only certain
operations and in the chart were concurrently displayed to the very same absorber with much the same
button. MapReduce may be extended to far bigger databases than generic servers. However, this method can
sometimes seem to have been inefficient in comparison to more sequential algorithms. Therefore, massive
server farm will use MapReduce in just a few hours to sort a petabyte of results. Parallelism also allows us to
recover from a partial server failure or storage during such kind of operation if one work can be changed as
long as the input data is available, only if mapper or reducer fails [16].
A distributed, flexible, and versatile file scheme transcribed in Java designed for the Hadoop project
function is the Hadoop file system (HDFS). Usually, every Hadoop node has one data node; the HDFS
cluster is a cluster of knowledge nodes. The condition is reasonable since not every node needs the existence
of a data node. A block protocol specific to HDFS is used for every single statistics node to serve chunks of
information across the system [17]. The file system is communicated using the transmission control
protocol/internet protocol (TCP/IP) layer. Customers are using remote procedure call (RPC) for
intercommunication. HDFS holds huge files on many computers (the optimal file size is 64 MB). Reliability
is accomplished by multiple host replication of the data, such that redundant array of independent disks
(RAID) storage on hosts is not necessary. Data on three nodes are stored with the default replication value,
i.e. 3; two on one stack as well as any on an additional stack. Information nodes will converse with each
other re-equilibrate information, push versions all around and maintain high data replication [18]. It is a
practical examination of how to effectively load heterogeneous data sources into a data centre for
everchanging schemas.
3. RELATED WORK
Exploration has investigated pipeline/data running for data latency reduction. Common metadata
repository (CMR) corresponds downloading and retrieval of data by storage of input data in a memory
buffer. C-MR is restricted to operate on a single computer and thus unfit for large data sets. Too, C-MR
needs an application programming interfaces (API) that eliminates a user's need for a new API. Claudia et al.
to overlap data transfer and process, implement a stream processing engine. This method does not work with
the programming model MapReduce and does not tolerate job failure. The distinction in our work is that we
retain the standard programming model MapReduce and use existing features from Hadoop such as failure
tolerance, and high scalability [19]. Moreover, we are fully open to consumers in our implementation.
The conventional cross-correlation techniques for single example create a linear statistical
relationship between the unique features of two objects. To look for the highest correlation coefficient, a
formula is serially interpreted over a signal or image, and a match may be found when a threshold
requirement is met. Continuous approaches also take models into account [20]. A template package serves as
Comput Sci Inf Technol ISSN: 2722-3221 
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
85
a proof source of identification in one single signal of interest of numerous features. Any template in the set
represents a hypothesis; the highest coefficient template is seen to be the winner.
Three questions arise:
 What can be achieved with this serial? Algorithms do not scale more massive datasets,
 What should be returned if several templates
 Record the coefficients for identical correlations) how sure is the stated template to be true?
4. ARCHITECTURE OF AdMap
This technique provides a general approach to data intake automatically in an HDFS-based data
warehouse. Amongst other modifications is a data jack, a standard data loading pipelines and an multiple
dwelling units (MDU) pattern, which together tackle the reliability of the loading of data, heterogeneities of
data sources and changing data warehouse schemas. The meta-data model includes configuration and
catalogue files setup per ingestion task is the configure file. The database administers the schema for the data
centre [21]. Towards inserting the diverse information to the objective outlines, the configuration files in
addition to the catalogue collaboratively drive the data subway.
In one particular use, the technique's implementations provide methods for automating the loading
of marketing data into a database. If customer requirements increase and all sources of an online campaign,
video, social and email. are integrated, there is a wide range of media channels. Marketers usually distribute
marketing transactions through multiple platforms, not just to a single advertising supplier. Such marketers
and advertisers are thus overarchingly concerned. It should be able to access an all-in-one central dashboard
and then comes up with a comprehensive perspective wherever the advertising plan happens expended in
addition to the overwhelming aggregate expenditures on the various broadcasting networks [22].
The creative measures are a fixed-key software intake system to promote this criterion. There are
various data schemas across heterogeneous data sources through these media networks. To incorporate these
various data sets into a specific system, marketers and advertisers will render detailed schema mappings.
Therefore, the heterogeneous schema is a part of the innovation.
Figure 1 is the illustration which shows the advertisement network, which displays various traces of
information such as advertisement cluster. As shown in Figure 1, Tv Channels 1-3 and Video Channel 1
operate for an advertiser automated border control (ABC). The report information for the ABC advertiser is
provided for each channel. The ABC advertiser needs to incorporate stories into a specific network system,
such as Amobee. The advertiser demands the transmission to the popular media schema for each platform
that it deals with [23]. This displays a basic query against the raising schema Report. In Figure 1 installation
of ingestion demands the advertiser ABC to takes place at different times.
ABC asked Channel 1 marketers to submit the monitoring data to Short Schema Versions 1 on 2
April 2020. Set up a configuration tab to periodically enter the data every day. 2 months later, on 2 June
2020, ABC demanded the sending to the popular schema version of Channel 2 of its reports info. Thus a
configuration file has been created. One month after, on 3 July 2020, Video Channel 1 is used as broadcast
by ABC advertiser. The new channel was told to submit a report to the standard schema of Turn.
Nevertheless, as the video network has several distinct areas which remained non specified by
Turn's Schematic, edition 1, standard schematic has been created by introducing two additional columns
(pause case, case complete). Video channel 1 can be modified with these changes Ingesting data into the
standard Schema of Turn. ABC requests to configure a second configuration to absorb data on channel 3, one
month later, on 10 August 2020. This time, the configuration file was created because of the schema. The
current schema targets version 2. In this dynamic business scenario, the novel approach makes the intake
request very smooth. As the standard system changes, a conventional design setup can rarely be modified
[24]. After heterogeneous data is ingested, you can view across an over-network statement that gives a
worldwide viewpoint of broadcasting consumption meant for advertisers by merely querying the standard
schema.
4.1. Data ingestion
Data pipelines move oil data from platform and database sources to computational and business
intelligence (BI) tools from software-as-service (SaaS) data management systems. Now businesses may use a
range of sources to pick data analytics based on big data. They need access to all their analytical and
commercial intelligence sources to make better decisions. The data then will be ingested into the Hadoop
Cluster, which then will process the data as per requirements. Figure 2 is a schematic cluster drawing which
shows the architecture by the technique for loading data. Data from various file transfer protocol (FTP)
servers are received as well as inserted in the Hadoop collection that includes numerous personal computers
linked to the service in conjunction with the management of Hadoop software [25]. A data hub server is in
charge of monitoring the loading of data. The server generates a range of tasks to perform the loading task.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
86
Some work is done on the datahub server; depending on the work nature, some work is carried out on the
Hadoop cluster. The datahub system also tracks and schedules pipeline function through the Hadoop and
ZooKeeper system communications.
Figure 2. Schematic cluster drawing
A Zookeeper cluster remains dispersed public synchronisation provision used for dispersed
submissions for this discussion. The event shows that a straightforward establishment of primitives knows
how to be used by distributed applications to implement higher-level synchronisation services such as
maintenance, and calling classes. It is developed uncomplicated to plan as well as utilises an information
prototype constructed according to well-known file system directory tree layout. It is operated in Java, and it
has Java and C bindings.
The professional individual should understand the elements of the technology like Hadoop,
MapReduce, ZooKeeper, and so forth and become acquainted with them for the debate. However, the
innovation herein must be understood that it is not restricted to the usage of these elements alone and in some
combination of some specific nature of these elements.
4.2. Datahub server
Figure 3 is a representation which shows the data hub cluster as a data packing generic innovation
platform. Extracts, transforms, and loads (ETL) files in every ingest process are taken through a pipeline to
the target. The tube has 5 linearly dependent jobs running one after the other. The data intake pipeline is
subject to a particular task for each job. The system creation has been tracked by a condition report residing
inside a perpetual medium along with a Hadoop cluster. A condition report can always be coordinated from
Kubernetes applications through some kind of networking link between handling huge server, as shown in
Figure 2 [26]. The tube is working sequentially. To complete its corresponding task, it can call on
MapReduce for each stage on the Hadoop cluster.
Figure 3. Data loading framework
The data loading process for the datahub server consists of 5 steps,
a) Transfer and procedure: This role runs mostly on the data hub network. This relates towards a document
or network progress report, which helps one can decide if and how the sources could be downloaded
into a locally owned and operated repository but instead transferred to the datahub client (ideally
uncompressed if required).
Comput Sci Inf Technol ISSN: 2722-3221 
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
87
b) Checking safety: It is a MapReduce function, runs on a Hadoop cluster by a data hub application. It
once analyses the feedback documents and determines whether the response information is a legitimate
source of statistics. This moves correct response documents to the later step during the channel; then a
diagram decreases function at this time on the way to decrease the information analysing period
dramatically.
c) MR sign up job: a MapReduce position on Hadoop cluster that is driven by the Datahub server. It
initially reads both the files of new customers as well as the available statistics depository documents in
the destination. Subsequently, this one combines the two data points to be consumed for the following
task. Again, the map reduction job is used to analyses data efficiently.
d) Commitjob: This job is a simple wrap-up work on a Hadoop cluster driven by the data hub server. It
renames the previous work output folders of MapReduce into an output folder, whose contents are used
for input. It additionally revises channel status records toward how the stacking progresses.
e) Data Consumption: This work is a Hadoop cluster MapReduce work. It uses the entire connection
production after the preceding periods of the channel as well as incorporates the results incorporating
connection keen on target information documents.
4.3. Metadata management
The datahub repository offers a framework through computation to navigate input information to
both the objective. Through multiple pipelines examples, its documentation is examined to individual ingest
activities. In Figure 3 during the update as well as convert function, each configuration file is read in to
decide which FTP site to be accessed, how the passwords should be obtained, how documents should be
searched and how they should be uploaded to Hadoop cluster. Then, you have to verify its fitness, write data
input and then use that to search it. Throughout addition, a server will then be reviewed for analysis of
quantitative evidence to compliance with either the defined objective scheme [27]. The MapReduce career
entry appears throughout the archives but database for similar work. Then commitment role inspects the file
system to determine where even the information from either the prior employer was located throughout the
system.
The partition of curriculum and metadata benefits from clean-cut between program and metadata so
that the optimisation of programs and the modelling of workflows can be carried out independently and
generally. The discussion details the modelling of metadata during data intake. Several sections compose the
meta-data modelling. The first component schema modelling is used to build a database, and the second part
was modelling the system configuration ration where a configuration file is set up per ingestion mission.
4.3.1. Data cataloguing
Changing business needs often change in the standard schema in big data management. Schema
evolution is a must without code modification. Inventory configuration modifications to a goal schema use
the features of the schema. Metadata IDs: Such a pattern suggests an integral series resembling other
accessible schemes. For just about any column structure, a simple datatype becomes also allocated as ID. For,
e.g., Database IDs=1,2,3 implies that there have been 3 open.
ID. Name: The short name of the specified Database table is stored on this site.
ID.latestVersion: The last version of the ID-identified table is stored in this property. 1.latest version=3, for
example, means that the Schema has 3 editions, as well as the newest edition is 3.
ID.Storage.Hadoop Cluster: It stores the utter HDFS route to maintain the table file. ID.description: It stores
the ID-discovered versioned table schematic.
ID.Version. Default: A default value of this object is transferred to the versioned table schema defined by the
ID. E.g., 1.1.default=1.0 is the default value for the first column, and 0 is the second column's default value.
The structure herein shows the past of developing any schema with the properties mentioned.
Therefore, a record from a prior iteration of a system to a subsequent model of the same system may be
continuously created by checking the database. Suppose a database is usable in schema version K. The record
may be built to the same schema in two stages, but the K+1 version is the default value [28].
Step 1: The system will first create a default log using K+1, which is then instantiated the default record with
K+1 version's default values.
Step 2: The program would then scan the database for the type K variations from K+1 schemas edition,
which utilises edition K's information automatically towards overwriting the information in Phase 1.
avoidance database. When the column of Version K is equivalent to the column of Version K+1, a similar
copy would be rendered and, when possible, typecast. If a variant K column does not have this relationship,
the column would be removed. The new record created containing the K+1 edition of the schematic along
with one or the other edition K's before the nonappearance K+1 version values after the two steps are
completed.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
88
4.3.2. Configurational file
The complexity of reconciling data sources is another problem for big data incorporation. Various
data providers are provided with different local schemas and hardware. It is necessary to deal with the
inherited heterogeneities precisely and efficiently to centralise all types of heterogeneous data and to manage
these data. The modifications include a framework which uses data ingestion configuration file setup to deal
with this issue [29], [30]. Specifically, schema mapping and various problems of lanes heterogeneity are
necessary to address. Like date format, and location of the FTP site.
The following software file properties are required to satisfy the criteria for, ingestion of function
and can quickly be extended: schema, schema version for destination: these two connections signify the
schema and version in which the source files are used. For, e.g., schemaID=1,schemaVersion=1 implies that
the ingestion of the most tasked source info.
Pattern Date: this property specifies the date format of the source data. For examples, date pattern=M/d/yyyy
implies that a particular date format is used for the source data, as is 06/04/2020 10:12.
Schema Mapping: this property describes the mapping amongst the schemes of the source statistics and the
schema of the objective. Examples, mapping=-1,4 implies that there is no corresponding column on the first
database root section throughout the goal database, as well as the Root Database, refers to the final endpoint
database section.
Separate Name: This property illustrates the Root data physical table partition. E.g., partition name =
Turn_44 implies in the named partition that the source files are swallowed into the Schema Turn_44.
Protocol: Its domain defines the file transfer protocol. For example, secure file transfer protocol (SFTP) can
be used to collect data from the data source using the secure FTP protocol.
UserID: This attribute specifies the identity of the user used to access the application in the database.
Password: This property specifies the keyword used to link to the registry.
Ingesting separate source code details into a single database the structure of the catalogue model is
version-based. As such, separate copies of data from the same system may be modified inside the data centre
quickly [31]. This novel approach models provide a different method of loading and querying.
The establishment of a single generalisation as well as an API to manage to understand is a
significant part of that method:
 Abstraction from records: record is a class (data structure) which stores 1-fold of a schema and a
combination of versions. The meaning list and a versioned schema are used. The data binary is stored in
the meaning sequence. For the data, the schema maintains the metadata.
 ConvertToLatestSchema(): The record class has a convertToLatestSchema() function. The current
highest remains transferred to the most recent edition of the Schematic documents when this function is
invoked.
Figure 4 is a database framework showing various configuration settings for instructing particular
user data in several implementations of almost the given format, with techniques between one chart reduction
feature consummated to both the diversity within each organisational information and knowledge's version
[31]. For a place of a series, where different Format Variants should be included in the virtual machine, that
documenting scripting language is being used to automatically accommodate the diversity. In Figure 4 client
1, client 2, several setup data through version 1, version 2 simplified schema. Version M is the mapping
feature of the MapReduce system is where these various sources of data reach each other.
The cluster supports the arrangement documents to stack into the HDFS file system and all the
various data sources. Then, work is started on MapReduce to add to the existing data in the universal
destination schema [32]. Specific client data may be applied to multiple implementations of the same schema
because of different initialisation period. Client 1 data will, for example, be configured to go to schema
version 1, and user 2 information can remain configured for schematic edition 2. To connect data to the
current data in the same system, a function is performed.
The MapReduce project is implemented that execute a combined job performance, that recognises
every new data and information results mostly on endpoint database, which contributes to reducing a
computation structure. The warning for handling the various data versions is to call convertToLatestSchema()
in mapper before anything else in every record. Such compliance means that the most current iteration of the
same system is eliminated [33].
In Figure 4 plotter 1 delivers the data from user 1, then that one is set to plan to the schema of
edition 1. It additionally follows the data of user 2 that is set to plan to the Schema edition 2. In the same
mapper, they meet each other. User 1 is parsed into a user 1 database by the InputFormat in the MapReduce
system, and the client2 data is parsed into a list. The method calls to convert to the latest schema ()to the last
iteration of the two different formats, and the data flow then flows from a mapper to a MapReduce frame
decreases [34].
Comput Sci Inf Technol ISSN: 2722-3221 
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
89
At the query time, a particular data scenario could be used to flow through a Hadoop cluster with a
different implementation of the same schema. For example, on the Hadoop cluster, you want to concurrently
verify various data versions of the system [35]. This experience will again be used to transform multiple
iterations of data to the current edition. The machine in exemplarity of a system which undergoes of a system
which undergoes of a particular set of guidelines meant for the system to execute some methodologies as
shown in Figure 5.
Figure 4. Schematic Illustration for different configurational setups to instruct different clients
Figure 5. Schematic illustration for a machine in exemplarity of a system which undergoes of a system which
undergoes of a particular set of guidelines meant for the system to execute some methodologies
4.3.3. Summary of meta data ingestion
In instantaneous, the novel approach encompasses metadata-driven data intake technique to
integrate Hadoop MapReduce data sources massively, including the data hub server, collection, and setup
files which give the versatility required for the complex bigger ones, which constitute the currently favoured
embodiment World of computer convergence.
5. ADVANTAGES OF THE PROPOSED FRAMEWORK APPROACH
This novel approach embodiments are the method used to address the following questions (high
ranking methodologies for solving the challenge are included under every task) among others: MapReduce
jobs; Many FTP servers; Effective processing plan; Heterogeneity of contracts/key/transfer (different format
Protocol/FTP/design schemas/servers); Files for Settings; Catalogue; Fault resistance (the next package auto-
recovery); Store state loading on HDFS file named state pipeline. The pipeline is used to build a fault
tolerance protocol; File of status; Immunity (dirty data protection); Check for fitness before loading with a
safety test; Act on maps; Conflicting pipe-lines coordinate synchronisation; Use of Zookeeper's distributed
lock service; Using the HDFS storage status file; Evolution of Schema (support different variants of the same
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
90
schema for adapting different configurations of legacy intake); Meta-tracked versions of client schemas;
Catalogue of data repository.
In addition to its high efficiency, this ingestion method flexibility also occurs: It's as simple to add a
table as inserting any text lines; Catalog Catalogue; The development of a schema is so simple that a new
version is added in the Index structure; Enter various versions of customer data at once; User 1 uses
Schematic 1 Edition 1 to ingest documents; Client 2 uses schema 1 version 2 to ingest files; Client 3 uses
Schema 1 Version 3 to absorb data; Customers can modify their file schema; Mapping figuration without
computer change; The client may submit the order to add/delete/change; Growing destination schema fields
can be used; Catalogue Change; Handles with precision all manner of variability; Integration of data (date
configuration and the configuration of images).
6. CLAIMS OF THE NOVEL FRAMEWORK: AdMaP
Claim 1. System for automated data intake into a data warehouse. Makes use of a MapReduce environment
to ingest a variety of heterotopic data sources. Data warehousing schemas loads heterogeneous
data throughout the target schemas.
Claim 2. The claim 1 method whereby data sources are collected from a multitude of various media
channels on marketing data.
Claim 3. Network automated information ingest unit which involves: the database for the downloading of
data. Relational database Integrates diverse data from several databases, a standard pipelines
loading system. A multipurpose environment MapReduce Heterogeneous data sources; a
processorimplemented metadata model, consisting of several configuration files and a catalogue;
where a config file is installed by intake function: where this catalogue is handled via the data
warehouse schema: where the data storage server is performed with a programmable data loading
task; and which configurative environment.
Claim 4. Claim 3 apparatus, which controls and codes the pipeline jobs via the Hadoop assemble as well as
the Zookeeper.
Claim 5. Entitlement 4 device where respectively incorporation job accepts, transforms besides loads the
source files through a data-loading framework pipeline for the destination; The device of Claim 4.
While a pipeline status file monitors the progress of the pipeline data loader framework,
communication between that data hub server and the ZooKeeper server is used to sync the access
file for that pipeline status.
Claim 6. Claim 3, in which this pipeline data charge framework is running sequentially, requires
MapReduce jobs on that Hadoop cluster for most of the stage A corresponding task is to be
performed.
Claim 7. A way to consume info automatically in a system Warehouse comprising: provision for the data
loading tasks of a datahub server:
The generic data loading system provides the MapReduce environment to ingest a plurality of
heterogeneous sources and provides a meta-data model implemented by the processor that consists
of a multitude of configuration file and catalogue.
Where a config file is mounted per function of ingestion. Where the catalogue handles the data
centre architecture: when an expected operation is conducted to load the data. Such a web datahub
in which said configuration data and the catalogue collaboratively lead to the datahub server
mechanically and self-sufficiently of information foundation heterogeneity and data storage
schema development to load the heterogenous new data to their destination patterns; the datahub
server performs the same data loading duty by transferring and converting a job that works
happening that cluster.
Claim 8. Claim 8 contains a health check task, MapReduce (MR) task, work intake, job update, work turn
and commit, and the health check job and MR join the job and employment intake, each of which
involves a job MapReduce, powered by the datahub server as well as moving on top of a Hadoop
cluster.
Claim 9. The claim 8 technique is that the datahub server executes the data loading task by carrying out a
further step in health checking of the job; the datahub server drives MapReduce jobs to analyse
and once transform input files produced by that download, determining if a valid data source is an
input file and then transferring the valid input files for next task in the pipe.
Claim 10. The Claim 10 solution uses a MapReduce system to run that information-loading function,
interpret arrival customer data and also some highly appreciate factory data then link client records
or identifying the target warehouses data in a proposed method to generate a response.
Comput Sci Inf Technol ISSN: 2722-3221 
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
91
Claim 11. The claim 11 method, which is the execution of that data-loading task by renaming previous job
output folders to a result directory with contents to be consumed by the ingestion job, is used as an
additional step to commit the work.
Claim 12. The argument 12 approaches are used to conduct this data loading function by carrying out the
other phase in the input of MapReduce research, utilising any add-ons generated from previous
pipeline stages and inserting the results into destination data files. The data reduction procedure is
focused on details.
Claim 13. A data hub server that consists of: a processor that uses the environment MapReduce to transmit
source data mining; said meta-data consultation framework for various instances of a pipeline for
ingesting tasks; in which meta-data modelling involves schema modelling by catalogue during
data ingestion and consumer configuration modelling through job ingestion.
Claim 14. The Claims 14 datahub server has mentioned catalogue supporting the evolution of schemas by
modelling the schema using any of the following schematic properties, without changing
framework code. A property with an integer sequence of all schemas accessible that gives a
specific integer identity (ID) for. Schema; an asset that provisions a table ID expressive name: a
property which stores a recent ID-identified version of a table; property stores an utter route to the
Hadoop (HDFS) file system, where the ID table is saved; a property which saves a versioned table
identification schema notified by ID; A stuff that stores are defaulting standards for the ID table
schema: wherein each schema documents the history of the evolution of these properties; and
Whereby the record can be evolved dynamically by consulting such a catalogue from a previous
variety of an assumed plan to an advanced form of the identical.
Claim 15. The information jump client of statement 15, where registration has been created in a model, from
that to the second, is to create a standard log for the same framework, while using the same similar
model where even the standard log is executed for the second iteration, to default parameters; In
any case, if there wasn't such a relationship, the columns of being the first system will be lowered
when fresh logging containing information of the very first model or even the scheme of a
subsequent one is generated.
Claim 16. Claim 14 data sub server where the configuration ration file is set up for schema mapping and
other heterogeneity problems utilising the data ingestion task: a total of two properties which
identify the schema and version of which the source files are supplied; a property which identifies
the source date format. A property identify maps between the source Scheme data and the
destination scheme: a property specifying the physical table division of the source data;a property
defining the protocol for file transfer; -a property that defines the username user name A server
data source and a password identification property used to log onto the server of the data source.
Claim 17. The Claim 14 which a record is indeed a category, a database schema, the dictionary processing of
the provided scheme, as well as the form mix of that record comprising a value array and the
version scheme: where even the label range comprises a transmitted number, as well as the
scheme, retains the location information; and where the schema is The database server provides a
records abstraction method for reconciliations versions;
Claim 18. The repository of Claim 18 data abstraction includes a feature that transforms the existing record
when invoked By consulting the catalogue for the latest version of the current schema of records.
Claim 19. A phase that includes: providing a trigger system for a processor MapReduce environment ageing
to a route data source to the purpose. It said framework metadata consulting for different purposes
pipeline instances to carry out tasks of ingestion. During data ingestion com meta-data modelling
Take a collection to create destination systems and Modeling client configuration by task intake by
a Folder for setup.
Claim 20. Statement 20 method assisted Schema creation through the usage of any of the following schema
characteristics without modifying frame codes through mapping the destination schema:a property
with an integer array that represents all available schemas and assigns a distinctive figure as its
identity (ID) used for each table schema; a property that holds a table identity descriptive name.
Created by ID: a property that contains the new table edition. The ID: property that saves an
absolute path of the Hadoop File System (HDFS) where an identity table is being stored; a
property that saves a versioned ID-identified table scheme; and a property that save default values
on a versioned IDdefined table schema: in which properties document changing history and a
dynamic scheme can be generated A later variant of th
Claim 21. The claim 21 process in which record evolution from first to second versions into the same schema
includes. Creating the default record of a schema with the same second edition, which instantiates
the default record of a second version in the same system; then looking for a database to locate
discrepancies between the schemas in the first update and that of the second version by using the
data of the first to substitute the details of that default, and if there is a connection Whereas the
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93
92
column of the first version is deleted if no such correspondence exists, and where a new record is
created that contains the schema of the second version with the data of that first version or This
second version's default value.
Claim 22. The method of claim 20, whereby the configuration file is configured to fix diagram plotting than
other conglomeration subjects with the subsequent features per data ingestion job: no or more
features within which schema or version source files are used; a feature which specifies a date
format in which the sources data are used; a feature that determines maps between the source data
schema properties to which the division for just a graphical column helps to define the source
information; a product that identified a mechanism for transferring files; a product that identifies
the user id that users log in to a computer for the data source
Claim 23. In the case of documents, the layout, the storage for one double of a given schema and a defined
version combination and the data archives, containing a meaning list and a versioned schema, the
procedure for claim 20 requires more information: a processing abstraction facility for version
reconciliation. Schema maintains meta-data in which the diagram is a version of a schema given in
the catalogue in memory.
Claim 24. Claim 24 also involves including a feature that invokes the new iteration of the current record
structure, converting the old record to the latest version.
Claim 25. A mechanism for data input to data storage that includes: a datahub server to monitor one or more
files to load heterogeneous data sources onto a Hadoop (HDFS) file system. The server to start a
MapReduce task to add all heterogeneous data sources with current data in the standard
destination.
Claim 26. An automated data input device in a data warehouse that includes one or more configuration files
on the datacube server. Loading a plurality of heterogeneous database in Hadoop (HDFS); said
server introducing a MapReduce vacancy to reach others. The heterogeneous databases with
existing ones in the standard schema and the data hub server, which carries out a joint task of
connecting the user data to existing data of the same schema by launching a MapReduce.
Claim 27. As we have been finding, significant data policy is cost-effective, analytical, versatile, plug-in and
personalised technology bundles. Entities that are joining the field of big data have discovered it is
not only a trend towards life but a voyage. Big data creates an open area of unparalleled problems
involving rational and empirical use of innovations powered by evidence.
Claim 28. Data-incidents through reference to storage were accessed. Activity intake becomes extremely
quick since it is a memory-based process. A tragedy like some kind of collapse of investigators or
even equipment failures can lead to a loss of data, though, because as detected improvements were
essentially unpredictable. Essential business operations are not a safe option, but storage interface
records may indeed be set.
7. CONCLUSION ANF FUTUTRE SCOPE
The novel approach provides a general approach for the automatic intake of data into an HDFS
datagram. Included in the modification process are a data hub, a generic data loading frame and a metadata
model to address the efficiency of data load, heterogeneity of data sources and evolution of warehouse
schemas. The MapReduce generic loading framework is datahub charging efficient framework. The meta-
data architecture comprises of directories and a database. The configurations file is designed for each
mission. The database administers the schema for the data centre. When a planned information loading
function is done, and the datahub setup and catalogues are shared to attach the heterogeneous data to their
schemas dynamically.
The framework can further be modified and can be used in various disciplines as per the
requirements. The Hadoop cluster can provide an insight of the user experience and it would be helpful for
thew advertising team. The results gained can be used to improve the user experience. The implementation of
data lake and data warehousing in various disciplines will lead to ease in handling of various repositories of
data and its diverse variety.
REFERENCES
[1] B. Nichifor, “Theoretical framework of advertising-some insights,” Stud. Sci. Res. Econ. Ed., no. 19, 2014.
[2] A. T. Kenyon and J. Liberman, “Controlling cross-border tobacco: advertising, promotion and sponsorship - implementing the
FCTC,” SSRN Electron. J., no. 161, 2006.
[3] F. Brassington and S. Pettitt, Principles of marketing. Pearson education, 2006.
[4] T. M. Barry, “The development of the hierarchy of effects: An historical perspective,” Curr. Issues Res. Advert., vol. 10, no. 1–2,
pp. 251–295, 1987.
[5] M. P. Gardner, “Mood states and consumer behavior: a critical review,” J. Consum. Res., vol. 12, no. 3, p. 281, Dec. 1985.
Comput Sci Inf Technol ISSN: 2722-3221 
AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary)
93
[6] G. Belch and M. Belch, “Advertising and promotion: an integrated marketing communications perspective,” New York McGraw-
Hil l, p. 864, 2011.
[7] J. B. Cooper and D. N. Singer, “The Role of Emotion in Prejudice,” J. Soc. Psychol., vol. 44, no. 2, pp. 241–247, Nov. 1956.
[8] E.-J. Lee and D. W. Schumann, “Explaining the special case of incongruity in advertising: combining classic theoretical
approaches,” Mark. Theory, vol. 4, no. 1–2, pp. 59–90, Jun. 2004.
[9] X. Nan and R. J. Faber, “Advertising theory: reconceptualizing the building blocks,” Mark. Theory, vol. 4, no. 1–2, pp. 7–30, Jun.
2004.
[10] R. E. Petty, J. T. Cacioppo, and D. Schumann, “Central and peripheral routes to advertising effectiveness: the moderating tole of
involvement,” J. Consum. Res., vol. 10, no. 2, p. 135, Sep. 1983.
[11] I. C. Popescu, “Comunicarea în marketing : concepte, tehnici, strategii,” a II-a, Ed. Uranus, Bucuresti, p. 271, 2003.
[12] R. E. Smith and X. Yang, “Toward a general theory of creativity in advertising: examining the role of divergence,” Mark. Theory,
vol. 4, no. 1–2, pp. 31–58, Jun. 2004.
[13] E. Thorson and J. Moore, Integrated communication: swynergy of persuasive voices. Psychology Press, 1996.
[14] D. Vakratsas and T. Ambler, “How advertising works: what do we really know?,” J. Mark., vol. 63, no. 1, pp. 26–43, Jan. 1999.
[15] R. Vaughan, “How advertising works: a planning model. .. putting it all together,” Advert. & Soc. Rev., vol. 1, no. 1, 2000.
[16] W. M. Weilbacher, “Point of view: does advertising cause a ‘hierarchy of effects’?,” J. Advert. Res., vol. 41, no. 6, pp. 19–26,
Nov. 2001.
[17] M. Dayalan, “MapReduce: simplified data processing on large cluster,” Int. J. Res. Eng., vol. 5, no. 5, pp. 399–403, Apr. 2018.
[18] D. Borthakur et al., “Apache hadoop goes realtime at Facebook,” in Proceedings of the 2011 international conference on
Management of data - SIGMOD ’11, 2011, p. 1071.
[19] D. Jiang, B. C. Ooi, L. Shi, and S. Wu, “The performance of mapreduce: An in-depth study,” Proc. VLDB Endow., vol. 3, no. 1–
2, pp. 472–483, 2010.
[20] A. Pavlo et al., “A comparison of approaches to large-scale data analysis,” in Proceedings of the 2009 ACM SIGMOD
International Conference on Management of data, 2009, pp. 165–178.
[21] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” ACM SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29–43,
Dec. 2003.
[22] B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy, “SCALLA: A platform for scalable one-pass analytics using
MapReduce,” ACM Trans. Database Syst., vol. 37, no. 4, pp. 1–43, Dec. 2012.
[23] D. Logothetis, C. Trezzo, K. C. Webb, and K. Yocum, “In-situ MapReduce for log processing,” in Proceedings of the 2011
USENIX Annual Technical Conference, USENIX ATC 2011, 2019, pp. 115–129.
[24] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay scheduling: a simple technique for
achieving locality and fairness in cluster scheduling,” in EuroSys’10 - Proceedings of the EuroSys 2010 Conference, 2010, pp.
265–278.
[25] N. Backman, K. Pattabiraman, R. Fonseca, and U. Çetintemel, “C-MR: continuously executing MapReduce workflows on multi-
core processors,” in MapReduce ’12 - 3rd International Workshop on MapReduce and Its Applications, 2012, pp. 1–8.
[26] R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul, “Stream as you Go: the case for incremental data access and
processing in the cloud,” in 2012 IEEE 28th International Conference on Data Engineering Workshops, 2012, pp. 159–166.
[27] T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears, “MapReduce online.,” in Nsdi, 2010, vol. 10, no.
4, p. 20.
[28] A. Verma, B. Cho, N. Zea, I. Gupta, and R. H. Campbell, “Breaking the MapReduce stage barrier,” Cluster Comput., vol. 16, no.
1, pp. 191–206, Mar. 2013.
[29] M. Elteir, H. Lin, and W. Feng, “Enhancing MapReduce via asynchronous data processing,” in 2010 IEEE 16th International
Conference on Parallel and Distributed Systems, 2010, pp. 397–405.
[30] M. Burrows, “The Chubby lock service for loosely-coupled distributed systems,” in OSDI 2006 - 7th USENIX Symposium on
Operating Systems Design and Implementation, 2006, pp. 335–350.
[31] F. Chang et al., “BigTable: a distributed storage system for structured data,” OSDI 2006 - 7th USENIX Symp. Oper. Syst. Des.
Implement., vol. 26, no. 2, pp. 205–218, 2006.
[32] R. Vernica, A. Balmin, K. S. Beyer, and V. Ercegovac, “Adaptive MapReduce using situation-aware mappers,” in Proceedings of
the 15th International Conference on Extending Database Technology - EDBT ’12, 2012, p. 420.
[33] Z. Guo, G. Fox, and M. Zhou, “Investigation of data locality and fairness in MapReduce,” in Proceedings of third international
workshop on MapReduce and its Applications Date - MapReduce ’12, 2012, p. 25.
[34] M. Hammoud and M. F. Sakr, “Locality-aware reduce task scheduling for mapreduce,” in Proceedings - 2011 3rd IEEE
International Conference on Cloud Computing Technology and Science, CloudCom 2011, 2011, pp. 570–576.
[35] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in 2010 IEEE 26th Symposium on Mass
Storage Systems and Technologies (MSST), 2010, pp. 1–10.

More Related Content

Similar to AdMap: a framework for advertising using MapReduce pipeline

A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service IndexingA New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexingijdms
 
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET Journal
 
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...ijaia
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
 
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...IIRindia
 
Efficient Cost Minimization for Big Data Processing
Efficient Cost Minimization for Big Data ProcessingEfficient Cost Minimization for Big Data Processing
Efficient Cost Minimization for Big Data ProcessingIRJET Journal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSBIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSaciijournal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET Journal
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Improving Service Recommendation Method on Map reduce by User Preferences and...
Improving Service Recommendation Method on Map reduce by User Preferences and...Improving Service Recommendation Method on Map reduce by User Preferences and...
Improving Service Recommendation Method on Map reduce by User Preferences and...paperpublications3
 

Similar to AdMap: a framework for advertising using MapReduce pipeline (20)

[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
 
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service IndexingA New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
 
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
 
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
 
Efficient Cost Minimization for Big Data Processing
Efficient Cost Minimization for Big Data ProcessingEfficient Cost Minimization for Big Data Processing
Efficient Cost Minimization for Big Data Processing
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSBIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving Environment
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
E018142329
E018142329E018142329
E018142329
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Ck34520526
Ck34520526Ck34520526
Ck34520526
 
Improving Service Recommendation Method on Map reduce by User Preferences and...
Improving Service Recommendation Method on Map reduce by User Preferences and...Improving Service Recommendation Method on Map reduce by User Preferences and...
Improving Service Recommendation Method on Map reduce by User Preferences and...
 

More from CSITiaesprime

Hybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networksHybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networksCSITiaesprime
 
Implementing lee's model to apply fuzzy time series in forecasting bitcoin price
Implementing lee's model to apply fuzzy time series in forecasting bitcoin priceImplementing lee's model to apply fuzzy time series in forecasting bitcoin price
Implementing lee's model to apply fuzzy time series in forecasting bitcoin priceCSITiaesprime
 
Sentiment analysis of online licensing service quality in the energy and mine...
Sentiment analysis of online licensing service quality in the energy and mine...Sentiment analysis of online licensing service quality in the energy and mine...
Sentiment analysis of online licensing service quality in the energy and mine...CSITiaesprime
 
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...CSITiaesprime
 
The impact of usability in information technology projects
The impact of usability in information technology projectsThe impact of usability in information technology projects
The impact of usability in information technology projectsCSITiaesprime
 
Collecting and analyzing network-based evidence
Collecting and analyzing network-based evidenceCollecting and analyzing network-based evidence
Collecting and analyzing network-based evidenceCSITiaesprime
 
Agile adoption challenges in insurance: a systematic literature and expert re...
Agile adoption challenges in insurance: a systematic literature and expert re...Agile adoption challenges in insurance: a systematic literature and expert re...
Agile adoption challenges in insurance: a systematic literature and expert re...CSITiaesprime
 
Exploring network security threats through text mining techniques: a comprehe...
Exploring network security threats through text mining techniques: a comprehe...Exploring network security threats through text mining techniques: a comprehe...
Exploring network security threats through text mining techniques: a comprehe...CSITiaesprime
 
An LSTM-based prediction model for gradient-descending optimization in virtua...
An LSTM-based prediction model for gradient-descending optimization in virtua...An LSTM-based prediction model for gradient-descending optimization in virtua...
An LSTM-based prediction model for gradient-descending optimization in virtua...CSITiaesprime
 
Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...CSITiaesprime
 
Designing a framework for blockchain-based e-voting system for Libya
Designing a framework for blockchain-based e-voting system for LibyaDesigning a framework for blockchain-based e-voting system for Libya
Designing a framework for blockchain-based e-voting system for LibyaCSITiaesprime
 
Development reference model to build management reporter using dynamics great...
Development reference model to build management reporter using dynamics great...Development reference model to build management reporter using dynamics great...
Development reference model to build management reporter using dynamics great...CSITiaesprime
 
Social media and optimization of the promotion of Lake Toba tourism destinati...
Social media and optimization of the promotion of Lake Toba tourism destinati...Social media and optimization of the promotion of Lake Toba tourism destinati...
Social media and optimization of the promotion of Lake Toba tourism destinati...CSITiaesprime
 
Caring jacket: health monitoring jacket integrated with the internet of thing...
Caring jacket: health monitoring jacket integrated with the internet of thing...Caring jacket: health monitoring jacket integrated with the internet of thing...
Caring jacket: health monitoring jacket integrated with the internet of thing...CSITiaesprime
 
Net impact implementation application development life-cycle management in ba...
Net impact implementation application development life-cycle management in ba...Net impact implementation application development life-cycle management in ba...
Net impact implementation application development life-cycle management in ba...CSITiaesprime
 
Circularly polarized metamaterial Antenna in energy harvesting wearable commu...
Circularly polarized metamaterial Antenna in energy harvesting wearable commu...Circularly polarized metamaterial Antenna in energy harvesting wearable commu...
Circularly polarized metamaterial Antenna in energy harvesting wearable commu...CSITiaesprime
 
An ensemble approach for the identification and classification of crime tweet...
An ensemble approach for the identification and classification of crime tweet...An ensemble approach for the identification and classification of crime tweet...
An ensemble approach for the identification and classification of crime tweet...CSITiaesprime
 
National archival platform system design using the microservice-based service...
National archival platform system design using the microservice-based service...National archival platform system design using the microservice-based service...
National archival platform system design using the microservice-based service...CSITiaesprime
 
Antispoofing in face biometrics: A comprehensive study on software-based tech...
Antispoofing in face biometrics: A comprehensive study on software-based tech...Antispoofing in face biometrics: A comprehensive study on software-based tech...
Antispoofing in face biometrics: A comprehensive study on software-based tech...CSITiaesprime
 
The antecedent e-government quality for public behaviour intention, and exten...
The antecedent e-government quality for public behaviour intention, and exten...The antecedent e-government quality for public behaviour intention, and exten...
The antecedent e-government quality for public behaviour intention, and exten...CSITiaesprime
 

More from CSITiaesprime (20)

Hybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networksHybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networks
 
Implementing lee's model to apply fuzzy time series in forecasting bitcoin price
Implementing lee's model to apply fuzzy time series in forecasting bitcoin priceImplementing lee's model to apply fuzzy time series in forecasting bitcoin price
Implementing lee's model to apply fuzzy time series in forecasting bitcoin price
 
Sentiment analysis of online licensing service quality in the energy and mine...
Sentiment analysis of online licensing service quality in the energy and mine...Sentiment analysis of online licensing service quality in the energy and mine...
Sentiment analysis of online licensing service quality in the energy and mine...
 
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...
 
The impact of usability in information technology projects
The impact of usability in information technology projectsThe impact of usability in information technology projects
The impact of usability in information technology projects
 
Collecting and analyzing network-based evidence
Collecting and analyzing network-based evidenceCollecting and analyzing network-based evidence
Collecting and analyzing network-based evidence
 
Agile adoption challenges in insurance: a systematic literature and expert re...
Agile adoption challenges in insurance: a systematic literature and expert re...Agile adoption challenges in insurance: a systematic literature and expert re...
Agile adoption challenges in insurance: a systematic literature and expert re...
 
Exploring network security threats through text mining techniques: a comprehe...
Exploring network security threats through text mining techniques: a comprehe...Exploring network security threats through text mining techniques: a comprehe...
Exploring network security threats through text mining techniques: a comprehe...
 
An LSTM-based prediction model for gradient-descending optimization in virtua...
An LSTM-based prediction model for gradient-descending optimization in virtua...An LSTM-based prediction model for gradient-descending optimization in virtua...
An LSTM-based prediction model for gradient-descending optimization in virtua...
 
Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...
 
Designing a framework for blockchain-based e-voting system for Libya
Designing a framework for blockchain-based e-voting system for LibyaDesigning a framework for blockchain-based e-voting system for Libya
Designing a framework for blockchain-based e-voting system for Libya
 
Development reference model to build management reporter using dynamics great...
Development reference model to build management reporter using dynamics great...Development reference model to build management reporter using dynamics great...
Development reference model to build management reporter using dynamics great...
 
Social media and optimization of the promotion of Lake Toba tourism destinati...
Social media and optimization of the promotion of Lake Toba tourism destinati...Social media and optimization of the promotion of Lake Toba tourism destinati...
Social media and optimization of the promotion of Lake Toba tourism destinati...
 
Caring jacket: health monitoring jacket integrated with the internet of thing...
Caring jacket: health monitoring jacket integrated with the internet of thing...Caring jacket: health monitoring jacket integrated with the internet of thing...
Caring jacket: health monitoring jacket integrated with the internet of thing...
 
Net impact implementation application development life-cycle management in ba...
Net impact implementation application development life-cycle management in ba...Net impact implementation application development life-cycle management in ba...
Net impact implementation application development life-cycle management in ba...
 
Circularly polarized metamaterial Antenna in energy harvesting wearable commu...
Circularly polarized metamaterial Antenna in energy harvesting wearable commu...Circularly polarized metamaterial Antenna in energy harvesting wearable commu...
Circularly polarized metamaterial Antenna in energy harvesting wearable commu...
 
An ensemble approach for the identification and classification of crime tweet...
An ensemble approach for the identification and classification of crime tweet...An ensemble approach for the identification and classification of crime tweet...
An ensemble approach for the identification and classification of crime tweet...
 
National archival platform system design using the microservice-based service...
National archival platform system design using the microservice-based service...National archival platform system design using the microservice-based service...
National archival platform system design using the microservice-based service...
 
Antispoofing in face biometrics: A comprehensive study on software-based tech...
Antispoofing in face biometrics: A comprehensive study on software-based tech...Antispoofing in face biometrics: A comprehensive study on software-based tech...
Antispoofing in face biometrics: A comprehensive study on software-based tech...
 
The antecedent e-government quality for public behaviour intention, and exten...
The antecedent e-government quality for public behaviour intention, and exten...The antecedent e-government quality for public behaviour intention, and exten...
The antecedent e-government quality for public behaviour intention, and exten...
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

AdMap: a framework for advertising using MapReduce pipeline

  • 1. Computer Science and Information Technologies Vol. 3, No. 2, July 2022, pp. 82~93 ISSN: 2722-3221, DOI: 10.11591/csit.v3i2.pp82-93  82 Journal homepage: http://iaesprime.com/index.php/csit AdMap: a framework for advertising using MapReduce pipeline Abhay Chaudhary1 , K R Batwada2 , Namita Mittal2 , Emmanuel S. Pilli2 1 Department of Computer Science and Engineering, Vellore Institute of Technology-AP, Andhra Pradesh, India 2 Department of Computer Science and Engineering, Malaviya National Institute of Technology, Jaipur, India Article Info ABSTRACT Article history: Received Mar 5, 2021 Revised May 29, 2022 Accepted Jun 12, 2022 There is a vast collection of data for consumers due to tremendous development in digital marketing. For their ads or for consumers to validate nearby services which already are upgraded to the dataset systems, consumers are more concerned with the amount of data. Hence there is a void formed between the producer and the client. To fill that void, there is the need for a framework which can facilitate all the needs for query updating of the data. The present systems have some shortcomings by a vast number of information that each time lead to decision tree-based approach. A systematic solution to the automated incorporation of data into a Hadoop distributed file system (HDFS) warehouse (Hadoop file system) includes a data hub server, a generic data charging mechanism and a metadata model. In our model framework, the database would be able to govern the data processing schema. In the future, as a variety of data is archived, the datalake will play a critical role in managing that data. To order to carry out a planned loading function, the setup files immense catalogue move the datahub server together to attach the miscellaneous details dynamically to its schemas. Keywords: Advertising Advertising and publishing Data lake Data warehouse Hadoop distributed file system Map reduce This is an open access article under the CC BY-SA license. Corresponding Author: Abhay Chaudhary Department of Computer Science and Engineering, Vellore Institute of Technology Amaravati, Andhra Pradesh, India Email: abhaychaudharydps@gmail.com 1. INTRODUCTION Advertising can be called a convergent feature represented in two essential fields of science: interaction and sales. From the scholarly and realistic roots, an advertisement was tackled both as a form of networking for those interested in contemporary economic practices, and in addressing the communication issues of separate organisations, for example, the newspaper. In general, advertisement and networking are elements of modern social and economic processes [1], [2]. In culture today, advertisement has developed into a dynamic communication mechanism that is important for the public as well as organisations. Over time, messaging has become an integral aspect of marketing programmes, with the potential to submit letter precisely planned for its goals. During their goods and services in the global markets, various enterprises, beginning with multinational and local companies, assign growing significance to the advertisement [3]. Consumers have learned to use commercial data in their purchasing decisions in working consumer economies [4], [5]. Besides broad scope text and data analysis, MapReduce is indeed an integer programming model. Conventional MapReduce implementation, for instance, its Hadoop, needs to access the entire collection into some kind of cluster even before evaluation is performed [6]. In situations whereby content is massive, data will not be enabled but stored again, and again-for example of logs, safety reports even protected messages-
  • 2. Comput Sci Inf Technol ISSN: 2722-3221  AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary) 83 this will lead to significant congestion. We suggest an approach to the data pipeline to mask data latency in the study of MapReduce. Our Hadoop MapReduce-based implementation is utterly transparent to the user. The distributed competitor queue includes the allocation and syncing of data blocks such that data upload and implementation overlap. This article addresses main concerns: the specific limit of mapping, as well as the configurable interactive amount of mapping, facilitate unidentified data entry. We also use a delay scheduler for data pipeline data localisation. The test of the solution for multiple real-world data sets implementations reveals that our methodology demonstrates efficiency improvements [7], [8]. In business and academia, the MapReduce method has significant advantages. It offers a basic programming paradigm with optimised fault tolerance and parallel synchronisation for large-scale data processing [9]. The previous design of MapReduce follows a hierarchical data format with huge volumes reflected throughout hierarchical files. The framework maps its activities then decreases them which fit for regional data sources. Clickstream log analysis should be used by programmes that can benefit from a systematic approach to data loading. The algorithms are searched linearly, page by page, to allow streaming approaches to load data. In MapReduce, we recommend an approach to data pipelines. As the logical unit of the stream, the databank uses the storage node [10]. That is, once a block is fully uploaded, it shall begin processing. Map tasks are begun, providing overlapped execution and the data up load before the final upload process is complete. In specific, to coordinate the distributed file system, we suggest adding a distributed competition queue and task elimination for MapReduce. The filesystem is a creator and generates block metadata on the queue, while MapReduce employees are a customer and shall be alerted when block metadata are available for them. Standard MapReduce practises data from the input and begins a sequence of map activities dependent on the number of input separates.  A pipeline-coordinating architecture for data collection  Processing with MapReduce.  Two types of work visualisation were vague data processing of location understanding or avoidance of error.  An apparent client data pipeline execution that does not require revisions to the current code.  Experimental findings showed better results. In the science world, prototype matching strategies are omnipresent. Matching techniques have been applied in an array of research disciplines ranging as of medical electrocardiogram evaluation to gene expression in genetics, communications signature identification, geoscience seismic signals processing and computer vision photo-tracking and recognition [11]. Model matching is a conventional approach since it is accurate, simple to read, and can be measured or prototyped quickly. To evaluate correlations between two objects, which are single or multi-dimensional signals, such as time series of images, the basic definition of matching models is there. There are different methods to measure resemblance and including total variance. In this paper, we will be discussing the background of the Advertising field and its heterogeneous relationship with data lake. Further, the paper talks about the related work done in the field of advertising and overall review of the novel approach towards advertising and MapReduce framework. The last section summarises the work present in this paper covers the future scope in detail. 2. BACKGROUND Advertising is one of the methods to sell goods to customers. Many ads are available selling attractive advertising packages. In revealing the positive and the poor to the public, the media plays a significant role [12], [13]. Local media have been Internationalised with the proliferation of emerging technology and public demand. The technique concerns the efficient loading of heterogeneous data sources to an ever-evolving data warehouse. More precisely, innovation involves the usage of a MapReduce system for meta-data driven data ingestion. Usually, a data store is designed on the highest of an accessible collection framework, for instance, Hadoop in the broad data sector [14]. Hadoop is a distributed, open-source computer environment that uses MapReduce. MapReduce is a system for addressing massive data sets with a massive range of very distributed problems. Many machines, or grids (if nodes are distinct in hardware), are collectively called the cluster (when all nodes are prepared with the same hardware). Program processing may happen on either an unstructured or a structured file system database. In Figure 1, the practical schema 2 takes the sequence of operations carried out by average internet user and all the metadata collected by the user into account. Chart step: The network layer splits your feedback through minor annoyances then allocates everything to either the branches of the team. An employee's path that contributes to something like a layout of a network in several sections is being done once more. The employee cluster deals more with a larger problem, then gives itself the primary cluster.
  • 3.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93 84 Lower step: The master node then gathers answers to all the subproblems and integrates them to form the performance in some manner, i.e. the solution to the question that it wanted to solve. Figure 1. Advertising ecosystem for heterogeneous data MapReduce enables map analysis and reduction operations to be spread. While increasing routing procedure can be done simultaneously, independently of specific processing procedures, throughout the fact that amount with independent data sources as well as the number of CPUs near the increasing source is limited [15]. Similarly, a variety of reduction stages can indeed be carried through given that only certain operations and in the chart were concurrently displayed to the very same absorber with much the same button. MapReduce may be extended to far bigger databases than generic servers. However, this method can sometimes seem to have been inefficient in comparison to more sequential algorithms. Therefore, massive server farm will use MapReduce in just a few hours to sort a petabyte of results. Parallelism also allows us to recover from a partial server failure or storage during such kind of operation if one work can be changed as long as the input data is available, only if mapper or reducer fails [16]. A distributed, flexible, and versatile file scheme transcribed in Java designed for the Hadoop project function is the Hadoop file system (HDFS). Usually, every Hadoop node has one data node; the HDFS cluster is a cluster of knowledge nodes. The condition is reasonable since not every node needs the existence of a data node. A block protocol specific to HDFS is used for every single statistics node to serve chunks of information across the system [17]. The file system is communicated using the transmission control protocol/internet protocol (TCP/IP) layer. Customers are using remote procedure call (RPC) for intercommunication. HDFS holds huge files on many computers (the optimal file size is 64 MB). Reliability is accomplished by multiple host replication of the data, such that redundant array of independent disks (RAID) storage on hosts is not necessary. Data on three nodes are stored with the default replication value, i.e. 3; two on one stack as well as any on an additional stack. Information nodes will converse with each other re-equilibrate information, push versions all around and maintain high data replication [18]. It is a practical examination of how to effectively load heterogeneous data sources into a data centre for everchanging schemas. 3. RELATED WORK Exploration has investigated pipeline/data running for data latency reduction. Common metadata repository (CMR) corresponds downloading and retrieval of data by storage of input data in a memory buffer. C-MR is restricted to operate on a single computer and thus unfit for large data sets. Too, C-MR needs an application programming interfaces (API) that eliminates a user's need for a new API. Claudia et al. to overlap data transfer and process, implement a stream processing engine. This method does not work with the programming model MapReduce and does not tolerate job failure. The distinction in our work is that we retain the standard programming model MapReduce and use existing features from Hadoop such as failure tolerance, and high scalability [19]. Moreover, we are fully open to consumers in our implementation. The conventional cross-correlation techniques for single example create a linear statistical relationship between the unique features of two objects. To look for the highest correlation coefficient, a formula is serially interpreted over a signal or image, and a match may be found when a threshold requirement is met. Continuous approaches also take models into account [20]. A template package serves as
  • 4. Comput Sci Inf Technol ISSN: 2722-3221  AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary) 85 a proof source of identification in one single signal of interest of numerous features. Any template in the set represents a hypothesis; the highest coefficient template is seen to be the winner. Three questions arise:  What can be achieved with this serial? Algorithms do not scale more massive datasets,  What should be returned if several templates  Record the coefficients for identical correlations) how sure is the stated template to be true? 4. ARCHITECTURE OF AdMap This technique provides a general approach to data intake automatically in an HDFS-based data warehouse. Amongst other modifications is a data jack, a standard data loading pipelines and an multiple dwelling units (MDU) pattern, which together tackle the reliability of the loading of data, heterogeneities of data sources and changing data warehouse schemas. The meta-data model includes configuration and catalogue files setup per ingestion task is the configure file. The database administers the schema for the data centre [21]. Towards inserting the diverse information to the objective outlines, the configuration files in addition to the catalogue collaboratively drive the data subway. In one particular use, the technique's implementations provide methods for automating the loading of marketing data into a database. If customer requirements increase and all sources of an online campaign, video, social and email. are integrated, there is a wide range of media channels. Marketers usually distribute marketing transactions through multiple platforms, not just to a single advertising supplier. Such marketers and advertisers are thus overarchingly concerned. It should be able to access an all-in-one central dashboard and then comes up with a comprehensive perspective wherever the advertising plan happens expended in addition to the overwhelming aggregate expenditures on the various broadcasting networks [22]. The creative measures are a fixed-key software intake system to promote this criterion. There are various data schemas across heterogeneous data sources through these media networks. To incorporate these various data sets into a specific system, marketers and advertisers will render detailed schema mappings. Therefore, the heterogeneous schema is a part of the innovation. Figure 1 is the illustration which shows the advertisement network, which displays various traces of information such as advertisement cluster. As shown in Figure 1, Tv Channels 1-3 and Video Channel 1 operate for an advertiser automated border control (ABC). The report information for the ABC advertiser is provided for each channel. The ABC advertiser needs to incorporate stories into a specific network system, such as Amobee. The advertiser demands the transmission to the popular media schema for each platform that it deals with [23]. This displays a basic query against the raising schema Report. In Figure 1 installation of ingestion demands the advertiser ABC to takes place at different times. ABC asked Channel 1 marketers to submit the monitoring data to Short Schema Versions 1 on 2 April 2020. Set up a configuration tab to periodically enter the data every day. 2 months later, on 2 June 2020, ABC demanded the sending to the popular schema version of Channel 2 of its reports info. Thus a configuration file has been created. One month after, on 3 July 2020, Video Channel 1 is used as broadcast by ABC advertiser. The new channel was told to submit a report to the standard schema of Turn. Nevertheless, as the video network has several distinct areas which remained non specified by Turn's Schematic, edition 1, standard schematic has been created by introducing two additional columns (pause case, case complete). Video channel 1 can be modified with these changes Ingesting data into the standard Schema of Turn. ABC requests to configure a second configuration to absorb data on channel 3, one month later, on 10 August 2020. This time, the configuration file was created because of the schema. The current schema targets version 2. In this dynamic business scenario, the novel approach makes the intake request very smooth. As the standard system changes, a conventional design setup can rarely be modified [24]. After heterogeneous data is ingested, you can view across an over-network statement that gives a worldwide viewpoint of broadcasting consumption meant for advertisers by merely querying the standard schema. 4.1. Data ingestion Data pipelines move oil data from platform and database sources to computational and business intelligence (BI) tools from software-as-service (SaaS) data management systems. Now businesses may use a range of sources to pick data analytics based on big data. They need access to all their analytical and commercial intelligence sources to make better decisions. The data then will be ingested into the Hadoop Cluster, which then will process the data as per requirements. Figure 2 is a schematic cluster drawing which shows the architecture by the technique for loading data. Data from various file transfer protocol (FTP) servers are received as well as inserted in the Hadoop collection that includes numerous personal computers linked to the service in conjunction with the management of Hadoop software [25]. A data hub server is in charge of monitoring the loading of data. The server generates a range of tasks to perform the loading task.
  • 5.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93 86 Some work is done on the datahub server; depending on the work nature, some work is carried out on the Hadoop cluster. The datahub system also tracks and schedules pipeline function through the Hadoop and ZooKeeper system communications. Figure 2. Schematic cluster drawing A Zookeeper cluster remains dispersed public synchronisation provision used for dispersed submissions for this discussion. The event shows that a straightforward establishment of primitives knows how to be used by distributed applications to implement higher-level synchronisation services such as maintenance, and calling classes. It is developed uncomplicated to plan as well as utilises an information prototype constructed according to well-known file system directory tree layout. It is operated in Java, and it has Java and C bindings. The professional individual should understand the elements of the technology like Hadoop, MapReduce, ZooKeeper, and so forth and become acquainted with them for the debate. However, the innovation herein must be understood that it is not restricted to the usage of these elements alone and in some combination of some specific nature of these elements. 4.2. Datahub server Figure 3 is a representation which shows the data hub cluster as a data packing generic innovation platform. Extracts, transforms, and loads (ETL) files in every ingest process are taken through a pipeline to the target. The tube has 5 linearly dependent jobs running one after the other. The data intake pipeline is subject to a particular task for each job. The system creation has been tracked by a condition report residing inside a perpetual medium along with a Hadoop cluster. A condition report can always be coordinated from Kubernetes applications through some kind of networking link between handling huge server, as shown in Figure 2 [26]. The tube is working sequentially. To complete its corresponding task, it can call on MapReduce for each stage on the Hadoop cluster. Figure 3. Data loading framework The data loading process for the datahub server consists of 5 steps, a) Transfer and procedure: This role runs mostly on the data hub network. This relates towards a document or network progress report, which helps one can decide if and how the sources could be downloaded into a locally owned and operated repository but instead transferred to the datahub client (ideally uncompressed if required).
  • 6. Comput Sci Inf Technol ISSN: 2722-3221  AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary) 87 b) Checking safety: It is a MapReduce function, runs on a Hadoop cluster by a data hub application. It once analyses the feedback documents and determines whether the response information is a legitimate source of statistics. This moves correct response documents to the later step during the channel; then a diagram decreases function at this time on the way to decrease the information analysing period dramatically. c) MR sign up job: a MapReduce position on Hadoop cluster that is driven by the Datahub server. It initially reads both the files of new customers as well as the available statistics depository documents in the destination. Subsequently, this one combines the two data points to be consumed for the following task. Again, the map reduction job is used to analyses data efficiently. d) Commitjob: This job is a simple wrap-up work on a Hadoop cluster driven by the data hub server. It renames the previous work output folders of MapReduce into an output folder, whose contents are used for input. It additionally revises channel status records toward how the stacking progresses. e) Data Consumption: This work is a Hadoop cluster MapReduce work. It uses the entire connection production after the preceding periods of the channel as well as incorporates the results incorporating connection keen on target information documents. 4.3. Metadata management The datahub repository offers a framework through computation to navigate input information to both the objective. Through multiple pipelines examples, its documentation is examined to individual ingest activities. In Figure 3 during the update as well as convert function, each configuration file is read in to decide which FTP site to be accessed, how the passwords should be obtained, how documents should be searched and how they should be uploaded to Hadoop cluster. Then, you have to verify its fitness, write data input and then use that to search it. Throughout addition, a server will then be reviewed for analysis of quantitative evidence to compliance with either the defined objective scheme [27]. The MapReduce career entry appears throughout the archives but database for similar work. Then commitment role inspects the file system to determine where even the information from either the prior employer was located throughout the system. The partition of curriculum and metadata benefits from clean-cut between program and metadata so that the optimisation of programs and the modelling of workflows can be carried out independently and generally. The discussion details the modelling of metadata during data intake. Several sections compose the meta-data modelling. The first component schema modelling is used to build a database, and the second part was modelling the system configuration ration where a configuration file is set up per ingestion mission. 4.3.1. Data cataloguing Changing business needs often change in the standard schema in big data management. Schema evolution is a must without code modification. Inventory configuration modifications to a goal schema use the features of the schema. Metadata IDs: Such a pattern suggests an integral series resembling other accessible schemes. For just about any column structure, a simple datatype becomes also allocated as ID. For, e.g., Database IDs=1,2,3 implies that there have been 3 open. ID. Name: The short name of the specified Database table is stored on this site. ID.latestVersion: The last version of the ID-identified table is stored in this property. 1.latest version=3, for example, means that the Schema has 3 editions, as well as the newest edition is 3. ID.Storage.Hadoop Cluster: It stores the utter HDFS route to maintain the table file. ID.description: It stores the ID-discovered versioned table schematic. ID.Version. Default: A default value of this object is transferred to the versioned table schema defined by the ID. E.g., 1.1.default=1.0 is the default value for the first column, and 0 is the second column's default value. The structure herein shows the past of developing any schema with the properties mentioned. Therefore, a record from a prior iteration of a system to a subsequent model of the same system may be continuously created by checking the database. Suppose a database is usable in schema version K. The record may be built to the same schema in two stages, but the K+1 version is the default value [28]. Step 1: The system will first create a default log using K+1, which is then instantiated the default record with K+1 version's default values. Step 2: The program would then scan the database for the type K variations from K+1 schemas edition, which utilises edition K's information automatically towards overwriting the information in Phase 1. avoidance database. When the column of Version K is equivalent to the column of Version K+1, a similar copy would be rendered and, when possible, typecast. If a variant K column does not have this relationship, the column would be removed. The new record created containing the K+1 edition of the schematic along with one or the other edition K's before the nonappearance K+1 version values after the two steps are completed.
  • 7.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93 88 4.3.2. Configurational file The complexity of reconciling data sources is another problem for big data incorporation. Various data providers are provided with different local schemas and hardware. It is necessary to deal with the inherited heterogeneities precisely and efficiently to centralise all types of heterogeneous data and to manage these data. The modifications include a framework which uses data ingestion configuration file setup to deal with this issue [29], [30]. Specifically, schema mapping and various problems of lanes heterogeneity are necessary to address. Like date format, and location of the FTP site. The following software file properties are required to satisfy the criteria for, ingestion of function and can quickly be extended: schema, schema version for destination: these two connections signify the schema and version in which the source files are used. For, e.g., schemaID=1,schemaVersion=1 implies that the ingestion of the most tasked source info. Pattern Date: this property specifies the date format of the source data. For examples, date pattern=M/d/yyyy implies that a particular date format is used for the source data, as is 06/04/2020 10:12. Schema Mapping: this property describes the mapping amongst the schemes of the source statistics and the schema of the objective. Examples, mapping=-1,4 implies that there is no corresponding column on the first database root section throughout the goal database, as well as the Root Database, refers to the final endpoint database section. Separate Name: This property illustrates the Root data physical table partition. E.g., partition name = Turn_44 implies in the named partition that the source files are swallowed into the Schema Turn_44. Protocol: Its domain defines the file transfer protocol. For example, secure file transfer protocol (SFTP) can be used to collect data from the data source using the secure FTP protocol. UserID: This attribute specifies the identity of the user used to access the application in the database. Password: This property specifies the keyword used to link to the registry. Ingesting separate source code details into a single database the structure of the catalogue model is version-based. As such, separate copies of data from the same system may be modified inside the data centre quickly [31]. This novel approach models provide a different method of loading and querying. The establishment of a single generalisation as well as an API to manage to understand is a significant part of that method:  Abstraction from records: record is a class (data structure) which stores 1-fold of a schema and a combination of versions. The meaning list and a versioned schema are used. The data binary is stored in the meaning sequence. For the data, the schema maintains the metadata.  ConvertToLatestSchema(): The record class has a convertToLatestSchema() function. The current highest remains transferred to the most recent edition of the Schematic documents when this function is invoked. Figure 4 is a database framework showing various configuration settings for instructing particular user data in several implementations of almost the given format, with techniques between one chart reduction feature consummated to both the diversity within each organisational information and knowledge's version [31]. For a place of a series, where different Format Variants should be included in the virtual machine, that documenting scripting language is being used to automatically accommodate the diversity. In Figure 4 client 1, client 2, several setup data through version 1, version 2 simplified schema. Version M is the mapping feature of the MapReduce system is where these various sources of data reach each other. The cluster supports the arrangement documents to stack into the HDFS file system and all the various data sources. Then, work is started on MapReduce to add to the existing data in the universal destination schema [32]. Specific client data may be applied to multiple implementations of the same schema because of different initialisation period. Client 1 data will, for example, be configured to go to schema version 1, and user 2 information can remain configured for schematic edition 2. To connect data to the current data in the same system, a function is performed. The MapReduce project is implemented that execute a combined job performance, that recognises every new data and information results mostly on endpoint database, which contributes to reducing a computation structure. The warning for handling the various data versions is to call convertToLatestSchema() in mapper before anything else in every record. Such compliance means that the most current iteration of the same system is eliminated [33]. In Figure 4 plotter 1 delivers the data from user 1, then that one is set to plan to the schema of edition 1. It additionally follows the data of user 2 that is set to plan to the Schema edition 2. In the same mapper, they meet each other. User 1 is parsed into a user 1 database by the InputFormat in the MapReduce system, and the client2 data is parsed into a list. The method calls to convert to the latest schema ()to the last iteration of the two different formats, and the data flow then flows from a mapper to a MapReduce frame decreases [34].
  • 8. Comput Sci Inf Technol ISSN: 2722-3221  AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary) 89 At the query time, a particular data scenario could be used to flow through a Hadoop cluster with a different implementation of the same schema. For example, on the Hadoop cluster, you want to concurrently verify various data versions of the system [35]. This experience will again be used to transform multiple iterations of data to the current edition. The machine in exemplarity of a system which undergoes of a system which undergoes of a particular set of guidelines meant for the system to execute some methodologies as shown in Figure 5. Figure 4. Schematic Illustration for different configurational setups to instruct different clients Figure 5. Schematic illustration for a machine in exemplarity of a system which undergoes of a system which undergoes of a particular set of guidelines meant for the system to execute some methodologies 4.3.3. Summary of meta data ingestion In instantaneous, the novel approach encompasses metadata-driven data intake technique to integrate Hadoop MapReduce data sources massively, including the data hub server, collection, and setup files which give the versatility required for the complex bigger ones, which constitute the currently favoured embodiment World of computer convergence. 5. ADVANTAGES OF THE PROPOSED FRAMEWORK APPROACH This novel approach embodiments are the method used to address the following questions (high ranking methodologies for solving the challenge are included under every task) among others: MapReduce jobs; Many FTP servers; Effective processing plan; Heterogeneity of contracts/key/transfer (different format Protocol/FTP/design schemas/servers); Files for Settings; Catalogue; Fault resistance (the next package auto- recovery); Store state loading on HDFS file named state pipeline. The pipeline is used to build a fault tolerance protocol; File of status; Immunity (dirty data protection); Check for fitness before loading with a safety test; Act on maps; Conflicting pipe-lines coordinate synchronisation; Use of Zookeeper's distributed lock service; Using the HDFS storage status file; Evolution of Schema (support different variants of the same
  • 9.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93 90 schema for adapting different configurations of legacy intake); Meta-tracked versions of client schemas; Catalogue of data repository. In addition to its high efficiency, this ingestion method flexibility also occurs: It's as simple to add a table as inserting any text lines; Catalog Catalogue; The development of a schema is so simple that a new version is added in the Index structure; Enter various versions of customer data at once; User 1 uses Schematic 1 Edition 1 to ingest documents; Client 2 uses schema 1 version 2 to ingest files; Client 3 uses Schema 1 Version 3 to absorb data; Customers can modify their file schema; Mapping figuration without computer change; The client may submit the order to add/delete/change; Growing destination schema fields can be used; Catalogue Change; Handles with precision all manner of variability; Integration of data (date configuration and the configuration of images). 6. CLAIMS OF THE NOVEL FRAMEWORK: AdMaP Claim 1. System for automated data intake into a data warehouse. Makes use of a MapReduce environment to ingest a variety of heterotopic data sources. Data warehousing schemas loads heterogeneous data throughout the target schemas. Claim 2. The claim 1 method whereby data sources are collected from a multitude of various media channels on marketing data. Claim 3. Network automated information ingest unit which involves: the database for the downloading of data. Relational database Integrates diverse data from several databases, a standard pipelines loading system. A multipurpose environment MapReduce Heterogeneous data sources; a processorimplemented metadata model, consisting of several configuration files and a catalogue; where a config file is installed by intake function: where this catalogue is handled via the data warehouse schema: where the data storage server is performed with a programmable data loading task; and which configurative environment. Claim 4. Claim 3 apparatus, which controls and codes the pipeline jobs via the Hadoop assemble as well as the Zookeeper. Claim 5. Entitlement 4 device where respectively incorporation job accepts, transforms besides loads the source files through a data-loading framework pipeline for the destination; The device of Claim 4. While a pipeline status file monitors the progress of the pipeline data loader framework, communication between that data hub server and the ZooKeeper server is used to sync the access file for that pipeline status. Claim 6. Claim 3, in which this pipeline data charge framework is running sequentially, requires MapReduce jobs on that Hadoop cluster for most of the stage A corresponding task is to be performed. Claim 7. A way to consume info automatically in a system Warehouse comprising: provision for the data loading tasks of a datahub server: The generic data loading system provides the MapReduce environment to ingest a plurality of heterogeneous sources and provides a meta-data model implemented by the processor that consists of a multitude of configuration file and catalogue. Where a config file is mounted per function of ingestion. Where the catalogue handles the data centre architecture: when an expected operation is conducted to load the data. Such a web datahub in which said configuration data and the catalogue collaboratively lead to the datahub server mechanically and self-sufficiently of information foundation heterogeneity and data storage schema development to load the heterogenous new data to their destination patterns; the datahub server performs the same data loading duty by transferring and converting a job that works happening that cluster. Claim 8. Claim 8 contains a health check task, MapReduce (MR) task, work intake, job update, work turn and commit, and the health check job and MR join the job and employment intake, each of which involves a job MapReduce, powered by the datahub server as well as moving on top of a Hadoop cluster. Claim 9. The claim 8 technique is that the datahub server executes the data loading task by carrying out a further step in health checking of the job; the datahub server drives MapReduce jobs to analyse and once transform input files produced by that download, determining if a valid data source is an input file and then transferring the valid input files for next task in the pipe. Claim 10. The Claim 10 solution uses a MapReduce system to run that information-loading function, interpret arrival customer data and also some highly appreciate factory data then link client records or identifying the target warehouses data in a proposed method to generate a response.
  • 10. Comput Sci Inf Technol ISSN: 2722-3221  AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary) 91 Claim 11. The claim 11 method, which is the execution of that data-loading task by renaming previous job output folders to a result directory with contents to be consumed by the ingestion job, is used as an additional step to commit the work. Claim 12. The argument 12 approaches are used to conduct this data loading function by carrying out the other phase in the input of MapReduce research, utilising any add-ons generated from previous pipeline stages and inserting the results into destination data files. The data reduction procedure is focused on details. Claim 13. A data hub server that consists of: a processor that uses the environment MapReduce to transmit source data mining; said meta-data consultation framework for various instances of a pipeline for ingesting tasks; in which meta-data modelling involves schema modelling by catalogue during data ingestion and consumer configuration modelling through job ingestion. Claim 14. The Claims 14 datahub server has mentioned catalogue supporting the evolution of schemas by modelling the schema using any of the following schematic properties, without changing framework code. A property with an integer sequence of all schemas accessible that gives a specific integer identity (ID) for. Schema; an asset that provisions a table ID expressive name: a property which stores a recent ID-identified version of a table; property stores an utter route to the Hadoop (HDFS) file system, where the ID table is saved; a property which saves a versioned table identification schema notified by ID; A stuff that stores are defaulting standards for the ID table schema: wherein each schema documents the history of the evolution of these properties; and Whereby the record can be evolved dynamically by consulting such a catalogue from a previous variety of an assumed plan to an advanced form of the identical. Claim 15. The information jump client of statement 15, where registration has been created in a model, from that to the second, is to create a standard log for the same framework, while using the same similar model where even the standard log is executed for the second iteration, to default parameters; In any case, if there wasn't such a relationship, the columns of being the first system will be lowered when fresh logging containing information of the very first model or even the scheme of a subsequent one is generated. Claim 16. Claim 14 data sub server where the configuration ration file is set up for schema mapping and other heterogeneity problems utilising the data ingestion task: a total of two properties which identify the schema and version of which the source files are supplied; a property which identifies the source date format. A property identify maps between the source Scheme data and the destination scheme: a property specifying the physical table division of the source data;a property defining the protocol for file transfer; -a property that defines the username user name A server data source and a password identification property used to log onto the server of the data source. Claim 17. The Claim 14 which a record is indeed a category, a database schema, the dictionary processing of the provided scheme, as well as the form mix of that record comprising a value array and the version scheme: where even the label range comprises a transmitted number, as well as the scheme, retains the location information; and where the schema is The database server provides a records abstraction method for reconciliations versions; Claim 18. The repository of Claim 18 data abstraction includes a feature that transforms the existing record when invoked By consulting the catalogue for the latest version of the current schema of records. Claim 19. A phase that includes: providing a trigger system for a processor MapReduce environment ageing to a route data source to the purpose. It said framework metadata consulting for different purposes pipeline instances to carry out tasks of ingestion. During data ingestion com meta-data modelling Take a collection to create destination systems and Modeling client configuration by task intake by a Folder for setup. Claim 20. Statement 20 method assisted Schema creation through the usage of any of the following schema characteristics without modifying frame codes through mapping the destination schema:a property with an integer array that represents all available schemas and assigns a distinctive figure as its identity (ID) used for each table schema; a property that holds a table identity descriptive name. Created by ID: a property that contains the new table edition. The ID: property that saves an absolute path of the Hadoop File System (HDFS) where an identity table is being stored; a property that saves a versioned ID-identified table scheme; and a property that save default values on a versioned IDdefined table schema: in which properties document changing history and a dynamic scheme can be generated A later variant of th Claim 21. The claim 21 process in which record evolution from first to second versions into the same schema includes. Creating the default record of a schema with the same second edition, which instantiates the default record of a second version in the same system; then looking for a database to locate discrepancies between the schemas in the first update and that of the second version by using the data of the first to substitute the details of that default, and if there is a connection Whereas the
  • 11.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 3, No. 2, July 2022: 82-93 92 column of the first version is deleted if no such correspondence exists, and where a new record is created that contains the schema of the second version with the data of that first version or This second version's default value. Claim 22. The method of claim 20, whereby the configuration file is configured to fix diagram plotting than other conglomeration subjects with the subsequent features per data ingestion job: no or more features within which schema or version source files are used; a feature which specifies a date format in which the sources data are used; a feature that determines maps between the source data schema properties to which the division for just a graphical column helps to define the source information; a product that identified a mechanism for transferring files; a product that identifies the user id that users log in to a computer for the data source Claim 23. In the case of documents, the layout, the storage for one double of a given schema and a defined version combination and the data archives, containing a meaning list and a versioned schema, the procedure for claim 20 requires more information: a processing abstraction facility for version reconciliation. Schema maintains meta-data in which the diagram is a version of a schema given in the catalogue in memory. Claim 24. Claim 24 also involves including a feature that invokes the new iteration of the current record structure, converting the old record to the latest version. Claim 25. A mechanism for data input to data storage that includes: a datahub server to monitor one or more files to load heterogeneous data sources onto a Hadoop (HDFS) file system. The server to start a MapReduce task to add all heterogeneous data sources with current data in the standard destination. Claim 26. An automated data input device in a data warehouse that includes one or more configuration files on the datacube server. Loading a plurality of heterogeneous database in Hadoop (HDFS); said server introducing a MapReduce vacancy to reach others. The heterogeneous databases with existing ones in the standard schema and the data hub server, which carries out a joint task of connecting the user data to existing data of the same schema by launching a MapReduce. Claim 27. As we have been finding, significant data policy is cost-effective, analytical, versatile, plug-in and personalised technology bundles. Entities that are joining the field of big data have discovered it is not only a trend towards life but a voyage. Big data creates an open area of unparalleled problems involving rational and empirical use of innovations powered by evidence. Claim 28. Data-incidents through reference to storage were accessed. Activity intake becomes extremely quick since it is a memory-based process. A tragedy like some kind of collapse of investigators or even equipment failures can lead to a loss of data, though, because as detected improvements were essentially unpredictable. Essential business operations are not a safe option, but storage interface records may indeed be set. 7. CONCLUSION ANF FUTUTRE SCOPE The novel approach provides a general approach for the automatic intake of data into an HDFS datagram. Included in the modification process are a data hub, a generic data loading frame and a metadata model to address the efficiency of data load, heterogeneity of data sources and evolution of warehouse schemas. The MapReduce generic loading framework is datahub charging efficient framework. The meta- data architecture comprises of directories and a database. The configurations file is designed for each mission. The database administers the schema for the data centre. When a planned information loading function is done, and the datahub setup and catalogues are shared to attach the heterogeneous data to their schemas dynamically. The framework can further be modified and can be used in various disciplines as per the requirements. The Hadoop cluster can provide an insight of the user experience and it would be helpful for thew advertising team. The results gained can be used to improve the user experience. The implementation of data lake and data warehousing in various disciplines will lead to ease in handling of various repositories of data and its diverse variety. REFERENCES [1] B. Nichifor, “Theoretical framework of advertising-some insights,” Stud. Sci. Res. Econ. Ed., no. 19, 2014. [2] A. T. Kenyon and J. Liberman, “Controlling cross-border tobacco: advertising, promotion and sponsorship - implementing the FCTC,” SSRN Electron. J., no. 161, 2006. [3] F. Brassington and S. Pettitt, Principles of marketing. Pearson education, 2006. [4] T. M. Barry, “The development of the hierarchy of effects: An historical perspective,” Curr. Issues Res. Advert., vol. 10, no. 1–2, pp. 251–295, 1987. [5] M. P. Gardner, “Mood states and consumer behavior: a critical review,” J. Consum. Res., vol. 12, no. 3, p. 281, Dec. 1985.
  • 12. Comput Sci Inf Technol ISSN: 2722-3221  AdMap: a framework for advertising using MapReduce pipeline (Abhay Chaudhary) 93 [6] G. Belch and M. Belch, “Advertising and promotion: an integrated marketing communications perspective,” New York McGraw- Hil l, p. 864, 2011. [7] J. B. Cooper and D. N. Singer, “The Role of Emotion in Prejudice,” J. Soc. Psychol., vol. 44, no. 2, pp. 241–247, Nov. 1956. [8] E.-J. Lee and D. W. Schumann, “Explaining the special case of incongruity in advertising: combining classic theoretical approaches,” Mark. Theory, vol. 4, no. 1–2, pp. 59–90, Jun. 2004. [9] X. Nan and R. J. Faber, “Advertising theory: reconceptualizing the building blocks,” Mark. Theory, vol. 4, no. 1–2, pp. 7–30, Jun. 2004. [10] R. E. Petty, J. T. Cacioppo, and D. Schumann, “Central and peripheral routes to advertising effectiveness: the moderating tole of involvement,” J. Consum. Res., vol. 10, no. 2, p. 135, Sep. 1983. [11] I. C. Popescu, “Comunicarea în marketing : concepte, tehnici, strategii,” a II-a, Ed. Uranus, Bucuresti, p. 271, 2003. [12] R. E. Smith and X. Yang, “Toward a general theory of creativity in advertising: examining the role of divergence,” Mark. Theory, vol. 4, no. 1–2, pp. 31–58, Jun. 2004. [13] E. Thorson and J. Moore, Integrated communication: swynergy of persuasive voices. Psychology Press, 1996. [14] D. Vakratsas and T. Ambler, “How advertising works: what do we really know?,” J. Mark., vol. 63, no. 1, pp. 26–43, Jan. 1999. [15] R. Vaughan, “How advertising works: a planning model. .. putting it all together,” Advert. & Soc. Rev., vol. 1, no. 1, 2000. [16] W. M. Weilbacher, “Point of view: does advertising cause a ‘hierarchy of effects’?,” J. Advert. Res., vol. 41, no. 6, pp. 19–26, Nov. 2001. [17] M. Dayalan, “MapReduce: simplified data processing on large cluster,” Int. J. Res. Eng., vol. 5, no. 5, pp. 399–403, Apr. 2018. [18] D. Borthakur et al., “Apache hadoop goes realtime at Facebook,” in Proceedings of the 2011 international conference on Management of data - SIGMOD ’11, 2011, p. 1071. [19] D. Jiang, B. C. Ooi, L. Shi, and S. Wu, “The performance of mapreduce: An in-depth study,” Proc. VLDB Endow., vol. 3, no. 1– 2, pp. 472–483, 2010. [20] A. Pavlo et al., “A comparison of approaches to large-scale data analysis,” in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, 2009, pp. 165–178. [21] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” ACM SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29–43, Dec. 2003. [22] B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy, “SCALLA: A platform for scalable one-pass analytics using MapReduce,” ACM Trans. Database Syst., vol. 37, no. 4, pp. 1–43, Dec. 2012. [23] D. Logothetis, C. Trezzo, K. C. Webb, and K. Yocum, “In-situ MapReduce for log processing,” in Proceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC 2011, 2019, pp. 115–129. [24] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling,” in EuroSys’10 - Proceedings of the EuroSys 2010 Conference, 2010, pp. 265–278. [25] N. Backman, K. Pattabiraman, R. Fonseca, and U. Çetintemel, “C-MR: continuously executing MapReduce workflows on multi- core processors,” in MapReduce ’12 - 3rd International Workshop on MapReduce and Its Applications, 2012, pp. 1–8. [26] R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul, “Stream as you Go: the case for incremental data access and processing in the cloud,” in 2012 IEEE 28th International Conference on Data Engineering Workshops, 2012, pp. 159–166. [27] T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears, “MapReduce online.,” in Nsdi, 2010, vol. 10, no. 4, p. 20. [28] A. Verma, B. Cho, N. Zea, I. Gupta, and R. H. Campbell, “Breaking the MapReduce stage barrier,” Cluster Comput., vol. 16, no. 1, pp. 191–206, Mar. 2013. [29] M. Elteir, H. Lin, and W. Feng, “Enhancing MapReduce via asynchronous data processing,” in 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010, pp. 397–405. [30] M. Burrows, “The Chubby lock service for loosely-coupled distributed systems,” in OSDI 2006 - 7th USENIX Symposium on Operating Systems Design and Implementation, 2006, pp. 335–350. [31] F. Chang et al., “BigTable: a distributed storage system for structured data,” OSDI 2006 - 7th USENIX Symp. Oper. Syst. Des. Implement., vol. 26, no. 2, pp. 205–218, 2006. [32] R. Vernica, A. Balmin, K. S. Beyer, and V. Ercegovac, “Adaptive MapReduce using situation-aware mappers,” in Proceedings of the 15th International Conference on Extending Database Technology - EDBT ’12, 2012, p. 420. [33] Z. Guo, G. Fox, and M. Zhou, “Investigation of data locality and fairness in MapReduce,” in Proceedings of third international workshop on MapReduce and its Applications Date - MapReduce ’12, 2012, p. 25. [34] M. Hammoud and M. F. Sakr, “Locality-aware reduce task scheduling for mapreduce,” in Proceedings - 2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011, 2011, pp. 570–576. [35] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, pp. 1–10.