Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
Today’s era is generally treated as the era of data on each and every field of computing application huge amount of data is generated. The society is gradually more dependent on computers so large amount of data is generated in each and every second which is either in structured format, unstructured format or semi structured format. These huge amount of data are generally treated as big data. To analyze big data is a biggest challenge in current world. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage and it generally follows horizontal processing. Map Reduce programming is generally run over Hadoop Framework and process the large amount of structured and unstructured data. This Paper describes about different joining strategies used in Map reduce programming to combine the data of two files in Hadoop Framework and also discusses the skewness problem associate to it.
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks. Hadoop is an open source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Mr. Ketan Bagade | Mrs. Anjali Gharat | Mrs. Helina Tandel "A Review Paper on Big Data and Hadoop for Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29816.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29816/a-review-paper-on-big-data-and-hadoop-for-data-science/mr-ketan-bagade
We are in the age of big data which involves collection of large datasets.Managing and processing large data sets is difficult with existing traditional database systems.Hadoop and Map Reduce has become one of the most powerful and popular tools for big data processing . Hadoop Map Reduce a powerful programming model is used for analyzing large set of data with parallelization, fault tolerance and load balancing and other features are it is elastic,scalable,efficient.MapReduce with cloud is combined to form a framework for storage, processing and analysis of massive machine maintenance data in a cloud computing environment.
Today’s era is generally treated as the era of data on each and every field of computing application huge amount of data is generated. The society is gradually more dependent on computers so large amount of data is generated in each and every second which is either in structured format, unstructured format or semi structured format. These huge amount of data are generally treated as big data. To analyze big data is a biggest challenge in current world. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage and it generally follows horizontal processing. Map Reduce programming is generally run over Hadoop Framework and process the large amount of structured and unstructured data. This Paper describes about different joining strategies used in Map reduce programming to combine the data of two files in Hadoop Framework and also discusses the skewness problem associate to it.
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks. Hadoop is an open source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Mr. Ketan Bagade | Mrs. Anjali Gharat | Mrs. Helina Tandel "A Review Paper on Big Data and Hadoop for Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29816.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29816/a-review-paper-on-big-data-and-hadoop-for-data-science/mr-ketan-bagade
We are in the age of big data which involves collection of large datasets.Managing and processing large data sets is difficult with existing traditional database systems.Hadoop and Map Reduce has become one of the most powerful and popular tools for big data processing . Hadoop Map Reduce a powerful programming model is used for analyzing large set of data with parallelization, fault tolerance and load balancing and other features are it is elastic,scalable,efficient.MapReduce with cloud is combined to form a framework for storage, processing and analysis of massive machine maintenance data in a cloud computing environment.
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well outside data. There are many techniques and technologies used in Big Data mining to extract the useful information from the distributed system. It is more powerful to extract the information compare with traditional data mining techniques. One of the most known technologies is Hadoop, used in Big Data mining. It takes many advantages over the traditional data mining technique but it has some issues like visualization technique, privacy etc.
Mankind has stored more than 295 billion gigabytes (or 295 Exabyte) of data since 1986, as per a report by the University of Southern California. Storing and monitoring this data in widely distributed environments for 24/7 is a huge task for global service organizations. These datasets require high processing power which can’t be offered by traditional databases as they are stored in an unstructured format. Although one can use Map Reduce paradigm to solve this problem using java based Hadoop, it cannot provide us with maximum functionality. Drawbacks can be overcome using Hadoop-streaming techniques that allow users to define non-java executable for processing this datasets. This paper proposes a THESAURUS model which allows a faster and easier version of business analysis.
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.We then present background technology for big data summarization brings to us. The objective of this paper is to discuss the big data summarization framework, challenges and possible solutions as well as methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
Big data Mining Using Very-Large-Scale Data Processing PlatformsIJERA Editor
Big Data consists of large-volume, complex, growing data sets with multiple, heterogenous sources. With the
tremendous development of networking, data storage, and the data collection capacity, Big Data are now rapidly
expanding in all science and engineering domains, including physical, biological and biomedical sciences. The
MapReduce programming mode which has parallel processing ability to analyze the large-scale network.
MapReduce is a programming model that allows easy development of scalable parallel applications to process
big data on large clusters of commodity machines. Google’s MapReduce or its open-source equivalent Hadoop
is a powerful tool for building such applications.
Managing Big data using Hadoop Map Reduce in Telecom DomainAM Publications
Map reduce is a programming model for analysing and processing large massive data sets. Apache Hadoop is an efficient frame work and the most popular implementation of the map reduce model. Hadoop’s success has motivated research interest and has led to different modifications as well as extensions to framework. In this paper, the challenges faced in different domains like data storage, analytics, online processing and privacy/ security issues while handling big data are explored. Also, the various possible solutions with respect to Telecom domain with Hadoop Map reduce implementation is discussed in this paper.
Most common technology which is used to store meta data and large databases.we can find numerous applications in the real world.It is the very useful for creating new database oriented apps
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
DK Panda from Ohio State University presented this deck at the Switzerland HPC Conference.
"This talk will provide an overview of challenges in accelerating Hadoop, Spark and Mem- cached on modern HPC clusters. An overview of RDMA-based designs for multiple com- ponents of Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached will be presented. Enhanced designs for these components to exploit in-memory technology and parallel file systems (such as Lustre) will be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown."
Watch the video presentation: https://www.youtube.com/watch?v=glf2KITDdVs
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Applications of machine learning are widely used in the real world with either supervised or unsupervised
learning process. Recently emerged domain in the information technologies is Big Data which refers to
data with characteristics such as volume, velocity and variety. The existing machine learning approaches
cannot cope with Big Data. The processing of big data has to be done in an environment where distributed
programming is supported. In such environment like Hadoop, a distributed file system like Hadoop
Distributed File System (HDFS) is required to support scalable and efficient access to data. Distributed
environments are often associated with cloud computing and data centres. Naturally such environments are
equipped with GPUs (Graphical Processing Units) that support parallel processing. Thus the environment
is suitable for processing huge amount of data in short span of time. In this paper we propose a framework
that can have generic operations that support processing of big data. Our framework provides building
blocks to support clustering of unstructured data which is in the form of documents. We proposed an
algorithm that works in scheduling jobs of multiple users. We built a prototype application to demonstrate
the proof of concept. The empirical results revealed that the proposed framework shows 95% accuracy
when the results are compared with the ground truth.
Introduction to Big Data and Hadoop using Local Standalone Modeinventionjournals
Big Data is a term defined for data sets that are extreme and complex where traditional data processing applications are inadequate to deal with them. The term Big Data often refers simply to the use of predictive investigation on analytic methods that extract value from data. Big data is generalized as a large data which is a collection of big datasets that cannot be processed using traditional computing techniques. Big data is not purely a data, rather than it is a complete subject involves various tools, techniques and frameworks. Big data can be any structured collection which results incapability of conventional data management methods. Hadoop is a distributed example used to change the large amount of data. This manipulation contains not only storage as well as processing on the data. Hadoop is an open- source software framework for dispersed storage and processing of big data sets on computer clusters built from commodity hardware. HDFS was built to support high throughput, streaming reads and writes of extremely large files. Hadoop Map Reduce is a software framework for easily writing applications which process vast amounts of data. Wordcount example reads text files and counts how often words occur. The input is text files and the result is wordcount file, each line of which contains a word and the count of how often it occurred separated by a tab.
The concept of big data has been endemic within computer science since the earliest days of computing. “Big Data” originally meant the volume of data that could not be processed (efficiently) by traditional database methods and tools.
In a broad term Big data can be describe as a data sets which is so large or complex that can not be handle by traditional data processing applications. More especially unstructured or semi-structured data.
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
Vikram Andem, Senior Manager, United Airlines, A case for Bigdata Program and Strategy @ IATA Technology Roadmap 2014, October 13th, 2014, Montréal, Canada
Two-Dimensional Object Detection Using Accumulated Cell Average Constant Fals...ijcisjournal
The extensive work in SONAR is oceanic Engineering which is one of the most developing researches in
engineering. The SideScan Sonars (SSS) are one of the most utilized devices to obtain acoustic images of
the seafloor. This paper proposes an approach for developing an efficient system for automatic object
detection utilizing the technique of accumulated cell average-constant false alarm rate in 2D (ACA-CFAR-
2D), where the optimization of the computational effort is achieved. This approach employs image
segmentation as preprocessing step for object detection, which have provided similar results with other
approaches like undecimated discrete wavelet transform (UDWT), watershed and active contour
techniques. The SSS sea bottom images are segmented for the 2D object detection using these four
techniques and the segmented images are compared along with the experimental results of the proportion
of segmented image (P) and runtime in seconds (T) are presented.
Design and Fabrication of S-Band MIC Power Amplifierijcisjournal
In this paper, we demonstrate an approach to design FET (pHEMT) based amplifier. The FET is from
Berex Inc.The design is carried out using the measured S-parameter data of the FET.ADS is used as design
tool for the design. A single-stage power amplifier demonstrated 13dB output gain from 3GHz-4GHz .The
saturated output power of 1W and the power added efficiency (PAE) up to 43%.The amplifier is fabricated
on a selective device GaAs power pHEMT process in MIC (Microwave Integrated Circuit) Technology.
MICs are realized using one or more different forms of transmission lines, all characterized by their ability
to be printed on a dielectric substrate.Active and passive components such as transistors/FET, thin or thick
film chip capacitors and resistors are attached
Design Verification and Test Vector Minimization Using Heuristic Method of a ...ijcisjournal
The reduction in feature size increases the probability of manufacturing defect in the IC will result in a
faulty chip. A very small defect can easily result in a faulty transistor or interconnecting wire when the
feature size is less. Testing is required to guarantee fault-free products, regardless of whether the product
is a VLSI device or an electronic system. Simulation is used to verify the correctness of the design. To test n
input circuit we required 2n
test vectors. As the number inputs of a circuit are more, the exponential growth
of the required number of vectors takes much time to test the circuit. It is necessary to find testing methods
to reduce test vectors . So here designed an heuristic approach to test the ripple carry adder. Modelsim and
Xilinx tools are used to verify and synthesize the design.
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well outside data. There are many techniques and technologies used in Big Data mining to extract the useful information from the distributed system. It is more powerful to extract the information compare with traditional data mining techniques. One of the most known technologies is Hadoop, used in Big Data mining. It takes many advantages over the traditional data mining technique but it has some issues like visualization technique, privacy etc.
Mankind has stored more than 295 billion gigabytes (or 295 Exabyte) of data since 1986, as per a report by the University of Southern California. Storing and monitoring this data in widely distributed environments for 24/7 is a huge task for global service organizations. These datasets require high processing power which can’t be offered by traditional databases as they are stored in an unstructured format. Although one can use Map Reduce paradigm to solve this problem using java based Hadoop, it cannot provide us with maximum functionality. Drawbacks can be overcome using Hadoop-streaming techniques that allow users to define non-java executable for processing this datasets. This paper proposes a THESAURUS model which allows a faster and easier version of business analysis.
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.We then present background technology for big data summarization brings to us. The objective of this paper is to discuss the big data summarization framework, challenges and possible solutions as well as methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
Big data Mining Using Very-Large-Scale Data Processing PlatformsIJERA Editor
Big Data consists of large-volume, complex, growing data sets with multiple, heterogenous sources. With the
tremendous development of networking, data storage, and the data collection capacity, Big Data are now rapidly
expanding in all science and engineering domains, including physical, biological and biomedical sciences. The
MapReduce programming mode which has parallel processing ability to analyze the large-scale network.
MapReduce is a programming model that allows easy development of scalable parallel applications to process
big data on large clusters of commodity machines. Google’s MapReduce or its open-source equivalent Hadoop
is a powerful tool for building such applications.
Managing Big data using Hadoop Map Reduce in Telecom DomainAM Publications
Map reduce is a programming model for analysing and processing large massive data sets. Apache Hadoop is an efficient frame work and the most popular implementation of the map reduce model. Hadoop’s success has motivated research interest and has led to different modifications as well as extensions to framework. In this paper, the challenges faced in different domains like data storage, analytics, online processing and privacy/ security issues while handling big data are explored. Also, the various possible solutions with respect to Telecom domain with Hadoop Map reduce implementation is discussed in this paper.
Most common technology which is used to store meta data and large databases.we can find numerous applications in the real world.It is the very useful for creating new database oriented apps
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
DK Panda from Ohio State University presented this deck at the Switzerland HPC Conference.
"This talk will provide an overview of challenges in accelerating Hadoop, Spark and Mem- cached on modern HPC clusters. An overview of RDMA-based designs for multiple com- ponents of Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached will be presented. Enhanced designs for these components to exploit in-memory technology and parallel file systems (such as Lustre) will be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown."
Watch the video presentation: https://www.youtube.com/watch?v=glf2KITDdVs
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Applications of machine learning are widely used in the real world with either supervised or unsupervised
learning process. Recently emerged domain in the information technologies is Big Data which refers to
data with characteristics such as volume, velocity and variety. The existing machine learning approaches
cannot cope with Big Data. The processing of big data has to be done in an environment where distributed
programming is supported. In such environment like Hadoop, a distributed file system like Hadoop
Distributed File System (HDFS) is required to support scalable and efficient access to data. Distributed
environments are often associated with cloud computing and data centres. Naturally such environments are
equipped with GPUs (Graphical Processing Units) that support parallel processing. Thus the environment
is suitable for processing huge amount of data in short span of time. In this paper we propose a framework
that can have generic operations that support processing of big data. Our framework provides building
blocks to support clustering of unstructured data which is in the form of documents. We proposed an
algorithm that works in scheduling jobs of multiple users. We built a prototype application to demonstrate
the proof of concept. The empirical results revealed that the proposed framework shows 95% accuracy
when the results are compared with the ground truth.
Introduction to Big Data and Hadoop using Local Standalone Modeinventionjournals
Big Data is a term defined for data sets that are extreme and complex where traditional data processing applications are inadequate to deal with them. The term Big Data often refers simply to the use of predictive investigation on analytic methods that extract value from data. Big data is generalized as a large data which is a collection of big datasets that cannot be processed using traditional computing techniques. Big data is not purely a data, rather than it is a complete subject involves various tools, techniques and frameworks. Big data can be any structured collection which results incapability of conventional data management methods. Hadoop is a distributed example used to change the large amount of data. This manipulation contains not only storage as well as processing on the data. Hadoop is an open- source software framework for dispersed storage and processing of big data sets on computer clusters built from commodity hardware. HDFS was built to support high throughput, streaming reads and writes of extremely large files. Hadoop Map Reduce is a software framework for easily writing applications which process vast amounts of data. Wordcount example reads text files and counts how often words occur. The input is text files and the result is wordcount file, each line of which contains a word and the count of how often it occurred separated by a tab.
The concept of big data has been endemic within computer science since the earliest days of computing. “Big Data” originally meant the volume of data that could not be processed (efficiently) by traditional database methods and tools.
In a broad term Big data can be describe as a data sets which is so large or complex that can not be handle by traditional data processing applications. More especially unstructured or semi-structured data.
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
Vikram Andem, Senior Manager, United Airlines, A case for Bigdata Program and Strategy @ IATA Technology Roadmap 2014, October 13th, 2014, Montréal, Canada
Two-Dimensional Object Detection Using Accumulated Cell Average Constant Fals...ijcisjournal
The extensive work in SONAR is oceanic Engineering which is one of the most developing researches in
engineering. The SideScan Sonars (SSS) are one of the most utilized devices to obtain acoustic images of
the seafloor. This paper proposes an approach for developing an efficient system for automatic object
detection utilizing the technique of accumulated cell average-constant false alarm rate in 2D (ACA-CFAR-
2D), where the optimization of the computational effort is achieved. This approach employs image
segmentation as preprocessing step for object detection, which have provided similar results with other
approaches like undecimated discrete wavelet transform (UDWT), watershed and active contour
techniques. The SSS sea bottom images are segmented for the 2D object detection using these four
techniques and the segmented images are compared along with the experimental results of the proportion
of segmented image (P) and runtime in seconds (T) are presented.
Design and Fabrication of S-Band MIC Power Amplifierijcisjournal
In this paper, we demonstrate an approach to design FET (pHEMT) based amplifier. The FET is from
Berex Inc.The design is carried out using the measured S-parameter data of the FET.ADS is used as design
tool for the design. A single-stage power amplifier demonstrated 13dB output gain from 3GHz-4GHz .The
saturated output power of 1W and the power added efficiency (PAE) up to 43%.The amplifier is fabricated
on a selective device GaAs power pHEMT process in MIC (Microwave Integrated Circuit) Technology.
MICs are realized using one or more different forms of transmission lines, all characterized by their ability
to be printed on a dielectric substrate.Active and passive components such as transistors/FET, thin or thick
film chip capacitors and resistors are attached
Design Verification and Test Vector Minimization Using Heuristic Method of a ...ijcisjournal
The reduction in feature size increases the probability of manufacturing defect in the IC will result in a
faulty chip. A very small defect can easily result in a faulty transistor or interconnecting wire when the
feature size is less. Testing is required to guarantee fault-free products, regardless of whether the product
is a VLSI device or an electronic system. Simulation is used to verify the correctness of the design. To test n
input circuit we required 2n
test vectors. As the number inputs of a circuit are more, the exponential growth
of the required number of vectors takes much time to test the circuit. It is necessary to find testing methods
to reduce test vectors . So here designed an heuristic approach to test the ripple carry adder. Modelsim and
Xilinx tools are used to verify and synthesize the design.
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM ijcisjournal
Color image segmentation algorithms in the literature segment an image on the basis of color, texture, and
also as a fusion of both color and texture. In this paper, a color image segmentation algorithm is proposed
by extracting both texture and color features and applying them to the One-Against-All Multi Class Support
Vector Machine classifier for segmentation. A novel Power Law Descriptor (PLD) is used for extracting
the textural features and homogeneity model is used for obtaining the color features. The Multi Class SVM
is trained using the samples obtained from Soft Rough Fuzzy-C-Means (SRFCM) clustering. Fuzzy set
based membership functions capably handle the problem of overlapping clusters. The lower and upper
approximation concepts of rough sets deal well with uncertainty, vagueness, and incompleteness in data.
Parameterization tools are not a prerequisite in defining Soft set theory. The goodness aspects of soft sets,
rough sets and fuzzy sets are incorporated in the proposed algorithm to achieve improved segmentation
performance. The Power Law Descriptor used for texture feature extraction has the advantage of being
dealt in the spatial domain thereby reducing computational complexity. The proposed algorithm is
comparable and achieved better performance compared with the state of the art algorithms found in the
literature.
Glaucoma Disease Diagnosis Using Feed Forward Neural Network ijcisjournal
Glaucoma is an eye disease which damages the optic nerve and or loss of the field of vision which leads to
complete blindness caused by the pressure buildup by the fluid of the eye i.e. the intraocular pressure
(IOP). This optic disorder with a gradual loss of the field of vision leads to progressive and irreversible
blindness, so it should be diagnosed and treated properly at an early stage. In this paper,
thedaubechies(db3) or symlets (sym3)and reverse biorthogonal (rbio3.7) wavelet filters are employed for
obtaining average and energy texture feature which are used to classify glaucoma disease with high
accuracy. The Feed-Forward neural network classifies the glaucoma disease with an accuracy of 96.67%.
In this work, the computational complexity is minimized by reducing the number of filters while retaining
the same accuracy.
Design and Implementation of Efficient Ternary Content Addressable Memory ijcisjournal
A CAM is used for store and search data and using comparison logic circuitry implements the table
lookupfunction in a single clock cycle. CAMs are main application of packet forwarding and packet
classification in Network routers. A Ternary content addressable memory(TCAM) has three type of states
‘0’,’1’ and ‘X’(don’t care) and which is like as binary CAM and has extra feature of searching and storing.
The ‘X’ option may be used as ‘0’ and ‘1’. TCAM performs high-speed search operation in a deterministic
time. In this work a TCAM circuit is designed by using current race sensing scheme and butterfly matchline
(ML) scheme. The speed and power measures of both the TCAM designs are analysed separately. A Novel
technique is developed which is obtained by combining these two techniques which results in significant
power and speed efficiencies.
Arduino Based Abnormal Heart Rate Detection and Wireless Communicationijcisjournal
The heart rate is to be monitored continuously for the heart patients. This paper proposed a system that
awakes to monitor the heart rate condition of the patient. This work includes two parts. 1) We developed an
application to monitor the patient’s heart beat and if the heart beat is abnormal, a message should be
transmitted to the doctor using 3G shield. Thus, doctors could monitor the patient’s heart rate condition
continuously and suggests few earlier precautions to the patient. The heart rate is detected by using
photoplethysmograph (PPG) technique. 2) The Arduino wireless communication is developed in such a
way that it performs voice calls, sends SMS by interfacing with 3G shield using AT commands. One of the
input given is a user specified number to be dial and the expected output is voice call should be successfully
performed. Here the program is written in such a way that there are many options available like dynamic
call, emergency call (police, ambulance, and fire), sending message, receiving call, Disconnect a call,
Redial, Forward message.
Prototyping of Wireless Sensor Network for Precision Agriculture ijcisjournal
Now-a-days the climatic conditions are not same and predictable. More over the wireless sensor network
[1] carved path in many applications. There are many manual methods to cultivate a healthy crop which
involves a lot of manpower. Hence there is a need to design a system for precision agriculture. Precision
agriculture means giving the correct input to the crops at the right time. This paper explains how the real
input is given to the crops according to the environment change [2]. This system design uses Arduino Uno
.The values which are measured by the sensors are transmitted to a centralized device which is Zigbee
(coordinator). After the values received by the Zigbee, according to those values precise decision will be
taken by the experts.
Analysis of Inertial Sensor Data Using Trajectory Recognition Algorithmijcisjournal
This paper describes a digital pen based on IMU sensor for gesture and handwritten digit gesture
trajectory recognition applications. This project allows human and Pc interaction. Handwriting
Recognition is mainly used for applications in the field of security and authentication. By using embedded
pen the user can make hand gesture or write a digit and also an alphabetical character. The embedded pen
contains an inertial sensor, microcontroller and a module having Zigbee wireless transmitter for creating
handwriting and trajectories using gestures. The propound trajectory recognition algorithm constitute the
sensing signal attainment, pre-processing techniques, feature origination, feature extraction, classification
technique. The user hand motion is measured using the sensor and the sensing information is wirelessly
imparted to PC for recognition. In this process initially excerpt the time domain and frequency domain
features from pre-processed signal, later it performs linear discriminant analysis in order to represent
features with reduced dimension. The dimensionally reduced features are processed with two classifiers –
State Vector Machine (SVM) and k-Nearest Neighbour (kNN). Through this algorithm with SVM classifier
provides recognition rate is 98.5% and with kNN classifier recognition rate is 95.5% .
A Design of Double Swastika Slot Microstrip Antenna for Ultra Wide Band and W...ijcisjournal
This paper presents a design of double Swastika Slot Micro-strip Antenna which can be used in UWB and
WiMAX Applications. The proposed antenna operates at resonant frequencies 3GHz and 3.11 GHz. At
3GHz obtained value of VSWR is 1 and return loss is -42dB and at 3.11 GHz VSWR is 1.7 and return loss
is -12dB. RT Duroid having dielectric constant 2.2 is used as substrate. Here the double Swastika slot
Antenna is fed with the coaxial feeding technique.
Analogy Fault Model for Biquad Filter by Using Vectorisation Method ijcisjournal
In this paper a simple shortcoming model for a CMOS exchanged capacitor low pass channel is tried. The
exchanged capacitor (SC) low pass channel circuit is modularized into practical macros. These useful
macros incorporate OPAMP, switches and capacitors. The circuit is distinguished as flawed if the
recurrence reaction of the exchange capacity does not meet the configuration detail. The sign stream chart
(SFG) models of the considerable number of macros are investigated to get the broken exchange capacity
of the circuit under test (CUT). A CMOS exchanged capacitor low pass channel for sign recipient
applications is picked as a case to exhibit the testing of the simple shortcoming model. To find out error is
to calculate EIGEN values and EIGEN vectors is to detect the error of each component and parameters
Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues.
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAijp2p
The objective of the proposed system is to integrate the high volume of data along with the important
considerations like monitoring a wide array of heterogeneous security. When a real time cyber attack
occurred, the Intrusion Detection System automatically store the log in distributed environment and
monitor the log with existing intrusion dictionary. At the same time the system will check and categorize the
severity of the log to high, medium, and low respectively. After the categorization, the system will
automatically take necessary action against the user-unit with respect to the severity of the log. The
advantage of the system is that it utilize anomaly detection, evaluates data and issue alert message or
reports based on abnormal behaviour.
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value. We then present background technology for big data summarization brings to us. The objective of this paper is to discuss the big data summarization framework, challenges and possible solutions as well as methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.
We then present background technology for big data summarization brings to us. The objective of this
paper is to discuss the big data summarization framework, challenges and possible solutions as well as
methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of open problems and future directions..
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
In this paper, we first briefly review the concept of big data, including its definition, features, and value.
We then present background technology for big data summarization brings to us. The objective of this
paper is to discuss the big data summarization framework, challenges and possible solutions as well as
methods of evaluation for big data summarization. Finally, we conclude the paper with a discussion of
open problems and future directions..
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories
or collection of data to provide more accurate predictions and analysis. Classification using supervised
learning method aims to identify the category of the class to which a new data will fall under. With the
advancement of technology and increase in the generation of real-time data from various sources like
Internet, IoT and Social media it needs more processing and challenging. One such challenge in
processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes
causing the machine learning classifiers to be more biased towards majority classes and also most
classification algorithm predicts all the test data with majority classes. In this paper, the author analysis
the data imbalance models using big data and classification algorithm
A Review on Classification of Data Imbalance using BigDataIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.
Big data is a term which refers to those data sets or combinations of data sets whose volume,
complexity, and rate of growth make them difficult to be captured, managed, processed or analysed by
traditional tools and technologies. Big data is relatively a new concept that includes huge quantities of data,
social media analytics and real time data. Over the past era, there have been a lot of efforts and studies are
carried out in growing proficient tools for performing various tasks in big data. Due to the large and
complex collection of datasets it is difficult to process on traditional data processing applications. This
concern turns to be further mandatory for producing various tools in big data. In this survey, a various
collection of big data tools are illustrated and also analysed with the salient features.
In this paper, we discuss security issues for cloud computing, Map Reduce and Hadoop
environment. We also discuss various possible solutions for the issues in cloud computing
security and Hadoop. Today, Cloud computing security is developing at a rapid pace which
includes computer security, network security and information security. Cloud computing plays a
very vital role in protecting data, applications and the related infrastructure with the help of
policies, technologies and controls.
HIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONScscpconf
In this paper, we discuss security issues for cloud computing, Map Reduce and Hadoop
environment. We also discuss various possible solutions for the issues in cloud computing
security and Hadoop. Today, Cloud computing security is developing at a rapid pace which
includes computer security, network security and information security. Cloud computing plays a
very vital role in protecting data, applications and the related infrastructure with the help of
policies, technologies and controls.
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
The emerging phenomenon called ―Big Data‖ is pushing numerous changes in businesses and several other organizations, Domains, Fields, areas etc. Many of them are struggling just to manage the massive data sets. Big data management is about two things - ―Big data‖ and ―Data Management‖ and these terms work together to achieve business and technology goals as well. In previous few years data generation have tremendously enhanced due to digitization of data. Day by day new computer tools and technologies for transmission of data among several computers through Internet is been increasing. It‗s relevance and importance in the context of applicability, usefulness for decision making, performance improvement etc in all areas have emerged very fast to be relevant in today‗s era. Big data management also has numerous challenges and common complexities include low organizational maturity relative to big data, weak business support, and the need to learn new technology approaches. This paper will discuss the impacts of Big Data and issues related to data management using current technologies
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
Abstract: Big data refers to the organizational data asset that exceeds the volume, velocity, and variety of data typically stored using traditional structured database technologies. This type of data has become the important resource from which organizations can get valuable insightand make business decision by applying predictive analysis. This paper provides a comprehensive view of current status of big data development,starting from the definition and the description of Hadoop and MapReduce – the framework that standardizes the use of cluster of commodity machines to analyze big data. For the organizations that are ready to embrace big data technology, significant adjustments on infrastructure andthe roles played byIT professionals and BI practitioners must be anticipated which is discussed in the challenges of big data section. The landscape of big data development change rapidly which is directly related to the trend of big data. Clearly, a major part of the trend is the result ofthe attempt to deal with the challenges discussed earlier. Lastly the paper includes the most recent job prospective related to big data. The description of several job titles that comprise the workforce in the area of big data are also included.
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTINGIJNSA Journal
In this paper, we discuss security issues for cloud computing, Big data, Map Reduce and Hadoop environment. The main focus is on security issues in cloud computing that are associated with big data. Big data applications are a great benefit to organizations, business, companies and many large scale and small scale industries.We also discuss various possible solutions for the issues in cloud computing security and Hadoop. Cloud computing security is developing at a rapid pace which includes computer security, network security, information security, and data privacy. Cloud computing plays a very vital role in protecting data, applications and the related infrastructure with the help of policies, technologies, controls, and big data tools. Moreover, cloud computing, big data and its applications, advantages are likely to represent the most promising new frontiers in science.
Security issues associated with big data in cloud computingIJNSA Journal
In this paper, we discuss security issues for cloud
computing, Big data, Map Reduce and Hadoop
environment. The main focus is on security issues i
n cloud computing that are associated with big
data. Big data applications are a great benefit to
organizations, business, companies and many
large scale and small scale industries.We also disc
uss various possible solutions for the issues
in cloud computing security and Hadoop. Cloud compu
ting security is developing at a rapid pace
which includes computer security, network security,
information security, and data privacy.
Cloud computing plays a very vital role in protecti
ng data, applications and the related
infrastructure with the help of policies, technolog
ies, controls, and big data tools
.
Moreover,
cloud computing, big data and its applications, adv
antages are likely to represent the most
promising new frontiers in science.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
"Impact of front-end architecture on development cost", Viktor Turskyi
A Comprehensive Study on Big Data Applications and Challenges
1. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
DOI: 10.5121/ijci.2016.5409 71
A COMPREHENSIVE STUDY ON BIG DATA
APPLICATIONS AND CHALLENGES
DDD Suribabu1
and K Venkanna Naidu2
1
Head&Assoc Professor, Dept. of Cse., DNR College of Engg& Tech.Bhimavaram
2
Head&Assoc Professor, Dept. of Ece., DNR College of Engg& Tech.Bhimavaram
ABSTRACT
Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
KEYWORDS
Big Data, HBase, Hadoop, MapReduce, Heterogeneity
1. INTRODUCTION
Big Data is proficient for business application and is rapidly escalating as a segment of IT
industry. It has generated significant interest in various fields including the manufacture of health
care machines, Banking transactions, Social media, Satellite imaging. Traditionally data is stored
in a highly structured format to maximize its informational contents. Conversely, current data
volumes are driven by both unstructured and semi structured data. Billions of individuals are
various mobile devices and as a result of this technological revolution.
These people are generating tremendous amounts of data through the increased use of such
devices. Particularly remote sensors continuously produce much heterogeneous data that are
either structured or unstructured. This data also is termed as Big Data. Big Data is characterized
by 3 aspects namely; (a) Numerous data (b) Categorization of data cannot be done in regular
relational data bases (c) Processing of data can be generated and captured quickly. Consequently
end to end processing can be obstructed by the translation between structured data in relational
systems of DBMS and unstructured data for analytics. Big Data is a compilation of very huge
2. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
72
data sets with a great diversity of types so that it becomes complicated to process by using state
of the art data processing approaches.
In general a data set can be called a big data if it is formidable to perform capture, visualization,
analysis at current technologies. The Essential characteristics of Big Data are followed with five
V’s namely, Variety, Velocity, Volume, Virality, viscosity. Volume is described as the relative
size of the data to the processing capability. Viscosity measures the resistance to flow in the
volume of data. Here the resistance may come from different data sources. Virality is described
as faster distribution of information across B2B networks (Business to Business).Variety depicts
the spread of data types from machine to machine and adds new data types to traditional
transactional data. Velocity is described as a frequency at which the data is generated, Shared and
captured.
Fig 1: Essential characteristics of Big Data
2. BIG DATA ARCHITECTURE
The architecture of Big Data must be coordinated with the support infrastructure of the
organization. Today, all of the data used by organizations are stagnant. Data is increasingly
sourced from various fields that are disorganized and messy, such as information from machines
or sensors and large sources of public and private data. Big Data technology improves
performance, assists innovation in the products and services of business models, and provide
decision making support. Big Data technology aim to lessen hardware and processing costs and
to substantiate the value of Big Data before committing significant company resources. Properly
managed Big Data are accessible, reliable, secure, and manageable. Hence, Big Data applications
can be applied in various complex scientific disciplines (either single or interdisciplinary),
including atmospheric science, astronomy, medicine, biology, genomics, and biogeochemistry.
2.1 Hadoop Ecosystem:
Hadoop is tranquil of HBase, HCatalog, Pig, Hive, Oozie, Zookeeper, and Kafka; however, the
most common components and well-known paradigms are Hadoop Distributed File System
(HDFS) and Map Reduce for Big Data. HDFS. This paradigm is applied when the amount of data
is excessively much for a single machine. HBase is a management system that is open-source,
versioned, and distributed based on the Big Table of Google. Zookeeper maintains, configures,
and names large amounts of data. HCatalog manages HDFS. It stores metadata and generates
tables for large amounts of data. Hive structures warehouses in HDFS and other input sources,
3. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
73
such as Amazon S3. The Pig framework generates a high-level scripting language (Pig Latin) and
operates a run-time platform that enables users to execute Map Reduce on Hadoop. Mahout is a
library for machine-learning and datamining. Oozie. In the Hadoop system, Oozie coordinates,
executes and manages job flow. Avro serializes data, conducts remote procedure calls, and passes
data from one program or language to another. Chukwa is a framework for data collection and
analysis that is related to Map Reduce and HDFS. Flume is particularly used to aggregate and
transfer large amounts of data (i.e., log data) in and out of Hadoop.
2.2 Hadoop Usage:
Explicitly used by
(1) Searching Yahoo, Amazon, Zvents
(2) Log processing Facebook, Yahoo,ContexWeb.Joost, Last.fm
(3) Analysis of videos and images New York Times, Eyelike
(4) Data warehouse Facebook, AOL
(5) Recommendation systems Facebook
2.3 System Architectures of Map Reduce and HDFS:
Fig 2: HDFS Architecture
4. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
74
Fig 3: Map Reduce Life Cycle
Map Reduce is the hub of Hadoop and is a programming paradigm that enables mass scalability
across numerous servers in a Hadoop cluster. In this cluster, each server encompasses a set of
internal disk drives that are inexpensive. To improve performance, Map Reduce allocate
workloads to the servers in which the processed data are stored. Data processing is scheduled
based on the cluster nodes. A node may be consigned to a task that necessitates data foreign to
that node.
2.3.1 MapReduce tasks:
(1) Input
(i) Data are loaded into HDFS in blocks and dispersed to data nodes
(ii) Blocks are replicated in case of failures
(iii) The name node tracks the blocks and data nodes
(2) Job Submits the job and its details to the JobTracker
(3) Job initialization
(i)The Job Tracker interacts with the TaskTracker on each data node
(ii) All tasks are scheduled
(4) Mapping
(i)The Mapper processes the data blocks
(ii) Key value pairs are listed
(5) Sorting
The Mapper sorts the list of key value pairs
(6) Shuffling
5. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
75
(i)The mapped output is conveyed to the Reducers
(ii) Values are rearranged in a sorted format
(7) Reduction Reducers merge the list of key value pairs to generate the final result
(8) Result
(i) Values are stored in HDFS
(ii) Results are replicated permitting to the configuration
(iii) Clients read the results from the HDFS
Map Reduce essentially communicate to two distinct jobs performed by Hadoop programs. The
first is the map job, which involves procurement of a dataset and transforming it into another
dataset. In these datasets, individual components are reviewed into tuples (key/value pairs). The
reduction task receives inputs from map outputs and further divides the data tuples into small sets
of tuples. Redundant data are stored in multiple areas across the cluster. The programming model
tenacities failures inevitably by running portions of the program on a choice of servers in the
cluster. Data can be distributed across a very large cluster of commodity components along with
associated programming given the redundancy of data. This redundancy also endures faults and
facilitates the Hadoop cluster to repair itself if the component of commodity hardware fails,
especially given large amount of data. With this process, Hadoop can delegate workloads related
to Big Data problems across large clusters of reasonable machines. The Map Reduce framework
is convoluted, predominantly when complex transformational logic must be leveraged. Attempts
have been generated by open source modules to simplify this framework, but these modules also
use registered languages.
Fig 4: Map Reduce Architecture
2.3.2 Process flow
Record reader Map Combiner Partitioner Shuffle and sort Reduce Output format
6. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
76
Figure 5: Proposed data life cycle using the technologies and terminologies of Big Data.
Raw Data. Researchers, agencies, and organizations integrate the collected raw data and increase
their value through input from specific program offices and scientific research projects. The data
are distorted from their initial state and are stored in a value-added state, including web services.
Collection/Filtering/Classification. Data collection or generation is generally the first stage of
any data life cycle. Large amounts of data are created in the forms of log file data and data from
sensors, mobile equipment, satellites, laboratories, supercomputers, searching entries, chat
records, posts on Internet forums, and micro blog messages. Log files are utilized in nearly all
digital equipment; that is, web servers note the number of visits, clicks, click rates, and other
property registers the web users in log files.
Sensing. Sensors are often used to extent the physical quantities, which are then altered into
comprehensible digital signals for processing and storage.
Mobile Equipment. The functions of mobile devices have under wired progressively as their
usage rapidly increases. As the features of such devices are complicated and as means of data
acquisition are enriched, various data types are produced.
Data Analysis. Data analysis enables an organization to handle abundant information that can
affect the business. Data analysis has two main objectives: to apprehend the relationships among
7. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
77
features and to develop effective methods of data mining that can perfectly foresee future
observations. Big Data analysis can be applied to special types of data.
Data Mining Algorithms. In data mining, hidden but potentially valuable information is extracted
from large, incomplete, fuzzy, and noisy data.
Cluster Analysis. Cluster analysis groups objects statistically according to certain rules and
features. For example, objects in the same group are extremely heterogeneous, whereas those in
another group are exceedingly homogeneous.
Correlation Analysis. Correlation analysis regulates the law of relations amongst practical
phenomena, including mutual restriction, correlation, and correlative dependence.
Statistical Analysis. Statistical analysis is established on statistical theory, which is a division of
applied mathematics.
Regression Analysis. Regression analysis is a mathematical technique that can reveal correlations
between one variable and others.
Heterogeneity. Data mining algorithms trace unknown patterns and homogeneous formats for
analysis in structured formats.
Scalability. Demanding issues in data analysis embrace the management and analysis of huge
amounts of data and the rapid increase in the size of datasets.
Accuracy. Data analysis is typically buoyed by relatively accurate data obtained from structured
databases with limited sources.
Storing/Sharing/Publishing. Data and its resources are composed and analyzed for storing,
sharing, and publishing to benefit audiences, the public, tribal governments, academicians,
researchers, scientific partners, federal agencies, and other stakeholders (e.g., industries,
communities, and the media). Large and extensive Big Data datasets must be stored and managed
with reliability, availability, and easy accessibility; storage infrastructures must provide reliable
space and a strong access interface that can not only analyze large amounts of data, but also store,
manage, and establish data with relational DBMS structures. Storage capacity must be reasonable
given the sharp increase in data volume; for this reason, research on data storage is necessary.
Security. This stage of the data life cycle describes the security of data, governance bodies,
organizations, and agendas. It also clarifies the roles in data stewardship. Therefore,
appropriateness in terms of data type and use must be considered in developing data, systems,
tools, policies, and procedures to defend legitimate privacy, confidentiality, and intellectual
property.
Retrieve/Reuse/Discover. Data retrieval warrants data quality, value addition, and data
preservation by reusing existing data to discover new and valuable information. This area is
specifically involved in various subfields, including retrieval, management, authentication,
8. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
78
archiving, preservation, and representation. The classical approach to structured data
management is divided into two parts: one is a schema to store the dataset and the other is a
relational database for data retrieval.
3. CHALLENGES
With Big Data, users not only face abundant attractive prospects but also come across challenges.
Such complications lie in data capture, storage, searching, sharing, analysis, and visualization.
Challenges in Big Data analysis include data inconsistency and incompleteness, scalability,
timeliness, and security. Prior to data analysis, data must be well constructed. Understanding the
method by which data can be preprocessed is important to improve data quality and the analysis
results. Privacy is major concern in outsourced data. Recently, some arguments have discovered
how some security agencies are using data generated by individuals for their own benefits
without permission.
4. CONCLUSION
This paper presents the elementary concepts of Big Data. These concepts comprise the role of Big
Data in the current environment of enterprise and technology. To augment the efficiency of data
management, we have devised a data-lifecycle that uses the technologies and terminologies of
BigData. The stages in this life cycle include collection, filtering, analysis, storage, publication,
retrieval, and discovery. Data are also generated in different formats(unstructured and/or semi
structured), which unfavorably affect data analysis, management, and storage. This deviation in
data is accompanied by complexity and the development of further means of data acquisition. Big
Data has developed such that it cannot be harnessed separately. Big Data is characterized by large
systems, profits, and challenges. As a result, additional research is obligatory to address these
issues and advance the efficient display, analysis, and storage of Big Data. To improve such
research, capital investments, human resources, and pioneering ideas are the basic requirements.
REFERENCES
[1] C.L. Philip Chen, Chun-Yang Zhang, “Data-intensive applications, challenges, techniques and
technologies: A survey on Big Data”, Information Sciences, www.elsevier.com/locate/ins, January
2014.
[2] Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding, “Data Mining with Big Data”, IEEE
Transactions On Knowledge and Data Engineering, vol. 26, no. 1, January 2014,pp.97-107.
[3] “IBM What Is Big Data: Bring Big Data to the Enterprise,” http://www-
01.ibm.com/software/data/bigdata/, IBM, 2012.
[4] Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding, “Data Mining with Big Data”, IEEE
Transactions On Knowledge and Data Engineering, vol. 26, no. 1, January 2014,pp.97-107.
[5] unping Zhang, Fei-Yue Wang, Kunfeng Wang, Wei-Hua Lin, Xin Xu, Cheng Chen, Data-driven
intelligent transportation systems: a survey, IEEE Trans. Intell. Trans. Syst. 12 (4) (2011) 1624–1639.
[6] D. Che, M. Safran, and Z. Peng, “From big data to big data mining: challenges, issues, and
opportunities,” in Database Systems for Advanced Applications, B. Hong, X.Meng, L. Chen,W.
Winiwarter, and W. Song, Eds., vol. 7827 of Lecture Notes in Computer Science, pp. 1–15, Springer,
Berlin, Germany, 2013.
9. International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
79
[7] Z. Sebepou and K. Magoutis, “Scalable storage support for data stream processing,” in Proceedings
of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST ’10), pp. 1–6,
Incline Village, Nev, USA, May 2010.
[8] A. Katal, M. Wazid, and R. H. Goudar, “Big data: issues, challenges, tools and good practices,” in
Proceedings of the 6th International Conference on Contemporary Computing (IC3 '13), pp. 404–409,
IEEE, 2013.
[ 9] A. Azzini and P. Ceravolo, “Consistent process mining over big data triple stores,” in Proceeding of
the International Congress on Big Data (BigData '13), pp. 54–61, 2013.
[10] A. O’Driscoll, J. Daugelaite, and R. D. Sleator, “’Big data’, Hadoop and cloud computing in
genomics,” Journal of Biomedical Informatics, vol. 46, no. 5, pp. 774–781, 2013.
[11] Y. Demchenko, P. Grosso, C. de Laat, and P. Membrey, “Addressing big data issues in scientific data
infrastructure,” in Proceedings of the IEEE International Conference on Collaboration Technologies
and Systems (CTS ’13), pp. 48–55, May 2013.
[12] Y. Demchenko, C. Ngo, and P. Membrey, “Architecture Framework and Components for the Big
Data Ecosystem,” Journal of System and Network Engineering, pp. 1–31, 2013.
[13] M. Loukides, “What is data science? The future belongs to the companies and people that turn data
into products,” AnOReilly Radar Report, 2010.
[14] A.Wahab,M. Helmy, H.Mohd, M. Norzali, H. F. Hanafi, and M. F. M. Mohsin, “Data pre-processing
on web server logs for generalized association rules mining algorithm,” Proceedings of World
Academy of Science: Engineering & Technology, pp. 48–53.