Big data is a prominent term which characterizes the improvement and availability of data in all three
formats like structure, unstructured and semi formats. Structure data is located in a fixed field of a record
or file and it is present in the relational data bases and spreadsheets whereas an unstructured data file
includes text and multimedia contents. The primary objective of this big data concept is to describe the
extreme volume of data sets i.e. both structured and unstructured. It is further defined with three “V”
dimensions namely Volume, Velocity and Variety, and two more “V” also added i.e. Value and Veracity.
Volume denotes the size of data, Velocity depends upon the speed of the data processing, Variety is
described with the types of the data, Value which derives the business value and Veracity describes about
the quality of the data and data understandability. Nowadays, big data has become unique and preferred
research areas in the field of computer science. Many open research problems are available in big data
and good solutions also been proposed by the researchers even though there is a need for development of
many new techniques and algorithms for big data analysis in order to get optimal solutions. In this paper,
a detailed study about big data, its basic concepts, history, applications, technique, research issues and
tools are discussed.
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
Real World Application of Big Data In Data Mining Toolsijsrd.com
The main aim of this paper is to make a study on the notion Big data and its application in data mining tools like R, Weka, Rapidminer, Knime,Mahout and etc. We are awash in a flood of data today. In a broad range of application areas, data is being collected at unmatched scale. Decisions that previously were based on surmise, or on painstakingly constructed models of reality, can now be made based on the data itself. Such Big Data analysis now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences. The paper mainly focuses different types of data mining tools and its usage in big data in knowledge discovery.
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
Rapid development and popularization of internet and technological advancement introduced massive amount
of data and still increasing continuously and daily. A very large amount of data generated, collected, stored, transferred by
applications such as sensors, smart mobile devices, cloud systems and social networks put us on the era of BIG data, a data
with huge size, complex and unstructured data types from many origins. So converting these BIG data into useful information
is essential, the technique for discovering hidden interesting patterns and knowledge insights into BIG data introduced
as BIG data mining. BIG data have rises so many problems and challenges related with handling, storing, managing,
transferring, analyzing and mining but it has provides new directions and wide range of opportunities for research
and information extraction and future of some technologies such as data mining in the terms of BIG data mining. In this
paper, we present the concept of BIG data and BIG data mining and mentioned problems with BIG data mining and listed
new research directions for BIG data mining and problems with traditional data mining techniques while dealing with
BIG data as well as we have also discuss some comparison between traditional data mining algorithms and some big data
mining algorithms that will be useful for new BIG data mining technology future.
Big Data is the new technology or science to make the well informed decision in
business or any other science discipline with huge volume of data from new sources of
heterogeneous data. . Such new sources include blogs, online media, social network, sensor network,
image data and other forms of data which vary in volume, structure, format and other factors. Big
Data applications are increasingly adopted in all science and engineering domains, including space
science, biomedical sciences and astronomic and deep space studies. The major challenges of big
data mining are in data accessing and processing, data privacy and mining algorithms. This paper
includes the information about what is big data, data mining with big data, the challenges in big data
mining and what are the currently available solutions to meet those challenges.
A Review on Classification of Data Imbalance using BigDataIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
This paper has discussed various issues and security primitives like Spatial Data Handing, Privacy Protection of data, Data Load Balancing, Resource Mining etc. in the area of Data Mining.A 5-stage review process has been conductedfor 30 research papers which were published in the period of year ranging from 1996 to year 2013. After an exhaustive review process, nine key issues were found “Spatial Data Handing, Data Load Balancing, Resource Mining ,Visual Data Mining, Data Clusters Mining, Privacy Preservation, Mining of gaps between business tools & patterns, Mining of hidden complex patterns.” which have been resolved and explained with proper methodologies. Several solution approaches have been discussed in the 30 papers. This paper provides an outcome of the review which is in the form of various findings, found under various key issues. The findings included algorithms and methodologies used by researchers along with their strengths and weaknesses and the scope for the future work in the area.
Big data is the term that characterized by its increasing
volume, velocity, variety and veracity. All these characteristics
make processing on this big data a complex task. So, for
processing such data we need to do it differently like map reduce
framework. When an organization exchanges data for mining
useful information from this big data then privacy of the data
becomes an important problem. In the past, several privacy
preserving algorithms have been proposed. Of all those
anonymizing the data has been the most efficient one.
Anonymizing the dataset can be done on several operations like
generalization, suppression, anatomy, specialization, permutation
and perturbation. These algorithms are all suitable for dataset
that does not have the characteristics of the big data. To preserve
the privacy of the large dataset an algorithm was proposed
recently. It applies the top down specialization approach for
anonymizing the dataset and the scalability is increasing my
applying the map reduce frame work. In this paper we survey the
growth of big data, characteristics, map-reduce framework and
all the privacy preserving mechanisms and propose future
directions of our research.
Data sciences is the topnotch in our world now as it enables us to predict the future and behaviors of people and systems alike.
Hence, this course focuses on introducing the processing involved in data sciences.
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
Companies, organizations and policy makers shake out with flood flowing volume of transactional data,
accumulating trillions of bytes of information about their customers, suppliers and operations. The advanced networked sensors are being implanted in devices such as mobile phones, smart energy meters,automobiles and industrial machines that sense, generate and transfer data to multiple storage devices. In fact, as they go about their business and interact with individuals, they are producing an incredible amount of fatigue digital data. Social media sites, smart phones, and other customer devices have allowed billions
of individuals around the world to contribute to the amount of data available. In addition, the extremely
increasing size of multimedia data has also take part a key role in the rapid growth of data. The technology
of high-definition video creates more than 2,000 times as many bytes as necessary to store as normal text
data. Moreover, in a digitized world, consumers are leaving enormous amount of data about their day-today
communicating, browsing, buying, sharing, searching and so on. As a result, it evolved as a big data and in turn has motivated the advances in big data analytics paradigms, endorsed as a basic motivation factor for the present researchers.
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGcscpconf
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
Due to the arrival of new technologies, devices, and communication means, the amount of data produced by mankind is growing rapidly every year. This gives rise to the era of big data. The term big data comes with the new challenges to input, process and output the data. The paper focuses on limitation of traditional approach to manage the data and the components that are useful in handling big data. One of the approaches used in processing big data is Hadoop framework, the paper presents the major components of the framework and working process within the framework.
Real World Application of Big Data In Data Mining Toolsijsrd.com
The main aim of this paper is to make a study on the notion Big data and its application in data mining tools like R, Weka, Rapidminer, Knime,Mahout and etc. We are awash in a flood of data today. In a broad range of application areas, data is being collected at unmatched scale. Decisions that previously were based on surmise, or on painstakingly constructed models of reality, can now be made based on the data itself. Such Big Data analysis now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences. The paper mainly focuses different types of data mining tools and its usage in big data in knowledge discovery.
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
Rapid development and popularization of internet and technological advancement introduced massive amount
of data and still increasing continuously and daily. A very large amount of data generated, collected, stored, transferred by
applications such as sensors, smart mobile devices, cloud systems and social networks put us on the era of BIG data, a data
with huge size, complex and unstructured data types from many origins. So converting these BIG data into useful information
is essential, the technique for discovering hidden interesting patterns and knowledge insights into BIG data introduced
as BIG data mining. BIG data have rises so many problems and challenges related with handling, storing, managing,
transferring, analyzing and mining but it has provides new directions and wide range of opportunities for research
and information extraction and future of some technologies such as data mining in the terms of BIG data mining. In this
paper, we present the concept of BIG data and BIG data mining and mentioned problems with BIG data mining and listed
new research directions for BIG data mining and problems with traditional data mining techniques while dealing with
BIG data as well as we have also discuss some comparison between traditional data mining algorithms and some big data
mining algorithms that will be useful for new BIG data mining technology future.
Big Data is the new technology or science to make the well informed decision in
business or any other science discipline with huge volume of data from new sources of
heterogeneous data. . Such new sources include blogs, online media, social network, sensor network,
image data and other forms of data which vary in volume, structure, format and other factors. Big
Data applications are increasingly adopted in all science and engineering domains, including space
science, biomedical sciences and astronomic and deep space studies. The major challenges of big
data mining are in data accessing and processing, data privacy and mining algorithms. This paper
includes the information about what is big data, data mining with big data, the challenges in big data
mining and what are the currently available solutions to meet those challenges.
A Review on Classification of Data Imbalance using BigDataIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
This paper has discussed various issues and security primitives like Spatial Data Handing, Privacy Protection of data, Data Load Balancing, Resource Mining etc. in the area of Data Mining.A 5-stage review process has been conductedfor 30 research papers which were published in the period of year ranging from 1996 to year 2013. After an exhaustive review process, nine key issues were found “Spatial Data Handing, Data Load Balancing, Resource Mining ,Visual Data Mining, Data Clusters Mining, Privacy Preservation, Mining of gaps between business tools & patterns, Mining of hidden complex patterns.” which have been resolved and explained with proper methodologies. Several solution approaches have been discussed in the 30 papers. This paper provides an outcome of the review which is in the form of various findings, found under various key issues. The findings included algorithms and methodologies used by researchers along with their strengths and weaknesses and the scope for the future work in the area.
Big data is the term that characterized by its increasing
volume, velocity, variety and veracity. All these characteristics
make processing on this big data a complex task. So, for
processing such data we need to do it differently like map reduce
framework. When an organization exchanges data for mining
useful information from this big data then privacy of the data
becomes an important problem. In the past, several privacy
preserving algorithms have been proposed. Of all those
anonymizing the data has been the most efficient one.
Anonymizing the dataset can be done on several operations like
generalization, suppression, anatomy, specialization, permutation
and perturbation. These algorithms are all suitable for dataset
that does not have the characteristics of the big data. To preserve
the privacy of the large dataset an algorithm was proposed
recently. It applies the top down specialization approach for
anonymizing the dataset and the scalability is increasing my
applying the map reduce frame work. In this paper we survey the
growth of big data, characteristics, map-reduce framework and
all the privacy preserving mechanisms and propose future
directions of our research.
Data sciences is the topnotch in our world now as it enables us to predict the future and behaviors of people and systems alike.
Hence, this course focuses on introducing the processing involved in data sciences.
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
Companies, organizations and policy makers shake out with flood flowing volume of transactional data,
accumulating trillions of bytes of information about their customers, suppliers and operations. The advanced networked sensors are being implanted in devices such as mobile phones, smart energy meters,automobiles and industrial machines that sense, generate and transfer data to multiple storage devices. In fact, as they go about their business and interact with individuals, they are producing an incredible amount of fatigue digital data. Social media sites, smart phones, and other customer devices have allowed billions
of individuals around the world to contribute to the amount of data available. In addition, the extremely
increasing size of multimedia data has also take part a key role in the rapid growth of data. The technology
of high-definition video creates more than 2,000 times as many bytes as necessary to store as normal text
data. Moreover, in a digitized world, consumers are leaving enormous amount of data about their day-today
communicating, browsing, buying, sharing, searching and so on. As a result, it evolved as a big data and in turn has motivated the advances in big data analytics paradigms, endorsed as a basic motivation factor for the present researchers.
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGcscpconf
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
Due to the arrival of new technologies, devices, and communication means, the amount of data produced by mankind is growing rapidly every year. This gives rise to the era of big data. The term big data comes with the new challenges to input, process and output the data. The paper focuses on limitation of traditional approach to manage the data and the components that are useful in handling big data. One of the approaches used in processing big data is Hadoop framework, the paper presents the major components of the framework and working process within the framework.
Big data is a broad term for data sets so large or complex that tr.docxhartrobert670
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set.
Analysis of data sets can find new correlations, to "spot business trends, prevent diseases, combat crime and so on."[1] Scientists, practitioners of media and advertising and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[2]connectomics, complex physics simulations,[3] and biological and environmental research.[4]
Data sets grow in size in part because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks.[5]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-6" [6]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-7" [7] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[8] as of 2012, every day 2.5 exabytes (2.5×1018) of data were created;[9] The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.[10]
Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or notebook[11] that can handle the available data set.
Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers".[12] What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]
Contents
· 1 Definition
· 2 Characteristics
· 3 Architecture
· 4 Technologies
· 5 Applications
· 5.1 Government
· 5.1.1 United States of America
· 5.1.2 India
· 5.1.3 United Kingdom
· 5.2 International development
· 5.3 Manufacturing
· 5.3.1 Cyber-Physical Models
· 5.4 Media
· 5.4.1 Internet of Things (IoT)
· 5.4.2 Technology
· 5.5 Private sector
· 5.5.1 Retail
· 5.5.2 Retail Banking
· 5.5.3 Real Estate
· 5.6 Science
· 5.6.1 Science and Resear ...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
Companies, organizations and policy makers shake out with flood flowing volume of transactional data, accumulating trillions of bytes of information about their customers, suppliers and operations. The advanced networked sensors are being implanted in devices such as mobile phones, smart energy meters, automobiles and industrial machines that sense, generate and transfer data to multiple storage devices. In fact, as they go about their business and interact with individuals, they are producing an incredible amount of fatigue digital data. Social media sites, smart phones, and other customer devices have allowed billions of individuals around the world to contribute to the amount of data available. In addition, the extremely increasing size of multimedia data has also take part a key role in the rapid growth of data. The technology of high-definition video creates more than 2,000 times as many bytes as necessary to store as normal text data. Moreover, in a digitized world, consumers are leaving enormous amount of data about their day-today communicating, browsing, buying, sharing, searching and so on. As a result, it evolved as a big data and in turn has motivated the advances in big data analytics paradigms, endorsed as a basic motivation factor for the present researchers.
The authors in the present paper conduct a comprehensive study to explore the impact of big data analytics in key domains namely, Health Care (HC), Retail Industry (RI), Public Governance (PG), Pubic Security & Safety (PSS) and Personal Location Tracking (PLT). Initially, the study looks at the insights of data sources along with their characteristics in each domain. Later, it presents the highly productive and competitive big data applications with innovative big data technologies. Subsequently, the study showcases the impact of big data on each domain to capture value addition in its services. Finally, the study put forwards many more research opportunities as all these domains differ in their complexity and development in the usage of big data analytics
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
Companies, organizations and policy makers shake out with flood flowing volume of transactional data,
accumulating trillions of bytes of information about their customers, suppliers and operations. The
advanced networked sensors are being implanted in devices such as mobile phones, smart energy meters,
automobiles and industrial machines that sense, generate and transfer data to multiple storage devices. In
fact, as they go about their business and interact with individuals, they are producing an incredible amount
of fatigue digital data. Social media sites, smart phones, and other customer devices have allowed billions
of individuals around the world to contribute to the amount of data available. In addition, the extremely
increasing size of multimedia data has also take part a key role in the rapid growth of data. The technology
of high-definition video creates more than 2,000 times as many bytes as necessary to store as normal text
data. Moreover, in a digitized world, consumers are leaving enormous amount of data about their day-today communicating, browsing, buying, sharing, searching and so on. As a result, it evolved as a big data
and in turn has motivated the advances in big data analytics paradigms, endorsed as a basic motivation
factor for the present researchers.
The authors in the present paper conduct a comprehensive study to explore the impact of big data analytics
in key domains namely, Health Care (HC), Retail Industry (RI), Public Governance (PG), Pubic Security &
Safety (PSS) and Personal Location Tracking (PLT). Initially, the study looks at the insights of data sources
along with their characteristics in each domain. Later, it presents the highly productive and competitive big
data applications with innovative big data technologies. Subsequently, the study showcases the impact of
big data on each domain to capture value addition in its services. Finally, the study put forwards many
more research opportunities as all these domains differ in their complexity and development in the usage of
big data analytics.
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Love Arora
Big data came into existence when the traditional relational database systems were not able to handle the unstructured data (weblogs, videos, photos, social updates, human behaviour) generated today by organisation, social media, or from any other data generating source. Data that is so large in volume, so diverse in variety or moving with such velocity is called Big data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Apache Hive, No SQL and HPCC. These technologies handle massive amount of data in MB, PB, YB, ZB, KB, and TB.
In this research paper various technologies for handling big data along with the advantages and disadvantages of each technology for catering the problems in hand to deal the massive data has discussed.
Big data is to be implemented in as full way in real-time; it is still in a research. People
need to know what to do with enormous data. Insurance agencies are actively participating for the
analysis of patient's data which could be used to extract some useful information. Analysis is done in
term of discharge summary, drug & pharma, diagnostics details, doctor’s report, medical history,
allergies & insurance policies which are made by the application of map reduce and useful data is
extracted. We are analysing more number of factors like disease Types with its agreeing reasons,
insurance policy details along with sanctioned amount, family grade wise segregation.
Keywords: Big data, Stemming, Map reduce Policy and Hadoop.
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacydbpublications
While Big Data gradually become a hot topic of research and business and has been everywhere used in many industries, Big Data security and privacy has been increasingly concerned. However, there is an obvious contradiction between Big Data security and privacy and the widespread use of Big Data. There have been a various different privacy preserving mechanisms developed for protecting privacy at different stages (e.g. data generation, data storage, data processing) of big data life cycle. The goal of this paper is to provide a complete overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms and also we illustrate the infrastructure of big data and state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. This paper focus on the anonymization process, which significantly improve the scalability and efficiency of TDS (top-down-specialization) for data anonymization over existing approaches. Also, we discuss the challenges and future research directions related to preserving privacy in big data.
Our journal that operates on a monthly basis. It embraces the principles of open access, peer review, and full refereeing to ensure the highest standards of scholarly communication stands as a testament to the commitment of the global scientific community towards advancing research, promoting interdisciplinary collaborations, and enhancing academic excellence.
https://jst.org.in/index.html
Our journal has journal not only unravels the latest trends in marketing but also provides insights into crafting strategies that resonate in an ever-evolving marketplace. As you immerse yourself in the diverse articles and research papers.
Artificial intelligence has been a buzz word that is impacting every industry in the world. With the rise of
such advanced technology, there will be always a question regarding its impact on our social life,
environment and economy thus impacting all efforts exerted towards sustainable development. In the
information era, enormous amounts of data have become available on hand to decision makers. Big data
refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to
handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be
studied and provided in order to handle and extract value and knowledge from these datasets for different
industries and business operations. Numerous use cases have shown that AI can ensure an effective supply
of information to citizens, users and customers in times of crisis. This paper aims to analyse some of the
different methods and scenario which can be applied to AI and big data, as well as the opportunities
provided by the application in various business operations and crisis management domains.
Similar to RESEARCH IN BIG DATA – AN OVERVIEW (20)
Issues and Challenges in Broadcast Storm Suppression Algorithms of Vehicular ...ieijjournal1
Vehicular Ad-Hoc Networks (VANETs) are the networks of vehicles which are characterized with high
mobility and dynamic changing topology. Most of the communication interchanges in VANETs take place in broadcasting mode, which is supposed to be the simplest way to disseminate (spread) emergency messages all over the vehicular network. This broadcasting technique assures the optimal delivery of emergency messages all over the VANET. However, it also results in unwanted flooding of messages which causes severe contention and collisions in VANETs leading to Broadcast Storm Problem (BSP) and in turn affects the overall performance of VANETs. A Multitude of research work have proposed Broadcast Storm Suppression Algorithms (BSSA) to control this Broadcast Storm. These mechanisms tried to control BSP by either reducing the number of rebroadcasting/ relaying nodes or by identifying the best relay node. The suppression mechanisms help to overcome BSP to certain extent, still there is need to still reduce the number of rebroadcasting nodes in existing mechanisms and also to identify the best possible rebroadcasting node. This would help to mitigate BSP completely and efficiently. This paper presents comparative analysis of various prominent BSSA in order to identify the underlying issues and challenges in controlling BSP completely. The outcome of this paper would provide the requirements for developing an efficient BSSA overcoming the identified issues and challenges.
Low Power SI Class E Power Amplifier and Rf Switch for Health Careieijjournal1
This research was to design a 2.4 GHz class E Power Amplifier (PA) for health care, with 0.18um Semiconductor Manufacturing International Corporation CMOS technology by using Cadence software. And also RF switch was designed at cadence software with power Jazz 180nm SOI process. The ultimate goal for such application is to reach high performance and low cost, and between high performance and low power consumption design. This paper introduces the design of a 2.4GHz class E power amplifier and RF switch design. PA consists of cascade stage with negative capacitance. This power amplifier can transmit 16dBm output power to a 50Ω load. The performance of the power amplifier and switch meet the specification requirements of the desired.
Informatics Engineering, an International Journal (IEIJ)ieijjournal1
Informatics is rapidly developing field. The study of informatics involves human-computer interaction and how an interface can be built to maximize user-efficiency. Due to the growth in IT, individuals and organizations increasingly process information digitally. This has led to the study of informatics to solve privacy, security, healthcare, education, poverty, and challenges in our environment. The Informatics Engineering, an International Journal (IEIJ) is a open access peer-reviewed journal that publishes articles which contribute new results in all areas of Informatics. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on the human use of computing fields such as communication, mathematics, multimedia, and human-computer interaction design and establishing new collaborations in these areas.
A NEW HYBRID FOR SOFTWARE COST ESTIMATION USING PARTICLE SWARM OPTIMIZATION A...ieijjournal1
Software Cost Estimation (SCE) is considered one of the most important sections in software engineering
that results in capabilities and well-deserved influence on the processes of cost and effort. Two factors of
cost and effort in software projects determine the success and failure of projects. The project that will be
completed in a certain time and manpower is a successful one and will have good profit to project
managers. In most of the SCE techniques, algorithmic models such as COCOMO algorithm models have
been used. COCOMO model is not capable of estimating the close approximations to the actual cost,
because it runs in the form of linear. So, the models should be adapted that simultaneously with the number
of Lines of Code (LOC) has the ability to estimate in a fair and accurate fashion for effort factors. Metaheuristic
algorithms can be a good model for SCE due to the ability of local and global search. In this
paper, we have used the hybrid of Particle Swarm Optimization (PSO) and Differential Evolution (DE) for
the SCE. Test results on NASA60 software dataset show that the rate of Mean Magnitude of Relative Error
(MMRE) error on hybrid model, in comparison with COCOMO model is reduced to about 9.55%.
THE LEFT AND RIGHT BLOCK POLE PLACEMENT COMPARISON STUDY: APPLICATION TO FLIG...ieijjournal1
It is known that if a linear-time-invariant MIMO system described by a state space equation has a number
of states divisible by the number of inputs and it can be transformed to block controller form, we can
design a state feedback controller using block pole placement technique by assigning a set of desired Block
poles. These may be left or right block poles. The idea is to compare both in terms of system’s response.
A HYPER-HEURISTIC METHOD FOR SCHEDULING THEJOBS IN CLOUD ENVIRONMENTieijjournal1
Currently cloud computing has turned into a promising technology and has become a great key for
satisfying a flexible service oriented , online provision and storage of computing resources and user’s
information in lesser expense with dynamism framework on pay per use basis.In this technology Job
Scheduling Problem is acritical issue. For well-organizedmanaging and handling resources,
administrations, scheduling plays a vital role. This paper shares out the improved Hyper- Heuristic
Scheduling Approach to schedule resources, by taking account of computation time and makespan with two
detection operators. Operators are used to select the low-level heuristics automatically. Conditional
Revealing Algorithm (CRA)idea is applied for finding the job failures while allocating the resources. We
believe that proposed hyper-heuristic achieve better results than other individual heuristics
FROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATIONieijjournal1
ASIC (Application Specific Integrated Circuit) design verification takes as long as the designers take to
describe, synthesis and implement the design. The hybrid approach, where the design is first prototyped on
an FPGA (Field-Programmable Gate Array) platform for functional validation and then implemented as
an ASIC allows earlier defect detection in the design process and thus allows a significant time saving.
This paper deals with a CMOS standard-cell ASIC implementation of a SoC (System on Chip) based on the
OpenRISC processor for Voice over IP (VoIP) application; where a hybrid approach is adopted. The
architecture of the design is mainly based on the reuse of IPs cores described at the RTL level. This RTL
code is technology-independent; hence the design can be ported easily from FPGA to ASIC. Results show
that the SoC occupied the area of 2.64mm². Regarding the power consumption, RTL power estimation is
given.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
ADAPTIVE MODELING OF URBAN DYNAMICS DURING EPHEMERAL EVENT VIA MOBILE PHONE T...ieijjournal1
The communication devices have produced digital traces for their users either voluntarily or not. This type
of collective data can give powerful indications that are affecting the urban systems design and
development. In this study mobile phone data during Armada event is investigated. Analyzing mobile
phone traces gives conceptual views about individuals densities and their mobility patterns in the urban
city. The geo-visualization and statistical techniques have been used for understanding human mobility
collectively and individually. The undertaken substantial parameters are inter-event times, travel distances
(displacements) and radius of gyration. They have been analyzed and simulated using computing platform
by integrating various applications for huge database management, visualization, analysis, and
simulation. Accordingly, the general population pattern law has been extracted. The study contribution
outcomes have revealed both the individuals densities in static perspective and individuals mobility in
dynamic perspective with multi levels of abstraction (macroscopic, mesoscopic, microscopic).
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMSieijjournal1
An intrusion detection system detects various malicious behaviors and abnormal activities that might harm
security and trust of computer system. IDS operate either on host or network level via utilizing anomaly
detection or misuse detection. Main problem is to correctly detect intruder attack against computer
network. The key point of successful detection of intrusion is choice of proper features. To resolve the
problems of IDS scheme this research work propose “an improved method to detect intrusion using
machine learning algorithms”. In our paper we use KDDCUP 99 dataset to analyze efficiency of intrusion
detection with different machine learning algorithms like Bayes, NaiveBayes, J48, J48Graft and Random
forest. To identify network based IDS with KDDCUP 99 dataset, experimental results shows that the three
algorithms J48, J48Graft and Random forest gives much better results than other machine learning
algorithms. We use WEKA to check the accuracy of classified dataset via our proposed method. We have
considered all the parameter for computation of result i.e. precision, recall, F – measure and ROC.
CLASS D POWER AMPLIFIER FOR MEDICAL APPLICATIONieijjournal1
The objective of this research was to design a 2.4 GHz class AB Power Amplifier (PA), with 0.18um
Semiconductor Manufacturing International Corporation (SMIC) CMOS technology by using Cadence
software, for health care applications. The ultimate goal for such application is to minimize the trade-offs
between performance and cost, and between performance and low power consumption design. This paper
introduces the design of a 2.4GHz class D power amplifier which consists of two stage amplifiers. This
power amplifier can transmit 15dBm output power to a 50Ω load. The power added efficiency was 50%
and the total power consumption was 90.4 mW. The performance of the power amplifier meets the
specification requirements of the desired.
THE PRESSURE SIGNAL CALIBRATION TECHNOLOGY OF THE COMPREHENSIVE TEST SYSTEMieijjournal1
The pressure signal calibration technology of the comprehensive test system which involved pressure
sensors was studied in this paper. The melioration of pressure signal calibration methods was elaborated.
Compared with the calibration methods in the lab and after analyzing the relevant problems,the
calibration technology online was achieved. The test datum and reasons of measuring error analyzed,the
uncertainty evaluation was given and then this calibration method was proved to be feasible and accurate.
LOW POWER SI CLASS E POWER AMPLIFIER AND RF SWITCH FOR HEALTH CAREieijjournal1
This research was to design a 2.4 GHz class E Power Amplifier (PA) for health care, with 0.18um
Semiconductor Manufacturing International Corporation CMOS technology by using Cadence software.
And also RF switch was designed at cadence software with power Jazz 180nm SOI process. The ultimate
goal for such application is to reach high performance and low cost, and between high performance and
low power consumption design. This paper introduces the design of a 2.4GHz class E power amplifier and
RF switch design. PA consists of cascade stage with negative capacitance. This power amplifier can
transmit 16dBm output power to a 50Ω load. The performance of the power amplifier and switch meet the
specification requirements of the desired.
PRACTICE OF CALIBRATION TECHNOLOGY FOR THE SPECIAL TEST EQUIPMENT ieijjournal1
For the issues encountered in the special test equipment calibration work, based on the characteristics of special test equipment, the calibration point selection,classification of calibration parameters and calibration method of special test equipment are briefly introduced in this paper, at the same time, the preparation and management requirements of calibration specification are described.
Informatics Engineering, an International Journal (IEIJ) ieijjournal1
Informatics is rapidly developing field. The study of informatics involves human-computer interaction and how an interface can be built to maximize user-efficiency. Due to the growth in IT, individuals and organizations increasingly process information digitally. This has led to the study of informatics to solve privacy, security, healthcare, education, poverty, and challenges in our environment. The Informatics Engineering, an International Journal (IEIJ) is a open access peer-reviewed journal that publishes articles which contribute new results in all areas of Informatics. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on the human use of computing fields such as communication, mathematics, multimedia, and human-computer interaction design and establishing new collaborations in these areas.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The French Revolution Class 9 Study Material pdf free download
RESEARCH IN BIG DATA – AN OVERVIEW
1. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
DOI : 10.5121/ieij.2016.4301 1
RESEARCH IN BIG DATA – AN OVERVIEW
Dr. S.Vijayarani1
and Ms. S.Sharmila2
1
Assistant Professor, Department of Computer Science, Bharathiar University,
Coimbatore
2
Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore
ABSTRACT
Big data is a prominent term which characterizes the improvement and availability of data in all three
formats like structure, unstructured and semi formats. Structure data is located in a fixed field of a record
or file and it is present in the relational data bases and spreadsheets whereas an unstructured data file
includes text and multimedia contents. The primary objective of this big data concept is to describe the
extreme volume of data sets i.e. both structured and unstructured. It is further defined with three “V”
dimensions namely Volume, Velocity and Variety, and two more “V” also added i.e. Value and Veracity.
Volume denotes the size of data, Velocity depends upon the speed of the data processing, Variety is
described with the types of the data, Value which derives the business value and Veracity describes about
the quality of the data and data understandability. Nowadays, big data has become unique and preferred
research areas in the field of computer science. Many open research problems are available in big data
and good solutions also been proposed by the researchers even though there is a need for development of
many new techniques and algorithms for big data analysis in order to get optimal solutions. In this paper,
a detailed study about big data, its basic concepts, history, applications, technique, research issues and
tools are discussed.
KEYWORDS:
Big data, Technologies, Visualization, Classification, Clustering
1. INTRODUCTION
Big data is associated with large data sets and the size is above the flexibility of common
database software tools to capture, store, handle and evaluate [1][2]. Big data analysis is essential
for analysts, researchers and business people to make better decisions that were previously not
attained. Figure 1 explains the structure of big data which contains five dimensions namely
volume, velocity, variety, value and veracity [2][3]. Volume refers the size of the data which
mainly shows how to handle large scalability databases and high dimensional databases and its
processing needs. Velocity defines the continuous arrival of data streams from this useful
information’s are obtained. Furthermore big data has enhanced improved through-put,
connectivity and computing speed of digital devices which has fastened the retrieval, process and
production of the data.
Veracity determines the quality of information from various places. Variety describes how to
deliver the different types of data, for example source data includes not only structured traditional
relational data but it also includes quasi-structured, semi-structured and unstructured data such as
text, sensor data, audio, video, graph and many more type. Value is essential to get the economic
2. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
2
value of different data which varies significantly. The primary challenge is to identify which are
valuable and the way to perform transformation and the technique to be applied to perform data
analysis [1].
Big data has three types of knowledge discovery; they are novelty discovery, class discovery and
association discovery. Novelty discovery is used to find a new, rare one, previously undiscovered
and unknown from a billion or trillion objects or events [2]. Class discovery finds new classes of
objects and behavior and association discovery is used to find an unusual co-occurring
association. This data by its innovative method is changing our world. This innovative concept is
being driven by various aspects: A proliferation of sensors, creation of almost all information in
digital form, dramatic cost reductions in storage, remarkable increase in network bandwidth,
impressive cost reductions and scalability improvements in computation, efficient algorithmic
breakthroughs in machine learning and other areas [2]. Analysis of big data is used to reduce
fraud, helps to improve scientific research and field development. Figure 1 illustrates the structure
of big data [1].
Figure.1.Structure of Big Data
Few typical characteristics of big data are the integration of structured data, semi-structured data
and unstructured data. Big data addresses speed and measurability, quality and security,
flexibility and stability. Another important advantage of big data is data analytic. Big data
analytics refers to the process of collecting, organizing and analyzing large sets of data to
discover patterns and other useful information. Table 1 shows the comparative study of different
types of data based on its size, characteristic, tools and methods [1] [3].
3. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
3
The remaining portion of the paper is systematized as follows. Section 2 gives the need for big
data, applications, advantages and characteristics. Big data tools and technologies are discussed in
Section 3. Section 4 provides the detailed description about big data. Section 5 presents big data
challenges. Finally Section 6 concludes and discussed about recent trends
2. NEED FOR BIG DATA
The massive volume of data could not be expeditiously processed by traditional database
strategies and tools and it mainly focused and handled structured data [1]. At the time of
development of computers the amount of data stored in the computers are very less due to its
minimum storage capacity. After the invention of networking, the data stored in computers are
increased because the improved developments in the hardware components. Next, the arrival of
an internet creates a boom to store vast collections of data and it can be used for various purposes
[2]. This situation raised concerns about the introduction of new research related concepts like
data mining, networking, image processing, grid computing, cloud computing etc are used for
analyzing the different types of data which are used in various domains. Many new techniques,
algorithms, concepts and methods have been proposed by the researchers for analyzing the static
data sets. In this digital era, after the development of mobile and wireless technologies provides a
new platform in which people may share their information through social media sites for e.g. face
book, twitter and google+ [3]. In these places, the data may be arrived continuously and it cannot
be stored in computer memory because the size of the data is huge and it is considered as “Big
Data”. This situation also created a problem about how to perform data analysis for this dynamic
datasets since the existing algorithms and their solutions are not suitable for handling the big data.
This situation has raised concerns about the requirement of development of new techniques,
methods and algorithms [1][2].
The term 'Big Data’ came into view for first time in 1998 in a Silicon Graphics (SGI) by John
Mashey The growth of big data needs to increase the storage capacity and processing power.
Frequently large amounts of data (2.5quintillion) are created through social networking [1]. Big
data analytics are used to examine these large amounts of data and identifies the hidden patterns
and unknown correlation. Two technologies are used in big data analytics are NoSQL and
Hadoop. NoSQL is a non- relation or non SQL database solution, examples are HBase, Cassandra
and mongoDB [2]. Hadoop is an eco software package which includes HDFS and MapReduce.
Big data rely on structured data, unstructured data and semi-structured data to back up their
decisions. Tools like SAS, R, and Matlab which supports the decisive analysis but these are not
developed for the large datasets and either DBMS or Map Reduce can manage the data and which
arrived at high rates. [2]
Big data applications have introduced the large scale distribution applications which work with
large data sets. Data analysis problem plays a vital role in many sectors [1]. The existing software
for big data applications like Apache Hadoop and Google’s map reduce framework, in which
these applications generates a large amount of intermediate data [2]. There are many applications
of big data such as manufacturing, bioinformatics, health care, social network, business, science
and technology and smart cities [3]. Big data provides an infrastructure for Hadoop in
bioinformatics which incorporates sequencing next generation, large scale data analysis and other
biological domains. Parallel distributed computing framework and cloud computing combines
with clusters and web interfaces. [1][3].
4. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
4
3. BIG DATA TECHNOLOGIES
Column-oriented databases
In column-oriented database stores data in columns rather than rows, which is used to compresses
massive data and fast queries [3].
Schema-less databases
Schema-less databases are otherwise called as NoSQL databases. Database provides a mechanism
for storage and retrieval of data that is modeled in means other than the tabular relations used in
relational databases. There are two types of database such as document stores and key value
stores that stores and retrieves massive amount of structured, unstructured and semi structured
data [3].
Hadoop
Hadoop is a popular open source tool for handling big data and implementated in MapReduce. It
is java-based programming framework which supports large data sets in distributing computing.
Hadoop cluster uses a master/slave structure. Distributed file system in hadoop helps to transfer
data in rapid rates. In case of some node failure a distributed file system allows the system to
continue the normal operation. Hadoop has two main sub projects namely Map Reduce and
Hadoop Distributed File System (HDFS) [4].
Map Reduce
This is a programming paradigm which allows execution scalability against thousands of servers
and server clusters for large task. Map reduce implementation consists of two tasks such as map
task and reduce task. In the map task the input dataset is converted into different key/value pairs
or tuples where as in reduced tasks several forms of output of map task is combined to form a
reduced set of tuples.
HDFS
Hadoop distributed file system is a file system which extends all nodes in hadoop clusters for data
storage. It links all the file system together on local node to make into a large file system. To
overcome the node failures HDFS enhances the security by depicting data across multiple sources
[4].
Hive
Hive is a data warehousing infrastructure which is built on hadoop.It has different storage types
such as plain text, RC file, Hbase, ORC etc. Built-in user-defined functions are used to handle
dates, strings and other data mining tools It is SQL-like Bridge that allows BI application to run
queries against Hadoop clusters [4].
Storage Technologies
To store huge volume of data, efficient and effective techniques are required. The main focus of
storage technologies are data compression and storage virtualization [5].
HBase
HBase is a scalable distributive database which uses Hadoop distributed file system for storage. It
supports column-oriented database and structure data [5].
5. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
5
Chukwa
Chukwa analysis monitors large distributed system and it adds required semantics for log
collections and it uses end to end delivery model [5] .
4. RESEARCH ISSUES IN BIG DATA
Big data has three fundamental issue i.e. storage issues, management issues and processing these
issues exhibits a massive set of technical research problems whereas storage issue deal with when
a quality of data is exploded, each and every time it creates new storage medium. Moreover data
is being created generally in every place, for example, social media, 12+ Tbytes of tweets are
growing every day and typically re-tweets are 144 per tweet. The next issue is management
issues, which are difficult problem in big data domain. If the data is distributed geographically it
can be managed and owned by multiple entities. Digital data collection is easier than manual data
collection where digital data represents the methodology for data collection. Data qualification
focuses on missing data or outliers rather on validating each item [5]. Hence new approaches are
needed for data qualification and data validation. In processing issue concerns about how to
process 1K petabyte of data which requires a total end-to-end processing time of roughly 635
years. Thus, effective processing of exabytes of data will require extensive parallel processing
and new analytics algorithms in order to provide timely information. Many issues in big data can
be resolved by e-science which requires grid and cloud computing.
4.1 Big Data Classification
Data classification is the process of organizing data into categories for its most effective and
efficient use. A well-planned data classification system makes essential data easy to find and
retrieve. There are three primary and these aspects of data classification namely methods,
domains and variations. Methods describes common techniques used for classification examples
are probabilistic methods, decision trees, rule-based methods, instance-based methods, support
vector machine methods and neural networks [2]. Domains examine specific methods used for
data domains such as multimedia, text, time-series, network, discrete sequence and uncertain
data. It also covers large data sets and data streams due to the recent importance of the big data
paradigm [4]. Variations in classification process discusses ensembles, rare-class learning,
distance function learning, active learning, visual learning, transfer learning, and semi-supervised
learning as well as evaluation aspects of classifiers [5].
Classification of types of big data is divided into three categories namely Social Networks,
Traditional Business systems and Internet of Things [6]. Social Networks (human-sourced
information) contains information which is the record of human experiences, previously recorded
in books and works of art and later in photographs, audio and video. Human-sourced information
is now almost entirely digitized and stored everywhere from personal computers to social
networks. Data are loosely structured and often ungoverned such as social networks (Facebook,
Twitter etc), blogs and comments, personal documents, pictures, instagram, flicker, picasa,
videos, YouTube, internet search engine, mobile data content, text messages, user-generated
maps and e-mail. Traditional business systems are process-mediated data, these processes record
and monitor business events of interest, such as registering a customer, manufacturing a product,
taking an order, etc. The process-mediated data thus collected is highly structured and includes
transactions, reference tables and relationships, as well as the metadata that sets its context [4].
Traditional business data is the vast majority of what IT managed and processed, in both
operational and BI systems. Usually structured data are stored in relational database systems.
6. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
6
Some sources belonging to this class may fall into the category of "Administrative data", i.e. data
produced by Public Agencies, Medical and health records [5]. Data produced by businesses are
Commercial transaction data, Banking/stock records, E-commerce, Credit cards, etc. The last
classification is Internet of Things (machine-generated data): derived from the phenomenal
growth in the number of sensors and machines used to measure and record the events and
situations in the physical world. The output of these sensors is machine-generated data, and from
simple sensor records to complex computer logs, it is well structured [6]. As sensors proliferate
and data volumes grow, it is becoming an increasingly important component of the information
stored and processed by many businesses. Its well-structured nature is suitable for computer
processing, but its size and speed beyond traditional approaches. Data from sensors are divided
into fixed sensors, home automation, weather/pollution sensors, traffic sensors/webcam, scientific
sensors, videos, mobile sensors (tracking) like mobile phone location, cars, satellite images and
data from computer system logs and web logs [5][6].
Big data are classified into different categories to understand their characteristics. The
classification is based on five aspects: data sources, content format, data stores, data staging and
data processing. This is represented in Figure 2 [5]. Each classification requires new algorithms
and techniques for performing classification tasks efficiently in big data domain.
Figure 2. Big Data Classification
Data source is nothing but data is collected from different sources. Some of the important data
sources are web and social media, machine generated data, sensor data, transaction data and
internet of things (IoT). Social media contains volume of information which is generated using
URL (Uniform resource language) to share or exchange information in virtual communities and
network for example face book, twitter, and blogs. In Machine generated data information are
automatically generated from both hardware and software, for example computers and medical
devices. Sensor data are collected from various sensing devices and these are used to measure
physical quantities [7]. Transaction data involves a time dimension to illustrate the data, for
example, financial and business data. Finally IoT represents set of objects they are identified
uniquely as a part of internet i.e. smart phones and digital cameras.
Content format has three formats namely structured, unstructured and semi-structured. Structured
format is often managed by SQL and data resides in affixed field within a record or a file.
Unstructured format is often includes text and multimedia content, it is opposite to structured
data. Semi-structure format does not reside in a relational database [7]; it might include XML
documents and NOSQL database. Data stores classified into four categories such as document-
oriented, key-value, column-based and graph based. Document-oriented data are designed to store
7. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
7
and collect information and supports complex data whereas column –based data stores data in
row and column format. Key-value data store is an alternative to relational database which is
designed to scale very large data set and it can be accessed and stored easily. Finally graph based
data stores are designed to represent the graph model with edges, nodes and properties and these
are related to one another [8].
Data staging is classified into three forms; cleaning, transforming and normalization. Cleaning
identifies the incomplete data. Normalization is a method which minimizes redundancy.
Transform data staging which transfers data into suitable form. Finally data processing is based
on two types namely batch and real-time [9]. From the above analysis it is observed that content
format is suitable for all types of data like structure ,un structure and semi structured
4.2 Clusters in Big Data
A group of the identical elements closely together is known as clustering. Data clustering are also
known as cluster analysis or segment analysis which organizes a collection of n objects into a
partition or a hierarchy. The main aim of clustering is to classify data into clusters such that
objects are grouped in the same cluster when they are “similar” according to similarities, traits
and behavior. The most commonly used algorithms in clustering are partitioning, hierarchical,
grid based, density based, and model based algorithms. Partitioning algorithms is called as the
centroid based clustering. Hierarchical algorithms also called as the connectivity based clustering.
Density based clustering is based on the concept of data reachability and data connectivity. Grid
based clustering is based on the size of the grid instead of the data. Model based clustering
depends upon the probability distribution. Figure 2 represents the processing of data clustering
[4],[5],[6].
Figure.2.Processes of Data Clustering
The Clustering algorithm deals with a large amount of data. It is the most distinct feature that
demands specific requirements to all classical technologies and tools used. To guide the selection
of a suitable clustering algorithm with respect to the Volume property, the following criteria are
considered: size of the dataset, handling high dimensionality and handling outliers/noisy data.
Variety: refers to the ability of a clustering algorithm to handle different types of data (numerical,
categorical and hierarchical). It deals with the complexity of big data [7]. To guide the selection
of a suitable clustering algorithm with respect to the Variety property, the following criteria are
considered: Type of dataset and clusters shape. Velocity: refers to the speed of a clustering
8. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
8
algorithm on big data. Big Data are generated at high speed. To guide the selection of a suitable
clustering algorithm with respect to the Velocity property shows the criteria and Complexity of
algorithm. Many clustering algorithms are available few are listed below. [5][6][7].
K-means
Gaussian mixture models
Kernel K-means
Spectral Clustering
Nearest neighbor
Latent Dirichlet Allocation.
Figure 3 shows the various existing clustering algorithms.
Figure 3.Clustering Algorithm
4.3 Association Rules
Association rules are (if/then) statements that help to uncover relationships between seemingly
unrelated data in a transactional database, relational database or other information repository. An
association rule has two parts, an antecedent (if) and a consequent (then) [8]. An antecedent is an
item found in the data. A consequent is an item that is found in combination with the antecedent.
Association rules are created by analyzing data for frequent if/then patterns and using the
criteria support and confidence to identify the most important relationships. Support is an
indication of how frequently the items appear in the database. Confidence indicates the number of
times the if/then statements have been found to be true. In data mining, association rules are
useful for analyzing and predicting customer behaviour [9]. Association rule mining finds the
frequent patterns, associations, correlations, or causal structures among sets of items or objects in
transactional databases, relational databases and other information repositories. Association rules
are used for market basket data analysis, cross-marketing, catalog design, loss-leader analysis,
etc. Some of the properties of association rules are how items or objects are related to each other
and how they tend to group together, simple to understand (comprehensibility), provide useful
information (utilizability), efficient discovery algorithms (efficiency). Different types of
association rules are based on types of values handled i.e. Boolean association rules and
Quantitative association rules. Levels of abstraction are divided into either single-level
association rules or multilevel association rules. Dimensions of data involved into single-
dimensional association rules and multidimensional association rules [10].
9. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
9
4.4 BIG DATA VISUALIZATION
Big Data visualization is a processing by which numerical data are converted into meaningful 3-D
images. It is a presentation of pictorial or graphical format and which depends upon visual
representation such as graphics, tables, maps and charts which helps to understand more quickly
and easily. There are many tools in big data visualization namely polymaps, nodebox, flot,
processing, tangle, SAS visual analytics, linkscape, leafelet, crossfilter, openlayer [9].
Visualization techniques are classified into three different ways (i.e.) based upon the task, based
upon the structure of the data set or based on the dimension. Visualization can be classified as
whether the given data is spatial or non spatial or whether the displayed data to be in 2D or
3D.Visualization components can be either static or dynamic [10]
Visualization is used for spatial data and non-spatial data. For representing 2D or 3D data also
various visualization mechanisms are applied. The processing of data in visualization system can
be batch or interactive. The batch processing is used for analysis of set of images. In data
visualization interaction the user can interact in variety of ways which includes browsing,
sampling, querying and associative..Various methods are available in data visualization and it is
based on type of data, there are three types of data: Univariate, Bivariate and Multivariate.
Univariate measures the single quantitative variable, it characterizes distribution and it is
represented by two methods they are histogram and pie chart. Bivariate constitutes the sample
pair of two quantitative variables, they are related with each other. They are represented using
scatter plots and line graph methods .Multivariate data represents multidimensional data and it is
represented by icon based method, pixel based method and dynamic parallel coordinate
system.[8][9][10]
5. TOOLS FOR DATA VISUALIZATION
5.1. Dygraphs
Dygraphs is a fast, versatile open source JavaScript charting library .It is highly personalized and
designed to interpret dense data sets, it works in all browsers and it can be zoomed on mobile
and tablet devices. Few characteristics of Dygraphs they are used to handle huge datasets, highly
customizable; highly compatiable and they give strong support for error bar or confidence
interval. Dygraph chart is displayed in figure 2 [22].
Figure.2. Dygraph Chart
5.2. ZingChart
ZingChart is a powerful charting library and they have ability to create charts dashboards and
infographics. It is featured with -rich API set that allow user to built interactive Flash or HTML5
10. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
10
charts. It provide hundreds of chart variation and many methods For Example Bar, Scatter,
Radar, Piano, Gauge, Sparkline, Mixed, Rank flow and word cloud. Figure shows Zing chart.[22]
Figure.3. Zing Chart
5.3. Polymaps
Polymaps is a free java script charting library for image and vector- tiled maps using Scable
Vector Graphics (SVG).They provide dynamic and interactive maps in web browsers. Complex
data sets can be visualized using polymaps and offers multi-zoom functionality. The
characteristics of polymaps are it uses Scalable Vector Graphics (SVG) and the Basic CSS rules
are used and its imagery in spherical Mercator tile format. Figure 4 shows the layout of
Polymaps. [22]
Figure.4. Polymaps
5.4. Timeline
Timeline is a different tool which delivers an effective and interactive timeline that responds to
the user's mouse, it delivers lot of information in a compressed space. Each element can be
clicked to reveal more in-depth information; it gives a big-picture view with full detail. Timeline
is demonstrated in figure 5.[22]
Figure.5. Timeline
5.5. Exhibit
Exhibit is an open-source data visualization and it is developed by MIT, and Exhibit makes it
easy to create interactive maps, and other data-based visualizations measure oriented towards
teaching or static/historical based mostly knowledge sets like birth-places of notable persons.
Sample model is shown in figure 6.[22]
11. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
11
Figure .6.Sample Model of Exhibit.
5.6. Modest Maps
Modest Maps is a lightweight, simple mapping data visualization tool for web designers. It has
highest performance and compatibility with new technology and has well designed codes which
are tested and deployed widely. Modest Map is given in figure 7 [22].
Figure.7 Modest Map
5.7. Leaflet
Leaflet is an open source java script tool developed for interactive data visualization in an
HTML5/CSS3.Leaflet tool is designed with clarity, performance and mobilization. Few
visualizing features are given zooming and planning animation such as multi touch and double
tap zoom, hardware acceleration on IOS and utilizing CSS3 features. Figure 8 shows Leaflet
structure [22].
Figure.8 Leaflet structure
5.8. Visual.ly
Visual.ly is a combined gallery and infographic generation tool. It provides simple toolset for
building data representations and platform to share creations. This goes above pure data
visualisation, representation of visual.ly is displayed in figure 9[22].
12. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
12
Figure.9 Visual.ly
5.9. Visualize Free
Visualize Free is a free visualize analysis tool that allows user publicly available datasets, or
upload own, and build interactive visualizations to define the data. Visualize free works with
Flash; HTML5.Visualization is a perfect tool for sifting with multi dimensional data. Figure 10
shows the representation of visual free [22].
Figure.10 Representation of Visual Free
5.10. jQuery Visualize
jQuery Visualize Plugin is an open source charting plugin and creating accessible charts and
graphs. It uses HTML5 canvas elements and generates bar, line, area, pie charts for visualization.
Figure 11 demonstrates the jQuery Visualize [22].
Figure.11 Demonstration of jQuery Visualize
5.11. JqPlot
JqPlot is a good tool for line and point charts. Features of JqPlot provide different style options,
customized formatting, and automatic line computation, highlighting tooltips and data points. It
has the ability to generate automatic trend lines and interactive points according to dataset. Figure
12 represents JqPlot [22].
13. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
13
Figure.12 jqPlot
5.12. Many Eyes
Many Eyes was developed by IBM It is a web based tool and it is used for structure and
unstructured data analysis, Tool data set is in text file and uploaded from spreadsheet Many Eyes
tool allows user to build visualizations quickly from available or uploaded data sets.
Demonstration of Many eyes is shown in figure 13 [22].
Figure.13 Many Eyes
5.13. JavaScript InfoVis Toolkit
JavaScript InfoVis is a open source Toolkit which includes a compatible structure. It allows user
to download absolutely necessary data and displays chosen data for visualizations. The toolkit has
a number of different styles and classy animation effects. JavaScript InfoVis is demonstrated in
figure 14 [22].
Figure.14 jqPlot
5.14. JpGraph
JpGraph is an object-oriented graph creating library for PHP-based data visualization tool. It
Generates drill down graphs and large range of charts like pie, bar, line, scatter point and impulse
. Some features of JpGraph are web friendly; automatically generates client-side image maps. It
supports alpha blending, flexible scale, support integer, linear, logarithms and multiple Y- axes.
Figure 15 shows the JpGraph representation [22].
14. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
14
Figure.15 Representation of JpGraph
5.15. Highcharts
Highcharts is an open source JavaScript charting library and has a massive range of chart options.
The output is performed using SVG and VML. The charts are animated, displayed automatically
and live data streams are supported by framework. Figure 16 displays Highcharts [22].
Figure.16 Representation of Highcharts
5.16. R
R is an effective free software tool for statistical computing and graphics and integrated with data
handling, Calculation and graphical display. R is similar to S language which handles effective
data and storage. R has its own documentation like LaTeX format. Figure 17 shows the layout of
R [22]
Figure.17 Layout of R
5.17. WEKA
WEKA is an open source software and collection of machine-learning algorithms assigned for
data-mining, Weka is a excellent tool for classifying and clustering data using many attributes. It
explores data and generates simple plots in a powerful way. Figure 18 explains the representation
of WEKA [22].
15. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
15
Figure.18 Representation of WEKA
5.18. Flot
Flot is a specially designed for plotting library for jQuery, it works with all common web
browsers and has many handy features. Data is animated and fully controlled in all the aspects of
animation, presentation and user interaction. Interactive charts can be created using Flot tool.
Figure 19 shows the demonstration of Flot [22].
Figure.19 Demonstration of Flot
5.19. RAPHAEL
RAPHAEL is a tool which provides a wide range of data visualization options rendered using
SVG.It works with vector graph on web. RAPHAEL tool can be easily integrated with own web
site and codes. The supporting web browsers for RAPHAEL tools are Internet Explorer6.0+,
firefox 3.0+, Safari 3.0+, Chrome 5.0+ and Opera 9.5.Model of RAPHAEL is shown in figure 20
[22].
Figure.20 Models of RAPHAEL
5.20. Crossfilter
Crossfilter is an interactive GUI tool for massive volume of data and it reduces the input range on
any one chart. This is a powerful tool for dashboards or other interactive tools with large volumes
of data .It displays data, but at the same time, it restricts the range of the data and displays the
other linked charts. Representation of Crossfilter is shown in figure 21 [22].
16. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
16
Figure.21 Representation of Crossfilter
Table 2 represents the characteristics, advantages and disadvantages of the data
visualization tools.
Table 2: Comparison of data visualization tools
S.No Tool Name Characteristics Advantages Dis-Advantages
1. Crossfilter Exploring large multivariate
datasets
Extremely fast Restricts the range of the
data and displays the
other linked charts
2.
Dygraphs
Handles huge data sets
Zoom able charts
Interactions are
easily
discoverable
Supports only limited
web browsers
3. Exhibit easily create Web pages
with advanced text search
and filtering functionalities,
with interactive maps,
timelines and other
visualization
Can easily sort
out data and
present them
any way we
like
Newcomers unused to
coding visualizations, it
takes time to get familiar
with coding and library
syntax
4. Flot Plot categories and textual
data
Supports lines,
plots, filled
areas in any
combination
Not applicable for large
dataset
5.
Highcharts
It is a very powerful tool It supports
backward
compatibility
with IE8
Output is performed using
only SVG and VML
6. JavaScript
InfoVis
Toolkit
Uses high
polished
graphics for data analysis
Unique chart
types Ability to
interact with
animated charts
and graphs
This might not be a good
fit for users in an
organization who analyze
data but don't know how
to program.
7.
JpGraph
It support various types plot
types like, filled line, line
error, bar ,box, stock plots
Supports alpha
blending,
advanced
gantt-charts
Difficult for new users .It
takes time to get familiar
with coding
8. JqPlot It support line pie and bar
charts
It is the
extension of
JQuery which
meets all data
visualization
needs
Not suitable for rapid
visualization
17. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
17
9. jQuery
Visualize
Focus on ARIA support,
user friendly to screen
readers
Developers can
completely
separates java
script code
from HTML.
It uses only HTML5 for
designing.
10. Leaflet Eliminates tap delay on
mobile devices
Works on all
major desktop
and mobile
browsers
Difficult for new users.
11. Many Eyes Multiple ways to display
data
Upload data
sets for public
use
It is difficult to use in
large dataset
12. Modest Maps Used with several
extensions, such as
MapBox.js, HTMAPL, and
Easey
Designed to
provide basic
controls and
building
mapping tools
Support only limited
applications.
13. Polymaps display complex data sets Uses Scalable
Vector Graphic
It is ideal only for
zooming in and out of
form levels
14.
R
R is a general statistical
analysis platform
R also results
in graphs,
charts and plots
It is difficult to use in
large dataset
15.
RAPHAEL
Multi-chart capabilities Create a
variety of
charts, graphs
and other data
visualizations
It is not easy to customize
16. Timeline Display events as sequential
time lines
Embed audio
and video in
timelines from
3rd-party apps
Build timelines using
only Google Spreadsheet
data
17. Visual.ly Infographic generation tools It is specially
designed to
develop
simple toolset
representation
Difficult for new users
18. Visualize Free Upload data in Excel or
CSV formats
Drag-and-drop
components to
build
visualizations
Uses
Sandboxes for
data analysis
Not applicable for large
dataset
19. WEKA WEKA is a collection tools
for data pre-processing,
classification, regression,
clustering, association, and
visualization
Free
availability
Portability
Comprehensive
collection of
data
preprocessing
and modeling
techniques
Sequence modeling is not
covered by the algorithms
included in the Weka
distribution. Not capable
of multi-relational data
mining. Memory bound
18. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
18
6. VISUALIZATION ALGORITHMS
Digital data are visualized in digital form with the help of visualization concept. There are various
data source to display the digital data with the help of equipment for example Antennas.
Visualization has issues in digital signal to overcome this problem algorithms are applied to raw
data, digital 3D data and various digital equipments produces digital dataset [19]. Data should be
represented in discrete form; data objects are classified into two categories organizing structure
and data attribute whereas organizing structure determines the spatial location of the data and
describes the topology and geometry on which data is illustrated and they are specified as cells
and points. Cell is defined as an ordered sequences of points and the type of cells namely vertex,
poly vertex, triangle, line, poly line, pixel, voxel and tetraheder which characterizes the sequence
of points and number of points which specifies the size of the cell. Data attributes determines the
format of the data [20]. Data attributes are commonly described as scalar, vector and texture
coordinates etc. The Visualization algorithm has two sets of transformations. In Figure 3
visualization algorithm is explained. First set of transformation converts the data or sub data into
virtual scene. These transformations are applied to structure and data type of the data. The second
transformation is done when the virtual scene is created it consists of geometrical objects,
textures and computer graphics. Transformations are applied to form images [21].
Figure 3. Visualization Algorithm
The main objectives of visualization are understanding data clearly which Is recorded, Graphical
representation, Placing the search query to find the location of the data, Discovering hidden
patterns, perceptibility of data items [22].The characteristics of transformation algorithm are
explained in Figure 4, transformation algorithms are characterized by structure and type. The
Structure transformation has two formats such as topology and geometric. Topology represent
changes in topology for example conversion of polygon data into unstructured format.Geometric
format represent changes in coordinate , scaling,rotation,translation and geomentry[22]. Few
examples for transformations are scalar algorithm, vector algorithm, and modeling algorithm.
20. ZingChart It supports larger dataset
ranging from 10k to 5000k+
High performance
It uses more
than 100 types
charts to fit the
data
Difficult to customize
19. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
19
Figure 4. Characteristics of Transformation Algorithm
Big data visualization has overcome the five massive challenges of Big Data:
1. Increasing data acceleration.
2. Understanding the data
3. Delivering the data quality
4. Significant results are displayed
5. Handling outliers
Challenges of big data
• Issues related with storage and data processing
• Data obtaining and information sharing
• Human recognition and restricted screen space
• Acquire useful information
• Storing massive amount of data
• Combining data to extract meaningful information
• Querying ,data modeling and analysis
• Elaboration of data
7. CONCLUSION AND RESEARCH TRENDS
This paper is envisioned with big data tools, techniques, issues related with big data. It also
focused and provided the information about how to perform big data visualization. Research
trends in big data, operations of big data such as storage, search and retrieval, big data analytics
and computations on big data are discussed, where storage requires managing capacity, finding
out best collection and retrieval methods and synchronizes both IT and business team, it also
focuses on complex security and privacy issues. Big data analytics focuses on tools, algorithm,
and architecture which perform proper analysis and transfer large and massive volume of data.
Computing deals with processing, transforming, handling and information storage. This paper has
reviewed basic concepts of big data, its applications and research issues.
20. Informatics Engineering, an International Journal (IEIJ), Vol.4, No.3, September 2016
20
REFERENCES
1. Neelam Singh, Neha Garg, Varsha Mittal, Data – insights, motivation and challenges, Volume 4,
Issue 12, December-2013, 2172, ISSN 2229-5518 2013.
2. Karthik Kambatlaa, Giorgos Kollias b, Vipin Kumarc, Ananth Gramaa, Trends in big data
Analytics, (2014) 74 2561–2573
3. Francis X. “On the Origin(s) and Development of the Term Big Data"_ Francis X., 2012
4. Venkata narasimha inukollu1, sailaja arsi1 and srinivasa rao ravuri3 Security issues associated
with big data in cloud computing Vol.6, No.3, May 2014
5. Matzat1, Ulf-Dietrich Reips2,3 1 Eindhoven “Big Data” 2012, 7 (1), 1–5 ISSN 1662-5544
6. Hong Kong, Park Shatin, Mining Big Data: Current Status, and Forecast to the Future
7. Anil K. Jain Clustering Big Data, 2012
8. Daniel Keim Big-Data Visualization.
9. Hsinchun Chen Business Intelligence And Analytics: From Big Data To Big Impact AZ 85721,
OH 45221-0211 U.S.A. Mack Robinson, GA 30302-4015.
10. Ibrahim Abaker Targio Hashema,n, Ibrar Yaqooba, Nor Badrul Anuara, Salimah Mokhtara,
Abdullah Gania, Samee Ullah Khanb, The rise of “big data” on cloud computing: Review and
open research issues. 2014
11. Edd Dumbill, Making Sense of Big Data
12. Silva Robak , prof. Z. Szafrana, Zielona Góra Uniwersytet Zielonogórski Research Problems
Associated with Big Data Utilization in Logistics and Supply Chains Design and Management
2014 249 DOI: 10.15439/2014F472
13. C.L. Philip Chen , Chun-Yang Zhang Data-intensive applications, challenges, techniques and
technologies: A survey on Big Data 275 (2014) 314–347
14. Chaitanya Baru,1 Milind Bhandarkar,2Raghunath Nambiar,3 Meikel Poess,4and Tilmann Rabl
Survey of Recent Research Progress and Issues in Big Data 2013.
15. Tackling the Challenges of Big Data 2014.
16. Stephen Kaisleri_SW. Alberto Espinosa Big Data: Issues and Challenges Moving Forward
Stephen Kaisleri_SW. Alberto Espinosa 013 46th Hawaii International Conference on System
Sciences
17. Challenges and Opportunities with Big Data A community white paper developed by leading
researchers across the United States 1819
18. Danyang Dua , Aihua Lia*, Survey on the Applications of Big Data in Chinese Real Estate
Enterprise 1st International Conference on Data Science,2014
19. Shilpa, Manjit Kaur challenges and issues during visualization of big data , International Journal
For Technological Research In Engineering Volume 1, Issue 4, December - 2013 ISSN (Online) :
2347 - 4718
20. http://fellinlovewithdata.com/research/the-role-of-algorithms-in-data-visualization
21. http://prosjekt.ffi.no/unik-4660/lectures04/chapters/Algorithms2.html
22. http://www.creativebloq.com/design-tools/data-visualization-712402