Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues.
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks. Hadoop is an open source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Mr. Ketan Bagade | Mrs. Anjali Gharat | Mrs. Helina Tandel "A Review Paper on Big Data and Hadoop for Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29816.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29816/a-review-paper-on-big-data-and-hadoop-for-data-science/mr-ketan-bagade
I have collected information for the beginners to provide an overview of big data and hadoop which will help them to understand the basics and give them a Start-Up.
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to
manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully
balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The
research discusses the Map Reduce issues, framework for Map Reduce programming model and
implementation. The paper includes the analysis of Big Data using Map Reduce techniques and identifying
a required document from a stream of documents. Identifying a required document is part of the security in
a stream of documents in the cyber world. The document may be significant in business, medical, social, or
terrorism.
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks. Hadoop is an open source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Mr. Ketan Bagade | Mrs. Anjali Gharat | Mrs. Helina Tandel "A Review Paper on Big Data and Hadoop for Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29816.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29816/a-review-paper-on-big-data-and-hadoop-for-data-science/mr-ketan-bagade
I have collected information for the beginners to provide an overview of big data and hadoop which will help them to understand the basics and give them a Start-Up.
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to
manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully
balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The
research discusses the Map Reduce issues, framework for Map Reduce programming model and
implementation. The paper includes the analysis of Big Data using Map Reduce techniques and identifying
a required document from a stream of documents. Identifying a required document is part of the security in
a stream of documents in the cyber world. The document may be significant in business, medical, social, or
terrorism.
In this paper, we discuss about the Big Data. We
analyze and reveals the benefits of Big Data. We analyze the
big data challenges and how Hadoop gives solution to it. This
research paper gives the comparison between relational
databases and Hadoop. This research paper also gives reason
of why Big Data and Hadoop.
General Terms
Data Explosion, Big Data, Big Data Analytics, Hadoop, Hadoop
Distributed File System, MapReduce
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
Abstract: Big data refers to the organizational data asset that exceeds the volume, velocity, and variety of data typically stored using traditional structured database technologies. This type of data has become the important resource from which organizations can get valuable insightand make business decision by applying predictive analysis. This paper provides a comprehensive view of current status of big data development,starting from the definition and the description of Hadoop and MapReduce – the framework that standardizes the use of cluster of commodity machines to analyze big data. For the organizations that are ready to embrace big data technology, significant adjustments on infrastructure andthe roles played byIT professionals and BI practitioners must be anticipated which is discussed in the challenges of big data section. The landscape of big data development change rapidly which is directly related to the trend of big data. Clearly, a major part of the trend is the result ofthe attempt to deal with the challenges discussed earlier. Lastly the paper includes the most recent job prospective related to big data. The description of several job titles that comprise the workforce in the area of big data are also included.
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well outside data. There are many techniques and technologies used in Big Data mining to extract the useful information from the distributed system. It is more powerful to extract the information compare with traditional data mining techniques. One of the most known technologies is Hadoop, used in Big Data mining. It takes many advantages over the traditional data mining technique but it has some issues like visualization technique, privacy etc.
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at info@uplatz.com
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillClaraZara1
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The research discusses the MapReduce issues, framework for MapReduce programming model and implementation. The paper includes the analysis of Big Data using MapReduce techniques and identifying a required document from a stream of documents. Identifying a required document is part of the security in a stream of documents in the cyber world. The document may be significant in business, medical, social, or terrorism.
In this paper, we discuss about the Big Data. We
analyze and reveals the benefits of Big Data. We analyze the
big data challenges and how Hadoop gives solution to it. This
research paper gives the comparison between relational
databases and Hadoop. This research paper also gives reason
of why Big Data and Hadoop.
General Terms
Data Explosion, Big Data, Big Data Analytics, Hadoop, Hadoop
Distributed File System, MapReduce
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
Abstract: Big data refers to the organizational data asset that exceeds the volume, velocity, and variety of data typically stored using traditional structured database technologies. This type of data has become the important resource from which organizations can get valuable insightand make business decision by applying predictive analysis. This paper provides a comprehensive view of current status of big data development,starting from the definition and the description of Hadoop and MapReduce – the framework that standardizes the use of cluster of commodity machines to analyze big data. For the organizations that are ready to embrace big data technology, significant adjustments on infrastructure andthe roles played byIT professionals and BI practitioners must be anticipated which is discussed in the challenges of big data section. The landscape of big data development change rapidly which is directly related to the trend of big data. Clearly, a major part of the trend is the result ofthe attempt to deal with the challenges discussed earlier. Lastly the paper includes the most recent job prospective related to big data. The description of several job titles that comprise the workforce in the area of big data are also included.
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well outside data. There are many techniques and technologies used in Big Data mining to extract the useful information from the distributed system. It is more powerful to extract the information compare with traditional data mining techniques. One of the most known technologies is Hadoop, used in Big Data mining. It takes many advantages over the traditional data mining technique but it has some issues like visualization technique, privacy etc.
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at info@uplatz.com
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillClaraZara1
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The research discusses the MapReduce issues, framework for MapReduce programming model and implementation. The paper includes the analysis of Big Data using MapReduce techniques and identifying a required document from a stream of documents. Identifying a required document is part of the security in a stream of documents in the cyber world. The document may be significant in business, medical, social, or terrorism.
Big Data refers to the bulk amount of data while Hadoop is a framework to process this data.
There are various technologies and fields under Big Data. Big Data finds its applications in various areas like healthcare, military and various other fields.
http://www.techsparks.co.in/thesis-topics-in-big-data-and-hadoop/
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
The emerging phenomenon called ―Big Data‖ is pushing numerous changes in businesses and several other organizations, Domains, Fields, areas etc. Many of them are struggling just to manage the massive data sets. Big data management is about two things - ―Big data‖ and ―Data Management‖ and these terms work together to achieve business and technology goals as well. In previous few years data generation have tremendously enhanced due to digitization of data. Day by day new computer tools and technologies for transmission of data among several computers through Internet is been increasing. It‗s relevance and importance in the context of applicability, usefulness for decision making, performance improvement etc in all areas have emerged very fast to be relevant in today‗s era. Big data management also has numerous challenges and common complexities include low organizational maturity relative to big data, weak business support, and the need to learn new technology approaches. This paper will discuss the impacts of Big Data and issues related to data management using current technologies
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAijp2p
The objective of the proposed system is to integrate the high volume of data along with the important
considerations like monitoring a wide array of heterogeneous security. When a real time cyber attack
occurred, the Intrusion Detection System automatically store the log in distributed environment and
monitor the log with existing intrusion dictionary. At the same time the system will check and categorize the
severity of the log to high, medium, and low respectively. After the categorization, the system will
automatically take necessary action against the user-unit with respect to the severity of the log. The
advantage of the system is that it utilize anomaly detection, evaluates data and issue alert message or
reports based on abnormal behaviour.
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well outside data. There are many techniques and technologies used in Big Data mining to extract the useful information from the distributed system. It is more powerful to extract the information compare with traditional data mining techniques. One of the most known technologies is Hadoop, used in Big Data mining. It takes many advantages over the traditional data mining technique but it has some issues like visualization technique, privacy etc.
Hadoop was born out of the need to process Big Data.Today data is being generated liked never before and it is becoming difficult to store and process this enormous volume and large variety of data, In order to cope this Big Data technology comes in.Today Hadoop software stack is go-to framework for large scale,data intensive storage and compute solution for Big Data Analytics Applications.The beauty of Hadoop is that it is designed to process large volume of data in clustered commodity computers work in parallel.Distributing the data that is too large across the nodes in clusters solves the problem of having too large data sets to be processed onto the single machine.
Big data is data that, by virtue of its velocity, volume, or variety (the three Vs), cannot be easily stored or analyzed with traditional methods. Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
As a primary data mining method for knowledge discovery, clustering is a technique of classifying a dataset into groups of similar objects. The most popular method for data clustering K-means suffers from the drawbacks of requiring the number of clusters and their initial centers, which should be provided by the user. In the literature, several methods have proposed in a form of k-means variants, genetic algorithms, or combinations between them for calculating the number of clusters and finding proper clusters centers. However, none of these solutions has provided satisfactory results and determining the number of clusters and the initial centers are still the main challenge in clustering processes. In this paper we present an approach to automatically generate such parameters to achieve optimal clusters using a modified genetic algorithm operating on varied individual structures and using a new crossover operator. Experimental results show that our modified genetic algorithm is a better efficient alternative to the existing approaches.
Development of a Wireless Sensors Network for Greenhouse Monitoring and Controlijeei-iaes
Wireless sensor networks (WSN) could be used to monitor and control many parameters of environment such as temperature, humidity, and radiation leakage. In greenhouse the weather and soil should be independent of the natural agents. To achieve this condition a wireless sensor nodes could be deployed and communicate with a central base station to measure and transmit the sensed required environment factors. In this paper a WSN was implemented by deployed wireless sensor nodes in a greenhouse with temperature, humidity, moisture light, and CO2 sensors. The proposed model was built and tested, and the result shows an excellent improvement in the sensed parameters. To control the environmental factors, the used microcontroller programmed to control the parameters according to preset values, or manually through a user interface panel.
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurgeijeei-iaes
Wireless network is ready for hundreds or thousands of nodes, where each node is connected to one or sometimes more sensors. WSN sensor integrated circuits, embedded systems, networks, modems, wireless communication and dissemination of information. The sensor may be an obligation to technology and science. Recent developments underway to miniaturization and low power consumption. They act as a gateway, and prospective clients, I usually have the data on the server WSN. Other components separate routing network routers, called calculating and distributing routing tables. Discussed the routing of wireless energy balance. Optimization solutions, we have created a genetic algorithm. Before selecting an algorithm proposed for the construction of the center console. In this study, the algorithms proposed model simulated results based on "parameters depending dead nodes, the number of bits transmitted to a base station, where the number of units sent to the heads of fuel consumption compared to replay and show that the proposed algorithm has a network of a relative.
Design for Postplacement Mousing based on GSM in Long-Distanceijeei-iaes
This design for mousing is made up of power control module, infrared sensor module, signal processing module, distance information transportation based on GSM and device of power grid. The design consists of two sets of conductors, separately linked by fire wire and null line and distributing alternatively. The major innovation is infrared sensor module with Fresnel lens, and that the infrared detecting area should be spread in one direction at least. When the mouse get into the infrared detecting area, the sensor signal of infrared detecting device is sent to power control module through signal element and then starts the device of power grid to power up to make the mouse be shocked or die. GSM module is adopted to tell that the mouse is caught successfully. This design can be placed in any position that the mouse is always out and no need of baits.
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...ijeei-iaes
This paper presents a concept of two types multilevel inverters such as diode clamped and cascaded H-bridge for harmonic reduction on high power applications. Normally, multilevel inverters can be used to reduce the harmonic problems in electrical distribution systems. This paer focused on the performance and analysis of a three phase seven level inverter including diode clamped and cascaded H-bridge based on new tripizodal triangular space vector PWM technique approaches. TTMC based modified Space vector Pulse width modulation technique so called tripizodal triangular Space vector Pulse width modulation (TTMC-SVPWM) technique. In this paper the reference sine wave generated as in case of conventional off set injected SVPWM technique. It is observed that the TTMC-Space vector pulse width modulation ensures excellent, close to optimized pulse distribution results and THD is compared to seven level, diode clamped and cascaded multi level inverters. Theoretical investigations were confirmed by the digital simulations using MATLAB/SIMULINK software.
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...ijeei-iaes
One of the concerns of power system planners is the problem of optimum cost of generation as well as loss minimization on the grid system. This issue can be addressed in a number of ways; one of such ways is the use of reactive power support (shunt capacitor compensation). This paper used the method of shunt capacitor placement for cost and transmission loss minimization on Nigerian power grid system which is a 24-bus, 330kV network interconnecting four thermal generating stations (Sapele, Delta, Afam and Egbin) and three hydro stations to various load points. Simulation in MATLAB was performed on the Nigerian 330kV transmission grid system. The technique employed was based on the optimal power flow formulations using Newton-Raphson iterative method for the load flow analysis of the grid system. The results show that when shunt capacitor was employed as the inequality constraints on the power system, there is a reduction in the total cost of generation accompanied with reduction in the total system losses with a significant improvement in the system voltage profile
Mitigation of Power Quality Problems Using Custom Power Devices: A Reviewijeei-iaes
Electrical power quality (EPQ) in distribution systems is a critical issue for commercial, industrial and residential applications. The new concept of advanced power electronic based Custom Power Devices (CPDs) mainly distributed static synchronous compensator (D-STATCOM), dynamic voltage restorer (DVR) and unified power quality conditioner (UPQC) have been developed due to lacking the performance of traditional compensating devices to minimize power quality disturbances. This paper presents a comprehensive review on D-STATCOM, DVR and UPQC to solve the electrical power quality problems of the distribution networks. This is intended to present a broad overview of the various possible DSTATCOM, DVR and UPQC configurations for single-phase (two wire) and three-phase (three-wire and four-wire) networks and control strategies for the compensation of various power quality disturbances. Apart from this, comprehensive explanation, comparison, and discussion on D-STATCOM, DVR, and UPQC are presented. This paper is aimed to explore a broad prospective on the status of D-STATCOMs, DVRs, and UPQCs to researchers, engineers and the community dealing with the power quality enhancement. A classified list of some latest research publications on the topic is also appended for a quick reference.
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...ijeei-iaes
Consumer utilities are non –linear in nature. This injects increased flow of current and reduced voltage with distortions which cause adverse effect on the stability of consumer utilities. To overcome this problem we are using a modern Flexible Alternating Current Transmission System controller i.e. distributed power flow controller (DPFC). This controller is similar to UPFC, which can be installed in a transmission line between the two electrical areas. In DPFC, instead of the common Dc link capacitor three single phase converters are used. In this paper we are concentrating on system stability (oscillation damping). For analyzing the stability of a single machine infinite bus system (SMIB) we have used PI controlled Distributed Power Flow Controller (DPFC) and Fuzzy controlled DPFC. All these models are simulated using MATLAB/SIMULINK. Simulation results shows Fuzzy controlled DPFC are better than PI controlled DPFC. The significance of the results are better stability and constant power supply.
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...ijeei-iaes
This paper proposes Embellished Particle Swarm Optimization (EPSO) algorithm for solving reactive power problem .The main concept of Embellished Particle Swarm Optimization is to extend the single population PSO to the interacting multi-swarm model. Through this multi-swarm cooperative approach, diversity in the whole swarm community can be upheld. Concurrently, the swarm-to-swarm mechanism drastically speeds up the swarm community to converge to the global near optimum. In order to evaluate the performance of the proposed algorithm, it has been tested in standard IEEE 57,118 bus systems and results show that Embellished Particle Swarm Optimization (EPSO) is more efficient in reducing the Real power losses when compared to other standard reported algorithms.
Intelligent Management on the Home Consumers with Zero Energy Consumptionijeei-iaes
The energy and environment crisis has forced modern humans to think about new and clean energy sources and in particular, renewable energy sources. With the development of home network, the residents have the opportunity to plan the home electricity usage with the goal of reducing the cost of electricity. In this regard, to improve the energy consumption efficiency in residential buildings, smart buildings with zero energy consumption were considered as a proper option. Zero-energy building is a building that has smart equipment whose integral of generated and consumed power within a year is zero. In this article, smart devices submit their power consumption with regard to the requested activity associated with the user’s time setting for run times and end times of the work to the energy management unit and ultimately the time to start work will be determined. The problem’s target function is reducing the energy cost for the consumer with taking into account the applicable limitations.
Analysing Transportation Data with Open Source Big Data Analytic Toolsijeei-iaes
Big data analytics allows a vast amount of structured and unstructured data to be effectively processed so that correlations, hidden patterns, and other useful information can be mined from the data. Several open source big data analytic tools that can perform tasks such as dimensionality reduction, feature extraction, transformation, optimization, are now available. One interesting area where such tools can provide effective solutions is transportation. Big data analytics can be used to efficiently manage transport infrastructure assets such as roads, airports, bus stations or ports. In this paper an overview of two open source big data analytic tools is first provided followed by a simple demonstration of application of these tools on transport dataset.
A Pattern Classification Based approach for Blur Classificationijeei-iaes
Blur type identification is one of the most crucial step of image restoration. In case of blind restoration of such images, it is generally assumed that the blur type is known prior to restoration of such images. However, it is not practical in real applications. So, blur type identification is extremely desirable before application of blind restoration technique to restore a blurred image. An approach to categorize blur in three classes namely motion, defocus, and combined blur is presented in this paper. Curvelet transform based energy features are utilized as features of blur patterns and a neural network is designed for classification. The simulation results show preciseness of proposed approach.
Computing Some Degree-Based Topological Indices of Grapheneijeei-iaes
Graphene is one of the most promising nanomaterial because of its unique combination of superb properties, which opens a way for its exploitation in a wide spectrum of applications ranging from electronics to optics, sensors, and bio devices. Inspired by recent work on Graphene of computing topological indices, here we compute new topological indices viz. Arithmetic-Geometric index (AG2 index), SK3 index and Sanskruti index of a molecular graph G and obtain the explicit formulae of these indices for Graphene.
A Lyapunov Based Approach to Enchance Wind Turbine Stabilityijeei-iaes
This paper introduces a nonlinear control of a wind turbine based on a Double Feed Induction Generator. The Rotor Side converter is controlled by using field oriented control and Backstepping strategy to enhance the dynamic stability response. The Grid Side converter is controlled by a sliding mode. These methods aim to increase dynamic system stability for variable wind speed. Hence, The Doubly Fed Induction Generator (DFIG) is studied in order to illustrate its behavior in case of severe disturbance, and its dynamic response in grid connected mode for variable speed wind operation. The model is presented and simulated under Matlab/ Simulink.
Fuzzy Control of a Large Crane Structureijeei-iaes
The usage of tower cranes, one type of rotary cranes, is common in many industrial structures, e.g., shipyards, factories, etc. With the size of these cranes becoming larger and the motion expected to be faster and has no prescribed path, their manual operation becomes difficult and hence, automatic closed-loop control schemes are very important in the operation of rotary crane. In this paper, the plant of concern is a tower crane consists of a rotatable jib that carries a trolley which is capable of traveling over the length of the jib. There is a pendulum-like end line attached to the trolley through a cable of variable length. A fuzzy logic controller with various types of membership functions is implemented for controlling the position of the trolley and damping the load oscillations. It consists of two main types of controllers radial and rotational each of two fuzzy inference engines (FIEs). The radial controller is used to control the trolley position and the rotational is used for damping the load oscillations. Computer simulations are used to verify the performance of the controller. The results from the simulations show the effectiveness of the method in the control of tower crane keeping load swings small at the end of motion.
Site Diversity Technique Application on Rain Attenuation for Lagosijeei-iaes
This paper studied the impact of site diversity (SD) as a fade mitigation technique on rain attenuation at 12 GHz for Lagos. SD is one of the most effective methods to overcome such large fades due to rain attenuation that takes advantage of the usually localized nature of intense rainfall by receiving the satellite downlink signal at two or more earth stations to minimize the prospect of potential diversity stations being simultaneously subjected to significant rain attenuation. One year (January to December 2011) hourly rain gauge data was sourced from the Nigerian Meteorological Agency (NIMET) for three sites (Ikeja, Ikorodu and Marina) in Lagos, Nigeria. Significant improvement in both performance and availability was observed with the application of SD technique; again, separation distance was seen to be responsible for this observed performance improvements.
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...ijeei-iaes
Land mobile communication is burdened with typical propagation constraints due to the channel characteristics in radio systems.Also,the propagation characteristics vary form place to place and also as the mobile unit moves,from time to time.Hence,the tramsmission path between transmitter and receiver varies from simple direct LOS to the one which is severely obstructed by buildings, foliage and terrain. Multipath propagation and shadow fading effects affect the signal strength of an arbitrary Transmitter-Receiver due to the rapid fluctuations in the phase and amplitude of signal which also determines the average power over an area of tens or hundreds of meters. Shadowing introduces additional fluctuations, so the received local mean power varies around the area –mean. The present paper deals with the performance analysis of impact of next generation wireless cognitive radio network on wireless green eco system through signal and interference level based k coverage probability under the shadow fading effects.
Music Recommendation System with User-based and Item-based Collaborative Filt...ijeei-iaes
Internet and E-commerce are the generators of abundant of data, causing information Overloading. The problem of information overloading is addressed by Recommendation Systems (RS). RS can provide suggestions about a new product, movie or music etc. This paper is about Music Recommendation System, which will recommend songs to users based on their past history i.e. taste. In this paper we proposed a collaborative filtering technique based on users and items. First user-item rating matrix is used to form user clusters and item clusters. Next these clusters are used to find the most similar user cluster or most similar item cluster to a target user. Finally songs are recommended from the most similar user and item clusters. The proposed algorithm is implemented on the benchmark dataset Last.fm. Results show that the performance of proposed method is better than the most popular baseline method.
A Real-Time Implementation of Moving Object Action Recognition System Based o...ijeei-iaes
This paper proposes a PixelStreams-based FPGA implementation of a real-time system that can detect and recognize human activity using Handel-C. In the first part of our work, we propose a GUI programmed using Visual C++ to facilitate the implementation for novice users. Using this GUI, the user can program/erase the FPGA or change the parameters of different algorithms and filters. The second part of this work details the hardware implementation of a real-time video surveillance system on an FPGA, including all the stages, i.e., capture, processing, and display, using DK IDE. The targeted circuit is an XC2V1000 FPGA embedded on Agility’s RC200E board. The PixelStreams-based implementation was successfully realized and validated for real-time motion detection and recognition.
Wireless Sensor Network for Radiation Detectionijeei-iaes
n this paper a wireless sensor network (WSN) is designed from a group of radiation detector stations with different types of sensors. These stations are located in different areas and each sensor transmits its data through GSM network to the main monitoring and control station. The design includes GPS module to determine the location of mobile and fixed station. The data is transmitted with GSM/GPRS modem. Instead of using traditional SMS data string or word messages a digital data frame is constructed and transmitted as SMS data. In the main monitoring station graphical user interface (GUI) software is designed to shows information and statues of the all stations in the network. It reports any radiation leaks, in addition to the data; the GUI contains a geographical map to display the location of the leakage station and can control the stations power consumption by sending a special command to it.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Big Data-Survey
1. Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
Vol. 4, No. 1, March 2016, pp. 74~80
ISSN: 2089-3272, DOI: 10.11591/ijeei.v4i1.195 74
Received October 9, 2015; Revised January 6, 2016; Accepted January 20, 2016
Big Data-Survey
PSG Aruna Sri*
1
, Anusha M
2
1
Department of Electronics and Computer Engineering, K L University,
2
Research Scholar, Dept of CSE, K L UNIVERSITY
Greenfields, Vaddeswaram, Guntur District, Andhra Pradesh 522502
*Corresponding author, e-mail: arunasri_2012@kluniversity.in
1
, anushaaa9@kluniversity.in
2
Abstract
Big data is the term for any gathering of information sets, so expensive and complex, that it gets
to be hard to process for utilizing customary information handling applications. The difficulties incorporate
investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection
infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data
sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most
social database administration frameworks and desktop measurements and perception bundles, needing
rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In
this paper there was an observation on Hadoop architecture, different tools used for big data and its
security issues.
Keywords: Big data, Hadoop, Software tools, Map Reduce
1. Introduction
We make 2.5 quintillion bytes of information [1] - so much that 90% of the data on the
planet today has been made in the last two years alone. This data originates from all around:
sensors used to accumulate atmosphere data, posts to social media sites, digital pictures and
videos, and cell phone GPS signal. This enormous amount of the data is known as "Big data".
Big data is a catchphrase, or motto, utilizes to describe a massive volume of both structured and
unstructured data that is so huge that it's complicated to process using conventional database
and software procedures. In most project circumstances the information is too big or it shifts too
quickly or it surpasses existing processing ability.
Big data has the potential to help organizations to improve operations and make faster
by taking more intelligent decisions. Now-a-days, Big Data is the term which finds to be normal
in IT businesses. As there was enormous information in the industry although there is nothing
before big data which comes into imagine. Big data is really an advancing term that illustrates
any huge amount of organized, semi organized information that can possibly be extracting for
data. Although big data doesn't refer to any particular quantity, so this term is often utilized
when talking about petabytes and exabytes of data Big data is a comprehensive term for
expansive accumulation of the data sets so this large and complex that it gets to be
troublesome to work with conventional data processing applications. At the point when
managing bigger datasets, organizations face challenges in creating and managing big data. No
standard tools and procedures for searching and analyzing large data sets in business analytics
A case of Big data may be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data
comprising of billions to trillions of records of a huge number of individuals all from distinctive
sources for example agreements, web, moveable data. The data is commonly approximately
organized data that is frequently fragmented also, inaccessible. The problems faced by big data
are analyzing, capturing, searching, sharing, storing, transferring, visualization and privacy
abuse. Larger data sets is needed in order to prevent diseases, combat crime, spot business
trends and so on. Because of large information sets in these area researchers identify
limitations frequently like meteorology, genomics, connectomics, complex physics simulations,
biological, ecological research, and funding and trade information. Data sets increased their size
due to collecting data from sensing portable devices, aerial sensory technology, software logs,
cameras, microphones, radio-frequency identification readers, and wireless sensor networks.
In March 2012 [2], the Obama administration announced the big data research and
development initiative. The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP
2. ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
75
and HP, have spent more than $15 billion on buying data management and analytics software.
Big data defined as far back as 2001, industry expert Doug Laney (right now with Gartner)
expressed the standard meaning of big data as the 5 Vs of big data: volume, velocity, variety,
veracity and value, as shown Figure 1.
Figure 1. Five (5) V’s factor of big data
Volume: At present big scale of systems are flooded with constantly increasing information,
simply growing terabytes or even petabytes of data.
Velocity: Information is flowing at unique speed and should be deal with a sensible way. In real
time for many organizations it is difficult to deal with RFID tag, sensors and smart metering data.
Variety: Organized and unorganized information are producing a variety of data types by
making it feasible to search novel approaches, while analyzing these information collectively,
prediction might be attained as data flow into the organization.
Veracity: Identifying and verifying inconsistent information is significant, to accomplish faithful
study. Creating faith in big data is a big challenge to manage even more variety of data is
available.
Value: It is to be derived from big data. There is no reason for building the capacity to store and
manage, if unable to get the value from data.
One of the capable remarks on the advances of the arrangement with the Big Data is Hadoop.
2. Hadoop
Hadoop was made by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was working
at Yahoo! at the time, named it after his kid's toy elephant.
Hadoop system is shown Figure 2 and consists [3]:
Reliable: this software can handle both hardware and software failures.
Scalable: Designed for gigantic size of processors, memory, and local appended capacity
Distributed: Map reduce recommends parallel programming model and provides concept of
replication
3. IJEEI ISSN: 2089-3272
Big Data-Survey (PSG Aruna Sri)
76
Figure 2. Hadoop system [1]
Hadoop is open-source programming that allows loyal, flexible, expressed figuring on
groups of reserved servers. That utilization the Map-Reduce framework presented by Google by
utilizing the idea of map and reduce functions that remarkable utilized in Functional
Programming. In spite of the fact that the Hadoop structure is composed in Java, it permits
designers to send custom-composed projects coded in Java or some other dialect to process
information in a parallel manner over hundreds or a great many thing servers. It is streamlined
for touching read requests (streaming reads), where transforming incorporates of checking all
the information. Contingent upon the intricacy of the procedure and the volume of information,
reaction time can change from minutes to hours. While Hadoop can forms information quick, so
its key focal point is its monstrous adaptability. Hadoop is right now being utilized for file web
seeks, email spam location, proposal motors, expectation in budgetary administrations, genome
control in life sciences, furthermore, for investigation of unstructured information, for example,
log, content, and click stream. While a considerable lot of these applications could indeed be
actualized in a social database (RDBMS) figure 2, the primary center of the Hadoop system is
practically not the same as a RDBMS. The accompanying examines some of these distinctions
Hadoop is especially helpful when:
For handling of Complex data.
Need of conversion of unstructured information to structured information
Usage of SQL queries
Recursive calculations are very large
Complex geo-spatial investigation or genome sequencing
Machine learning
Data sets are so substantial it couldn't be possible fit into database RAM, plates, or require
an excess of centers (10's of TB up to PB)
Data worth does not defend cost of steady ongoing accessibility, for example, files or
exceptional interest information, which can be moved to Hadoop and stay accessible at
lower expense
Results are not required continuously
Fault resistance is basic
Significant custom coding would be obliged to handle employment booking
Hadoop was motivated by Google's Map Reduce, a product structure in which an
application is separated into various little parts. Any of these parts (additionally called sections
or squares) can be run on any hub in the group. Doug Cutting, Hadoop's maker, named the
system after his youngster's full toy elephant. The current Apache Hadoop biological system
comprises of the Hadoop bit, Map Reduce, the Hadoop circulated record framework (HDFS)
what's more, various related activities, for example, Apache Hive, HBase and Zookeeper. The
Hadoop structure is utilized by significant players including Google, Yahoo and IBM, generally
for applications including web search tools and promoting. The favored working frameworks are
Windows and Linux be that as it may Hadoop can likewise work with BSD and OS X.
4. ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
77
A distributed file system framework is a customer/server-based application that permits
customers to get to and process information put away on the server as though it were all alone
PC. At the point when a client gets to a document on the server, the server sends the client a
duplicate of the document, which is stored on the client's PC while the information is being
prepared and is then come back to the server. Preferably, a convey record framework arranges
document and registry administrations of individual servers into a worldwide catalog in such a
way that remote information access is not area particular however is indistinguishable from any
customer. All records are available to all clients of the worldwide record framework and
association is progressive what more, registry based is. Since more than one customer may get
to the same information all the while, the server must have a system in spot, (for example,
keeping up data about the times of access) to arrange overhauls so that the customer
dependably gets the most current adaptation of information and that information clashes don't
emerge. Dispersed record frameworks ordinarily utilize record or database replication
(conveying duplicates of information on different servers) to ensure against information access
failures [4]. Sun Microsystems' Network File System (NFS), Novell NetWare, Microsoft's
Distributed File Framework, and IBM/Transarc's DFS are a few samples of distributed file
systems framework.
3. HDFS
Hadoop frame work consists the Hadoop Distributed File System (HDFS) [4]. HDFS is
planned and improved to store information more than a lot of ease equipment in an
appropriated manner, as shown in Figure 3.
Figure 3. HDFS architecture
The structure of HDFS is master/slave. HDFS cluster consists of one Name Node, a
master server that manages the classification system namespace and regulates access to files
by clients. In addition, there square measure variety of information nodes, sometimes one per
node within the cluster, that manage storage hooked up to the nodes that they run on. HDFS
exposes a classification system namespace and permits user knowledge to be hold on in files.
Internally, a file is split into one or a lot of blocks and these blocks square measure hold on
during a set of information Nodes. The Name Node executes classification system namespace
operations like opening, closing, and renaming files and directories. It conjointly determines the
mapping of blocks to knowledge Nodes. The info Nodes square measure to responsibility for
serving scans and write requests from the file system’s clients. The info Nodes conjointly
performs block formation, deletion, and replication upon instruction from the Name Node.
The Name Node and knowledge Node square measure items of package designed to
run on goods machines. These machines generally run a GNU/Linux software system (OS).
HDFS is constructed exploitation the Java language; any machine that supports Java will run
the Name Node or the Data Node package. Usage of the extremely moveable Java language
5. IJEEI ISSN: 2089-3272
Big Data-Survey (PSG Aruna Sri)
78
implies that HDFS will be deployed on a large variety of machines. A typical readying includes a
dedicated machine that runs solely the Name Node package. Every of the opposite machines
within the cluster runs one instance of the Data Node package. The design doesn't preclude
running multiple knowledge Nodes on an equivalent machine however during a real readying
that's seldom the case. The existence of Name Node during a cluster greatly simplifies the
design of the system. The Name Node is that the intermediary and repository for all HDFS data.
The system is intended to flow the user knowledge through the Name Node.
The base Apache Hadoop structure is made out of the taking after modules: Hadoop
Common-contains libraries and utilities required by other Hadoop modules. Hadoop Distributed
File System (HDFS) - a appropriated document framework that stores information on product
machines, giving high total data transfer capacity over the group. Hadoop Map Reduce – a
programming model for huge scale information preparing. All the modules in Hadoop are
planned with a basic presumption that equipment disappointments (of individual machines, or
racks of machines) are normal also, consequently ought to be naturally taken care of in
programming by the structure. Apache Hadoop’s Map Reduce and HDFS parts initially got
individually from Google's Map Reduce and Google File System (GFS) papers."Hadoop"
frequently alludes not to simply the base Hadoop bundle yet rather to the Hadoop Ecosystem
fig.4 which incorporates the greater part of the extra programming bundles that can be
introduced on top of or nearby Hadoop, for example, Apache Hive, Apache Pig and A HBase.
4. Map Reduce Framework
Map Reduce as shown in Figure 4 is a product system for appropriated transforming of
Big data sets on PC groups [5]. It is first grown by Google. Map Reduce is planned to
encourage also, improve the preparing of incomprehensible measures of information in parallel
on extensive bunches of merchandise equipment in a solid, issue tolerant way. Map Reduce is
the key calculation that the Hadoop Map Reduce motor uses to circulate work around a bunch.
Commonplace Hadoop bunch coordinates Map Reduce and HFDS layer. In Map Reduce layer
job tracker assigns tasks to the task tracker. Master node job tracker also allots tasks to the
slave node task tracker figure.
Figure 4. Map reduce is based on the Master-Slave architecture
Master node contains -
Job tracker node (Map Reduce layer)
Task tracker node (Map Reduce layer)
Name node (HFDS layer)
Data node (HFDS layer)
Multiple slave nodes contain -
Task tracker node (Map Reduce layer)
Data node (HFDS layer)
Map Reduce layer has job and task tracker nodes
HFDS layer has name and data nodes
A. Map Reduce core functionality (I):
Map & Reduce stage plays an significant role in map reduce core functionality.
6. ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
79
Map stage: In Map step, master node takes consideration of large problem input and divided
into smaller problems and allotted to worker nodes. These nodes process the smaller problems
and return to the master node.
• Map (key1, value) list<key2, value2>
Reduce stage: In this Reduce stage, Master node takes the response from the sub problems
and joins them in a predefined manner to get the output to original problem, as shown in
Figure 5.
• Reduce (key2, list < value2 >) list
Figure 5. Reduce stage
5. PIG
Pigs [6] was at first shaped at Yahoo! to permit individuals utilizing Hadoop to
concentrate all the more on examining substantial information sets and invest less moment of
time is needed to compose mapper and reducer programs. Pig is comprised of two segments:
the first is the is called Pig Latin and the second is a runtime situation where Pig Latin programs
are executed.
6. HIVE
Apache Hive [7] was initially developed by face book. It has data warehouse structure
built on top of hadoop for analysis and inquiry of data. By default, Hive stores metadata in an
installed Apache Derby Database and other customer/server databases like MySQL can
alternatively be used. Right now, there are four document configurations upheld in Hive, which
are TEXTFILE SEQUENCEFILE, ORC and RCFILE.
7. HBase
HBase [8] is a section arranged database administration framework that keeps running
on top of HDFS. HBase applications are composed in Java much like an average Map Reduce
application. A HBase framework consists an arrangement of tables. Table Consists of rows and
columns like a conventional database.
8. Issues
Although a considerable measure of research is going on big data yet at the same time.
Several ideas are still to be investigated. Scientists would attempt to upgrade security stage to
enhance capacity of programming to discover propelled dangers, respond in like manner and
would create preventive measures for future. Specialists would attempt to enhance quality and
dependability of security framework. A few analysts are wanting to taken up information
accumulation, pretreatment, incorporation, Map Reduce and investigation utilizing machine
learning strategies. They would utilize the outcomes for securing and actualizing preventive
measures from dangers to big business information. Specialists would attempt to outline the
meet the creation needs of endeavors for growing high caliber item by applying efforts to
7. IJEEI ISSN: 2089-3272
Big Data-Survey (PSG Aruna Sri)
80
establish safety with the assistance of big data Analytics with Hadoop. A few specialists are
utilizing systems administration checking tools like Packet pig, Mahout and so on to Improve the
security levels. Targeted threats will be analyzed by using hadoop cluster.
9. Conclusion
Big data is going to keep developing amid the following years, and every information
researcher will need to oversee considerably more measure of information will be more
different, bigger, and speedier. We talked about a few experiences about the theme, and what
we consider are the fundamental concerns and primary issues for what’s to come. Big data is
turning into new final boundary for experimental information research and for business
applications. Everyone is warmly welcomed to take part in this fearless trip.
References
[1] Ms Vibhavari Chavan et al. “Survey Paper on Big Data”. International Journal of Computer Science
and Information Technologies. 2014; 5 (6), ISSN: 0975-9646.
[2] Michael R Lyu. “Service-generated Big Data and Big Data-as-a-Service”. Internetware. 2013.
[3] Suman Arora et al. “Survey Paper on Scheduling in Hadoop”. International Journal of Advanced
Research in Computer Science and Software Engineering. 2014; 4.
[4] Dhruba Borthakur. “The Hadoop Distributed File System: Architecture and Design”. The Apache
Software Foundation. 2007.
[5] Yaxiong Zhao et al. “Dache: A Data AwareCaching for Big-Data Applications Using the MapReduce
Framework”. Tsinghua science and technology. 2014; 19(1), ISSN: l1007-0214l.
[6] Apache Pig. Available at http://pig.apache.org
[7] Apache Hive. Available at http://hive.apache.org
[8] Apache HBase. Available at http://hbase.apache.org