Submit Search
Upload
Big Data with Hadoop – For Data Management, Processing and Storing
•
1 like
•
46 views
IRJET Journal
Follow
https://www.irjet.net/archives/V4/i8/IRJET-V4I8198.pdf
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 6
Download now
Download to read offline
Recommended
IJARCCE_49
IJARCCE_49
Mr.Sameer Kumar Das
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Ashok Royal
Big Data simplified
Big Data simplified
Praveen Hanchinal
Big data ppt
Big data ppt
Shweta Sahu
Big Data Concepts
Big Data Concepts
Ahmed Salman
Hadoop: An Industry Perspective
Hadoop: An Industry Perspective
Cloudera, Inc.
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
Recommended
IJARCCE_49
IJARCCE_49
Mr.Sameer Kumar Das
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Ashok Royal
Big Data simplified
Big Data simplified
Praveen Hanchinal
Big data ppt
Big data ppt
Shweta Sahu
Big Data Concepts
Big Data Concepts
Ahmed Salman
Hadoop: An Industry Perspective
Hadoop: An Industry Perspective
Cloudera, Inc.
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
Hadoop
Hadoop
Anil Reddy
Big data
Big data
rajsandhu1989
Big Data - An Overview
Big Data - An Overview
Arvind Kalyan
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET Journal
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
Laxmi8
Big Data: An Overview
Big Data: An Overview
C. Scyphers
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET Journal
Introduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
Hadoop
Hadoop
Veera Sundari
Big data abstract
Big data abstract
nandhiniarumugam619
Big Data
Big Data
Kirubaburi R
Bigdata
Bigdata
NithiDazz
BIG DATA
BIG DATA
Shashank Shetty
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Trendwise Analytics
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
Whatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Edureka!
Big data introduction, Hadoop in details
Big data introduction, Hadoop in details
Mahmoud Yassin
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
IJERA Editor
Big data
Big data
revathireddyb
More Related Content
What's hot
Hadoop
Hadoop
Anil Reddy
Big data
Big data
rajsandhu1989
Big Data - An Overview
Big Data - An Overview
Arvind Kalyan
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET Journal
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
Laxmi8
Big Data: An Overview
Big Data: An Overview
C. Scyphers
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET Journal
Introduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
Hadoop
Hadoop
Veera Sundari
Big data abstract
Big data abstract
nandhiniarumugam619
Big Data
Big Data
Kirubaburi R
Bigdata
Bigdata
NithiDazz
BIG DATA
BIG DATA
Shashank Shetty
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Trendwise Analytics
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
Whatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Edureka!
Big data introduction, Hadoop in details
Big data introduction, Hadoop in details
Mahmoud Yassin
What's hot
(20)
Hadoop
Hadoop
Big data
Big data
Big Data - An Overview
Big Data - An Overview
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Big data, map reduce and beyond
Big data, map reduce and beyond
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
Big Data: An Overview
Big Data: An Overview
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
Introduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Hadoop
Hadoop
Big data abstract
Big data abstract
Big Data
Big Data
Bigdata
Bigdata
BIG DATA
BIG DATA
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
Whatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Big data introduction, Hadoop in details
Big data introduction, Hadoop in details
Similar to Big Data with Hadoop – For Data Management, Processing and Storing
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
IJERA Editor
Big data
Big data
revathireddyb
Big data
Big data
revathireddyb
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET Journal
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
IRJET Journal
Big data Hadoop presentation
Big data Hadoop presentation
Shivanee garg
Big data with hadoop
Big data with hadoop
Anusha sweety
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
Big data analysis concepts and references
Big data analysis concepts and references
Information Security Awareness Group
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET Journal
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
IRJET Journal
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
ijcisjournal
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
G017143640
G017143640
IOSR Journals
IJSRED-V2I3P43
IJSRED-V2I3P43
IJSRED
hadoop seminar training report
hadoop seminar training report
Sarvesh Meena
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
IRJET Journal
Similar to Big Data with Hadoop – For Data Management, Processing and Storing
(20)
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
Big data
Big data
Big data
Big data
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
Big data Hadoop presentation
Big data Hadoop presentation
Big data with hadoop
Big data with hadoop
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Big data analysis concepts and references
Big data analysis concepts and references
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
G017143640
G017143640
IJSRED-V2I3P43
IJSRED-V2I3P43
hadoop seminar training report
hadoop seminar training report
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
More from IRJET Journal
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
React based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
More from IRJET Journal
(20)
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
React based fullstack edtech web application
React based fullstack edtech web application
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Recently uploaded
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Asst.prof M.Gokilavani
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
britheesh05
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
SAURABHKUMAR892774
Internship report on mechanical engineering
Internship report on mechanical engineering
malavadedarshan25
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
João Esperancinha
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Mark Billinghurst
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
NikhilNagaraju
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
Asst.prof M.Gokilavani
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
null - The Open Security Community
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
959SahilShah
Effects of rheological properties on mixing
Effects of rheological properties on mixing
viprabot1
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
k795866
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
GDSCAESB
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
Tagore Institute of Engineering And Technology
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Asst.prof M.Gokilavani
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
wendy cai
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
eptoze12
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
koyaldeepu123
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
asadnawaz62
Past, Present and Future of Generative AI
Past, Present and Future of Generative AI
abhishek36461
Recently uploaded
(20)
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
Internship report on mechanical engineering
Internship report on mechanical engineering
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
Effects of rheological properties on mixing
Effects of rheological properties on mixing
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Past, Present and Future of Generative AI
Past, Present and Future of Generative AI
Big Data with Hadoop – For Data Management, Processing and Storing
1.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1111 BIG DATA WITH HADOOP – FOR DATA MANAGEMENT, PROCESSING AND STORING Revathi.V1, Rakshitha.K.R2, Sruthi.K3, Guruprasaath.S4 1Assistant Professor, Department of BCA & M.Sc SS, Sri Krishna Arts and Science College, Coimbatore, India 2,3,4 III BCA ‘A’, Department of BCA & M.Sc SS, Sri Krishna Arts and Science College, Coimbatore, India ----------------------------------------------------------------------------***----------------------------------------------------------------------------- Abstract - Big data is a term that describes a large amount of data that are generated from every digital and social media exchange. Its size and rate of growth make it more complex to maintain for this purpose Hadoop technology can be used. Hadoop is an open source software project that enables the distributed processing of big data sets across clusters of commodity servers. It is planned to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Along with Hadoop, some techniques are also used to handle the massive amount of data. Some commonly used techniques are Map Reduce, HDFS, Apache Hive etc. Key Words: Big data, Hadoop, HDFS, Map Reduce, Hadoop Components 1. INTRODUCTION Companies across the globe have been using data for a long time to help them make better decisions in order to enhance their performances. It is the inaugural decade of the 21st century that actually showcased a rapid shift in the availability of data and its applicability for improving the overall effectiveness of the organization. This change that was to revolutionize the role of data brought into advent the concept that became popular as Big Data. Big data is a large amount of data that floods from the daily uses of human life like telephone, mobiles, traveling, shopping, and computer, organization which evokes complication for storing, processing or accessing data using a traditional database. This is in all likelihood one of the important reasonswhythe concept of Big data was first embraced by online firms like Google, eBay, Facebook,Linkedin,etc.,astheseorganizations were constructed with the foundation concept of using rapidly changing data. They did not plausibly face the challenge of integrating the raw and unstructured data with the already available ones. Since HADOOP has emerged as a popular tool for BIG DATA to analyze these data using map- reduce to get the desired outputwiththehelpof programs.It is a freely available platform to perform these type of performance. The design purpose of it is to perform millions of data in a single executive server. [1] 2. CHARACTERISTICS OF DATA Fig -1: Four V’s of Big Data Volume Volume refers to the amount of data. The size of the data is represented as a volume. The size of the information is represented in terabytes and petabytes. Variety Variety forms the data too big. The files arrive in various formats and of any type, it may be structured or unstructured, such as text, audio, videos, log files and more. Velocity Velocity refers to the speed of information processing. The data adds up at high speed. Sometimes 1 minute is also late, so big data are time sensitive. Value The possible value of Big data is huge. Value is the main root for big data because it is important for businesses, IT infrastructure system to store a lot of values in the database. Veracity Veracity refers to noise, biases, and abnormality when we are handling with high volume, velocity, and a variety of data. All data not going to be 100% correct, there will be dirty data. [2]
2.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1112 3. TECHNOLOGY USED IN BIG DATA For the determination of processing a large amount of data, the big data require exceptional technologies. The various techniques and technologies have been introduced for manipulating, examining, and visualizing the big data. On that point are many solutions to handle the Big Data, but the Hadoop is one of the most widely used technologies. 3.1 HADOOP Fig -2: Hadoop architecture Hadoop is an open source project which is developed for maintaining big data. This is implemented by Apache Software Foundation. It consists of many small sub projects which belong to the categoryofinfrastructurefordistributed computing. Hadoop mainly consists of: File System (The Hadoop File System [HDFS]) - Storage part Programming Paradigm (Map Reduce) - Processing part The other sub projects allow correlative services or they are built along the core to add higher-level abstractions. There exist several problems in dealing with storage of vast amounts of data. Though the memorycapacities ofthedrives have expanded massively, therateofreadingdata fromthem hasn't shown that considerable progress. The reading process takes a vast amount of time and the process of writing is also slower. The time can be decreased by taking from multiple disks at once. Just using one hundredth of a disk may seem wasteful. Only if there are one hundred datasets, each of which is one terabyte and providingshared access to them is also a solution. There also exist many problems with using several pieces of hardware as it increases the chances of failure. Replication method that is creating redundant copies of similar data on the different devices so that if any failure occurs the copy of the data will be available. The primary problem is combining the data being read from different machines. There are so many methods are able in distributed computing to handle this problem, but even so, it is quite challenging. All the complication discussed is well managed by Hadoop.Hadoop Distributed File System identifies the failure problem and the Map reduce programming paradigm identifies the problem of combining data. Map Reduce diminishes the complication of disk records and writes by giving a programming model dealing in computation with keys and values. [3] 4. HDFS (Hadoop Distributed File System) Fig -3: HDFS system HDFS is a file system which is constructed of storing a very large number of files by using streaming data access rule by running clusters on commodity hardware. HDFS manages storage on the cluster by breaking incoming files into pieces called blocks and string each block redundantly across the pocket billiards on the server. HDFS stores, three copies of each file by copying each piece to three different hosts. The size of each block 64MB. HDFS is divided into three modes which are Name node, Data Node, HDFS Clients/Edge Node. Name node It is a centrally placed node, which holds information about the Hadoop file system. The principal task of name node is that it records all the metadata & attributes and specific locations of files & data blocks in the data nodes. Name node acts as the master node as it stores all the information around the system and provides information which isnewly added, modified and removed from data nodes. Data node It functions as a slave node. Hadoop environment may comprise more than one data node based on capacity and performance. A data node performs twomaintasksstoringa block in HDFS and acts as the political program for running jobs. HDFS Clients/Edge node HDFS Clients sometimes also know as Edgenode.It works as a linker between name node and data nodes. Hadoopcluster at that point is only one client, but there are also many depending upon performance needs. [4]
3.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1113 5. MAP REDUCE Fig -4: Map Reduce Hadoop Map Reduce is an implementation of the algorithm developed and managed by the Apache Hadoop project.Map Reduce (map + reduce) Is a theoretical explanation for writing applications that treat large amounts of structured and unstructured data stored in the Hadoop Distributed File System (HDFS). Map reduce is an instrument which is very crucial to the analysis of the data which are structured or unstructured. In which the data are splits in the minuscule parts of the cluster to take the original form of the data. Map reduce is 100 times quicker than the Spark and 1000 times quicker than the Drill, now Drill is also working platform. For EX: CPU is the mental capacity of computer likethatMap reduction is the heart of the Hadoop. [5] 6. BIG DATA MAPPING USING MAP REDUCE TECHNIQUE Get the big data ready When a request is made by the client to run Map Reduce program, the initial step is to search and understand about the input file. The file pattern is a random type so it is necessary to change data that can be processed by the data. This process of input process and Record reader. Input Format chooses how the file is going to be smashed into smaller pieces for processing using a function called Input Split. It then assigns a Record Reader to transform the raw data for processing by the map. Dissimilar types of Record Readers are supplied with Hadoop, offeringa widevarietyof conversion options. Let the Big Data Map Begin Client's information is now in a mode tolerable to map. For all input pairs, a specific instance of a map is called to swear out the data for each input pair. Map and reduce the need to work together to maintain the data, the program should collect output information from the separate mappers and pass it to the reducers. This action can be performed by an Output Collector. A Reporter function also present information collected from map tasks. This entire project is being done on multiple nodes in the Hadoop cluster simultaneously. Later all the map tasks are complete, the common results are collected in the partition and a shuffling occurs, sorting the output for excellent processing by reducing. Reduce and combine for big data For each output pair, Reduce is called to do its task. In a similar pattern to map, reduce collect its output while all the mappings are processed. Reduce can't begin as far as all the mapping is made out. The output of reducing is also a key and a value. Hadoop provides an Output Format feature,and it performs very much like Input Format. Output Format takes the key-value pair and formulates the output for writing to HDFS. The last task is performed to write the data to HDFS. Record Writer is used for writing and also to perform similarly as Record Reader exceptinreverse.Ituses up the Output Format data and writes it to HDFS. [6] 7. HADOOP ECOSYSTEM Fig -5: Components of Hadoop 7.1 DATA MANAGEMENT Ambari It allows installation of services, so that Pig, Hive, scoop, HBase can be picked and put in. It will work across all the nodes in the cluster and also can manage the services from one centralized location like starting up, stopping, reconfiguring. Zookeeper Zookeeper is a distributed coordination and setting up service for Hadoop cluster It is a centralized service that
4.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1114 provides distributed synchronization and providing group services and maintains the configuration information. With Hadoop, this will be useful to track if a particular client is down and plan necessary communication protocol around node failure. Oozie It is a workflow library that permits us to play and to connect lots of those essential projects for instances, Pig, Hive, and Sqoop. 7.2 DATA INTEGRATION Sqoop Sqoop is a command-line interface platform which can be employed to transfer the data from relational database environments like Oracle, MySQL, and PostgreSQL into Hadoop environment. Flume and Chukwa Various applications, operating systems, web-services generate plenty of log data, for processing these logs a framework can be developed by using Flume and Chukwa tools. It is a way to push real time data, information right into Hadoop and also to execute real time analysis. 7.3 DATA ACCESS Pig Pig is a platform for processing and maintaining data with easy techniques. By just writing half dozen lines of code pig can process terabytes of data. Hive Hive is Data warehousing application that provides the SQL interface to process and analyze the big data storedinHDFS. Hive infrastructure is made along the top of Hadoop that help in providing summarization, query, and analysis. 7.4 DATA STORAGE HBase HBase is a distributed column oriented database where as HDFS is a file system. Simply it is built on top of HDFS system. It is completely written in Java, which is a non- relational, distributed database system. It can suffice as the input and output for the MapReduce. For example, read and write operations affect all rows but just a small subset of all columns. Cassandra Apache Cassandra is a no SQLdatabaseformanaginga heavy quantity of data across many commodity servers with high availability and no single instance of failure. 7.5 INTERACTION, VISUALIZATION, EXECUTION, DEVELOPMENT Hcatalog It is known as metadata table and storage management scheme. It is a way for tools like a Pig, Hive for interoperable also to hold a consistent view of data across those tools. Lucene Lucent is there for full text searching an API loaded with algorithms to answer thingslikestandardfull-textsearching, wild card searching, phrase searching, range searching kind of stuff. Hama Hama is there for BSP (Book Synchronous Processing). To turn with a large amount of scientific data home is used. Crunch Crunch is there for writing and testing and running map reduces pipeline. It basically gives full control to overall four phrases, which is going like 1. Map, 2. Reduce, 3. Shuffle, and 4. Combine. It is there to assist in joining and aggregation that is very hard to do with low-level map reduce, so Crunch is there to make you little easier inside map reduce pipeline. 7.6 DATA SERIALIZATION Avro Avro is a data serialization format which brings data interoperability among multiple components of Apache Hadoop. Most of the components in Hadoop started supporting the Avro data format. It starts with the basic premise of data produced by component should be readily consumed by another component Avro has following features Rich data types, Fast and compact serialization, Support many programming languages like Java, Python. Thrift It is more specific for creating flexible schemes that work with Hadoop data. It is specifically because it is intended for cross-languagecompatibility,sowebuildanapplicationwith Hadoop data in Java and if we want to use the same object in an application that you built on Ruby, Python or C++.
5.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1115 7.7 DATA INTELLIGENCE Drill The drill is actually an incubator project and is designed to perform interactive analysis of nested data. Mahout Mahout is a library for gaining knowledge about machines and data mining. It is divided into four different groups: collective filtering, categorization, clustering, and mining of parallel frequent patterns. TheMahoutlibrary belongsto the subset that can be performed in a distributed mode and can be executed by MapReduce. [7] 8. CHALLENGES Heterogeneity and Incompleteness If we need to analyze the data, it should be structured, but when we deal with the Big Data, data may be structured or unstructured as well. Heterogeneity is the big challenge in data Analysis and analysts need to cope with it. Consider an example of a patient in the Hospital. We will arrive at each disc for each medical test. And we will likewise create a record for a hospital stay. This will be different for all patients. This figure is not well structured. Hence it goes with the team you manage heterogeneous and incompleteis required. A secure data analysis should be applied to this. Scale As the name says Big Data is having a large size of data sets. Managing with large data sets has been a large problem for teens. Earlier, this difficulty was solved by the processors getting faster, but now data volumes are becoming hugeand processors are static. The World is being given towards the Cloud technology, due to this shifted data is generated at a very high rate. This high rate of increasing data is becoming a challenging problem with the data analysts. Hard disksare used to store the Data. They are slowerI/Operformance.But now Hard Disks are replaced by the solid state drives and other applied sciences. These are not at a slower rate like Hard disks, so the new storage scheme should be designed. Timeliness Another challenge with the size is speed. If the data sets are heavy in size, longer the time it will take to analyze it. Any system which deals effectively with the size is likely to execute well in term of speed. There are caseswhenwe need the analysis results directly. For example, If there is any fraud transaction, It should be dissected before the transaction is completed. And then some newsystemshould be designed to meet this challenge in data analysis. Privacy Privacy of data is another big problem with large data. In some nations, there are strict laws regarding the data privacy, for example, in the USA there are strict laws for health records, but for others, it is less forceful. Forexample, in social media, we cannot fix the private posts of users for sentiment analysis. Human Collaborations In malice of the advanced computational models, there are many patterns that a computer cannot detect. A novel method of harnessing human ingenuity to solvea problemis crowdsourcing. The best example is Wikipedia. We are relying on the information given by the strangers, however, most of the time they are correct. But there may be other people who provide false information.Forhandlingthiskind of complexity we require a technology model to cope with this. As mankind, we can look the review of the book and find that some are positive and some are negative and come up with a decision whether to buy or not. We need systems to be that intelligent to determine. [8] 9. OPPORTUNITIES Technology Nearly every top organization like Facebook,IBM,Yahoo has adopted Big Data and are investing in big data. Facebook handles 50 Billion photos of users. Every month Google handles 100 billion searches. From these stats, we can state that there are a lot of opportunities oninternet,social media. Government Big data can be applied to handle the problems faced by the government. Obama government announced a big data research and development initiative in 2012. Big data analysis played an important role of BJP winning the elections in 2014 and Indian government is putting on big data analysis in Indian elections. Healthcare Many health care organizations are adopting big data techniques for deriving each and every information about patients. To ameliorate the health care and low down the cost big data analysis is required and certain technology should be adapted. Science and Research Big data is the latest topic of inquiry. Many researchers are solving with big data. There are so many papers being printed on big data. NASA center for climate simulation stores 32 petabytes of observations.
6.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1116 Media Media is using big data for the promotions and selling of products by targeting the interest of the user on the net. For instance, social media posts, data analysts get the number of posts and then analyze the interest of the user. It can as well be done by getting the positive or negative reviews on the social media. [9] 10. CONCLUSION Big Data (Hadoop) is in huge demand in the marketplace nowadays. As there a huge amount of data is lying in the industry, but there is no tool to wield it and Hadoop can implement on low-cost hardware and can be used by a large set of the audience on a large number of datasets. In Hadoop map reduce is the most important component in Hadoop. Many techniques are used to make the efficient plan for the map reduce so that it can speed up the system or data retrieval Technique like Quincy, Asynchronous Processing, Speculative Execution, Job Awareness, Delay Scheduling, Copy Compute Splitting had made the schedule effective for faster processing. [10] REFERENCES [1] Bijesh Dhyani, Anurag Barthwal, “Big Data Analytics using Hadoop”, International Journal of Computer Applications (0975 – 8887) Volume 108 – No 12, December 2014 [2] Prof. Devendra P. Gadekar,HarshawardhanS.Bhosale,“A Review Paper on Big Data and Hadoop”, International Journal of Scientific and Research Publications [IJSRP], Volume 4, Issue 10, October 2014 1 ISSN 2250-3153 [3] Ms. Gurpreet Kaur, Ms. Manpreet Kaur, “REVIEW PAPER ON BIG DATA USING HADOOP”, International Journal of Computer Engineering & Technology (IJCET), Volume 6, Issue 12, Dec 2015, pp. 65-71, Article ID: IJCET_06_12_008 ISSN Print: 0976-6367 and ISSN Online: 0976–6375 [4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chandler, "The Hadoop Distributed File System", October 2010, 978-1-4244-7153, IEEE [5] Dr. Sanjay Srivastava, Swami Singh, Viplav Mandal, “The Big Data analytics with Hadoop”, International Journal of Research in Applied Science & Engineering Technology (IJRASET), Volume 4 Issue III, March 2016, ISSN:2321-9653 [6] Varsha B. Bobade, “Survey Paper on Big Data and Hadoop”, International Research Journal of Engineeringand Technology (IRJET) e-ISSN: 2395-0056, Volume: 03, Issue: 01, Jan-2016 [7] Poonam S. Patil, Rajesh. N. Phursule, “Survey Paper on Big Data Processing andHadoopComponents”, International Journal of Science and Research (IJSR), Volume 3 Issue 10, October 2014, ISSN (Online): 2319-7064 [8] Rahul Beakta, “Big Data And Hadoop: A Review Paper”, RIEECE -2015, Volume 2, Spl. Issue 2 (2015), e-ISSN: 1694- 2329, p-ISSN: 1694-2345 [9] Ivanilton Polatoa, ReginaldoReb,AlfredoGoldman,Fabio Kona, "A Comprehensive View of Hadoop Research - A Systematic Literature Review", Journal of Network and Computer Applications,Volume46,PP1-25,November2014 [10] Dr. Madhu Goel, Suman Arora, “Survey Paper on Scheduling in Hadoop”, International Journal of Advanced Research in Computer Science and Software Engineering [IJARCSSE], Volume 4, Issue 5, May 2014, ISSN: 2277 128X
Download now