Submit Search
Upload
Survey Paper on Big Data and Hadoop
•
0 likes
•
49 views
IRJET Journal
Follow
https://irjet.net/archives/V3/i1/IRJET-V3I1152.pdf
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 3
Download now
Download to read offline
Recommended
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
David Tjahjono,MD,MBA(UK)
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
Hadoop paper
Hadoop paper
ATWIINE Simon Alex
hadoop
hadoop
swatic018
Hadoop
Hadoop
Himanshu Soni
Managing Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
Seminar Report Vaibhav
Seminar Report Vaibhav
Vaibhav Dhattarwal
Hadoop basics
Hadoop basics
Laxmi Rauth
Recommended
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
David Tjahjono,MD,MBA(UK)
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
Hadoop paper
Hadoop paper
ATWIINE Simon Alex
hadoop
hadoop
swatic018
Hadoop
Hadoop
Himanshu Soni
Managing Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
Seminar Report Vaibhav
Seminar Report Vaibhav
Vaibhav Dhattarwal
Hadoop basics
Hadoop basics
Laxmi Rauth
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
Hadoop
Hadoop
RittikaBaksi
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
inventionjournals
assignment3
assignment3
Kirti J
Understanding hdfs
Understanding hdfs
Thirunavukkarasu Ps
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
rahulmonikasharma
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET Journal
Intro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
Jonathan Bloom
IJET-V3I2P14
IJET-V3I2P14
IJET - International Journal of Engineering and Techniques
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
hive lab
hive lab
marwa baich
Big Data Concepts
Big Data Concepts
Ahmed Salman
Hadoop
Hadoop
Ankit Prasad
An experimental evaluation of performance
An experimental evaluation of performance
ijcsa
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
MLconf
paper
paper
Ankeeta Battalwar
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
IRJET Journal
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET Journal
G017143640
G017143640
IOSR Journals
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET Journal
More Related Content
What's hot
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
Hadoop
Hadoop
RittikaBaksi
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
inventionjournals
assignment3
assignment3
Kirti J
Understanding hdfs
Understanding hdfs
Thirunavukkarasu Ps
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
rahulmonikasharma
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET Journal
Intro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
Jonathan Bloom
IJET-V3I2P14
IJET-V3I2P14
IJET - International Journal of Engineering and Techniques
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
hive lab
hive lab
marwa baich
Big Data Concepts
Big Data Concepts
Ahmed Salman
Hadoop
Hadoop
Ankit Prasad
An experimental evaluation of performance
An experimental evaluation of performance
ijcsa
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
MLconf
paper
paper
Ankeeta Battalwar
What's hot
(17)
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Hadoop
Hadoop
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
assignment3
assignment3
Understanding hdfs
Understanding hdfs
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
Intro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
IJET-V3I2P14
IJET-V3I2P14
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
hive lab
hive lab
Big Data Concepts
Big Data Concepts
Hadoop
Hadoop
An experimental evaluation of performance
An experimental evaluation of performance
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
paper
paper
Similar to Survey Paper on Big Data and Hadoop
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
IRJET Journal
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET Journal
G017143640
G017143640
IOSR Journals
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET Journal
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
Big data
Big data
revathireddyb
Big data
Big data
revathireddyb
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
IRJET Journal
project report on hadoop
project report on hadoop
Manoj Jangalva
Seminar_Report_hadoop
Seminar_Report_hadoop
Varun Narang
hadoop
hadoop
swatic018
Big data
Big data
Abilash Mavila
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
International Journal of Modern Research in Engineering and Technology
Big data and hadoop
Big data and hadoop
AshishRathore72
hadoop seminar training report
hadoop seminar training report
Sarvesh Meena
Hadoop info
Hadoop info
Nikita Sure
Similar to Survey Paper on Big Data and Hadoop
(20)
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
G017143640
G017143640
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Big data
Big data
Big data
Big data
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
project report on hadoop
project report on hadoop
Seminar_Report_hadoop
Seminar_Report_hadoop
hadoop
hadoop
Big data
Big data
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
Big data and hadoop
Big data and hadoop
hadoop seminar training report
hadoop seminar training report
Hadoop info
Hadoop info
More from IRJET Journal
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
React based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
More from IRJET Journal
(20)
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
React based fullstack edtech web application
React based fullstack edtech web application
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Recently uploaded
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
rehmti665
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
ranjana rawat
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
Suhani Kapoor
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
ranjana rawat
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
KurinjimalarL3
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
9953056974 Low Rate Call Girls In Saket, Delhi NCR
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur High Profile
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
upamatechverse
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Suman Mia
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
Suhani Kapoor
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
Soham Mondal
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
misbanausheenparvam
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
purnimasatapathy1234
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
upamatechverse
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
hassan khalil
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
RajkumarAkumalla
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
9953056974 Low Rate Call Girls In Saket, Delhi NCR
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
ranjana rawat
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
srsj9000
Recently uploaded
(20)
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Survey Paper on Big Data and Hadoop
1.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 01 | Jan-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 861 Survey Paper on Big Data and Hadoop Varsha B.Bobade Department of Computer Engineering, JSPM’s Imperial College of Engineering & Research, Wagholi, Pune,India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The term ‘Big Data’, refers to data sets whose size (volume), complexity (variability), and rate of growth (velocity) make them difficult to capture, manage, process or analyzed. To analyze this enormous amount of data Hadoop can be used. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Apache Hive, No SQL and HPCC. These technologies handle massive amount of data in MB, PB, YB, ZB, KB and TB Key Words: Bigdata , Hadoop Framework, HDFS, Mapreduce , Hadoop Component I. INTRODUCTION Big Data is as a collection of large dataset that can not be processed using traditional computing techniques .Big Data is not merely a data rather it has become a complete subject which involve various tools, techniques and framework. The need of big data generated from the large companies like facebook, yahoo, Google, YouTube etc for the purpose of analysis of enormous amount of data also Google contains the large amount of information Big Data is a term that refers to dataset whose volume (size), complexity and rate of growth (velocity) make them to difficult to captured ,managed ,processed or analyzed by conventional technology and tools such as relational databases. There are various technologies in the market from different vendors including Amazon, IBM, Microsoft, etc., to handle big data The data in it will be of three types. Structured data: Relational data. Semi Structured data: XML data. Unstructured data: Word, PDF, Text, Media Logs. 1.1 The challenges of big data 1. Volume: Volume refers to amount of data. volume represent the size of the data how the data is large. The size of the data is represented in terabytes and petabytes. 2. Variety: Variety makes the data too big. The files comes in various formats and of any type, it may be structured or unstructured such as text, audio, videos, log files and more. 3. Velocity: Velocity refers to the speed of data processing. The data comes at high speed. Sometimes 1 minute is too late so big data is time sensitive. 4. Value: The potential value of Big data is huge.Value is main source for big data because it is important for businesses, IT infrastructure system to store large amount of values in database. 5. Veracity: Veracity refers to noise ,biases and adnormality When we dealing with high volume, velocity and variety of data, the all of data are not going 100% correct, there will be dirty data. 2. Hadoop: Solution for Big Data Processing Hadoop is an Apache open source framework written in Java that allows distributed processing of large dataset across cluster of computers using simple programming model Hadoop creates cluster of machines and coordinates work among them . It is designed to scale up from single servers to thousands of machines, each offering local computation and storage Hadoop consists of two component Hadoop Distributed File System(HDFS) and MapReduce Framework.
2.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 01 | Jan-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 862 Fig -1: Architecture of Hadoop a) HDFS (Hadoop Distributed File System) HDFS is a file system designed for storing very large files with streaming data access pattern, running clusters on comodity hardware. HDFS manages storage on the cluster by breaking incoming files into pieces called ‘blocks’ and stroing each blocks redundantly across the pool of the server. HDFS stores three copies of each file by copying each piece to three different servers. Size of each block 64MB. HDFS architecture is broadly divided into following three nodes which are Name Node, Data Node, HDFS Clients/Edge Node 1. Name Node It is centrally placed node, which contains information about Hadoop file system . The main task of name node is that it records all the metadata & attributes and specific locations of files & data blocks in the data nodes. Name node acts as the master node as it stores all the information about the system .and provides information which is newly added, modified and removed from data nodes. 2. Data Node It works as slave node. Hadoop environment may contain more than one data nodes based on capacity and performance . A data node performs two main tasks storing a block in HDFS and acts as the platform for running jobs. 3. HDFS Clients/Edge node HDFS Clients sometimes also know as Edge node . It acts as linker between name node and data nodes. Hadoop cluster there is only one client but there are also many depending upon performance needs . b) MapReduce Framwork MapReduce is defined as a programming model for processing and generating large sets of data. There are two phases in MapReduce, the “Map” phase and the “Reduce” phase. The system splits the input data into multiple chunks, each of which is assigned a map task that can process the data in parallel. Each map task reads the input as a set of (key, value) pairs and produces a transformed set of (key, value) pairs as the output. The framework shuffles and sorts outputs of the map tasks, sending the intermediate (key, value) pairs to reduce task, which groups them into final results. 3. The Hadoop Ecosystem 1) HBases Hbase is distributed column oriented database where as HDFS is file system. But it is built on top of HDFS system. HBase is a management system that is open-source, versioned, and distributed based on the BigTable of Google. It is Non-relational, distributed database system written in Java. It runs on the top of HDFS. It can serve as the input and output for the MapReduce. For example, read and write operations involve all rows but only a small subset of all columns. 2) Avro: Avro is data serialization format which brings data interoperability among mutlple components of apache hadoop. Most of the components in hadoop started supporting Avro data format. It works with basic premise of data produced by component should be readily
3.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 01 | Jan-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 863 Fig Fig - 2: Hadoop Ecosystems consumed by other component Avro has following features Rich data types, Fast and compact serialization, Support many programming langguages like java, Python. 3) Pig: Pig is platform for big data analysis and processing. Pig adds one more level abstraction in data processing and it makes writing and maintaining data processing jobs very easy. Pig. can process tera bytes of data with half dozen lines of code. 4) Hive: Hive is a dataware housing framework on top of Hadoop. Hive allows to write SQL like queries to process and analyze the big data stored in HDFS. It is Data warehousing application that provides the SQL interface and relational model. Hive infrastructure is built on the top of Hadoop that help in providing summarization, query and analysis. 5) Sqoop: Sqoop is tool which can be used to transfer the data from relational database environments like oracle, mysql and postgresql into hadoop environment Sqoop is a command- line interface platform that is used for transferring data between relational databases and Hadoop. 6) Zookeeper: Zookeeper is a distributed coordination and governing service for hadoop cluster It is a centralized service that provides distributed synchronization and providing group services and maintains the configuration information etc. In hadoop this will be useful to track if particular node is down and plan necessary communication protocol around node failure. 7) Mahout: Mahout is a library for machine-learning and data mining. It is divided into four main groups: collective filtering, categorization, clustering, and mining of parallel frequent patterns. The Mahout library belongs to the subset that can be executed in a distributed mode and can be executed by MapReduce. 3. CONCLUSIONS I have entered an era of Big Data. The paper describes the concept of Big Data along with Operational vs. Analytical Systems of Big Data. The paper also focuses on Big Data processing problems. These technical challenges must be addressed for efficient and fast processing of Big Data. In this paper we have tried to cover all detail of Hadoop and Hadoop component and future scope. REFERENCES [1] Andrew Pavlo, “A Comparison of Approaches to Large-Scale Data Analysis”, SIGMOD, 2009. [2] Apache Hadoop: http://Hadoop.apache.org [3] Dean, J. and Ghemawat, S., “MapReduce: a flexible data processing tool”, ACM 2010. [4] DeWitt & Stonebraker, “MapReduce: A major step backwards”, 2008. [5]Hadoop Distributed File System, http://hadoop.apache.org/hdfs [6] HadoopTutorial: http://developer.yahoo.com/hadoop/tutorial/module1.html [7] J. Dean and S. Ghemawat, “Data Processing on Large Cluster”, OSDI ’04, pages 137–150, 2004 [8] J. Dean and S. Ghemawat,“MapReduce: Simplified Data Processing on Large Clusters”, p.10, (2004) [9] Jean-Pierre Dijcks, “Oracle: Big Data for the Enterprise”, 2013. [10] Greenplum Analytics Workbench ,visit us at www.greenplum.com [11] shilpa Manjit Kaur,” BIG Data and Methodology- A review” ,International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 10, October 2013.
Download now