Submit Search
Upload
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and Importance of Data Replication in HDFS
•
0 likes
•
24 views
IRJET Journal
Follow
https://irjet.net/archives/V5/i1/IRJET-V5I1118.pdf
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 4
Download now
Download to read offline
Recommended
An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.
ijceronline
IRJET- Big Data: A Study
IRJET- Big Data: A Study
IRJET Journal
Intro to big data and applications - day 1
Intro to big data and applications - day 1
Parviz Vakili
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
IRJET Journal
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
IRJET Journal
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
PROWEBSCRAPER
E018142329
E018142329
IOSR Journals
A Review on Classification of Data Imbalance using BigData
A Review on Classification of Data Imbalance using BigData
IJMIT JOURNAL
Recommended
An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.
ijceronline
IRJET- Big Data: A Study
IRJET- Big Data: A Study
IRJET Journal
Intro to big data and applications - day 1
Intro to big data and applications - day 1
Parviz Vakili
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
IRJET Journal
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
IRJET Journal
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
PROWEBSCRAPER
E018142329
E018142329
IOSR Journals
A Review on Classification of Data Imbalance using BigData
A Review on Classification of Data Imbalance using BigData
IJMIT JOURNAL
Big Data Analytics
Big Data Analytics
MrsSSumathiIT
Data Management
Data Management
BashirMutebi1
Information economics and big data
Information economics and big data
Mark Albala
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
DATAVERSITY
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
ijsptm
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
sarfraznawaz
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects
IJMER
1
1
Monika Moni
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
IOSR Journals
Data set module 1
Data set module 1
Data-Set
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
IJCSIS Research Publications
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
IRJET Journal
A Survey on Graph Database Management Techniques for Huge Unstructured Data
A Survey on Graph Database Management Techniques for Huge Unstructured Data
IJECEIAES
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
FellowBuddy.com
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
ijdpsjournal
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
Ravinder Kamboj
No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...
eSAT Publishing House
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
IJET - International Journal of Engineering and Techniques
introduction to data science
introduction to data science
Johnson Ubah
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdf
JayanthSram
Comparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big data
IRJET Journal
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
IRJET Journal
More Related Content
What's hot
Big Data Analytics
Big Data Analytics
MrsSSumathiIT
Data Management
Data Management
BashirMutebi1
Information economics and big data
Information economics and big data
Mark Albala
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
DATAVERSITY
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
ijsptm
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
sarfraznawaz
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects
IJMER
1
1
Monika Moni
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
IOSR Journals
Data set module 1
Data set module 1
Data-Set
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
IJCSIS Research Publications
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
IRJET Journal
A Survey on Graph Database Management Techniques for Huge Unstructured Data
A Survey on Graph Database Management Techniques for Huge Unstructured Data
IJECEIAES
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
FellowBuddy.com
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
ijdpsjournal
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
Ravinder Kamboj
No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...
eSAT Publishing House
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
IJET - International Journal of Engineering and Techniques
introduction to data science
introduction to data science
Johnson Ubah
What's hot
(19)
Big Data Analytics
Big Data Analytics
Data Management
Data Management
Information economics and big data
Information economics and big data
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects
1
1
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
Data set module 1
Data set module 1
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
A Survey on Graph Database Management Techniques for Huge Unstructured Data
A Survey on Graph Database Management Techniques for Huge Unstructured Data
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
introduction to data science
introduction to data science
Similar to IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and Importance of Data Replication in HDFS
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdf
JayanthSram
Comparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big data
IRJET Journal
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
IRJET Journal
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
IRJET Journal
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
IRJET Journal
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
Chapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
An Overview of Data Lake
An Overview of Data Lake
IRJET Journal
Big data service architecture: a survey
Big data service architecture: a survey
ssuser0191d4
Data modeling techniques used for big data in enterprise networks
Data modeling techniques used for big data in enterprise networks
Dr. Richard Otieno
Sample
Sample
MohdAnasSiddiqui
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
jessiehampson
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
IRJET Journal
Data set Introduction to Big Data
Data set Introduction to Big Data
Data-Set
IRJET- Survey of Big Data with Hadoop
IRJET- Survey of Big Data with Hadoop
IRJET Journal
Chapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptx
JethroDignadice2
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET Journal
Managing Data Strategically
Managing Data Strategically
Michael Findling
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
IJTET Journal
Data Science: A Revolution of Data
Data Science: A Revolution of Data
IRJET Journal
Similar to IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and Importance of Data Replication in HDFS
(20)
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdf
Comparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big data
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
Chapter 1 big data
Chapter 1 big data
An Overview of Data Lake
An Overview of Data Lake
Big data service architecture: a survey
Big data service architecture: a survey
Data modeling techniques used for big data in enterprise networks
Data modeling techniques used for big data in enterprise networks
Sample
Sample
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
Data set Introduction to Big Data
Data set Introduction to Big Data
IRJET- Survey of Big Data with Hadoop
IRJET- Survey of Big Data with Hadoop
Chapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptx
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
Managing Data Strategically
Managing Data Strategically
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
Data Science: A Revolution of Data
Data Science: A Revolution of Data
More from IRJET Journal
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
React based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
More from IRJET Journal
(20)
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
React based fullstack edtech web application
React based fullstack edtech web application
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Recently uploaded
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
Kamal Acharya
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
Call Girls in Nagpur High Profile Call Girls
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
University management System project report..pdf
University management System project report..pdf
Kamal Acharya
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
SUHANI PANDEY
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
Quintin Balsdon
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
RagavanV2
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Call Girls in Nagpur High Profile
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
bhaskargani46
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
JiananWang21
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Call Girls in Nagpur High Profile
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Suman Jyoti
NFPA 5000 2024 standard .
NFPA 5000 2024 standard .
DerechoLaboralIndivi
Online banking management system project.pdf
Online banking management system project.pdf
Kamal Acharya
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
KreezheaRecto
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
roncy bisnoi
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
rs7054576148
Recently uploaded
(20)
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
University management System project report..pdf
University management System project report..pdf
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
NFPA 5000 2024 standard .
NFPA 5000 2024 standard .
Online banking management system project.pdf
Online banking management system project.pdf
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and Importance of Data Replication in HDFS
1.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 553 Unraveling the Data Structures of Big data, the HDFS Architecture and Importance of Data Replication in HDFS Aseema Sultana1 1Asst.Professor, Dept. of Information Science and Engineering, HKBK CE, Bangalore, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – Big Data is one of the most challenging aspectsin today's world filled with emerging information.Thefactthatit leads us to be dealt with massive amounts of overwhelming raw data creates new opportunities in the said field. The term Big Data describes extremely large volumesofdatathatcomes from varied sources in different formats with high velocity. Here, the amount of data is not what is important, but what can be done with the data is more important. In this process, useful information is fetched from the massive amount ofdata and is analyzed. The analyzing part itself is a very challenging and demanding piece of work as it demands fault forbearing, amenable and ascendible systems that work upon the said data. The ultimate goal here is to make the process of managing and maintaining data a convenient and feasible task. This paper provides a review on Big Data, its data structures, details of the HDFS architecture, and replicationof data files within HDFS. Key Words: Big Data, Structured, Semi-structured, Quasi-structured, Un-structured Data, Hadoop, HDFS Architecture, Data Replication, Space Reclamation. 1. INTRODUCTION What is Big Data? Big Data is not just large volumes of data, but also data with high variety and high velocity. In other words, Big Data is huge data in different forms being generated at high speed which is difficult to manage or process using any existing tools and architectures. Let us first see what the input data into big data systemsis, it could be transactions from a bank, web server logs, chatter from Twitter or Facebook or any other social network,MP3songs, vlogs or any broadcasted videos, content of web pages, personal documents, medical reports, and many more to count, as mentioned in [1]. According to [2], in the 1990s the data volume was measured in terabytes. In subsequent decade the data volume was being measured in petabytes. And the decadeof 2010s is dealing with the exponential data volume drivenby variety and many sources of digitized data. Almost everybody and everything is leaving a digital footprint. Due to the volume in data, it has been considered to start measuring data in Exabyte. Some applications that generate a lot of data are shown in Fig -1. In the said figure we can see how the amount of information is evolving in the graph w.r.t every ten years. Data is increasing and exploding decade after decade and so are the sources for Big Data. Fig -1: The rise of Big Data sources and the Evolution of data 2. DATA STRUCTURES OF BIG DATA Big Data comes in many forms; it can be Structured, Semi- Structured,Quasi-Structuredor Unstructured asin[3].Fig-2 shows the Data structure types and the data growth accordingly. Fig -2: The pyramid of data structures of big data 2.1 Structured Data Structured data is data that comes with clarity, description, presentation, explicate length or format. This
2.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 554 type of data can be easily re-organized or processed by any data mining tools. One could envision this type of data as a perfectly organized book library where books are orderly arranged,labeled and are easily accessible. Hencethistypeof data can be effectively used by organizations. Few examples of structured data include numbers, dates, etc., it is believed that this type of data comprises about 15-20% of data present in the world. In Fig-3, we can see one such example representing structured data. We can see data from an excel sheet that displays clearly formatted information such as First Name, Last Name and Gender of a few people. 2.2 Semi-Structured Data This type of data is loosely defined or irregularly structured which allowsa schema-lessdescription format in which the data is less constrained than is usuallyin database work. Semi-structured data model allows information from several sources,with related butdifferentproperties,tobefit together in one whole, thus suitable for integration of databases and sharing information on the Web. Semi- structured data is data that may be irregular or incomplete and have a structure that may change rapidly or unpredictably. It generally has some structure, but does not conform to a fixed schema. It is Schema-less and self- describing,i.e., data carriesinformationaboutitsownschema (e.g., in terms of XML element tags). The main attraction for semi-structured data is that structure can be imposed upon the data as in [4]. Fig -3 shows an example for semi- structureddata;here we can see the XMLdocument that was fetched from the ‘view page source’ option while viewing a web page. 2.3 Quasi-Structured Data This data is not among the commonly mentionedtypesof data. It is the one in between the semi-structured and unstructured data. In other words, Quasi-structured data is data which is unstructured but is in an erratic format which can be handled with special tools. For example, consider visiting the website www.google.com, the URL (Uniform Resource Locator) is tracked into log files of the webpage's server; the user’s computer activity is being monitored. The URL defines a clickstream that can be parsed and mined by data scientists to discover usage patterns and uncover relationships between clicks and areas of interest on websites, as mentioned in [2]. This is illustrated in Fig -3 where the user types “HKBK College of Engineering” in the search option in Google. Upon a search, the clickstream “https://www.google.co.in/search?q=hkbk+college+of+enginee ring&rlz=1C1CHBD_enIN761IN761&oq=HKBK+col&aqs=chro me.0.0j69i57j69i61l3j0.3550j0j8&sourceid=chrome&ie=UTF-8” is generated which shall be parsed and mined. 2.4 Un-Structured Data Un-Structured data sometimes referred to as messy or unorganizedraw data is primarily in the formof text and can also be found in forms of multimedia files. This type of data doesnothave a formal structure,inotherwordsthestructure is unidentifiable. Word processing documents, Books, Journals, Health documents, e-mail messages, videos, presentations, webpages, photos, audio filesandmany other kinds of business documents are examples of unstructured data. As we know that structured data is information with a high degree of organization, unstructured data is essentially the opposite. To make use of this data, the massive data that was worthless is identified and stored in an organized fashion. Through the use of specialized software, the items can then be searched through. However, there are two aspects to be considered, firstly, the process of converting unstructureddata to organized datawouldbecostlyandtime consuming. Secondly, not all types of unstructured data can easily be converted into a structured model. Information aging is another factor that needs to be considered in this case where the data keeps accumulating day after day but is left unused. One example of un-structured data is illustrated in Fig -3 in which a video about Big Data from YouTube is streamed. 2.5 Heterogeneous Data This is a combination of all 4 types of data. For example, say we have a system that stores call logs for a call-center. In this case, data such as date and time stamp could represent the structured type. On the other hand, the customer chat historycouldrepresent the quasi-orsemi-structureddata.So much information can be extracted from the available set of the dataofdifferent structures.Figurebelowshowsexamples of all four forms of data. Fig -3: Illustration of data structures of big data 3. HADOOP AND HDFS ARCHITECTURE Hadoop is an open source software framework used to process big data. It was created by Doug Cuttingand Michael J. Cafarella as in [5]. Hadoop was developed for parallel and distributedprocessing systemsandwasinitiallymotivatedin
3.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 555 a paper by Google. As we know that managing big data is a near to impossible task for the existing traditional systems, this was made possible using Hadoop. With Hadoop, there is no limit on the storage of data and the process of computing huge data generated in high speeds is moving from traditional databases to distributed systems. Fig -4: The HDFS Architecture and Data Replication HDFS (Hadoop Distributed File system) is a highly fault tolerant parallel file systemwith amaster/slavearchitecture whose objective includes storing and managing huge data sets, managing failures of hardware, data access and simplifying simple data coherency issues as in [6]. The HDFS architecture is asshown in Fig -4.Thefigureshows the HDFS master/slave architecture. Each cluster consists of exactlyone NameNode which is amasterserverthatmanages accesses to files w.r.t clients and also manages file system namespace. There can be n number of DataNodesdepending upon the number of nodes in the cluster. The DataNodes manage storage of files. HDFS allows user data to be stored into files usually having the files itself split into more than one blocks and in turn these blocks stored in one or more DataNodes. The responsibility of NameNode is to execute operations like Opening, Closing or renaming files. On the other hand, the responsibility of DataNodes is to serve the read and write requests from the clients. In addition, the NameNode also instructs the DataNodes to perform block creation, cloning and deletion as per the need. The NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. The NameNode and DataNodes are designed to run on commodity hardware that runs a GNU/Linux operating system, as mentioned in [6]. HDFS is built using the Java language; therefore, any machine that supports Java can run the NameNode or the DataNode software. 3.1 Data Cloning Why is data replication required? The answer to this question relies on the reliability and performance of HDFS. Havingreplicasinthe DataNodesis what distinguishesHDFS from other parallel or distributed file systems. The main purpose of data replication is to better develop data availability, data reliability and fault tolerance. HDFS is designed to store files in a very reliable way using data replication in the clusters. Each file is stored asa sequence of blocks. The replication factor for each file and the block size are configurable asper each file. The above figurealsoshows how each file could be replicatedindifferentDataNodes.Here we can see that the replication factor can be specified during the time of file creation. The replication factor need not be restricted since it can be changed later. All files in HDFShave strictly one writer at any point of time and can be written only once. Alldecisions regarding cloningof blocksare made by the NameNode. 3.2 Data Replica Placement Instances of HDFS run on a cluster of computers that spread across a number of racks. Switches are used for communication between two nodes in different racks. The rack id to which each DataNode needs to belong to is determined by the NameNode. Usuallythereplicasareplaced in DataNodes on unique racks. This way data loss can be prevented even if an entire rack fails since data can be fetched from another rack that holds the replica. Hence the replicas can be evenly distributed in the cluster. This policy eases balancing component failure but increases cost of writes as each write needs a transfer of replica blocks to another rack. For instance, when a file has a replication factor of 3, HDFS’s placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the laston a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improveswrite performance.The chanceof rack failureisfar lessthan thatof node failure; this policy doesnotimpactdata reliability and availability guarantees. However, it does reduce the aggregatenetwork bandwidth usedwhenreading data since a block is placed in only two unique racks rather than three. With this policy,thereplicasof a filedonotevenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed acrossthe remaining racks. This
4.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 556 policy improves write performance without compromising data reliability or read performance as in [6]. 3.3 Data Selection When a read requestis made, HDFStriesto satisfy itfrom the replica that is nearest to the reader. The replica, if exists, on the same rack as the requestor, is preferred to satisfy the request. Else, if the cluster spans multiple data centers, then the replica residing at the local data center is chosen over a remote replica. 3.4 Space Redeeming Traditionally when we delete a file or a folder in our system on windows, the file is moved to recycle bin. This means the files space is not freed yet. File deletes in HDFS work similarly. When a file is deleted by an applicationor a user, it is notdeleted from HDFS immediately. Uponadelete,firstthe file is renamed and moved to the /trash directory. The file resides inthe trashdirectory for aconfiguredamountoftime. After the time expires, the file is then deleted permanently from the HDFS namespace by the NameNode. The deletion of a file causes all block spaces to be freed that are associated with the deleted file. However, the file can be restored while it still resides in the trash directory. To undelete a file, the deleted file must be restored from the trash directory. As per [6], the current default policy is to delete files permanently from trash that are more than 6 hours old. 4. CONCLUSION AND FUTURE WORK HDFS should support use of snapshots that can be helpfulin storing a copy of data at a particular instance of time. This can be particularly helpful to rollback a corrupted HDFS instance to a previously known point of time when the system worked without issues. There is a potential for making faster advancements in scientific discipline for analyzing large amountsof data .The technicalchallengesare most common across the large variety of application domains, therefore new cost effective and faster methods must be implemented to analyze the big data. ACKNOWLEDGEMENT I thank Dr. Syed Mustafa, Professor and Head, Departmentof Information Science and Engineering, HKBK CE for motivating me and providing their expertise that greatly assisted in writing the paper. I am also grateful to the authors of papers and documents I have referenced. I would like to express my appreciation to the authors for sharing their pearls of wisdom. REFERENCES [1] Big Data Now: 2012 Edition by O’Reilly Media, Inc. [2] Master Thesis on Tools and Methods for Big Data Analysis, by Miroslav Vozábal- 2016, University of West Bohemia. [3] Research Paper on Big Data and Hadoop, Iqbaldeep Kaur, Navneet Kaur, Amandeep Ummat, Jaspreet Kaur, Navjot Kaur. IJCST Vol. 7, ISSue 4, Oct - Dec 2016. [4] Adding Structure to Unstructured Data, Database Research Group (CIS), University of Pennsylvania ScholarlyCommons. 1997. [5] HADOOP ARCHITECTURE AND FAULT TOLERANCE BASED HADOOP CLUSTERS IN GEOGRAPHICALLY DISTRIBUTED DATA CENTER, T. Cowsalya and S.R. Mugunthan, ARPN Journal of Engineering and Applied Sciences. [6] The HDFS Architecture guide, https://hadoop.apache.org/. [7] Big Data And Hadoop: A Review Pape, Rahul Beakta, https://www.researchgate.net/publication/ BIOGRAPHIES Prof. Aseema Sultana was born in Bangalore, India. She received the B.E degree in ISE from HKBK CE, in 2010, and M.Tech in CSE from MVJ CE, in2012. She worked as a Sr. Project Engineer in Wipro (2012-2016) and currently working as an Asst. Professor in HKBK CE.1’st Author Photo
Download now