NewSQL systems seek to provide the scalability of NoSQL for online transaction processing while maintaining the ACID guarantees of a traditional database. There are three defining properties of big data: volume, velocity, and variety. Volume refers to the large amounts of data created each day. Velocity measures how fast data comes in, which can be real-time or in batches. Variety means data now comes in non-traditional forms like video or from devices.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
An elastic , effective, activety or intelligent ,graceful networking architecture layout be desired to make processing massive data. next to that ,existent network architectures be considerably incapable for
cleatting the huge data. massive data thrusts network exchequers into border it consequence with in network overcrowding ,needy achievement, then permicious employer exprtises. this offered the current state-of-the-art research affronts ,potential solutions into huge data networking notion. More specifically, present the state of networking problems into massive data connected intrequirements,capacity,running ,
data manipulating also will introduce the architectures of MapReduce , Hadoop paradigm within research
requirements, fabric networks and software defined networks which utilizized into making today’s idly growing digital world and compare and contrast into identify relevant drawbacks and solutions.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
An elastic , effective, activety or intelligent ,graceful networking architecture layout be desired to make processing massive data. next to that ,existent network architectures be considerably incapable for
cleatting the huge data. massive data thrusts network exchequers into border it consequence with in network overcrowding ,needy achievement, then permicious employer exprtises. this offered the current state-of-the-art research affronts ,potential solutions into huge data networking notion. More specifically, present the state of networking problems into massive data connected intrequirements,capacity,running ,
data manipulating also will introduce the architectures of MapReduce , Hadoop paradigm within research
requirements, fabric networks and software defined networks which utilizized into making today’s idly growing digital world and compare and contrast into identify relevant drawbacks and solutions.
Representing Non-Relational Databases with Darwinian NetworksIJERA Editor
The Darwinian networks (DNs) are first introduced by Dr Butz [1] to simplify and clarify how to work with Bayesian networks (BNs). DNs can unify modeling and reasoning tasks into a single platform using the graphical manipulation of the probability tables that takes on a biological feel. From this view of the DNs, we propose a graphical library to represent and depict non-relational databases using DNs. Because of the growing of this kind of databases, we need even more tools to help in the management work, and the DNs can help with these tasks.
The aim of this paper is to evaluate, through indexing techniques, the performance of Neo4j and
OrientDB, both graph databases technologies and to come up with strength and weaknesses os each
technology as a candidate for a storage mechanism of a graph structure. An index is a data structure that
makes the searching faster for a specific node in concern of graph databases. The referred data structure
is habitually a B-tree, however, can be a hash table or some other logic structure as well. The pivotal
point of having an index is to speed up search queries, primarily by reducing the number of nodes in a
graph or table to be examined. Graphs and graph databases are more commonly associated with social
networking or “graph search” style recommendations. Thus, these technologies remarkably are a core
technology platform for some Internet giants like Hi5, Facebook, Google, Badoo, Twitter and LinkedIn.
The key to understanding graph database systems, in the social networking context, is they give equal
prominence to storing both the data (users, favorites) and the relationships between them (who liked
what, who ‘follows’ whom, which post was liked the most, what is the shortest path to ‘reach’ who). By a
suitable application case study, in case a Twitter social networking of almost 5,000 nodes imported in
local servers (Neo4j and Orient-DB), one queried to retrieval the node with the searched data, first
without index (full scan), and second with index, aiming at comparing the response time (statement query
time) of the aforementioned graph databases and find out which of them has a better performance (the
speed of data or information retrieval) and in which case. Thereof, the main results are presented in the
section 6.
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...ijcsit
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes
in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks. Hadoop is an open source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Mr. Ketan Bagade | Mrs. Anjali Gharat | Mrs. Helina Tandel "A Review Paper on Big Data and Hadoop for Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29816.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29816/a-review-paper-on-big-data-and-hadoop-for-data-science/mr-ketan-bagade
This article useful for anyone who want to introduce with Big Data and how oracle architecture Big Data solution using Oracle Big Data Cloud solutions .
Representing Non-Relational Databases with Darwinian NetworksIJERA Editor
The Darwinian networks (DNs) are first introduced by Dr Butz [1] to simplify and clarify how to work with Bayesian networks (BNs). DNs can unify modeling and reasoning tasks into a single platform using the graphical manipulation of the probability tables that takes on a biological feel. From this view of the DNs, we propose a graphical library to represent and depict non-relational databases using DNs. Because of the growing of this kind of databases, we need even more tools to help in the management work, and the DNs can help with these tasks.
The aim of this paper is to evaluate, through indexing techniques, the performance of Neo4j and
OrientDB, both graph databases technologies and to come up with strength and weaknesses os each
technology as a candidate for a storage mechanism of a graph structure. An index is a data structure that
makes the searching faster for a specific node in concern of graph databases. The referred data structure
is habitually a B-tree, however, can be a hash table or some other logic structure as well. The pivotal
point of having an index is to speed up search queries, primarily by reducing the number of nodes in a
graph or table to be examined. Graphs and graph databases are more commonly associated with social
networking or “graph search” style recommendations. Thus, these technologies remarkably are a core
technology platform for some Internet giants like Hi5, Facebook, Google, Badoo, Twitter and LinkedIn.
The key to understanding graph database systems, in the social networking context, is they give equal
prominence to storing both the data (users, favorites) and the relationships between them (who liked
what, who ‘follows’ whom, which post was liked the most, what is the shortest path to ‘reach’ who). By a
suitable application case study, in case a Twitter social networking of almost 5,000 nodes imported in
local servers (Neo4j and Orient-DB), one queried to retrieval the node with the searched data, first
without index (full scan), and second with index, aiming at comparing the response time (statement query
time) of the aforementioned graph databases and find out which of them has a better performance (the
speed of data or information retrieval) and in which case. Thereof, the main results are presented in the
section 6.
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...ijcsit
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes
in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks. Hadoop is an open source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Mr. Ketan Bagade | Mrs. Anjali Gharat | Mrs. Helina Tandel "A Review Paper on Big Data and Hadoop for Data Science" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29816.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29816/a-review-paper-on-big-data-and-hadoop-for-data-science/mr-ketan-bagade
This article useful for anyone who want to introduce with Big Data and how oracle architecture Big Data solution using Oracle Big Data Cloud solutions .
We are good IEEE java projects development center in Chennai and Pondicherry. We guided advanced java technologies projects of cloud computing, data mining, Secure Computing, Networking, Parallel & Distributed Systems, Mobile Computing and Service Computing (Web Service).
For More Details:
http://jpinfotech.org/final-year-ieee-projects/2014-ieee-projects/java-projects/
Guest Speaker in the 2nd National level webinar titled "Big Data Driven Solutions to Combat Covid 19" on 4th July 2020, Ethiraj College for Women(Auto), Chennai.
Very basic Introduction to Big Data. Touches on what it is, characteristics, some examples of Big Data frameworks. Hadoop 2.0 example - Yarn, HDFS and Map-Reduce with Zookeeper.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
1. Tutorial 9
a)
NewSQLisa class of relational database managementsystemsthatseektoprovide the scalabilityof
NoSQLsystemsforonline transactionprocessing(OLTP) workloadswhile maintainingthe ACID
guaranteesof a traditional database system.... NewSQLsystemsattempttoreconcile the conflicts.
b)
There are three definingpropertiesthatcan helpbreakdownthe term.Dubbedthe three Vs;
volume,velocity,andvariety,these are keytounderstandinghow we canmeasure bigdataand just
howvery different‘bigdata’istooldfashioneddata.
Volume
The most obviousone iswhere we’ll start.Bigdatais aboutvolume.Volumesof datathatcan reach
unprecedentedheightsinfact.It’sestimatedthat2.5quintillionbytesof dataiscreatedeachday,
and as a result,there will be 40zettabytesof data createdby2020 – whichhighlightsanincrease of
300 timesfrom2005. As a result,itisnow not uncommonforlarge companiestohave Terabytes –
and evenPetabytes –of data instorage devicesandonservers.Thisdatahelpsto shape the future
of a companyand itsactions,all while trackingprogress.
Velocity
The growth of data, and the resultingimportanceof it,haschangedthe way we see data.There once
was a time whenwe didn’tsee the importance of datainthe corporate world,butwiththe change
of howwe gatherit,we’ve come torelyon it dayto day. Velocityessentiallymeasureshow fastthe
data iscomingin.Some data will come ininreal-time,whereasotherwill come infitsandstarts,
sentto us inbatches.Andas not all platformswill experience the incomingdataatthe same pace,
it’simportantnotto generalise,discount,orjumptoconclusionswithouthavingall the factsand
figures.
Variety
Data was once collectedfromone place anddeliveredinone format.Once takingthe shape of
database files - suchas, excel,csvandaccess - it isnow beingpresentedinnon-traditionalforms,like
video,text,pdf,andgraphicsonsocial media,aswell asviatechsuch as wearable devices.Although
2. thisdata isextremelyusefultous,itdoescreate more workand require more analytical skillsto
decipherthisincomingdata,make itmanageable andallow ittowork.
1) Seta bigdata strategy
At a highlevel,abigdata strategyisa plandesignedtohelpyouoversee andimprovethe wayyou
acquire,store,manage,share anduse data withinandoutside of yourorganization.A bigdata
strategysetsthe stage for businesssuccessamidanabundance of data.Whendevelopingastrategy,
it’simportantto considerexisting –and future – businessandtechnologygoalsandinitiatives.This
callsfor treatingbigdata like anyothervaluable businessassetratherthanjusta byproductof
applications.
Big Data Infographic
Clickon the infographictolearnmore aboutbigdata.
2) Knowthe sourcesof bigdata
Streamingdatacomesfrom the Internetof Things(IoT) andotherconnecteddevicesthatflow into
IT systemsfromwearables,smartcars,medical devices,industrial equipmentandmore.Youcan
analyze thisbigdata as itarrives,decidingwhichdatatokeepornot keep,andwhichneedsfurther
analysis.
Social mediadatastemsfrominteractionsonFacebook,YouTube,Instagram, etc.Thisincludesvast
amountsof big data inthe form of images,videos,voice,textandsound –useful formarketing,sales
and supportfunctions.Thisdataisofteninunstructuredorsemistructuredforms,soitposesa
unique challenge forconsumptionandanalysis.
Publiclyavailabledatacomesfrommassive amountsof opendatasourceslike the USgovernment’s
data.gov,the CIA World Factbookor the EuropeanUnionOpenData Portal.
Otherbigdata may come from data lakes,clouddatasources,suppliersandcustomers.
3) Access,manage and store bigdata
Moderncomputingsystemsprovide the speed,powerandflexibilityneededto quicklyaccess
massive amountsandtypesof bigdata. Alongwithreliable access,companiesalsoneedmethodsfor
integratingthe data,ensuringdataquality,providingdatagovernance andstorage,andpreparing
the data for analytics.Some datamaybe storedon-premisesinatraditional datawarehouse –but
there are also flexible,low-costoptionsforstoringandhandlingbigdataviacloudsolutions,data
lakesandHadoop.
3. 4) Analyze bigdata
Withhigh-performance technologieslike gridcomputingorin-memoryanalytics,organizationscan
choose to use all theirbigdata for analyses.Anotherapproachistodetermine upfrontwhichdatais
relevantbefore analyzingit.Eitherway,bigdataanalyticsishow companiesgainvalue andinsights
fromdata. Increasingly,bigdatafeedstoday’sadvancedanalyticsendeavorssuchasartificial
intelligence.
5) Make intelligent,data-drivendecisions
Well-managed,trusteddataleadstotrustedanalyticsandtrusteddecisions.Tostaycompetitive,
businessesneedto seize the full valueof bigdataand operate ina data-drivenway – making
decisionsbasedonthe evidence presentedbybigdataratherthan gut instinct.The benefitsof being
data-drivenare clear.Data-drivenorganizationsperformbetter,are operationallymore predictable
and are more profitable.
c)
HDFS AssumptionandGoals
I. Hardware failure
Hardware failure isnomore exception;ithasbecome aregularterm.HDFS instance consistsof
hundredsorthousandsof servermachines,eachof whichisstoringpart of the file system’sdata.
There existahuge numberof componentsthatare verysusceptibletohardware failure.Thismeans
that there are some componentsthatare alwaysnon-functional.Sothe core architectural goal of
HDFS isquickand automaticfaultdetection/recovery.
II.Streamingdata access
HDFS applicationsneedstreamingaccesstotheirdatasets.HadoopHDFS ismainlydesignedfor
batch processingratherthaninteractive use byusers.The force isonhighthroughputof data access
rather thanlowlatencyof data access.It focusesonhow to retrieve dataatthe fastestpossible
speedwhile analyzinglogs.
III.Large datasets
HDFS workswithlarge data sets.Instandard practices,a file inHDFSisof size rangingfromgigabytes
to petabytes.The architecture of HDFSshouldbe designinsucha waythat itshouldbe bestfor
storingand retrievinghuge amountsof data.HDFS shouldprovide highaggregate databandwidth
4. and shouldbe able toscale up to hundredsof nodesona single cluster.Also,itshouldbe good
enoughtodeal withtonsof millionsof filesonasingle instance.
IV.Simple coherencymodel
It workson a theoryof write-once-read-manyaccessmodelforfiles.Once the file iscreated,written,
and closed,itshouldnotbe changed.Thisresolvesthe datacoherencyissuesandenableshigh
throughputof data access.A MapReduce-basedapplicationorwebcrawlerapplicationperfectlyfits
inthismodel.Asperapache notes,there isaplan to supportappendingwritestofilesinthe future.
V.Moving computationischeaperthanmovingdata
If an applicationdoesthe computationnearthe dataitoperateson,it ismuch more efficientthan
done far of.Thisfact becomesstrongerwhile dealingwithlarge dataset.The mainadvantage of this
isthat it increasesthe overall throughputof the system.Italsominimizesnetworkcongestion.The
assumptionisthatit isbetterto move computationclosertodatainsteadof movingdatato
computation.
VI.Portabilityacrossheterogeneoushardware andsoftware platforms
HDFS isdesignedwiththe portable propertysothatit shouldbe portable fromone platformto
another.Thisenablesthe widespreadadoptionof HDFS.Itisthe bestplatformwhile dealingwitha
large setof data.