SlideShare a Scribd company logo
1 of 4
Tutorial 9
a)
NewSQLisa class of relational database managementsystemsthatseektoprovide the scalabilityof
NoSQLsystemsforonline transactionprocessing(OLTP) workloadswhile maintainingthe ACID
guaranteesof a traditional database system.... NewSQLsystemsattempttoreconcile the conflicts.
b)
There are three definingpropertiesthatcan helpbreakdownthe term.Dubbedthe three Vs;
volume,velocity,andvariety,these are keytounderstandinghow we canmeasure bigdataand just
howvery different‘bigdata’istooldfashioneddata.
Volume
The most obviousone iswhere we’ll start.Bigdatais aboutvolume.Volumesof datathatcan reach
unprecedentedheightsinfact.It’sestimatedthat2.5quintillionbytesof dataiscreatedeachday,
and as a result,there will be 40zettabytesof data createdby2020 – whichhighlightsanincrease of
300 timesfrom2005. As a result,itisnow not uncommonforlarge companiestohave Terabytes –
and evenPetabytes –of data instorage devicesandonservers.Thisdatahelpsto shape the future
of a companyand itsactions,all while trackingprogress.
Velocity
The growth of data, and the resultingimportanceof it,haschangedthe way we see data.There once
was a time whenwe didn’tsee the importance of datainthe corporate world,butwiththe change
of howwe gatherit,we’ve come torelyon it dayto day. Velocityessentiallymeasureshow fastthe
data iscomingin.Some data will come ininreal-time,whereasotherwill come infitsandstarts,
sentto us inbatches.Andas not all platformswill experience the incomingdataatthe same pace,
it’simportantnotto generalise,discount,orjumptoconclusionswithouthavingall the factsand
figures.
Variety
Data was once collectedfromone place anddeliveredinone format.Once takingthe shape of
database files - suchas, excel,csvandaccess - it isnow beingpresentedinnon-traditionalforms,like
video,text,pdf,andgraphicsonsocial media,aswell asviatechsuch as wearable devices.Although
thisdata isextremelyusefultous,itdoescreate more workand require more analytical skillsto
decipherthisincomingdata,make itmanageable andallow ittowork.
1) Seta bigdata strategy
At a highlevel,abigdata strategyisa plandesignedtohelpyouoversee andimprovethe wayyou
acquire,store,manage,share anduse data withinandoutside of yourorganization.A bigdata
strategysetsthe stage for businesssuccessamidanabundance of data.Whendevelopingastrategy,
it’simportantto considerexisting –and future – businessandtechnologygoalsandinitiatives.This
callsfor treatingbigdata like anyothervaluable businessassetratherthanjusta byproductof
applications.
Big Data Infographic
Clickon the infographictolearnmore aboutbigdata.
2) Knowthe sourcesof bigdata
Streamingdatacomesfrom the Internetof Things(IoT) andotherconnecteddevicesthatflow into
IT systemsfromwearables,smartcars,medical devices,industrial equipmentandmore.Youcan
analyze thisbigdata as itarrives,decidingwhichdatatokeepornot keep,andwhichneedsfurther
analysis.
Social mediadatastemsfrominteractionsonFacebook,YouTube,Instagram, etc.Thisincludesvast
amountsof big data inthe form of images,videos,voice,textandsound –useful formarketing,sales
and supportfunctions.Thisdataisofteninunstructuredorsemistructuredforms,soitposesa
unique challenge forconsumptionandanalysis.
Publiclyavailabledatacomesfrommassive amountsof opendatasourceslike the USgovernment’s
data.gov,the CIA World Factbookor the EuropeanUnionOpenData Portal.
Otherbigdata may come from data lakes,clouddatasources,suppliersandcustomers.
3) Access,manage and store bigdata
Moderncomputingsystemsprovide the speed,powerandflexibilityneededto quicklyaccess
massive amountsandtypesof bigdata. Alongwithreliable access,companiesalsoneedmethodsfor
integratingthe data,ensuringdataquality,providingdatagovernance andstorage,andpreparing
the data for analytics.Some datamaybe storedon-premisesinatraditional datawarehouse –but
there are also flexible,low-costoptionsforstoringandhandlingbigdataviacloudsolutions,data
lakesandHadoop.
4) Analyze bigdata
Withhigh-performance technologieslike gridcomputingorin-memoryanalytics,organizationscan
choose to use all theirbigdata for analyses.Anotherapproachistodetermine upfrontwhichdatais
relevantbefore analyzingit.Eitherway,bigdataanalyticsishow companiesgainvalue andinsights
fromdata. Increasingly,bigdatafeedstoday’sadvancedanalyticsendeavorssuchasartificial
intelligence.
5) Make intelligent,data-drivendecisions
Well-managed,trusteddataleadstotrustedanalyticsandtrusteddecisions.Tostaycompetitive,
businessesneedto seize the full valueof bigdataand operate ina data-drivenway – making
decisionsbasedonthe evidence presentedbybigdataratherthan gut instinct.The benefitsof being
data-drivenare clear.Data-drivenorganizationsperformbetter,are operationallymore predictable
and are more profitable.
c)
HDFS AssumptionandGoals
I. Hardware failure
Hardware failure isnomore exception;ithasbecome aregularterm.HDFS instance consistsof
hundredsorthousandsof servermachines,eachof whichisstoringpart of the file system’sdata.
There existahuge numberof componentsthatare verysusceptibletohardware failure.Thismeans
that there are some componentsthatare alwaysnon-functional.Sothe core architectural goal of
HDFS isquickand automaticfaultdetection/recovery.
II.Streamingdata access
HDFS applicationsneedstreamingaccesstotheirdatasets.HadoopHDFS ismainlydesignedfor
batch processingratherthaninteractive use byusers.The force isonhighthroughputof data access
rather thanlowlatencyof data access.It focusesonhow to retrieve dataatthe fastestpossible
speedwhile analyzinglogs.
III.Large datasets
HDFS workswithlarge data sets.Instandard practices,a file inHDFSisof size rangingfromgigabytes
to petabytes.The architecture of HDFSshouldbe designinsucha waythat itshouldbe bestfor
storingand retrievinghuge amountsof data.HDFS shouldprovide highaggregate databandwidth
and shouldbe able toscale up to hundredsof nodesona single cluster.Also,itshouldbe good
enoughtodeal withtonsof millionsof filesonasingle instance.
IV.Simple coherencymodel
It workson a theoryof write-once-read-manyaccessmodelforfiles.Once the file iscreated,written,
and closed,itshouldnotbe changed.Thisresolvesthe datacoherencyissuesandenableshigh
throughputof data access.A MapReduce-basedapplicationorwebcrawlerapplicationperfectlyfits
inthismodel.Asperapache notes,there isaplan to supportappendingwritestofilesinthe future.
V.Moving computationischeaperthanmovingdata
If an applicationdoesthe computationnearthe dataitoperateson,it ismuch more efficientthan
done far of.Thisfact becomesstrongerwhile dealingwithlarge dataset.The mainadvantage of this
isthat it increasesthe overall throughputof the system.Italsominimizesnetworkcongestion.The
assumptionisthatit isbetterto move computationclosertodatainsteadof movingdatato
computation.
VI.Portabilityacrossheterogeneoushardware andsoftware platforms
HDFS isdesignedwiththe portable propertysothatit shouldbe portable fromone platformto
another.Thisenablesthe widespreadadoptionof HDFS.Itisthe bestplatformwhile dealingwitha
large setof data.

More Related Content

What's hot

5 data resource management
5 data resource management5 data resource management
5 data resource managementNymphea Saraf
 
data resource management
 data resource management data resource management
data resource managementsoodsurbhi123
 
Representing Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian NetworksRepresenting Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian NetworksIJERA Editor
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databasesijaia
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
 
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...ijcsit
 

What's hot (14)

U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
 
5 data resource management
5 data resource management5 data resource management
5 data resource management
 
data resource management
 data resource management data resource management
data resource management
 
Representing Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian NetworksRepresenting Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian Networks
 
Advanced Database System
Advanced Database SystemAdvanced Database System
Advanced Database System
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
Managing data resources
Managing  data resourcesManaging  data resources
Managing data resources
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databases
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
Lecture 04 data resource management
Lecture 04 data resource managementLecture 04 data resource management
Lecture 04 data resource management
 
Dbms unit 1
Dbms unit   1Dbms unit   1
Dbms unit 1
 
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...
 
1771 1775
1771 17751771 1775
1771 1775
 

Similar to T9

A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET Journal
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Datachennaijp
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
data mining with big data
data mining with big datadata mining with big data
data mining with big dataswathi78
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis Conference Papers
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 

Similar to T9 (20)

A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
1
11
1
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
Big Data
Big DataBig Data
Big Data
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
data mining with big data
data mining with big datadata mining with big data
data mining with big data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
big data
big databig data
big data
 

More from NidhiGupta8431 (10)

T6
T6T6
T6
 
T4
T4T4
T4
 
T 8-gurjinder
T 8-gurjinderT 8-gurjinder
T 8-gurjinder
 
T10
T10T10
T10
 
Week 9.docx
Week 9.docxWeek 9.docx
Week 9.docx
 
T2
T2T2
T2
 
T1
T1T1
T1
 
Individual log file_3_shayan_.docx
Individual log file_3_shayan_.docxIndividual log file_3_shayan_.docx
Individual log file_3_shayan_.docx
 
Ict713 t320-t10-dl-08 dec2020
Ict713 t320-t10-dl-08 dec2020Ict713 t320-t10-dl-08 dec2020
Ict713 t320-t10-dl-08 dec2020
 
Ict713 t320-t7-dl-20 oct2020
Ict713 t320-t7-dl-20 oct2020Ict713 t320-t7-dl-20 oct2020
Ict713 t320-t7-dl-20 oct2020
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 

T9

  • 1. Tutorial 9 a) NewSQLisa class of relational database managementsystemsthatseektoprovide the scalabilityof NoSQLsystemsforonline transactionprocessing(OLTP) workloadswhile maintainingthe ACID guaranteesof a traditional database system.... NewSQLsystemsattempttoreconcile the conflicts. b) There are three definingpropertiesthatcan helpbreakdownthe term.Dubbedthe three Vs; volume,velocity,andvariety,these are keytounderstandinghow we canmeasure bigdataand just howvery different‘bigdata’istooldfashioneddata. Volume The most obviousone iswhere we’ll start.Bigdatais aboutvolume.Volumesof datathatcan reach unprecedentedheightsinfact.It’sestimatedthat2.5quintillionbytesof dataiscreatedeachday, and as a result,there will be 40zettabytesof data createdby2020 – whichhighlightsanincrease of 300 timesfrom2005. As a result,itisnow not uncommonforlarge companiestohave Terabytes – and evenPetabytes –of data instorage devicesandonservers.Thisdatahelpsto shape the future of a companyand itsactions,all while trackingprogress. Velocity The growth of data, and the resultingimportanceof it,haschangedthe way we see data.There once was a time whenwe didn’tsee the importance of datainthe corporate world,butwiththe change of howwe gatherit,we’ve come torelyon it dayto day. Velocityessentiallymeasureshow fastthe data iscomingin.Some data will come ininreal-time,whereasotherwill come infitsandstarts, sentto us inbatches.Andas not all platformswill experience the incomingdataatthe same pace, it’simportantnotto generalise,discount,orjumptoconclusionswithouthavingall the factsand figures. Variety Data was once collectedfromone place anddeliveredinone format.Once takingthe shape of database files - suchas, excel,csvandaccess - it isnow beingpresentedinnon-traditionalforms,like video,text,pdf,andgraphicsonsocial media,aswell asviatechsuch as wearable devices.Although
  • 2. thisdata isextremelyusefultous,itdoescreate more workand require more analytical skillsto decipherthisincomingdata,make itmanageable andallow ittowork. 1) Seta bigdata strategy At a highlevel,abigdata strategyisa plandesignedtohelpyouoversee andimprovethe wayyou acquire,store,manage,share anduse data withinandoutside of yourorganization.A bigdata strategysetsthe stage for businesssuccessamidanabundance of data.Whendevelopingastrategy, it’simportantto considerexisting –and future – businessandtechnologygoalsandinitiatives.This callsfor treatingbigdata like anyothervaluable businessassetratherthanjusta byproductof applications. Big Data Infographic Clickon the infographictolearnmore aboutbigdata. 2) Knowthe sourcesof bigdata Streamingdatacomesfrom the Internetof Things(IoT) andotherconnecteddevicesthatflow into IT systemsfromwearables,smartcars,medical devices,industrial equipmentandmore.Youcan analyze thisbigdata as itarrives,decidingwhichdatatokeepornot keep,andwhichneedsfurther analysis. Social mediadatastemsfrominteractionsonFacebook,YouTube,Instagram, etc.Thisincludesvast amountsof big data inthe form of images,videos,voice,textandsound –useful formarketing,sales and supportfunctions.Thisdataisofteninunstructuredorsemistructuredforms,soitposesa unique challenge forconsumptionandanalysis. Publiclyavailabledatacomesfrommassive amountsof opendatasourceslike the USgovernment’s data.gov,the CIA World Factbookor the EuropeanUnionOpenData Portal. Otherbigdata may come from data lakes,clouddatasources,suppliersandcustomers. 3) Access,manage and store bigdata Moderncomputingsystemsprovide the speed,powerandflexibilityneededto quicklyaccess massive amountsandtypesof bigdata. Alongwithreliable access,companiesalsoneedmethodsfor integratingthe data,ensuringdataquality,providingdatagovernance andstorage,andpreparing the data for analytics.Some datamaybe storedon-premisesinatraditional datawarehouse –but there are also flexible,low-costoptionsforstoringandhandlingbigdataviacloudsolutions,data lakesandHadoop.
  • 3. 4) Analyze bigdata Withhigh-performance technologieslike gridcomputingorin-memoryanalytics,organizationscan choose to use all theirbigdata for analyses.Anotherapproachistodetermine upfrontwhichdatais relevantbefore analyzingit.Eitherway,bigdataanalyticsishow companiesgainvalue andinsights fromdata. Increasingly,bigdatafeedstoday’sadvancedanalyticsendeavorssuchasartificial intelligence. 5) Make intelligent,data-drivendecisions Well-managed,trusteddataleadstotrustedanalyticsandtrusteddecisions.Tostaycompetitive, businessesneedto seize the full valueof bigdataand operate ina data-drivenway – making decisionsbasedonthe evidence presentedbybigdataratherthan gut instinct.The benefitsof being data-drivenare clear.Data-drivenorganizationsperformbetter,are operationallymore predictable and are more profitable. c) HDFS AssumptionandGoals I. Hardware failure Hardware failure isnomore exception;ithasbecome aregularterm.HDFS instance consistsof hundredsorthousandsof servermachines,eachof whichisstoringpart of the file system’sdata. There existahuge numberof componentsthatare verysusceptibletohardware failure.Thismeans that there are some componentsthatare alwaysnon-functional.Sothe core architectural goal of HDFS isquickand automaticfaultdetection/recovery. II.Streamingdata access HDFS applicationsneedstreamingaccesstotheirdatasets.HadoopHDFS ismainlydesignedfor batch processingratherthaninteractive use byusers.The force isonhighthroughputof data access rather thanlowlatencyof data access.It focusesonhow to retrieve dataatthe fastestpossible speedwhile analyzinglogs. III.Large datasets HDFS workswithlarge data sets.Instandard practices,a file inHDFSisof size rangingfromgigabytes to petabytes.The architecture of HDFSshouldbe designinsucha waythat itshouldbe bestfor storingand retrievinghuge amountsof data.HDFS shouldprovide highaggregate databandwidth
  • 4. and shouldbe able toscale up to hundredsof nodesona single cluster.Also,itshouldbe good enoughtodeal withtonsof millionsof filesonasingle instance. IV.Simple coherencymodel It workson a theoryof write-once-read-manyaccessmodelforfiles.Once the file iscreated,written, and closed,itshouldnotbe changed.Thisresolvesthe datacoherencyissuesandenableshigh throughputof data access.A MapReduce-basedapplicationorwebcrawlerapplicationperfectlyfits inthismodel.Asperapache notes,there isaplan to supportappendingwritestofilesinthe future. V.Moving computationischeaperthanmovingdata If an applicationdoesthe computationnearthe dataitoperateson,it ismuch more efficientthan done far of.Thisfact becomesstrongerwhile dealingwithlarge dataset.The mainadvantage of this isthat it increasesthe overall throughputof the system.Italsominimizesnetworkcongestion.The assumptionisthatit isbetterto move computationclosertodatainsteadof movingdatato computation. VI.Portabilityacrossheterogeneoushardware andsoftware platforms HDFS isdesignedwiththe portable propertysothatit shouldbe portable fromone platformto another.Thisenablesthe widespreadadoptionof HDFS.Itisthe bestplatformwhile dealingwitha large setof data.