SlideShare a Scribd company logo
1 of 42
Big Data Processing
System
Shima Jafari
Overview
● Evolution of data
● What is big data?
● Problems with Big Data
● Hadoop Distributed File System(HDFS)
● Solutions of Big Data problems
● MapReduce Framework
Evolution of Data
● Evolution of Technology
○ Internet of Things
○ Social media
○ Smart cars
Evolution of Data
● Evolution of Technology
○ Internet of Things
○ Social media
○ Smart cars
Evolution of Data
● Evolution of Technology
○ Internet of Things
○ Social media
○ Smart cars
Evolution of Data
● Evolution of Technology
○ Internet of Things
○ Social media
○ Smart cars
Notice
● The volume of data is increased exponentially
● Relational Databases can’t handle this format of Data
Evolution of Data
Evolution of Data
● Image
● Video
● Text
● ...
Evolution of Data
● Image
● Video
● Text
● ...
Unstructured
What is Big Data
"Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large
or complex to be dealt with by traditional data-processing application software.
What is Big Data
5 v’s of big data
Volume Variety Velocity Value Veracity
What is Big Data
● Volume
What is Big Data
● Volume
● Variety
○ Structured
○ Semi-Structured
○ Un-Structured
What is Big Data
● Volume
● Variety
○ Structured
○ Semi-Structured
○ Un-Structured
What is Big Data
● Volume
● Variety
● Velocity
What is Big Data
● Volume
● Variety
● Velocity
● Value
What is Big Data
● Volume
● Variety
● Velocity
● Value
● Veracity
Uncertainty and Inconsistencies in data
What is Big Data
Problems with Big Data
● Storing exponentially growing huge datasets
Problems with Big Data
● Processing data having complex structures
Organized data format
Data schema is fixed
Ex: RDBMS data,etc.
Partial organized data
Lacks formal structure of a
data model
Ex: XML & JSON files, etc.
Un-organized data
Unknown schema
Ex: multi-media files, etc.
Problems with Big Data
● Processing data faster
○ The data is growing as much faster rate than that of disk read/write speed
○ Bringing huge amount of data to computation unit becomes a bottleneck
Solution
Apache Hadoop is an open source framework for distributed computing to process large sets of wide
variety of data.
● HDFS
○ Storage
■ Allow to dump any kind of data across the cluster
● MapReduce
○ Processing
■ Allow parallel processing of the data stored in hdfs
● Yarn
○ Scheduling and resource allocation for the Hadoop System
Hadoop Distributed File System
● NameNode
○ Metadata
● DataNode
○ data
Hadoop Distributed File
System
● NameNode
○ Metadata
● DataNode
○ data
Yarn
● Resource Manager
○ Scheduler
○ Applications Manager
● Node Manager
○ Monitoring Resource
○ Reporting to the
Resource Manager/Scheduler
● Problems of Big Data
○ Storing exponentially growing huge datasets
○ Processing data having complex structures
○ Processing data faster
Solutions of Big Data Problems
Solutions of Big Data Problems
Problem1: storing exponentially growing huge datasets
Solution: HDFS
● Storage unit of Hadoop
● It is a Distributed File System
● Divide files (input data) into smaller chunks and stores it across the cluster
● Scalable as per requirement
Solutions of Big Data Problems
Problem1: storing exponentially growing huge datasets
Solution: HDFS
Solutions of Big Data Problems
Problem2: storing unstructured data
Solution: HDFS
● Allow to store any kind of data, be it: structured, semistructured or
unstructured
● Follow WORM(Write Once Read Many)
● No schema validation is done while dumping data
Solutions of Big Data Problems
Problem2: storing unstructured data
Solution: HDFS
Solutions of Big Data Problems
Problem2: processing data faster
Solution: HDFS
● Provides parallel proceccing of data present in HDFS
● Allow to process data locally i.e. each node works with a part of data which
is stored on it.
Solutions of Big Data Problems
Problem2: processing data faster
Solution: HDFS
Distributed Processing
● Map
○ Reading data from hdfs
● Reduce
○ Computation is done and
the result are stored
Map/Reduce
How Map/Reduce is used in
IOT
Problems of Map/Reduce
● It is a two step process
● Once data is processed through the map and reduce, it has to be stored
again
Solution: distributed in memory processing system with
Apache SPARK
References
● https://andreaskretz.com/2016/06/15/the-brutally-honest-truth-about-learning-big-data-the-right-way/
● https://www.youtube.com/watch?v=zez2Tv-bcXY&t=518s
To Be Continued

More Related Content

What's hot

Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your DataAlex Meadows
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehousesAlex Meadows
 
LDCache - a cache for linked data-driven web applications
LDCache - a cache for linked data-driven web applicationsLDCache - a cache for linked data-driven web applications
LDCache - a cache for linked data-driven web applicationsMetaSolutions AB
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentationgustavosouto
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Exposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOExposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOChristophe Guéret
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and ProcessingCRRC-Armenia
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !Christophe Guéret
 
NoSQL document oriented data access for .net systems with postgresql and marten
NoSQL document oriented data access for .net systems with postgresql and martenNoSQL document oriented data access for .net systems with postgresql and marten
NoSQL document oriented data access for .net systems with postgresql and martenBojan Veljanovski
 
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013Frauke Ziedorn
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbmsDaisy Joy
 
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage NightmareWebinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage NightmareStorage Switzerland
 
DataCite How To: Use the MDS
DataCite How To: Use the MDSDataCite How To: Use the MDS
DataCite How To: Use the MDSFrauke Ziedorn
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotJen Stirrup
 

What's hot (20)

Data Science as Scale
Data Science as ScaleData Science as Scale
Data Science as Scale
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
LDCache - a cache for linked data-driven web applications
LDCache - a cache for linked data-driven web applicationsLDCache - a cache for linked data-driven web applications
LDCache - a cache for linked data-driven web applications
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentation
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Exposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOExposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVO
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and Processing
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !
 
NoSQL document oriented data access for .net systems with postgresql and marten
NoSQL document oriented data access for .net systems with postgresql and martenNoSQL document oriented data access for .net systems with postgresql and marten
NoSQL document oriented data access for .net systems with postgresql and marten
 
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbms
 
Hadoop
HadoopHadoop
Hadoop
 
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage NightmareWebinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
 
DataCite How To: Use the MDS
DataCite How To: Use the MDSDataCite How To: Use the MDS
DataCite How To: Use the MDS
 
Big Data And Hadoop
Big Data And HadoopBig Data And Hadoop
Big Data And Hadoop
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivot
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

Similar to Big Data Processing System Overview

Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsNguyen Cao
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introductionyalla4u
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 

Similar to Big Data Processing System Overview (20)

Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop-2.6.0 Slides
Hadoop-2.6.0 SlidesHadoop-2.6.0 Slides
Hadoop-2.6.0 Slides
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Hadoop-2022.pptx
Hadoop-2022.pptxHadoop-2022.pptx
Hadoop-2022.pptx
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Big Data Processing System Overview

  • 2. Overview ● Evolution of data ● What is big data? ● Problems with Big Data ● Hadoop Distributed File System(HDFS) ● Solutions of Big Data problems ● MapReduce Framework
  • 3. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars
  • 4. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars
  • 5. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars
  • 6. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars
  • 7. Notice ● The volume of data is increased exponentially ● Relational Databases can’t handle this format of Data
  • 9. Evolution of Data ● Image ● Video ● Text ● ...
  • 10. Evolution of Data ● Image ● Video ● Text ● ... Unstructured
  • 11. What is Big Data "Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.
  • 12. What is Big Data 5 v’s of big data Volume Variety Velocity Value Veracity
  • 13. What is Big Data ● Volume
  • 14. What is Big Data ● Volume ● Variety ○ Structured ○ Semi-Structured ○ Un-Structured
  • 15. What is Big Data ● Volume ● Variety ○ Structured ○ Semi-Structured ○ Un-Structured
  • 16. What is Big Data ● Volume ● Variety ● Velocity
  • 17. What is Big Data ● Volume ● Variety ● Velocity ● Value
  • 18. What is Big Data ● Volume ● Variety ● Velocity ● Value ● Veracity Uncertainty and Inconsistencies in data
  • 19. What is Big Data
  • 20. Problems with Big Data ● Storing exponentially growing huge datasets
  • 21. Problems with Big Data ● Processing data having complex structures Organized data format Data schema is fixed Ex: RDBMS data,etc. Partial organized data Lacks formal structure of a data model Ex: XML & JSON files, etc. Un-organized data Unknown schema Ex: multi-media files, etc.
  • 22. Problems with Big Data ● Processing data faster ○ The data is growing as much faster rate than that of disk read/write speed ○ Bringing huge amount of data to computation unit becomes a bottleneck
  • 24. Apache Hadoop is an open source framework for distributed computing to process large sets of wide variety of data.
  • 25.
  • 26. ● HDFS ○ Storage ■ Allow to dump any kind of data across the cluster ● MapReduce ○ Processing ■ Allow parallel processing of the data stored in hdfs ● Yarn ○ Scheduling and resource allocation for the Hadoop System
  • 27. Hadoop Distributed File System ● NameNode ○ Metadata ● DataNode ○ data
  • 28. Hadoop Distributed File System ● NameNode ○ Metadata ● DataNode ○ data
  • 29. Yarn ● Resource Manager ○ Scheduler ○ Applications Manager ● Node Manager ○ Monitoring Resource ○ Reporting to the Resource Manager/Scheduler
  • 30. ● Problems of Big Data ○ Storing exponentially growing huge datasets ○ Processing data having complex structures ○ Processing data faster Solutions of Big Data Problems
  • 31. Solutions of Big Data Problems Problem1: storing exponentially growing huge datasets Solution: HDFS ● Storage unit of Hadoop ● It is a Distributed File System ● Divide files (input data) into smaller chunks and stores it across the cluster ● Scalable as per requirement
  • 32. Solutions of Big Data Problems Problem1: storing exponentially growing huge datasets Solution: HDFS
  • 33. Solutions of Big Data Problems Problem2: storing unstructured data Solution: HDFS ● Allow to store any kind of data, be it: structured, semistructured or unstructured ● Follow WORM(Write Once Read Many) ● No schema validation is done while dumping data
  • 34. Solutions of Big Data Problems Problem2: storing unstructured data Solution: HDFS
  • 35. Solutions of Big Data Problems Problem2: processing data faster Solution: HDFS ● Provides parallel proceccing of data present in HDFS ● Allow to process data locally i.e. each node works with a part of data which is stored on it.
  • 36. Solutions of Big Data Problems Problem2: processing data faster Solution: HDFS
  • 37. Distributed Processing ● Map ○ Reading data from hdfs ● Reduce ○ Computation is done and the result are stored
  • 39. How Map/Reduce is used in IOT
  • 40. Problems of Map/Reduce ● It is a two step process ● Once data is processed through the map and reduce, it has to be stored again Solution: distributed in memory processing system with Apache SPARK