Big Data Processing System Overview

•Download as PPTX, PDF•

1 like•72 views

The document discusses big data processing systems. It begins with an overview of big data and its evolution due to technologies like IoT, social media, and smart cars. This has led to an exponential increase in data volume and variety, including structured, semi-structured and unstructured data. Traditional databases cannot handle this type and size of data. The document then introduces Hadoop as an open source framework to process large, diverse datasets across clusters. It uses HDFS for storage and MapReduce for parallel processing of data stored in HDFS. Hadoop provides scalable solutions to the problems of storing huge, growing datasets and processing complex, diverse data faster.

Data & Analytics

Overview
● Evolution of data
● What is big data?
● Problems with Big Data
● Hadoop Distributed File System(HDFS)
● Solutions of Big Data problems
● MapReduce Framework

Evolution of Data
● Evolution of Technology
○ Internet of Things
○ Social media
○ Smart cars

Notice
● The volume of data is increased exponentially
● Relational Databases can’t handle this format of Data

Evolution of Data
● Image
● Video
● Text
● ...

Evolution of Data
● Image
● Video
● Text
● ...
Unstructured

What is Big Data
"Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large
or complex to be dealt with by traditional data-processing application software.

What is Big Data
5 v’s of big data
Volume Variety Velocity Value Veracity

What is Big Data
● Volume
● Variety
○ Structured
○ Semi-Structured
○ Un-Structured

What is Big Data
● Volume
● Variety
● Velocity

What is Big Data
● Volume
● Variety
● Velocity
● Value

What is Big Data
● Volume
● Variety
● Velocity
● Value
● Veracity
Uncertainty and Inconsistencies in data

Problems with Big Data
● Storing exponentially growing huge datasets

Problems with Big Data
● Processing data having complex structures
Organized data format
Data schema is fixed
Ex: RDBMS data,etc.
Partial organized data
Lacks formal structure of a
data model
Ex: XML & JSON files, etc.
Un-organized data
Unknown schema
Ex: multi-media files, etc.

Problems with Big Data
● Processing data faster
○ The data is growing as much faster rate than that of disk read/write speed
○ Bringing huge amount of data to computation unit becomes a bottleneck

Apache Hadoop is an open source framework for distributed computing to process large sets of wide
variety of data.

● HDFS
○ Storage
■ Allow to dump any kind of data across the cluster
● MapReduce
○ Processing
■ Allow parallel processing of the data stored in hdfs
● Yarn
○ Scheduling and resource allocation for the Hadoop System

Hadoop Distributed File System
● NameNode
○ Metadata
● DataNode
○ data

Yarn
● Resource Manager
○ Scheduler
○ Applications Manager
● Node Manager
○ Monitoring Resource
○ Reporting to the
Resource Manager/Scheduler

● Problems of Big Data
○ Storing exponentially growing huge datasets
○ Processing data having complex structures
○ Processing data faster
Solutions of Big Data Problems

Solutions of Big Data Problems
Problem1: storing exponentially growing huge datasets
Solution: HDFS
● Storage unit of Hadoop
● It is a Distributed File System
● Divide files (input data) into smaller chunks and stores it across the cluster
● Scalable as per requirement

Solutions of Big Data Problems
Problem1: storing exponentially growing huge datasets
Solution: HDFS

Solutions of Big Data Problems
Problem2: storing unstructured data
Solution: HDFS
● Allow to store any kind of data, be it: structured, semistructured or
unstructured
● Follow WORM(Write Once Read Many)
● No schema validation is done while dumping data

Solutions of Big Data Problems
Problem2: storing unstructured data
Solution: HDFS

Solutions of Big Data Problems
Problem2: processing data faster
Solution: HDFS
● Provides parallel proceccing of data present in HDFS
● Allow to process data locally i.e. each node works with a part of data which
is stored on it.

Solutions of Big Data Problems
Problem2: processing data faster
Solution: HDFS

Distributed Processing
● Map
○ Reading data from hdfs
● Reduce
○ Computation is done and
the result are stored

Problems of Map/Reduce
● It is a two step process
● Once data is processed through the map and reduce, it has to be stored
again
Solution: distributed in memory processing system with
Apache SPARK

References
● https://andreaskretz.com/2016/06/15/the-brutally-honest-truth-about-learning-big-data-the-right-way/
● https://www.youtube.com/watch?v=zez2Tv-bcXY&t=518s

What's hot

Data Science as ScaleConor B. Murphy

Graphing Your DataAlex Meadows

Building next generation data warehousesAlex Meadows

LDCache - a cache for linked data-driven web applicationsMetaSolutions AB

How Linked Data Can Speed Information DiscoveryAlex Meadows

Big Data & Social Analytics presentationgustavosouto

What is Big Data?CodePolitan

Exposing the data from NARCIS with VIVOChristophe Guéret

Data Archiving and ProcessingCRRC-Armenia

The future of Big Data toolingData Science Society

Let's downscale the semantic web !Christophe Guéret

NoSQL document oriented data access for .net systems with postgresql and martenBojan Veljanovski

DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013Frauke Ziedorn

Current trends in dbmsDaisy Joy

HadoopAarti Bedre

Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage NightmareStorage Switzerland

DataCite How To: Use the MDSFrauke Ziedorn

Big Data And HadoopAnkur Tripathi

Big Data Visualisation with Hadoop and PowerPivotJen Stirrup

Introduction to Big DataMd. Afif Al Mamun

What's hot (20)

Data Science as Scale

Graphing Your Data

Building next generation data warehouses

LDCache - a cache for linked data-driven web applications

How Linked Data Can Speed Information Discovery

Big Data & Social Analytics presentation

What is Big Data?

Exposing the data from NARCIS with VIVO

Data Archiving and Processing

The future of Big Data tooling

Let's downscale the semantic web !

NoSQL document oriented data access for .net systems with postgresql and marten

DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013

Current trends in dbms

Hadoop

Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare

DataCite How To: Use the MDS

Big Data And Hadoop

Big Data Visualisation with Hadoop and PowerPivot

Introduction to Big Data

Similar to Big Data Processing System Overview

Big data and hadoop overvewKunal Khanna

Hadoop-2.6.0 Slideskul prasad subedi

Introduction to Big Data Technologies & ApplicationsNguyen Cao

Hadoop-2022.pptxMurindanyiSudi1

Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari

BigData Hadoop Kumari Surabhi

Data Science Machine Lerning Bigdat.pptxPriyadarshini648418

Big Data - A brief introductionFrans van Noort

Hadoop introductionSubhas Kumar Ghosh

Hadoop introduction , Why and What is Hadoop ?sudhakara st

Chapter2.pdfWasyihunSema2

HadoopMallikarjuna G D

HadoopRittikaBaksi

Big data Hadoop presentation Shivanee garg

Hadoop introductionRabindra Nath Nandi

Big Data Introductionyalla4u

Unit-1 Introduction to Big Data.pptxAnkitChauhan817826

Big Data and HadoopMaulikLakhani

2. hadoop fundamentalsLokesh Ramaswamy

Hadoop and big dataSharad Pandey

Similar to Big Data Processing System Overview (20)

Big data and hadoop overvew

Hadoop-2.6.0 Slides

Introduction to Big Data Technologies & Applications

Hadoop-2022.pptx

Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays

BigData Hadoop

Data Science Machine Lerning Bigdat.pptx

Big Data - A brief introduction

Hadoop introduction

Hadoop introduction , Why and What is Hadoop ?

Chapter2.pdf

Hadoop

Big data Hadoop presentation

Hadoop introduction

Big Data Introduction

Unit-1 Introduction to Big Data.pptx

Big Data and Hadoop

2. hadoop fundamentals

Hadoop and big data

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

Industrialised data - the key to AI success.pdfLars Albertsson

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

Digi Khata Problem along complete plan.pptxTanveerAhmed817946

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Ukraine War presentation: KNOW THE BASICSAishani27

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...

Unveiling Insights: The Role of a Data Analyst

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

Industrialised data - the key to AI success.pdf

Decoding Loan Approval: Predictive Modeling in Action

Digi Khata Problem along complete plan.pptx

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Brighton SEO | April 2024 | Data Storytelling

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Predicting Employee Churn: A Data-Driven Approach Project Presentation

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Log Analysis using OSSEC sasoasasasas.pptx

Ukraine War presentation: KNOW THE BASICS

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

Schema on read is obsolete. Welcome metaprogramming..pdf

Call Girls In Mahipalpur O9654467111 Escorts Service

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

Big Data Processing System Overview

1. Big Data Processing System Shima Jafari

2. Overview ● Evolution of data ● What is big data? ● Problems with Big Data ● Hadoop Distributed File System(HDFS) ● Solutions of Big Data problems ● MapReduce Framework

3. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars

4. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars

5. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars

6. Evolution of Data ● Evolution of Technology ○ Internet of Things ○ Social media ○ Smart cars

7. Notice ● The volume of data is increased exponentially ● Relational Databases can’t handle this format of Data

8. Evolution of Data

9. Evolution of Data ● Image ● Video ● Text ● ...

10. Evolution of Data ● Image ● Video ● Text ● ... Unstructured

11. What is Big Data "Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

12. What is Big Data 5 v’s of big data Volume Variety Velocity Value Veracity

13. What is Big Data ● Volume

14. What is Big Data ● Volume ● Variety ○ Structured ○ Semi-Structured ○ Un-Structured

15. What is Big Data ● Volume ● Variety ○ Structured ○ Semi-Structured ○ Un-Structured

16. What is Big Data ● Volume ● Variety ● Velocity

17. What is Big Data ● Volume ● Variety ● Velocity ● Value

18. What is Big Data ● Volume ● Variety ● Velocity ● Value ● Veracity Uncertainty and Inconsistencies in data

19. What is Big Data

20. Problems with Big Data ● Storing exponentially growing huge datasets

21. Problems with Big Data ● Processing data having complex structures Organized data format Data schema is fixed Ex: RDBMS data,etc. Partial organized data Lacks formal structure of a data model Ex: XML & JSON files, etc. Un-organized data Unknown schema Ex: multi-media files, etc.

22. Problems with Big Data ● Processing data faster ○ The data is growing as much faster rate than that of disk read/write speed ○ Bringing huge amount of data to computation unit becomes a bottleneck

23. Solution

24. Apache Hadoop is an open source framework for distributed computing to process large sets of wide variety of data.

25.

26. ● HDFS ○ Storage ■ Allow to dump any kind of data across the cluster ● MapReduce ○ Processing ■ Allow parallel processing of the data stored in hdfs ● Yarn ○ Scheduling and resource allocation for the Hadoop System

27. Hadoop Distributed File System ● NameNode ○ Metadata ● DataNode ○ data

28. Hadoop Distributed File System ● NameNode ○ Metadata ● DataNode ○ data

29. Yarn ● Resource Manager ○ Scheduler ○ Applications Manager ● Node Manager ○ Monitoring Resource ○ Reporting to the Resource Manager/Scheduler

30. ● Problems of Big Data ○ Storing exponentially growing huge datasets ○ Processing data having complex structures ○ Processing data faster Solutions of Big Data Problems

31. Solutions of Big Data Problems Problem1: storing exponentially growing huge datasets Solution: HDFS ● Storage unit of Hadoop ● It is a Distributed File System ● Divide files (input data) into smaller chunks and stores it across the cluster ● Scalable as per requirement

32. Solutions of Big Data Problems Problem1: storing exponentially growing huge datasets Solution: HDFS

33. Solutions of Big Data Problems Problem2: storing unstructured data Solution: HDFS ● Allow to store any kind of data, be it: structured, semistructured or unstructured ● Follow WORM(Write Once Read Many) ● No schema validation is done while dumping data

34. Solutions of Big Data Problems Problem2: storing unstructured data Solution: HDFS

35. Solutions of Big Data Problems Problem2: processing data faster Solution: HDFS ● Provides parallel proceccing of data present in HDFS ● Allow to process data locally i.e. each node works with a part of data which is stored on it.

36. Solutions of Big Data Problems Problem2: processing data faster Solution: HDFS

37. Distributed Processing ● Map ○ Reading data from hdfs ● Reduce ○ Computation is done and the result are stored

38. Map/Reduce

39. How Map/Reduce is used in IOT

40. Problems of Map/Reduce ● It is a two step process ● Once data is processed through the map and reduce, it has to be stored again Solution: distributed in memory processing system with Apache SPARK

41. References ● https://andreaskretz.com/2016/06/15/the-brutally-honest-truth-about-learning-big-data-the-right-way/ ● https://www.youtube.com/watch?v=zez2Tv-bcXY&t=518s

42. To Be Continued

Big Data Processing System Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Processing System Overview

Similar to Big Data Processing System Overview (20)

Recently uploaded

Recently uploaded (20)

Big Data Processing System Overview