SlideShare a Scribd company logo
1 of 27
Download to read offline
BIG DATA ANALYTICS
K.Kiruthika
II M.Sc(CS)
Map Reduce application
Introduction :
Map Reduce is a programming modal as an
associated implementation for processing .
generating big data sets with
a parallel, distributed algorithm on a cluster.
 Map: The map function to the local data, and writes the
output to a temporary storage.
 Shuffle: worker nodes redistribute data based on the output
keys such that all data belong to one key is located on worker
node.
 Reduce: worker nodes process each group of output data, per
key, in parallel.
 MapReduce is as a 5-step parallel
and distributed computation:
Prepare the Map() input : The
"MapReduce system" designates Map
processors the input key value K1.
Run the user-provided Map()
code : Map() is run exactly once for
each K1 key value, generating output organized
by key values K2.
"Shuffle" the Map output to the
Reduce processors :The MapReduce
system designates assigns the K2 key value each
processer.
Run the user-provided Reduce()
code : Reduce() is run the program exactly once
for each K2 key value produced by the Map step.
Produce the final output : The
MapReduce system collects all the Reduce output,
and sorts it final outcome.
 DATA SERIALIZATION AND WORKING COMMON
SERIALIZATION FORMAT :
 A container is nothing but a data structure, or an object to
store data in.
 When we transfer data over a network the container is
converted into a byte stream.
 This refer to process.
Uses :
 A method of transferring data through the wires
(messaging).
 A method of storing data (in databases, on hard disk drives).
 A method of remote procedure calls, e.g., as in SOAP.
 A method for distributing objects, especially in component-
based software engineering such as COM, CORBA, etc.
 A method for detecting changes in time-varying data.
Drawback :
 Serialization breaks the opacity of an abstract data type.
 private implementation details.
 Serialize all data Private members may violate encapsulation.
 products, publishers of proprietary software their programs.
 serialization formats a trade secret.
 obfuscate or even encrypt the serialization of data.
 Therefore remote method call architectures such
as CORBA define their serialization formats in detail.
 Serialization formats :
 JSON is a lighter plain-text alternative .
 XMLcommonly used for client-server communication in
web applications.
 JSON is based on JavaScript syntax, but is supported in
other programming languages as well.
 YAML :
 powerful for serialization.
 "human friendly," and potentially more compact.
 These features include a notion of tagging data types.
 support for non-hierarchical data structures.
 Another human-readable serialization.
 Eg: format is property list format used
in NeXTSTEP, GNUstep, and macOS Cocoa.
 For large volume scientific datasets,
 such as satellite data and output
 numerical climate, weather, or ocean models, specific
binary serialization standards have been developed.
 e.g. HDF, netCDF and the older GRIB.
 BIG DATA SERIALIZATION FORMATS:
 Big Data serialization also refers to converting byte streams.
 But it has another goal which is schema control.
 data structure data can be validated on writing phase.
 It avoids to have some surprises data is read.
 Example: a field is missing or has bad type eg: (instead of
array).
 To resume, we can list following points to characterize
serialization in Big Data systems:
 splittability - easier to achieve splits on byte streams rather
than JSON or XML files.
 portability - schema can be consumed by different languages.
 versioning - flexibity to define fields with default values or
continue to use old schema version.
 data integrity - serialization schemas data corecteness. errors
can be detected earlier, when data is written.
 NoSQL solutions are very often related to the word
schemaless.
 Sometimes to maintenance or backward compatibility
problems.
 One of solutions to these issues in Big Data systems are
serialization frameworks.
 This purely theoretical, article to explain the role and the
importance of serialization in Big Data systems.
 Conclusion:
 Converting for bit stream.
 Json or xml formate in serialization data is splittable easier.
THANK YOU

More Related Content

What's hot

Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemHardik Patil
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?Spyros Eleftheriadis
 
An Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed DatasetsAlessandro Menabò
 
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)Beat Signer
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)Sudarshan Mondal
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-llachudhivi
 
Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)Rushdi Shams
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in sparkSubhasish Guha
 
Xs path navigation on xml schemas made easy
Xs path navigation on xml schemas made easyXs path navigation on xml schemas made easy
Xs path navigation on xml schemas made easyShakas Technologies
 
Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Subhasish Guha
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)Beat Signer
 

What's hot (20)

Discover Database
Discover DatabaseDiscover Database
Discover Database
 
Bt export
Bt exportBt export
Bt export
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Coma
ComaComa
Coma
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?
 
Spark
SparkSpark
Spark
 
An Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF Graphs
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)
 
Bhagaban Mallik
Bhagaban MallikBhagaban Mallik
Bhagaban Mallik
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in spark
 
Xs path navigation on xml schemas made easy
Xs path navigation on xml schemas made easyXs path navigation on xml schemas made easy
Xs path navigation on xml schemas made easy
 
Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016
 
Dbms quiz
Dbms quiz Dbms quiz
Dbms quiz
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
Storage Management - Lecture 8 - Introduction to Databases (1007156ANR)
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 

Similar to Big Data MapReduce and Serialization Formats

Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebStefan Prutianu
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Sameer Farooqui
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniJAXLondon2014
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz García
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3RojaT4
 
5. 19. Database Migration between various Applications Over Network (JAVA)
5. 19. Database Migration between various Applications Over Network (JAVA)5. 19. Database Migration between various Applications Over Network (JAVA)
5. 19. Database Migration between various Applications Over Network (JAVA)Ghazala Syed
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsRavindra kumar
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 

Similar to Big Data MapReduce and Serialization Formats (20)

Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
Cassandra
CassandraCassandra
Cassandra
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
5. 19. Database Migration between various Applications Over Network (JAVA)
5. 19. Database Migration between various Applications Over Network (JAVA)5. 19. Database Migration between various Applications Over Network (JAVA)
5. 19. Database Migration between various Applications Over Network (JAVA)
 
CLOUD BIOINFORMATICS Part1
 CLOUD BIOINFORMATICS Part1 CLOUD BIOINFORMATICS Part1
CLOUD BIOINFORMATICS Part1
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
 

Recently uploaded

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Big Data MapReduce and Serialization Formats

  • 2. Map Reduce application Introduction : Map Reduce is a programming modal as an associated implementation for processing . generating big data sets with a parallel, distributed algorithm on a cluster.
  • 3.
  • 4.  Map: The map function to the local data, and writes the output to a temporary storage.  Shuffle: worker nodes redistribute data based on the output keys such that all data belong to one key is located on worker node.  Reduce: worker nodes process each group of output data, per key, in parallel.
  • 5.
  • 6.  MapReduce is as a 5-step parallel and distributed computation: Prepare the Map() input : The "MapReduce system" designates Map processors the input key value K1. Run the user-provided Map() code : Map() is run exactly once for each K1 key value, generating output organized by key values K2.
  • 7. "Shuffle" the Map output to the Reduce processors :The MapReduce system designates assigns the K2 key value each processer. Run the user-provided Reduce() code : Reduce() is run the program exactly once for each K2 key value produced by the Map step. Produce the final output : The MapReduce system collects all the Reduce output, and sorts it final outcome.
  • 8.
  • 9.  DATA SERIALIZATION AND WORKING COMMON SERIALIZATION FORMAT :  A container is nothing but a data structure, or an object to store data in.  When we transfer data over a network the container is converted into a byte stream.  This refer to process.
  • 10.
  • 11. Uses :  A method of transferring data through the wires (messaging).  A method of storing data (in databases, on hard disk drives).  A method of remote procedure calls, e.g., as in SOAP.  A method for distributing objects, especially in component- based software engineering such as COM, CORBA, etc.  A method for detecting changes in time-varying data.
  • 12.
  • 13. Drawback :  Serialization breaks the opacity of an abstract data type.  private implementation details.  Serialize all data Private members may violate encapsulation.  products, publishers of proprietary software their programs.  serialization formats a trade secret.  obfuscate or even encrypt the serialization of data.  Therefore remote method call architectures such as CORBA define their serialization formats in detail.
  • 14.
  • 15.  Serialization formats :  JSON is a lighter plain-text alternative .  XMLcommonly used for client-server communication in web applications.  JSON is based on JavaScript syntax, but is supported in other programming languages as well.
  • 16.
  • 17.
  • 18.
  • 19.  YAML :  powerful for serialization.  "human friendly," and potentially more compact.  These features include a notion of tagging data types.  support for non-hierarchical data structures.
  • 20.
  • 21.  Another human-readable serialization.  Eg: format is property list format used in NeXTSTEP, GNUstep, and macOS Cocoa.  For large volume scientific datasets,  such as satellite data and output  numerical climate, weather, or ocean models, specific binary serialization standards have been developed.  e.g. HDF, netCDF and the older GRIB.
  • 22.  BIG DATA SERIALIZATION FORMATS:  Big Data serialization also refers to converting byte streams.  But it has another goal which is schema control.  data structure data can be validated on writing phase.  It avoids to have some surprises data is read.  Example: a field is missing or has bad type eg: (instead of array).
  • 23.
  • 24.  To resume, we can list following points to characterize serialization in Big Data systems:  splittability - easier to achieve splits on byte streams rather than JSON or XML files.  portability - schema can be consumed by different languages.  versioning - flexibity to define fields with default values or continue to use old schema version.  data integrity - serialization schemas data corecteness. errors can be detected earlier, when data is written.
  • 25.  NoSQL solutions are very often related to the word schemaless.  Sometimes to maintenance or backward compatibility problems.  One of solutions to these issues in Big Data systems are serialization frameworks.  This purely theoretical, article to explain the role and the importance of serialization in Big Data systems.
  • 26.  Conclusion:  Converting for bit stream.  Json or xml formate in serialization data is splittable easier.