SlideShare a Scribd company logo
1 of 27
Download to read offline
BIG DATA ANALYTICS
Name : K.Kiruthika
Class : II M.Sc(CS)
Batch : 2017-2019
Incharge staff : M.FLORANCE DAYANA
Map Reduce application
Introduction :
Map Reduce is a programming modal as an
associated implementation for processing .
generating big data sets with
a parallel, distributed algorithm on a cluster.
 Map: The map function to the local data, and writes the
output to a temporary storage.
 Shuffle: worker nodes redistribute data based on the output
keys such that all data belong to one key is located on worker
node.
 Reduce: worker nodes process each group of output data, per
key, in parallel.
 MapReduce is as a 5-step parallel
and distributed computation:
Prepare the Map() input : The
"MapReduce system" designates Map
processors the input key value K1.
Run the user-provided Map()
code : Map() is run exactly once for
each K1 key value, generating output organized
by key values K2.
"Shuffle" the Map output to the
Reduce processors :The MapReduce
system designates assigns the K2 key value each
processer.
Run the user-provided Reduce()
code : Reduce() is run the program exactly once
for each K2 key value produced by the Map step.
Produce the final output : The
MapReduce system collects all the Reduce output,
and sorts it final outcome.
 DATA SERIALIZATION AND WORKING COMMON
SERIALIZATION FORMAT :
 A container is nothing but a data structure, or an object to
store data in.
 When we transfer data over a network the container is
converted into a byte stream.
 This refer to process.
Uses :
 A method of transferring data through the wires
(messaging).
 A method of storing data (in databases, on hard disk drives).
 A method of remote procedure calls, e.g., as in SOAP.
 A method for distributing objects, especially in component-
based software engineering such as COM, CORBA, etc.
 A method for detecting changes in time-varying data.
Drawback :
 Serialization breaks the opacity of an abstract data type.
 private implementation details.
 Serialize all data Private members may violate encapsulation.
 products, publishers of proprietary software their programs.
 serialization formats a trade secret.
 obfuscate or even encrypt the serialization of data.
 Therefore remote method call architectures such
as CORBA define their serialization formats in detail.
 Serialization formats :
 JSON is a lighter plain-text alternative .
 XMLcommonly used for client-server communication in
web applications.
 JSON is based on JavaScript syntax, but is supported in
other programming languages as well.
 YAML :
 powerful for serialization.
 "human friendly," and potentially more compact.
 These features include a notion of tagging data types.
 support for non-hierarchical data structures.
 Another human-readable serialization.
 Eg: format is property list format used
in NeXTSTEP, GNUstep, and macOS Cocoa.
 For large volume scientific datasets,
 such as satellite data and output
 numerical climate, weather, or ocean models, specific
binary serialization standards have been developed.
 e.g. HDF, netCDF and the older GRIB.
 BIG DATA SERIALIZATION FORMATS:
 Big Data serialization also refers to converting byte streams.
 But it has another goal which is schema control.
 data structure data can be validated on writing phase.
 It avoids to have some surprises data is read.
 Example: a field is missing or has bad type eg: (instead of
array).
 To resume, we can list following points to characterize
serialization in Big Data systems:
 splittability - easier to achieve splits on byte streams rather
than JSON or XML files.
 portability - schema can be consumed by different languages.
 versioning - flexibity to define fields with default values or
continue to use old schema version.
 data integrity - serialization schemas data corecteness. errors
can be detected earlier, when data is written.
 NoSQL solutions are very often related to the word
schemaless.
 Sometimes to maintenance or backward compatibility
problems.
 One of solutions to these issues in Big Data systems are
serialization frameworks.
 This purely theoretical, article to explain the role and the
importance of serialization in Big Data systems.
 Conclusion:
 Converting for bit stream.
 Json or xml formate in serialization data is splittable easier.
THANK YOU

More Related Content

What's hot

SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareSPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareMaria Stylianou
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?Spyros Eleftheriadis
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemHardik Patil
 
Hadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networksHadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networksHariniA7
 
Querying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using FunqlQuerying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using FunqlMongoDB
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)Sudarshan Mondal
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in sparkSubhasish Guha
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working setsJinxinTang
 
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)Beat Signer
 
Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Subhasish Guha
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Ajay Ohri
 
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoCLOUDIAN KK
 

What's hot (20)

SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareSPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Hadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networksHadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networks
 
Coma
ComaComa
Coma
 
Bt export
Bt exportBt export
Bt export
 
Querying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using FunqlQuerying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using Funql
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in spark
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working sets
 
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
 
Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016
 
Bhagaban Mallik
Bhagaban MallikBhagaban Mallik
Bhagaban Mallik
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1
 
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
 

Similar to Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college for women

Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebStefan Prutianu
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniJAXLondon2014
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Sameer Farooqui
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayJosef Adersberger
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsRavindra kumar
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz García
 
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsTYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsArti Parab Academics
 

Similar to Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college for women (20)

Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
CLOUD BIOINFORMATICS Part1
 CLOUD BIOINFORMATICS Part1 CLOUD BIOINFORMATICS Part1
CLOUD BIOINFORMATICS Part1
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
 
Cassandra
CassandraCassandra
Cassandra
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsTYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
E031201032036
E031201032036E031201032036
E031201032036
 
Big data
Big dataBig data
Big data
 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college for women

  • 1. BIG DATA ANALYTICS Name : K.Kiruthika Class : II M.Sc(CS) Batch : 2017-2019 Incharge staff : M.FLORANCE DAYANA
  • 2. Map Reduce application Introduction : Map Reduce is a programming modal as an associated implementation for processing . generating big data sets with a parallel, distributed algorithm on a cluster.
  • 3.
  • 4.  Map: The map function to the local data, and writes the output to a temporary storage.  Shuffle: worker nodes redistribute data based on the output keys such that all data belong to one key is located on worker node.  Reduce: worker nodes process each group of output data, per key, in parallel.
  • 5.
  • 6.  MapReduce is as a 5-step parallel and distributed computation: Prepare the Map() input : The "MapReduce system" designates Map processors the input key value K1. Run the user-provided Map() code : Map() is run exactly once for each K1 key value, generating output organized by key values K2.
  • 7. "Shuffle" the Map output to the Reduce processors :The MapReduce system designates assigns the K2 key value each processer. Run the user-provided Reduce() code : Reduce() is run the program exactly once for each K2 key value produced by the Map step. Produce the final output : The MapReduce system collects all the Reduce output, and sorts it final outcome.
  • 8.
  • 9.  DATA SERIALIZATION AND WORKING COMMON SERIALIZATION FORMAT :  A container is nothing but a data structure, or an object to store data in.  When we transfer data over a network the container is converted into a byte stream.  This refer to process.
  • 10.
  • 11. Uses :  A method of transferring data through the wires (messaging).  A method of storing data (in databases, on hard disk drives).  A method of remote procedure calls, e.g., as in SOAP.  A method for distributing objects, especially in component- based software engineering such as COM, CORBA, etc.  A method for detecting changes in time-varying data.
  • 12.
  • 13. Drawback :  Serialization breaks the opacity of an abstract data type.  private implementation details.  Serialize all data Private members may violate encapsulation.  products, publishers of proprietary software their programs.  serialization formats a trade secret.  obfuscate or even encrypt the serialization of data.  Therefore remote method call architectures such as CORBA define their serialization formats in detail.
  • 14.
  • 15.  Serialization formats :  JSON is a lighter plain-text alternative .  XMLcommonly used for client-server communication in web applications.  JSON is based on JavaScript syntax, but is supported in other programming languages as well.
  • 16.
  • 17.
  • 18.
  • 19.  YAML :  powerful for serialization.  "human friendly," and potentially more compact.  These features include a notion of tagging data types.  support for non-hierarchical data structures.
  • 20.
  • 21.  Another human-readable serialization.  Eg: format is property list format used in NeXTSTEP, GNUstep, and macOS Cocoa.  For large volume scientific datasets,  such as satellite data and output  numerical climate, weather, or ocean models, specific binary serialization standards have been developed.  e.g. HDF, netCDF and the older GRIB.
  • 22.  BIG DATA SERIALIZATION FORMATS:  Big Data serialization also refers to converting byte streams.  But it has another goal which is schema control.  data structure data can be validated on writing phase.  It avoids to have some surprises data is read.  Example: a field is missing or has bad type eg: (instead of array).
  • 23.
  • 24.  To resume, we can list following points to characterize serialization in Big Data systems:  splittability - easier to achieve splits on byte streams rather than JSON or XML files.  portability - schema can be consumed by different languages.  versioning - flexibity to define fields with default values or continue to use old schema version.  data integrity - serialization schemas data corecteness. errors can be detected earlier, when data is written.
  • 25.  NoSQL solutions are very often related to the word schemaless.  Sometimes to maintenance or backward compatibility problems.  One of solutions to these issues in Big Data systems are serialization frameworks.  This purely theoretical, article to explain the role and the importance of serialization in Big Data systems.
  • 26.  Conclusion:  Converting for bit stream.  Json or xml formate in serialization data is splittable easier.