DATA SERIALIZATION IN BIG DATAANALYSIS
S.SUBHALAKSHMI,
II M.SC(CS),
NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE,
THENI.
CONTENT:
Serialization
Uses of serialization
Drawbacks
Serialization formats
Programming language support
SERIALIZATION:
Serialization is the process of translating data
structures or object state into a format that can be stored.
This process of serializing an object is also called
marshalling an object.
The opposite operation, extracting a data structure
from a series of bytes, is unmarshalling.
They are sequences of bytes in several ways:
i) send it to another process
ii) send it to the clipboard, to be browsed or
used by another application.
iii) send it to another machine
iv)send it to file on disk
USES OF SERIALIZATION:
A method of transferring data through the wires
(messaging).
A method of storing data (in databases, on hard disk
drives).
A method of remote procedure calls, e.g., as in
SOAP.
A method for detecting changes in time-varying
data.
The serialization process includes a step called
unswizzling or pointer unswizzling.
The deserialization process includes an inverse step
called pointer swizzling.
DRAWBACKS:
Serialization breaks the opacity of an abstract data
type by potentially exposing private implementation details.
Trivial implementations which serialize all data
members may violate encapsulation.
Many institutions, such as archives and libraries,
attempt to future proof their backup archives.
SERIALIZATION FORMATS:
The Xerox Network Systems Courier technology in the
early 1980s influenced the first widely adopted standard.
Sun Microsystems published the External Data
Representation (XDR) in 1987.
XML was used to produce a human readable text-based
encoding.
Binary XML had been proposed as a compromise which
was not readable by plain-text editors.
In the 2000s, XML was often used for asynchronous
transfer of structured data between client and server in Ajax web
applications.
JSON is a lighter plain-text alternative to XML which is
also commonly used for client-server communication in web
applications.
YAML, is similar to JSON and includes features
that make it more powerful for serialization, more "human
friendly," and potentially more compact.
For large volume scientific datasets, such as satellite
data and output of numerical climate, weather, or ocean
models, specific binary serialization standards have been
developed, e.g. HDF, netCDF and the older GRIB.
PROGRAMMING LANGUAGE SUPPORT:
Several object-oriented programming languages directly
support object serialization.
The languages which do so include Ruby, Smalltalk,
Python, PHP, Objective-C, Delphi, Java, and the .NET family of
languages.
There are also libraries available that add serialization
support to languages that lack native support for it.
CFML:
CFML allows data structures to be serialized to
WDDX.
OCAML:
OCaml's standard library provides marshalling
through the Marshal module its documentation.
PERL:
Several Perl modules available from CPAN provide
serialization mechanisms, including Storable , JSON::XS
and Freeze Thaw.
DELPHI:
Delphi provides a built-in mechanism for serialization
of components which is fully integrated with its IDE.
C and C++:
C and C++ do not provide serialization as
any sort of high-level construct, but both languages support
writing any of the built-in data types, as well as plain old
data structs, as binary data.
SWIFT:
The Swift standard library provides two protocols,
Encodable and Decodable.
JAVASCRIPT:
JavaScript has included the built-in JSON object and
its methods.
JAVA:
Java provides automatic serialization which requires
that the object be marked by implementing the
java.io.Serializable.
.NETFRAMEWORK:
.NET Framework has several serializers designed by
Microsoft.
PYTHON:
The core general serialization mechanism is the
pickle standard library module.
PHP:
PHP originally implemented serialization through the
built-in serialize() and unserialize() functions.
R:
R has the function dput which writes an ASCII text
representation of an R object to a file or connection.
REBOL:
REBOL will serialize to file (save/all) or to a string!
(mold/all).
RUBY:
Ruby includes the standard module Marshal.
SMALLTALK:
In general, non-recursive and non-sharing objects can
be stored and retrieved in a human readable form using the
storeOn:/readFrom: protocol.
LISP:
Generally a Lisp data structure can be serialized with
the functions "read" and "print".
HASKELL:
In Haskell, serialization is supported for types that
are members of the Read and Show type classes.
WINDOWS POWERSHELL:
Windows PowerShell implements serialization
through the built-in cmdlet Export-CliXML.
JULIA:
Julia implements serialization through the
serialize() / deserialize() modules.
Big data

Big data

  • 1.
    DATA SERIALIZATION INBIG DATAANALYSIS S.SUBHALAKSHMI, II M.SC(CS), NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE, THENI.
  • 2.
  • 3.
    SERIALIZATION: Serialization is theprocess of translating data structures or object state into a format that can be stored. This process of serializing an object is also called marshalling an object. The opposite operation, extracting a data structure from a series of bytes, is unmarshalling.
  • 4.
    They are sequencesof bytes in several ways: i) send it to another process ii) send it to the clipboard, to be browsed or used by another application. iii) send it to another machine iv)send it to file on disk
  • 5.
    USES OF SERIALIZATION: Amethod of transferring data through the wires (messaging). A method of storing data (in databases, on hard disk drives). A method of remote procedure calls, e.g., as in SOAP.
  • 6.
    A method fordetecting changes in time-varying data. The serialization process includes a step called unswizzling or pointer unswizzling. The deserialization process includes an inverse step called pointer swizzling.
  • 7.
    DRAWBACKS: Serialization breaks theopacity of an abstract data type by potentially exposing private implementation details. Trivial implementations which serialize all data members may violate encapsulation. Many institutions, such as archives and libraries, attempt to future proof their backup archives.
  • 8.
    SERIALIZATION FORMATS: The XeroxNetwork Systems Courier technology in the early 1980s influenced the first widely adopted standard. Sun Microsystems published the External Data Representation (XDR) in 1987. XML was used to produce a human readable text-based encoding.
  • 9.
    Binary XML hadbeen proposed as a compromise which was not readable by plain-text editors. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in Ajax web applications. JSON is a lighter plain-text alternative to XML which is also commonly used for client-server communication in web applications.
  • 10.
    YAML, is similarto JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF, netCDF and the older GRIB.
  • 11.
    PROGRAMMING LANGUAGE SUPPORT: Severalobject-oriented programming languages directly support object serialization. The languages which do so include Ruby, Smalltalk, Python, PHP, Objective-C, Delphi, Java, and the .NET family of languages. There are also libraries available that add serialization support to languages that lack native support for it.
  • 12.
    CFML: CFML allows datastructures to be serialized to WDDX. OCAML: OCaml's standard library provides marshalling through the Marshal module its documentation.
  • 13.
    PERL: Several Perl modulesavailable from CPAN provide serialization mechanisms, including Storable , JSON::XS and Freeze Thaw. DELPHI: Delphi provides a built-in mechanism for serialization of components which is fully integrated with its IDE.
  • 14.
    C and C++: Cand C++ do not provide serialization as any sort of high-level construct, but both languages support writing any of the built-in data types, as well as plain old data structs, as binary data.
  • 15.
    SWIFT: The Swift standardlibrary provides two protocols, Encodable and Decodable. JAVASCRIPT: JavaScript has included the built-in JSON object and its methods.
  • 16.
    JAVA: Java provides automaticserialization which requires that the object be marked by implementing the java.io.Serializable. .NETFRAMEWORK: .NET Framework has several serializers designed by Microsoft.
  • 17.
    PYTHON: The core generalserialization mechanism is the pickle standard library module. PHP: PHP originally implemented serialization through the built-in serialize() and unserialize() functions.
  • 18.
    R: R has thefunction dput which writes an ASCII text representation of an R object to a file or connection. REBOL: REBOL will serialize to file (save/all) or to a string! (mold/all).
  • 19.
    RUBY: Ruby includes thestandard module Marshal. SMALLTALK: In general, non-recursive and non-sharing objects can be stored and retrieved in a human readable form using the storeOn:/readFrom: protocol.
  • 20.
    LISP: Generally a Lispdata structure can be serialized with the functions "read" and "print". HASKELL: In Haskell, serialization is supported for types that are members of the Read and Show type classes.
  • 21.
    WINDOWS POWERSHELL: Windows PowerShellimplements serialization through the built-in cmdlet Export-CliXML. JULIA: Julia implements serialization through the serialize() / deserialize() modules.