SlideShare a Scribd company logo
1 of 41
Text ProcessingText Processing
•Comma Sepearated values (CSV)
•JavaScript Object Notation(JSON),
•Python and XML
CSVCSV
The csv module is useful for working
with data exported from
spreadsheets and databases into
text files formatted with fields and
records, commonly referred to as
comma-separated value (CSV) format
because commas are often used to
separate the fields in a record.
CSV is an short form for Comma Separated Values.
The CSV format is a very common data format
used by various programs and applications as a
format to export or import data.
Most spreadsheet and database applications offer
export and import facilities for CSV files.
CSV files generally store tables of data that
contain fields stored within records.
Each field in the record is separated by a
character within the CSV file and generally
new records start on a new line.  
This is what a CSV file looks like that has been
exported from Excel:
CSV files are simple, lacking many of the
features of an Excel spreadsheet. For
example, CSV files:
Don’t have types for their values—
everything is a string
Don’t have settings for font size or
color
Don’t have multiple worksheets
Can’t specify cell widths and heights
Can’t have merged cells
Can’t have images or charts
embedded in them
The advantage of CSV files is simplicity.
CSV files are widely supported by many
types of programs, can be viewed in text
editors (including IDLE’s file editor), and
are a straightforward way to represent
spreadsheet data.
The CSV format is exactly as advertised:
It’s just a text file of comma-
separated values.
Spread sheet and corresponding CSV fileSpread sheet and corresponding CSV file
CSV formatCSV format
As simple as that sounds, even CSV format is
not completely universal ,different apps have
small variations
Python provides a module to deal with these
variations called the csv module
This module allows you to read
spreadsheet info into your program
We load the module in the usual way using
import:
writerwriter
 import csv #import modules
 f=open("sanmple.csv","w+")
 a=["Name","Address","phone"],["ravi","amreli","12345"],
["raviraj","rajkot","132345"]
 w1=csv.writer(f) #create writer object
 w1.writerow(a) #write data
 print("data written")
 f.close() #close file
readerreader
import csv
#reading CSV data
f=open("sanmple.csv","r+")
r1=csv.reader(f)
for row in r1:
 print(row)
f.close()
QuotingQuoting
JavaScript Object Notation (JSON)JavaScript Object Notation (JSON)
JSON or JavaScript Object Notation is a lightweight
text-based open standard designed for human-
readable data interchange.
Conventions used by JSON are known to
programmers, which include C, C++, Java, Python,
Perl, etc.
JSON stands for JavaScript Object Notation.
The format was specified by Douglas Crockford.
It was designed for human-readable data
interchange.
It has been extended from the JavaScript scripting
language.
Characteristics of JSONCharacteristics of JSON
JSON is easy to read and write.
It is a lightweight text-based interchange
format.
JSON is language independent.
JSON is Not
Overly Complex
A “document” format
A markup language
A programming language
JSON like XML
Plain text formats
“Self-describing“ (human readable)
Hierarchical (Values can contain lists of objects
or values)
Lighter and faster than XML
JSON uses typed objects. All XML values are
type-less strings and must be parsed at runtime.
Less syntax, no semantics
Properties are immediately accessible to
JavaScript code
JSON Object Syntax
Unordered sets of name/value pairs
Begins with { (left brace)
Ends with } (right brace)
Each name is followed by : (colon)
Name/value pairs are separated by ,
(comma)
JSON example
employeeData = {
"employee_id": 1234567,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false
};
Arrays in JSON
An ordered collection of values
Begins with [ (left bracket)
Ends with ] (right bracket)
Name/value pairs are separated by ,
(comma)
employeeData = {
"employee_id": 1236937,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false,
"random_nums": [ 24,65,12,94 ]
};
Data Types: Object & Array
Objects: Unordered key/value pairs
wrapped in { }
Arrays: Ordered key/value pairs wrapped
in [ ]
How & When to use JSON
Transfer data to and from a server
Perform asynchronous data calls without
requiring a page refresh
Working with data stores
Compile and save form or user data for
local storage
JSONJSON
In Python, the json module provides an
API similar to convert in-memory Python
objects to a serialized representation
known as JavaScript Object Notation
(JSON) and vice-a-versa.
Encode Python objects as JSON
strings, and decode JSON strings
into Python objects
Encode Python objects as JSONEncode Python objects as JSON
stringsstrings
if you want to dump the JSON into a
file/socket or whatever, then you should
go for dump().
If you only need it as a string (for
printing, parsing or whatever) then use
dumps() (dump string)
 Options :
 The default value of skipkeys is False. If skipkeys is True, then dict keys
that are not of a basic type (str, int, float, bool, None) will be skipped
instead of raising a TypeError.
 The json module always produces str objects, not bytes objects.
Therefore, fp.write() must support str input.
 If ensure_ascii is True (the default), the output is guaranteed to have
all incoming non-ASCII characters escaped. If ensure_ascii is False,
these characters will be output as-is.
 The default value of check_circular is True. If check_circular is False,
then the circular reference check for container types will be skipped
and a circular reference will result in an OverflowError.
 The default value of allow_nan is True. If allow_nan is False, then it
will be a ValueError to serialize out of range float values (nan, inf, -inf)
in strict compliance with the JSON specification, instead of using the
JavaScript equivalents (NaN, Infinity, -Infinity).
 If indent is a non-negative integer or string, then JSON
array elements and object members will be pretty-
printed with that indent level. An indent level of 0,
negative, or "" will only insert newlines. None (the
default) selects the most compact representation. Using
a positive integer indent indents that many spaces per
level. If indent is a string (such as "t"), that string is
used to indent each level.
 Use (',', ': ') as default if indent is not None.
 The default value of sort_keys is False. If sort_keys is
True, then the output of dictionaries will be sorted by
key.
Decode JSON strings into PythonDecode JSON strings into Python
objectsobjects
Options :
 The default value of skipkeys is False. If skipkeys is True, then dict
keys that are not of a basic type (str, int, float, bool, None) will be
skipped instead of raising a TypeError.
 The json module always produces str objects, not bytes objects.
Therefore, fp.write() must support str input.
 If ensure_ascii is True (the default), the output is guaranteed to have
all incoming non-ASCII characters escaped. If ensure_ascii is False,
these characters will be output as-is.
 The default value of check_circular is True. If check_circular is False,
then the circular reference check for container types will be skipped
and a circular reference will result in an OverflowError.
 The default value of allow_nan is True. If allow_nan is False (default:
True), then it will be a ValueError to serialize out of range float values
(nan, inf, -inf) in strict compliance with the JSON specification, instead
of using the JavaScript equivalents (NaN, Infinity, -Infinity).
 If indent is a non-negative integer or string, then JSON
array elements and object members will be pretty-printed
with that indent level. An indent level of 0, negative, or ""
will only insert newlines. None (the default) selects the
most compact representation. Using a positive integer
indent indents that many spaces per level. If indent is a
string (such as "t"), that string is used to indent each level.
 Use (',', ': ') as default if indent is not None.
 default(obj) is a function that should return a serializable
version of obj or raise TypeError. The default simply
raises TypeError.
 The default value of sort_keys is False. If sort_keys is
True, then the output of dictionaries will be sorted by key.
Python and XMLPython and XML
XMLXML
The Extensible Markup Language (XML) is a
markup language much like HTML or
SGML (standard generalized markup
language).
This is recommended by the World Wide
Web Consortium and available as an open
standard.
XML is extremely useful for keeping track
of small to medium amounts of data
without requiring a SQL-based backbone.
XML Parser Architectures and APIsXML Parser Architectures and APIs
 A parser is a compiler or interpreter component that breaks
data into smaller elements for easy translation into another
language.
 The Python standard library provides a minimal but useful set of interfaces
to work with XML.
 The two most basic and broadly used APIs to XML data are the
SAX and DOM interfaces.
 Simple API for XML (SAX) :
 Here, you register callbacks for events of interest and then let the
parser proceed through the document.
 This is useful when your documents are large or you have
memory limitations, it parses the file as it reads it from disk and
the entire file is never stored in memory.
 Document Object Model (DOM) API :
 This is a World Wide Web Consortium recommendation wherein the
entire file is read into memory and stored in a hierarchical (tree-
based) form to represent all the features of an XML document.
SAX obviously cannot process information
as fast as DOM can when working with
large files.
On the other hand, using DOM exclusively
can really kill your resources, especially if
used on a lot of small files.
SAX is read-only, while DOM allows
changes to the XML file.
Since these two different APIs literally
complement each other, there is no reason
why you can not use them both for large
projects.
Parsing XML with SAX APIsParsing XML with SAX APIs
 SAX is a standard interface for event-driven XML parsing.
 Parsing XML with SAX generally requires you to create
your own ContentHandler by subclassing
xml.sax.ContentHandler.
 Your ContentHandler handles the particular tags and
attributes of your flavor(s) of XML.
 A ContentHandler object provides methods to handle
various parsing events.
 Its owning parser calls ContentHandler methods as it parses
the XML file.
 The methods startDocument and endDocument are
called at the start and the end of the XML file.
 The method characters(text) is passed character data
of the XML file via the parameter text.
The ContentHandler is called at the start
and end of each element.
If the parser is not in namespace mode,
the methods startElement(tag, attributes)
and endElement(tag) are called;
otherwise, the corresponding methods
startElementNS and endElementNS are
called.
Here, tag is the element tag, and attributes
is an Attributes object.
TheThe make_parsermake_parser MethodMethod
Following method creates a new parser
object and returns it.
The parser object created will be of the first
parser type the system finds.
xml.sax.make_parser( [parser_list] )
parser_list: The optional argument
consisting of a list of parsers to use which
must all implement the make_parser
method.

parseMethodparseMethod
Following method creates a SAX parser and
uses it to parse a document.
xml.sax.parse( xmlfile, contenthandler[,
errorhandler])
Here is the detail of the parameters:
xmlfile: This is the name of the XML file to read
from.
contenthandler: This must be a ContentHandler
object.
errorhandler: If specified, errorhandler must be a
SAX ErrorHandler object.
parseStringMethodparseStringMethod
There is one more method to create a SAX parser
and to parse the specified XML string.
xml.sax.parseString (xmlstring, contenthandler[, errorhandler])
Here is the detail of the parameters:
xmlstring: This is the name of the XML string to
read from.
contenthandler: This must be a ContentHandler
object.
errorhandler: If specified, errorhandler must be a
SAX ErrorHandler object.
Parsing XML with DOM APIsParsing XML with DOM APIs
 The Document Object Model (DOM) is a cross-language
API from the World Wide Web Consortium (W3C) for
accessing and modifying XML documents.
 The DOM is extremely useful for random-access
applications.
 SAX only allows you a view of one bit of the document at a
time. If you are looking at one SAX element, you have
no access to another.
 Here is the easiest way to quickly load an XML
document and to create a minidom object using the
xml.dom module.
 The minidom object provides a simple parser method
that quickly creates a DOM tree from the XML file.
 The sample phrase calls the parse( file [,parser] ) function
of the minidom object to parse the XML file
designated by file into a DOM tree object.

More Related Content

What's hot

Basic data types in python
Basic data types in pythonBasic data types in python
Basic data types in pythonsunilchute1
 
‘go-to’ general-purpose sequential collections - from Java To Scala
‘go-to’ general-purpose sequential collections -from Java To Scala‘go-to’ general-purpose sequential collections -from Java To Scala
‘go-to’ general-purpose sequential collections - from Java To ScalaPhilip Schwarz
 
Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1
Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1
Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1Philip Schwarz
 
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board ExamsC++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Examshishamrizvi
 
Chap1introppt2php(finally done)
Chap1introppt2php(finally done)Chap1introppt2php(finally done)
Chap1introppt2php(finally done)monikadeshmane
 
Ti1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: PolymorphismTi1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: PolymorphismEelco Visser
 
358 33 powerpoint-slides_7-structures_chapter-7
358 33 powerpoint-slides_7-structures_chapter-7358 33 powerpoint-slides_7-structures_chapter-7
358 33 powerpoint-slides_7-structures_chapter-7sumitbardhan
 
Data Structures in Python
Data Structures in PythonData Structures in Python
Data Structures in PythonDevashish Kumar
 
Xbase - Implementing Domain-Specific Languages for Java
Xbase - Implementing Domain-Specific Languages for JavaXbase - Implementing Domain-Specific Languages for Java
Xbase - Implementing Domain-Specific Languages for Javameysholdt
 
Tokens expressionsin C++
Tokens expressionsin C++Tokens expressionsin C++
Tokens expressionsin C++HalaiHansaika
 
Python data type
Python data typePython data type
Python data typeJaya Kumari
 

What's hot (20)

Basic data types in python
Basic data types in pythonBasic data types in python
Basic data types in python
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
 
C# String
C# StringC# String
C# String
 
Java script summary
Java script summaryJava script summary
Java script summary
 
‘go-to’ general-purpose sequential collections - from Java To Scala
‘go-to’ general-purpose sequential collections -from Java To Scala‘go-to’ general-purpose sequential collections -from Java To Scala
‘go-to’ general-purpose sequential collections - from Java To Scala
 
Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1
Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1
Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1
 
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board ExamsC++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
 
XML and XPath details
XML and XPath detailsXML and XPath details
XML and XPath details
 
Streams in Java 8
Streams in Java 8Streams in Java 8
Streams in Java 8
 
Php
PhpPhp
Php
 
Chap1introppt2php(finally done)
Chap1introppt2php(finally done)Chap1introppt2php(finally done)
Chap1introppt2php(finally done)
 
Ti1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: PolymorphismTi1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: Polymorphism
 
358 33 powerpoint-slides_7-structures_chapter-7
358 33 powerpoint-slides_7-structures_chapter-7358 33 powerpoint-slides_7-structures_chapter-7
358 33 powerpoint-slides_7-structures_chapter-7
 
Java 8 streams
Java 8 streamsJava 8 streams
Java 8 streams
 
Data Structures in Python
Data Structures in PythonData Structures in Python
Data Structures in Python
 
Xbase - Implementing Domain-Specific Languages for Java
Xbase - Implementing Domain-Specific Languages for JavaXbase - Implementing Domain-Specific Languages for Java
Xbase - Implementing Domain-Specific Languages for Java
 
Variables In Php 1
Variables In Php 1Variables In Php 1
Variables In Php 1
 
Tokens expressionsin C++
Tokens expressionsin C++Tokens expressionsin C++
Tokens expressionsin C++
 
Python data type
Python data typePython data type
Python data type
 
Python dictionary
Python   dictionaryPython   dictionary
Python dictionary
 

Similar to Text processing by Rj

Similar to Text processing by Rj (20)

json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
 
Javascript2839
Javascript2839Javascript2839
Javascript2839
 
Json
JsonJson
Json
 
Scala in Places API
Scala in Places APIScala in Places API
Scala in Places API
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
Jackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSVJackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSV
 
Basics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examplesBasics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examples
 
Python - Lecture 11
Python - Lecture 11Python - Lecture 11
Python - Lecture 11
 
JavaScript.pptx
JavaScript.pptxJavaScript.pptx
JavaScript.pptx
 
Json demo
Json demoJson demo
Json demo
 
javascript
javascript javascript
javascript
 
Esoft Metro Campus - Certificate in c / c++ programming
Esoft Metro Campus - Certificate in c / c++ programmingEsoft Metro Campus - Certificate in c / c++ programming
Esoft Metro Campus - Certificate in c / c++ programming
 
JSON - Quick Overview
JSON - Quick OverviewJSON - Quick Overview
JSON - Quick Overview
 
json.pptx
json.pptxjson.pptx
json.pptx
 
Json
JsonJson
Json
 
Java Script Introduction
Java Script IntroductionJava Script Introduction
Java Script Introduction
 
Unit 3
Unit 3Unit 3
Unit 3
 
JCConf 2021 - Java17: The Next LTS
JCConf 2021 - Java17: The Next LTSJCConf 2021 - Java17: The Next LTS
JCConf 2021 - Java17: The Next LTS
 
Ajax and JavaScript Bootcamp
Ajax and JavaScript BootcampAjax and JavaScript Bootcamp
Ajax and JavaScript Bootcamp
 

More from Shree M.L.Kakadiya MCA mahila college, Amreli

More from Shree M.L.Kakadiya MCA mahila college, Amreli (20)

Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Listeners and filters in servlet
Listeners and filters in servletListeners and filters in servlet
Listeners and filters in servlet
 
Servlet unit 2
Servlet unit 2 Servlet unit 2
Servlet unit 2
 
Servlet by Rj
Servlet by RjServlet by Rj
Servlet by Rj
 
Research paper on python by Rj
Research paper on python by RjResearch paper on python by Rj
Research paper on python by Rj
 
Networking in python by Rj
Networking in python by RjNetworking in python by Rj
Networking in python by Rj
 
Jsp in Servlet by Rj
Jsp in Servlet by RjJsp in Servlet by Rj
Jsp in Servlet by Rj
 
Motion capture by Rj
Motion capture by RjMotion capture by Rj
Motion capture by Rj
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Python and data analytics
Python and data analyticsPython and data analytics
Python and data analytics
 
Multithreading by rj
Multithreading by rjMultithreading by rj
Multithreading by rj
 
Django by rj
Django by rjDjango by rj
Django by rj
 
Database programming
Database programmingDatabase programming
Database programming
 
CGI by rj
CGI by rjCGI by rj
CGI by rj
 
Seminar on Project Management by Rj
Seminar on Project Management by RjSeminar on Project Management by Rj
Seminar on Project Management by Rj
 
Spring by rj
Spring by rjSpring by rj
Spring by rj
 
Python by Rj
Python by RjPython by Rj
Python by Rj
 
Leadership & Motivation
Leadership & MotivationLeadership & Motivation
Leadership & Motivation
 
Event handling
Event handlingEvent handling
Event handling
 
Layout manager
Layout managerLayout manager
Layout manager
 

Recently uploaded

Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital ManagementMBA Assignment Experts
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismDabee Kamal
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMELOISARIVERA8
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptxPoojaSen20
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint23600690
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi RajagopalEADTU
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...Nguyen Thanh Tu Collection
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...Gary Wood
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17Celine George
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽中 央社
 

Recently uploaded (20)

Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 

Text processing by Rj

  • 1. Text ProcessingText Processing •Comma Sepearated values (CSV) •JavaScript Object Notation(JSON), •Python and XML
  • 2. CSVCSV The csv module is useful for working with data exported from spreadsheets and databases into text files formatted with fields and records, commonly referred to as comma-separated value (CSV) format because commas are often used to separate the fields in a record.
  • 3. CSV is an short form for Comma Separated Values. The CSV format is a very common data format used by various programs and applications as a format to export or import data. Most spreadsheet and database applications offer export and import facilities for CSV files. CSV files generally store tables of data that contain fields stored within records. Each field in the record is separated by a character within the CSV file and generally new records start on a new line.   This is what a CSV file looks like that has been exported from Excel:
  • 4. CSV files are simple, lacking many of the features of an Excel spreadsheet. For example, CSV files: Don’t have types for their values— everything is a string Don’t have settings for font size or color Don’t have multiple worksheets Can’t specify cell widths and heights Can’t have merged cells Can’t have images or charts embedded in them
  • 5. The advantage of CSV files is simplicity. CSV files are widely supported by many types of programs, can be viewed in text editors (including IDLE’s file editor), and are a straightforward way to represent spreadsheet data. The CSV format is exactly as advertised: It’s just a text file of comma- separated values.
  • 6. Spread sheet and corresponding CSV fileSpread sheet and corresponding CSV file
  • 7. CSV formatCSV format As simple as that sounds, even CSV format is not completely universal ,different apps have small variations Python provides a module to deal with these variations called the csv module This module allows you to read spreadsheet info into your program We load the module in the usual way using import:
  • 8. writerwriter  import csv #import modules  f=open("sanmple.csv","w+")  a=["Name","Address","phone"],["ravi","amreli","12345"], ["raviraj","rajkot","132345"]  w1=csv.writer(f) #create writer object  w1.writerow(a) #write data  print("data written")  f.close() #close file
  • 9. readerreader import csv #reading CSV data f=open("sanmple.csv","r+") r1=csv.reader(f) for row in r1:  print(row) f.close()
  • 11. JavaScript Object Notation (JSON)JavaScript Object Notation (JSON) JSON or JavaScript Object Notation is a lightweight text-based open standard designed for human- readable data interchange. Conventions used by JSON are known to programmers, which include C, C++, Java, Python, Perl, etc. JSON stands for JavaScript Object Notation. The format was specified by Douglas Crockford. It was designed for human-readable data interchange. It has been extended from the JavaScript scripting language.
  • 12. Characteristics of JSONCharacteristics of JSON JSON is easy to read and write. It is a lightweight text-based interchange format. JSON is language independent.
  • 13. JSON is Not Overly Complex A “document” format A markup language A programming language
  • 14. JSON like XML Plain text formats “Self-describing“ (human readable) Hierarchical (Values can contain lists of objects or values) Lighter and faster than XML JSON uses typed objects. All XML values are type-less strings and must be parsed at runtime. Less syntax, no semantics Properties are immediately accessible to JavaScript code
  • 15. JSON Object Syntax Unordered sets of name/value pairs Begins with { (left brace) Ends with } (right brace) Each name is followed by : (colon) Name/value pairs are separated by , (comma)
  • 16. JSON example employeeData = { "employee_id": 1234567, "name": "Jeff Fox", "hire_date": "1/1/2013", "location": "Norwalk, CT", "consultant": false };
  • 17. Arrays in JSON An ordered collection of values Begins with [ (left bracket) Ends with ] (right bracket) Name/value pairs are separated by , (comma)
  • 18. employeeData = { "employee_id": 1236937, "name": "Jeff Fox", "hire_date": "1/1/2013", "location": "Norwalk, CT", "consultant": false, "random_nums": [ 24,65,12,94 ] };
  • 19. Data Types: Object & Array Objects: Unordered key/value pairs wrapped in { } Arrays: Ordered key/value pairs wrapped in [ ]
  • 20. How & When to use JSON Transfer data to and from a server Perform asynchronous data calls without requiring a page refresh Working with data stores Compile and save form or user data for local storage
  • 21.
  • 22. JSONJSON In Python, the json module provides an API similar to convert in-memory Python objects to a serialized representation known as JavaScript Object Notation (JSON) and vice-a-versa.
  • 23. Encode Python objects as JSON strings, and decode JSON strings into Python objects
  • 24. Encode Python objects as JSONEncode Python objects as JSON stringsstrings
  • 25. if you want to dump the JSON into a file/socket or whatever, then you should go for dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)
  • 26.
  • 27.  Options :  The default value of skipkeys is False. If skipkeys is True, then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError.  The json module always produces str objects, not bytes objects. Therefore, fp.write() must support str input.  If ensure_ascii is True (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is False, these characters will be output as-is.  The default value of check_circular is True. If check_circular is False, then the circular reference check for container types will be skipped and a circular reference will result in an OverflowError.  The default value of allow_nan is True. If allow_nan is False, then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance with the JSON specification, instead of using the JavaScript equivalents (NaN, Infinity, -Infinity).
  • 28.  If indent is a non-negative integer or string, then JSON array elements and object members will be pretty- printed with that indent level. An indent level of 0, negative, or "" will only insert newlines. None (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If indent is a string (such as "t"), that string is used to indent each level.  Use (',', ': ') as default if indent is not None.  The default value of sort_keys is False. If sort_keys is True, then the output of dictionaries will be sorted by key.
  • 29. Decode JSON strings into PythonDecode JSON strings into Python objectsobjects
  • 30. Options :  The default value of skipkeys is False. If skipkeys is True, then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError.  The json module always produces str objects, not bytes objects. Therefore, fp.write() must support str input.  If ensure_ascii is True (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is False, these characters will be output as-is.  The default value of check_circular is True. If check_circular is False, then the circular reference check for container types will be skipped and a circular reference will result in an OverflowError.  The default value of allow_nan is True. If allow_nan is False (default: True), then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance with the JSON specification, instead of using the JavaScript equivalents (NaN, Infinity, -Infinity).
  • 31.  If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or "" will only insert newlines. None (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If indent is a string (such as "t"), that string is used to indent each level.  Use (',', ': ') as default if indent is not None.  default(obj) is a function that should return a serializable version of obj or raise TypeError. The default simply raises TypeError.  The default value of sort_keys is False. If sort_keys is True, then the output of dictionaries will be sorted by key.
  • 33. XMLXML The Extensible Markup Language (XML) is a markup language much like HTML or SGML (standard generalized markup language). This is recommended by the World Wide Web Consortium and available as an open standard. XML is extremely useful for keeping track of small to medium amounts of data without requiring a SQL-based backbone.
  • 34. XML Parser Architectures and APIsXML Parser Architectures and APIs  A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language.  The Python standard library provides a minimal but useful set of interfaces to work with XML.  The two most basic and broadly used APIs to XML data are the SAX and DOM interfaces.  Simple API for XML (SAX) :  Here, you register callbacks for events of interest and then let the parser proceed through the document.  This is useful when your documents are large or you have memory limitations, it parses the file as it reads it from disk and the entire file is never stored in memory.  Document Object Model (DOM) API :  This is a World Wide Web Consortium recommendation wherein the entire file is read into memory and stored in a hierarchical (tree- based) form to represent all the features of an XML document.
  • 35. SAX obviously cannot process information as fast as DOM can when working with large files. On the other hand, using DOM exclusively can really kill your resources, especially if used on a lot of small files. SAX is read-only, while DOM allows changes to the XML file. Since these two different APIs literally complement each other, there is no reason why you can not use them both for large projects.
  • 36. Parsing XML with SAX APIsParsing XML with SAX APIs  SAX is a standard interface for event-driven XML parsing.  Parsing XML with SAX generally requires you to create your own ContentHandler by subclassing xml.sax.ContentHandler.  Your ContentHandler handles the particular tags and attributes of your flavor(s) of XML.  A ContentHandler object provides methods to handle various parsing events.  Its owning parser calls ContentHandler methods as it parses the XML file.  The methods startDocument and endDocument are called at the start and the end of the XML file.  The method characters(text) is passed character data of the XML file via the parameter text.
  • 37. The ContentHandler is called at the start and end of each element. If the parser is not in namespace mode, the methods startElement(tag, attributes) and endElement(tag) are called; otherwise, the corresponding methods startElementNS and endElementNS are called. Here, tag is the element tag, and attributes is an Attributes object.
  • 38. TheThe make_parsermake_parser MethodMethod Following method creates a new parser object and returns it. The parser object created will be of the first parser type the system finds. xml.sax.make_parser( [parser_list] ) parser_list: The optional argument consisting of a list of parsers to use which must all implement the make_parser method. 
  • 39. parseMethodparseMethod Following method creates a SAX parser and uses it to parse a document. xml.sax.parse( xmlfile, contenthandler[, errorhandler]) Here is the detail of the parameters: xmlfile: This is the name of the XML file to read from. contenthandler: This must be a ContentHandler object. errorhandler: If specified, errorhandler must be a SAX ErrorHandler object.
  • 40. parseStringMethodparseStringMethod There is one more method to create a SAX parser and to parse the specified XML string. xml.sax.parseString (xmlstring, contenthandler[, errorhandler]) Here is the detail of the parameters: xmlstring: This is the name of the XML string to read from. contenthandler: This must be a ContentHandler object. errorhandler: If specified, errorhandler must be a SAX ErrorHandler object.
  • 41. Parsing XML with DOM APIsParsing XML with DOM APIs  The Document Object Model (DOM) is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying XML documents.  The DOM is extremely useful for random-access applications.  SAX only allows you a view of one bit of the document at a time. If you are looking at one SAX element, you have no access to another.  Here is the easiest way to quickly load an XML document and to create a minidom object using the xml.dom module.  The minidom object provides a simple parser method that quickly creates a DOM tree from the XML file.  The sample phrase calls the parse( file [,parser] ) function of the minidom object to parse the XML file designated by file into a DOM tree object.