Python Module of the Week
                 Release 1.91




               Doug Hellmann




                     May 31, ...
CONTENTS



1   Data Persistence and Exchange                                                                             ...
7    File Formats                                                                                               123
     7...
14.3 pipes – Unix shell command pipeline templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
   14.4 p...
iv
Python Module of the Week, Release 1.91


About Python Module of the Week

PyMOTW is a series of blog posts written by Dou...
Python Module of the Week, Release 1.91


      DOUG HELLMANN DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, IN-
 ...
CHAPTER

                                                                                                                O...
Python Module of the Week, Release 1.91


All access to the database is through the Python DBI 2.0 API, by default, as no ...
CHAPTER

                                                                                                                T...
Python Module of the Week, Release 1.91


LookupError

Base class for errors raised when something can’t be found.


Envir...
Python Module of the Week, Release 1.91



class NoAttributes(object):
    pass

o = NoAttributes()
print o.attribute


$ ...
Python Module of the Week, Release 1.91


FloatingPointError

Raised by floating point operations that result in errors, wh...
Python Module of the Week, Release 1.91


   1. If a module does not exist.

import module_does_not_exist


$ python excep...
Python Module of the Week, Release 1.91


KeyboardInterrupt

A KeyboardInterrupt occurs whenever the user presses Ctrl-C (...
Python Module of the Week, Release 1.91



1 3
(error, discarding existing list)
2 1
2 2
2 3
(error, discarding existing l...
Python Module of the Week, Release 1.91



  File quot;exceptions_NotImplementedError.pyquot;, line 18, in do_something
  ...
Python Module of the Week, Release 1.91



$ python exceptions_OverflowError.py
Regular integer: (maxint=2147483647)
No ov...
Python Module of the Week, Release 1.91


RuntimeError

A RuntimeError exception is used when no other more specific except...
Python Module of the Week, Release 1.91


SystemExit

When sys.exit() is called, it raises SystemExit instead of exiting i...
Python Module of the Week, Release 1.91


ValueError

A ValueError is used when a function receives a value that has the r...
CHAPTER

                                                                                                         THREE


...
Python Module of the Week, Release 1.91


3.1.2 Differ Example

Reproducing output similar to the diff command line tool i...
Python Module of the Week, Release 1.91



22: + adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo
23: ...
Python Module of the Week, Release 1.91



 Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer
-eu lacus ac...
Python Module of the Week, Release 1.91



$ python difflib_html.py

     <table class=quot;diffquot; id=quot;difflib_chg_...
Python Module of the Week, Release 1.91


3.1.6 SequenceMatcher

SequenceMatcher, which implements the comparison algorith...
Python Module of the Week, Release 1.91



     if n.startswith(’_’):
         continue
     v = getattr(string, n)
     i...
Python Module of the Week, Release 1.91



import string

leet = string.maketrans(’abegiloprstz’, ’463611092572’)

s = ’Th...
Python Module of the Week, Release 1.91


One key difference between templates and standard string interpolation is that t...
Python Module of the Week, Release 1.91


Let’s look at the default pattern:

import string

t = string.Template(’$var’)
p...
Python Module of the Week, Release 1.91


See Also:

string Standard library documentation for this module.
PEP 292 Simple...
Python Module of the Week, Release 1.91


The application we are building at work includes a shell scripting interface in ...
Python Module of the Week, Release 1.91


The packed value is converted to a sequence of hex bytes for printing, since som...
Python Module of the Week, Release 1.91



Format string       :   @ I 2s f for native, native
Uses                :   12 ...
Python Module of the Week, Release 1.91



print ’After   :’, binascii.hexlify(a)
print ’Unpacked:’, s.unpack_from(a, 0)

...
Python Module of the Week, Release 1.91



import textwrap
from textwrap_example import sample_text

print ’No dedent:n’
p...
Python Module of the Week, Release 1.91


3.5.4 Combining Dedent and Fill

Next, let’s see what happens if we take the ded...
Python Module of the Week, Release 1.91


This makes it relatively easy to produce a hanging indent, where the first line i...
CHAPTER

                                                                                                                F...
Python Module of the Week, Release 1.91



a = array.array(’i’, xrange(5))
print ’Initial :’, a

a.extend(xrange(5))
print...
Python Module of the Week, Release 1.91


4.1.4 Alternate Byte Ordering

If the data in the array is not in the native byt...
Python Module of the Week, Release 1.91


4.2.1 Times

Time values are represented with the time class. Times have attribu...
Python Module of the Week, Release 1.91


4.2.2 Dates

Basic date values are represented with the date class. Instances ha...
Python Module of the Week, Release 1.91


The resolution for dates is whole days.

$ python datetime_date_minmax.py
Earlie...
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
PyMOTW-1
Upcoming SlideShare
Loading in...5
×

PyMOTW-1

4,875

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,875
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
72
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "PyMOTW-1"

  1. 1. Python Module of the Week Release 1.91 Doug Hellmann May 31, 2009
  2. 2. CONTENTS 1 Data Persistence and Exchange 3 1.1 Serializing Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Storing Serialized Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Data Exchange Through Standard Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Builtin Objects 5 2.1 exceptions – Builtin error classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 String Services 17 3.1 difflib – Compute differences between sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 string – Working with text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 StringIO and cStringIO – Work with text buffers using file-like API . . . . . . . . . . . . . . . . . . 27 3.4 struct – Working with Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.5 textwrap – Formatting text paragraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Data Types 35 4.1 array – Sequence of fixed-type data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 datetime – Date/time value manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 calendar – Work with dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4 collections – Container data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.5 heapq – In-place heap sort algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.6 bisect – Maintain lists in sorted order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.7 sched – Generic event scheduler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.8 Queue – A thread-safe FIFO implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.9 weakref – Garbage-collectable references to objects . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.10 copy – Duplicate objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.11 pprint – Pretty-print data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5 Numeric and Mathematical Modules 81 5.1 itertools – Iterator functions for efficient looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 functools – Tools for making decorators and other function wrappers. . . . . . . . . . . . . . . . . . 89 5.3 operator – Functional interface to built-in operators. . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6 Internet Data Handling 103 6.1 base64 – Encode binary data into ASCII characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 json – JavaScript Object Notation Serializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 mailbox – Access and manipulate email archives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.4 mhlib – Work with MH mailboxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 i
  3. 3. 7 File Formats 123 7.1 csv – Comma-separated value files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.2 ConfigParser – Work with configuration files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8 Cryptographic Services 131 8.1 hashlib – Cryptographic hashes and message digests . . . . . . . . . . . . . . . . . . . . . . . . . . 131 8.2 hmac – Cryptographic signature and verification of messages. . . . . . . . . . . . . . . . . . . . . . 134 9 File and Directory Access 139 9.1 os.path – Platform-independent manipulation of file names. . . . . . . . . . . . . . . . . . . . . . . 139 9.2 fileinput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 9.3 filecmp – Compare files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 9.4 tempfile – Create temporary filesystem resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.5 glob – Filename pattern matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.6 fnmatch – Compare filenames against Unix-style glob patterns. . . . . . . . . . . . . . . . . . . . . 158 9.7 linecache – Read text files efficiently . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.8 shutil – High-level file operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.9 dircache – Cache directory listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 10 Data Compression and Archiving 171 10.1 bz2 – bzip2 compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 10.2 gzip – Read and write GNU zip files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 10.3 tarfile – Tar archive access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 10.4 zipfile – Read and write ZIP archive files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 10.5 zlib – Low-level access to GNU zlib compression library . . . . . . . . . . . . . . . . . . . . . . . . 198 11 Data Persistence 207 11.1 anydbm – Access to DBM-style databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 11.2 dbhash – DBM-style API for the BSD database library . . . . . . . . . . . . . . . . . . . . . . . . . 209 11.3 dbm – Simple database interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 11.4 dumbdbm – Portable DBM Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 11.5 gdbm – GNU’s version of the dbm library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 11.6 pickle and cPickle – Python object serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 11.7 shelve – Persistent storage of arbitrary Python objects . . . . . . . . . . . . . . . . . . . . . . . . . 216 11.8 whichdb – Identify DBM-style database formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 12 Generic Operating System Services 221 12.1 os – Portable access to operating system specific features. . . . . . . . . . . . . . . . . . . . . . . . 221 12.2 time – Functions for manipulating clock time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 12.3 optparse – Command line option parser to replace getopt. . . . . . . . . . . . . . . . . . . . . . . . 242 12.4 getopt – Command line option parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 12.5 logging – Report status, error, and informational messages. . . . . . . . . . . . . . . . . . . . . . . . 251 12.6 getpass – Prompt the user for a password without echoing. . . . . . . . . . . . . . . . . . . . . . . . 254 12.7 platform – Access system hardware, OS, and interpreter version information. . . . . . . . . . . . . . 256 13 Optional Operating System Services 261 13.1 threading – Manage concurrent threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 13.2 mmap – Memory-map files instead of reading the contents directly. . . . . . . . . . . . . . . . . . . 277 13.3 multiprocessing – Manage processes like threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 13.4 readline – Interface to the GNU readline library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 13.5 rlcompleter – Adds tab-completion to the interactive interpreter . . . . . . . . . . . . . . . . . . . . 310 14 Unix-specific Services 311 14.1 commands – Run external shell commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 14.2 grp – Unix Group Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 ii
  4. 4. 14.3 pipes – Unix shell command pipeline templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 14.4 pwd – Unix Password Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 15 Interprocess Communication and Networking 325 15.1 asynchat – Asynchronous protocol handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 15.2 asyncore – Asynchronous I/O handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 15.3 signal – Receive notification of asynchronous system events . . . . . . . . . . . . . . . . . . . . . . 343 15.4 subprocess – Spawn and communicate with additional processes. . . . . . . . . . . . . . . . . . . . 348 16 Internet Protocols and Support 355 16.1 BaseHTTPServer – base classes for implementing web servers . . . . . . . . . . . . . . . . . . . . . 355 16.2 Cookie – HTTP Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 16.3 imaplib - IMAP4 client library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 16.4 SimpleXMLRPCServer – Implements an XML-RPC server. . . . . . . . . . . . . . . . . . . . . . . 378 16.5 smtpd – Sample SMTP Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 16.6 smtplib – Simple Mail Transfer Protocol client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 16.7 SocketServer – Creating network servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 16.8 urllib – simple interface for network resource access . . . . . . . . . . . . . . . . . . . . . . . . . . 401 16.9 urlparse – Split URL into component pieces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 16.10 uuid – Universally unique identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 16.11 webbrowser – Displays web pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 16.12 xmlrpclib – Client-side library for XML-RPC communication . . . . . . . . . . . . . . . . . . . . . 416 17 Internationalization 425 17.1 locale – POSIX cultural localization API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 18 Program Frameworks 431 18.1 cmd – Create line-oriented command processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 18.2 shlex – Lexical analysis of shell-style syntaxes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 19 Development Tools 449 19.1 unittest – Automated testing framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 20 Debugging and Profiling 457 20.1 profile, cProfile, and pstats – Performance analysis of Python programs. . . . . . . . . . . . . . . . . 457 20.2 timeit – Time the execution of small bits of Python code. . . . . . . . . . . . . . . . . . . . . . . . . 462 20.3 trace – Follow Python statements as they are executed . . . . . . . . . . . . . . . . . . . . . . . . . 468 21 Python Runtime Services 475 21.1 atexit – Call functions when a program is closing down . . . . . . . . . . . . . . . . . . . . . . . . . 475 21.2 contextlib – Context manager utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 21.3 inspect – Inspect live objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 21.4 traceback – Extract, format, and print exceptions and stack traces. . . . . . . . . . . . . . . . . . . . 491 21.5 warnings – Non-fatal alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 22 Importing Modules 503 22.1 imp – Interface to module import mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 22.2 pkgutil – Extend the module search path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 22.3 zipimport – Load Python code from inside ZIP archives . . . . . . . . . . . . . . . . . . . . . . . . 512 23 Miscelaneous 517 23.1 EasyDialogs – Carbon dialogs for Mac OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Module Index 531 Index 533 iii
  5. 5. iv
  6. 6. Python Module of the Week, Release 1.91 About Python Module of the Week PyMOTW is a series of blog posts written by Doug Hellmann. It was started as a way to build the habit of writing something on a regular basis. The focus of the series is building a set of example code for the modules in the Python standard library. Complete documentation for the standard library can be found on the Python web site at http://docs.python.org/library/contents.html. Tools The source text for PyMOTW is reStructuredText and the HTML and PDF output are created using Sphinx. Subscribe As new articles are written, they are posted to my blog. Updates are available by RSS and email. See the project home page for details (http://www.doughellmann.com/PyMOTW/). Translations and Other Versions Ernesto Rico Schmidt provides a Spanish translation that follows the English version posts. Ernesto is in Bolivia, and is translating these examples as a way to contribute to the members of the Bolivian Free Software community who use Python. The full list of articles available in Spanish can be found at http://denklab.org/articles/category/pymotw/, and of course there is an RSS feed. Gerard Flanagan is working on a “Python compendium” called The Hazel Tree. He is converting a collection of old and new of Python-related reference material into reStructuredText and then building a single searchable repository from the results. I am very pleased to have PyMOTW included with works from authors like Mark Pilgrim, Fredrik Lundh, Andrew Kuchling, and a growing list of others. Other Contributors Thank you to John Benediktsson for the original HTML-to-reST conversion. Copyright All of the prose from the Python Module of the Week is licensed under a Creative Commons Attribution, Non- commercial, Share-alike 3.0 license. You are free to share and create derivative works from it. If you post the material online, you must give attribution and link to the PyMOTW home page (http://www.doughellmann.com/PyMOTW/). You may not use this work for commercial purposes. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. The source code included here is copyright Doug Hellmann and licensed under the BSD license. Copyright Doug Hellmann, All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Doug Hellmann not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. CONTENTS 1
  7. 7. Python Module of the Week, Release 1.91 DOUG HELLMANN DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, IN- CLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL DOUG HELLMANN BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. This section of the PyMOTW guide includes feature-based introductions to several modules in the standard library, organized by what your goal might be. Each article may include cross-references to several modules from different parts of the library, and show how they relate to one another. 2 CONTENTS
  8. 8. CHAPTER ONE DATA PERSISTENCE AND EXCHANGE Python provides several modules for storing data. There are basically two aspects to persistence: converting the in-memory object back and forth into a format for saving it, and working with the storage of the converted data. 1.1 Serializing Objects Python includes two modules capable of converting objects into a transmittable or storable format (serializing): pickle and json. It is most common to use pickle, since there is a fast C implementation and it is integrated with some of the other standard library modules that actually store the serialized data, such as shelve. Web-based applications may want to examine json, however, since it integrates better with some of the existing web service storage applications. 1.2 Storing Serialized Objects Once the in-memory object is converted to a storable format, the next step is to decide how to store the data. A simple flat-file with serialized objects written one after the other works for data that does not need to be indexed in any way. But Python includes a collection of modules for storing key-value pairs in a simple database using one of the DBM format variants. The simplest interface to take advantage of the DBM format is provided by shelve. Simply open the shelve file, and access it through a dictionary-like API. Objects saved to the shelve are automatically pickled and saved without any extra work on your part. One drawback of shelve is that with the default interface you can’t guarantee which DBM format will be used. That won’t matter if your application doesn’t need to share the database files between hosts with different libraries, but if that is needed you can use one of the classes in the module to ensure a specific format is selected (Specific Shelf Types). If you’re going to be passing a lot of data around via JSON anyway, using json and anydbm can provide another persistence mechanism. Since the DBM database keys and values must be strings, however, the objects won’t be automatically re-created when you access the value in the database. 1.3 Relational Databases The excellent sqlite3 in-process relational database is available with most Python distributions. It stores its database in memory or in a local file, and all access is from within the same process, so there is no network lag. The compact nature of sqlite3 makes it especially well suited for embedding in desktop applications or develop- ment versions of web apps. 3
  9. 9. Python Module of the Week, Release 1.91 All access to the database is through the Python DBI 2.0 API, by default, as no object relational mapper (ORM) is included. The most popular general purpose ORM is SQLAlchemy, but others such as Django’s native ORM layer also support SQLite. SQLAlchemy is easy to install and set up, but if your objects aren’t very complicated and you are worried about overhead, you may want to use the DBI interface directly. 1.4 Data Exchange Through Standard Formats Although not usually considered a true persistence format csv, or comma-separated-value, files can be an effective way to migrate data between applications. Most spreadsheet programs and databases support both export and import using CSV, so dumping data to a CSV file is frequently the simplest way to move data out of your application and into an analysis tool. 4 Chapter 1. Data Persistence and Exchange
  10. 10. CHAPTER TWO BUILTIN OBJECTS 2.1 exceptions – Builtin error classes Purpose The exceptions module defines the builtin errors used throughout the standard library and by the interpreter. Python Version 1.5 and later 2.1.1 Description In the past, Python has supported simple string messages as exceptions as well as classes. Since 1.5, all of the standard library modules use classes for exceptions. Starting with Python 2.5, string exceptions result in a DeprecationWarning, and support for string exceptions will be removed in the future. 2.1.2 Base Classes The exception classes are defined in a hierarchy, described in the standard library documentation. In addition to the obvious organizational benefits, exception inheritance is useful because related exceptions can be caught by catching their base class. In most cases, these base classes are not intended to be raised directly. BaseException Base class for all exceptions. Implements logic for creating a string representation of the exception using str() from the arguments passed to the constructor. Exception Base class for exceptions that do not result in quitting the running application. All user-defined exceptions should use Exception as a base class. StandardError Base class for builtin exceptions used in the standard library. ArithmeticError Base class for math-related errors. 5
  11. 11. Python Module of the Week, Release 1.91 LookupError Base class for errors raised when something can’t be found. EnvironmentError Base class for errors that come from outside of Python (the operating system, filesystem, etc.). 2.1.3 Raised Exceptions AssertionError An AssertionError is raised by a failed assert statement. assert False, ’The assertion failed’ $ python exceptions_AssertionError_assert.py Traceback (most recent call last): File quot;exceptions_AssertionError_assert.pyquot;, line 12, in <module> assert False, ’The assertion failed’ AssertionError: The assertion failed It is also used in the unittest module in methods like failIf(). import unittest class AssertionExample(unittest.TestCase): def test(self): self.failUnless(False) unittest.main() $ python exceptions_AssertionError_unittest.py F ====================================================================== FAIL: test (__main__.AssertionExample) ---------------------------------------------------------------------- Traceback (most recent call last): File quot;exceptions_AssertionError_unittest.pyquot;, line 17, in test self.failUnless(False) AssertionError ---------------------------------------------------------------------- Ran 1 test in 0.000s FAILED (failures=1) AttributeError When an attribute reference or assignment fails, AttributeError is raised. For example, when trying to reference an attribute that does not exist: 6 Chapter 2. Builtin Objects
  12. 12. Python Module of the Week, Release 1.91 class NoAttributes(object): pass o = NoAttributes() print o.attribute $ python exceptions_AttributeError.py Traceback (most recent call last): File quot;exceptions_AttributeError.pyquot;, line 16, in <module> print o.attribute AttributeError: ’NoAttributes’ object has no attribute ’attribute’ Or when trying to modify a read-only attribute: class MyClass(object): @property def attribute(self): return ’This is the attribute value’ o = MyClass() print o.attribute o.attribute = ’New value’ $ python exceptions_AttributeError_assignment.py This is the attribute value Traceback (most recent call last): File quot;exceptions_AttributeError_assignment.pyquot;, line 20, in <module> o.attribute = ’New value’ AttributeError: can’t set attribute EOFError An EOFError is raised when a builtin function like input() or raw_input() do not read any data before en- countering the end of their input stream. The file methods like read() return an empty string at the end of the file. while True: data = raw_input(’prompt:’) print ’READ:’, data $ echo hello | python PyMOTW/exceptions/exceptions_EOFError.py prompt:READ: hello prompt:Traceback (most recent call last): File quot;PyMOTW/exceptions/exceptions_EOFError.pyquot;, line 13, in <module> data = raw_input(’prompt:’) EOFError: EOF when reading a line 2.1. exceptions – Builtin error classes 7
  13. 13. Python Module of the Week, Release 1.91 FloatingPointError Raised by floating point operations that result in errors, when floating point exception control (fpectl) is turned on. Enabling fpectl requires an interpreter compiled with the --with-fpectl flag. Using fpectl is discouraged in the stdlib docs. import math import fpectl print ’Control off:’, math.exp(1000) fpectl.turnon_sigfpe() print ’Control on:’, math.exp(1000) GeneratorExit Raised inside a generator the generator’s close() method is called. def my_generator(): try: for i in range(5): print ’Yielding’, i yield i except GeneratorExit: print ’Exiting early’ g = my_generator() print g.next() g.close() $ python exceptions_GeneratorExit.py Yielding 0 0 Exiting early IOError Raised when input or output fails, for example if a disk fills up or an input file does not exist. f = open(’/does/not/exist’, ’r’) $ python exceptions_IOError.py Traceback (most recent call last): File quot;exceptions_IOError.pyquot;, line 12, in <module> f = open(’/does/not/exist’, ’r’) IOError: [Errno 2] No such file or directory: ’/does/not/exist’ ImportError Raised when a module, or member of a module, cannot be imported. There are a few conditions where an ImportError might be raised. 8 Chapter 2. Builtin Objects
  14. 14. Python Module of the Week, Release 1.91 1. If a module does not exist. import module_does_not_exist $ python exceptions_ImportError_nomodule.py Traceback (most recent call last): File quot;exceptions_ImportError_nomodule.pyquot;, line 12, in <module> import module_does_not_exist ImportError: No module named module_does_not_exist 1. If from X import Y is used and Y cannot be found inside the module X, an ImportError is raised. from exceptions import MadeUpName $ python exceptions_ImportError_missingname.py Traceback (most recent call last): File quot;exceptions_ImportError_missingname.pyquot;, line 12, in <module> from exceptions import MadeUpName ImportError: cannot import name MadeUpName IndexError An IndexError is raised when a sequence reference is out of range. my_seq = [ 0, 1, 2 ] print my_seq[3] $ python exceptions_IndexError.py Traceback (most recent call last): File quot;exceptions_IndexError.pyquot;, line 13, in <module> print my_seq[3] IndexError: list index out of range KeyError Similarly, a KeyError is raised when a value is not found as a key of a dictionary. d = { ’a’:1, ’b’:2 } print d[’c’] $ python exceptions_KeyError.py Traceback (most recent call last): File quot;exceptions_KeyError.pyquot;, line 13, in <module> print d[’c’] KeyError: ’c’ 2.1. exceptions – Builtin error classes 9
  15. 15. Python Module of the Week, Release 1.91 KeyboardInterrupt A KeyboardInterrupt occurs whenever the user presses Ctrl-C (or Delete) to stop a running program. Unlike most of the other exceptions, KeyboardInterrupt inherits directly from BaseException to avoid being caught by global exception handlers that catch Exception. try: print ’Press Return or Ctrl-C:’, ignored = raw_input() except Exception, err: print ’Caught exception:’, err except KeyboardInterrupt, err: print ’Caught KeyboardInterrupt’ else: print ’No exception’ Pressing Ctrl-C at the prompt causes a KeyboardInterrupt exception. $ python exceptions_KeyboardInterrupt.py Press Return or Ctrl-C: ^CCaught KeyboardInterrupt MemoryError If your program runs out of memory and it is possible to recover (by deleting some objects, for example), a Memory- Error is raised. import itertools # Try to create a MemoryError by allocating a lot of memory l = [] for i in range(3): try: for j in itertools.count(1): print i, j l.append(’*’ * (2**30)) except MemoryError: print ’(error, discarding existing list)’ l = [] $ python exceptions_MemoryError.py python(49670) malloc: *** mmap(size=1073745920) failed (error code=12) *** error: can’t allocate region *** set a breakpoint in malloc_error_break to debug python(49670) malloc: *** mmap(size=1073745920) failed (error code=12) *** error: can’t allocate region *** set a breakpoint in malloc_error_break to debug python(49670) malloc: *** mmap(size=1073745920) failed (error code=12) *** error: can’t allocate region *** set a breakpoint in malloc_error_break to debug 0 1 0 2 0 3 (error, discarding existing list) 1 1 1 2 10 Chapter 2. Builtin Objects
  16. 16. Python Module of the Week, Release 1.91 1 3 (error, discarding existing list) 2 1 2 2 2 3 (error, discarding existing list) NameError NameErrors are raised when your code refers to a name that does not exist in the current scope. For example, an unqualified variable name. def func(): print unknown_name func() $ python exceptions_NameError.py Traceback (most recent call last): File quot;exceptions_NameError.pyquot;, line 15, in <module> func() File quot;exceptions_NameError.pyquot;, line 13, in func print unknown_name NameError: global name ’unknown_name’ is not defined NotImplementedError User-defined base classes can raise NotImplementedError to indicate that a method or behavior needs to be defined by a subclass, simulating an interface. class BaseClass(object): quot;quot;quot;Defines the interfacequot;quot;quot; def __init__(self): super(BaseClass, self).__init__() def do_something(self): quot;quot;quot;The interface, not implementedquot;quot;quot; raise NotImplementedError(self.__class__.__name__ + ’.do_something’) class SubClass(BaseClass): quot;quot;quot;Implementes the interfacequot;quot;quot; def do_something(self): quot;quot;quot;really does somethingquot;quot;quot; print self.__class__.__name__ + ’ doing something!’ SubClass().do_something() BaseClass().do_something() $ python exceptions_NotImplementedError.py SubClass doing something! Traceback (most recent call last): File quot;exceptions_NotImplementedError.pyquot;, line 27, in <module> BaseClass().do_something() 2.1. exceptions – Builtin error classes 11
  17. 17. Python Module of the Week, Release 1.91 File quot;exceptions_NotImplementedError.pyquot;, line 18, in do_something raise NotImplementedError(self.__class__.__name__ + ’.do_something’) NotImplementedError: BaseClass.do_something OSError OSError serves as the error class for the os module, and is raised when an error comes back from an os-specific function. import os for i in range(10): print i, os.ttyname(i) $ python exceptions_OSError.py 0 /dev/ttys000 1 Traceback (most recent call last): File quot;exceptions_OSError.pyquot;, line 15, in <module> print i, os.ttyname(i) OSError: [Errno 25] Inappropriate ioctl for device OverflowError When an arithmetic operation exceeds the limits of the variable type, an OverflowError is raise. Long integers allocate more space as values grow, so they end up raising MemoryError. Floating point exception handling is not standardized, so floats are not checked. Regular integers are converted to long values as needed. import sys print ’Regular integer: (maxint=%s)’ % sys.maxint try: i = sys.maxint * 3 print ’No overflow for ’, type(i), ’i =’, i except OverflowError, err: print ’Overflowed at ’, i, err print print ’Long integer:’ for i in range(0, 100, 10): print ’%2d’ % i, 2L ** i print print ’Floating point values:’ try: f = 2.0**i for i in range(100): print i, f f = f ** 2 except OverflowError, err: print ’Overflowed after ’, f, err 12 Chapter 2. Builtin Objects
  18. 18. Python Module of the Week, Release 1.91 $ python exceptions_OverflowError.py Regular integer: (maxint=2147483647) No overflow for <type ’long’> i = 6442450941 Long integer: 0 1 10 1024 20 1048576 30 1073741824 40 1099511627776 50 1125899906842624 60 1152921504606846976 70 1180591620717411303424 80 1208925819614629174706176 90 1237940039285380274899124224 Floating point values: 0 1.23794003929e+27 1 1.53249554087e+54 2 2.34854258277e+108 3 5.5156522631e+216 Overflowed after 5.5156522631e+216 (34, ’Result too large’) ReferenceError When a weakref proxy is used to access an object that has already been garbage collected, a ReferenceError occurs. import gc import weakref class ExpensiveObject(object): def __init__(self, name): self.name = name def __del__(self): print ’(Deleting %s)’ % self obj = ExpensiveObject(’obj’) p = weakref.proxy(obj) print ’BEFORE:’, p.name obj = None print ’AFTER:’, p.name $ python exceptions_ReferenceError.py BEFORE: obj (Deleting <__main__.ExpensiveObject object at 0xcd170>) AFTER: Traceback (most recent call last): File quot;exceptions_ReferenceError.pyquot;, line 26, in <module> print ’AFTER:’, p.name ReferenceError: weakly-referenced object no longer exists 2.1. exceptions – Builtin error classes 13
  19. 19. Python Module of the Week, Release 1.91 RuntimeError A RuntimeError exception is used when no other more specific exception applies. The interpreter does not raise this exception itself very often, but some user code does. StopIteration When an iterator is done, it’s next() method raises StopIteration. This exception is not considered an error. l=[0,1,2] i=iter(l) print i print i.next() print i.next() print i.next() print i.next() $ python exceptions_StopIteration.py <listiterator object at 0x87db0> 0 1 2 Traceback (most recent call last): File quot;exceptions_StopIteration.pyquot;, line 19, in <module> print i.next() StopIteration SyntaxError A SyntaxError occurs any time the parser finds source code it does not understand. This can be while importing a module, invoking exec, or calling eval(). Attributes of the exception can be used to find exactly what part of the input text caused the exception. try: print eval(’five times three’) except SyntaxError, err: print ’Syntax error %s (%s-%s): %s’ % (err.filename, err.lineno, err.offset, err.text) print err $ python exceptions_SyntaxError.py Syntax error <string> (1-10): five times three invalid syntax (<string>, line 1) SystemError When an error occurs in the interpreter itself and there is some chance of continuing to run successfully, it raises a SystemError. SystemErrors probably indicate a bug in the interpreter and should be reported to the maintainer. 14 Chapter 2. Builtin Objects
  20. 20. Python Module of the Week, Release 1.91 SystemExit When sys.exit() is called, it raises SystemExit instead of exiting immediately. This allows cleanup code in try:finally blocks to run and callers (like debuggers and test frameworks) to catch the exception and avoid exiting. TypeError TypeErrors are caused by combining the wrong type of objects, or calling a function with the wrong type of object. result = (’tuple’,) + ’string’ $ python exceptions_TypeError.py Traceback (most recent call last): File quot;exceptions_TypeError.pyquot;, line 12, in <module> result = (’tuple’,) + ’string’ TypeError: can only concatenate tuple (not quot;strquot;) to tuple UnboundLocalError An UnboundLocalError is a type of NameError specific to local variable names. def throws_global_name_error(): print unknown_global_name def throws_unbound_local(): local_val = local_val + 1 print local_val try: throws_global_name_error() except NameError, err: print ’Global name error:’, err try: throws_unbound_local() except UnboundLocalError, err: print ’Local name error:’, err The difference between the global NameError and the UnboundLocal is the way the name is used. Because the name “local_val” appears on the left side of an expression, it is interpreted as a local variable name. $ python exceptions_UnboundLocalError.py Global name error: global name ’unknown_global_name’ is not defined Local name error: local variable ’local_val’ referenced before assignment UnicodeError UnicodeError is a subclass of ValueError and is raised when a Unicode problem occurs. There are separate subclasses for UnicodeEncodeError, UnicodeDecodeError, and UnicodeTranslateError. 2.1. exceptions – Builtin error classes 15
  21. 21. Python Module of the Week, Release 1.91 ValueError A ValueError is used when a function receives a value that has the right type but an invalid value. print chr(1024) $ python exceptions_ValueError.py Traceback (most recent call last): File quot;exceptions_ValueError.pyquot;, line 12, in <module> print chr(1024) ValueError: chr() arg not in range(256) ZeroDivisionError When zero shows up in the denominator of a division operation, a ZeroDivisionError is raised. print 1/0 $ python exceptions_ZeroDivisionError.py Traceback (most recent call last): File quot;exceptions_ZeroDivisionError.pyquot;, line 12, in <module> print 1/0 ZeroDivisionError: integer division or modulo by zero 2.1.4 Warning Categories There are also several exceptions defined for use with the warnings module. Warning The base class for all warnings. UserWarning Base class for warnings coming from user code. DeprecationWarning Used for features no longer being maintained. PendingDeprecationWarning Used for features that are soon going to be deprecated. SyntaxWarning Used for questionable syntax. RuntimeWarning Used for events that happen at runtime that might cause problems. FutureWarning Warning about changes to the language or library that are coming at a later time. ImportWarning Warn about problems importing a module. UnicodeWarning Warn about problems with unicode text. See Also: exceptions The standard library documentation for this module. 16 Chapter 2. Builtin Objects
  22. 22. CHAPTER THREE STRING SERVICES 3.1 difflib – Compute differences between sequences Purpose Library of tools for computing and working with differences between sequences, especially of lines in text files. Python Version 2.1 The SequenceMatcher class compares any 2 sequences of values, as long as the values are hashable. It uses a recursive algorithm to identify the longest contiguous matching blocks from the sequences, eliminating “junk” values. The Dif- fer class works on sequences of text lines and produces human-readable deltas, including differences within individual lines. The HtmlDiff class produces similar results formatted as an HTML table. 3.1.1 Test Data The examples below will all use this common test data in the difflib_data module: text1 = quot;quot;quot;Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec mauris eget magna consequat convallis. Nam sed sem vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate tristique enim. Donec quis lectus a justo imperdiet tempus.quot;quot;quot; text1_lines = text1.splitlines() text2 = quot;quot;quot;Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec mauris eget magna consequat convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo imperdiet tempus. Suspendisse eu lectus. In nunc. quot;quot;quot; text2_lines = text2.splitlines() 17
  23. 23. Python Module of the Week, Release 1.91 3.1.2 Differ Example Reproducing output similar to the diff command line tool is simple with the Differ class: import difflib from difflib_data import * d = difflib.Differ() diff = d.compare(text1_lines, text2_lines) print ’n’.join(list(diff)) The output includes the original input values from both lists, including common values, and markup data to indicate what changes were made. Lines may be prefixed with - to indicate that they were in the first sequence, but not the second. Lines prefixed with + were in the second sequence, but not the first. If a line has an incremental change between versions, an extra line prefixed with ? is used to try to indicate where the change occurred within the line. If a line has not changed, it is printed with an extra blank space on the left column to let it line up with the other lines which may have other markup. The beginning of both text segments is the same. 1: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer The second line has been changed to include a comma in the modified text. Both versions of the line are printed, with the extra information on line 4 showing the column where the text was modified, including the fact that the , character was added. 2: - eu lacus accumsan arcu fermentum euismod. Donec pulvinar porttitor 3: + eu lacus accumsan arcu fermentum euismod. Donec pulvinar, porttitor 4: ? + 5: Lines 6-9 of the output shows where an extra space was removed. 6: - tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec 7: ? - 8: 9: + tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec Next a more complex change was made, replacing several words in a phrase. 10: - mauris eget magna consequat convallis. Nam sed sem vitae odio 11: ? - -- 12: 13: + mauris eget magna consequat convallis. Nam cras vitae mi vitae odio 14: ? +++ +++++ + 15: The last sentence in the paragraph was changed significantly, so the difference is represented by simply removing the old version and adding the new (lines 20-23). 16: pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu 17: metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris 18: urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, 19: suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta 20: - adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate tristique 21: - enim. Donec quis lectus a justo imperdiet tempus. 18 Chapter 3. String Services
  24. 24. Python Module of the Week, Release 1.91 22: + adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo 23: + imperdiet tempus. Suspendisse eu lectus. In nunc. The ndiff() function produces essentially the same output. The processing is specifically tailored to working with text data and eliminating “noise” in the input. import difflib from difflib_data import * diff = difflib.ndiff(text1_lines, text2_lines) print ’n’.join(list(diff)) $ python difflib_ndiff.py Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer - eu lacus accumsan arcu fermentum euismod. Donec pulvinar porttitor + eu lacus accumsan arcu fermentum euismod. Donec pulvinar, porttitor ? + - tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec ? - + tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec - mauris eget magna consequat convallis. Nam sed sem vitae odio ? ------ + mauris eget magna consequat convallis. Nam cras vitae mi vitae odio ? +++ +++++++++ pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta - adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate tristique - enim. Donec quis lectus a justo imperdiet tempus. + adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo + imperdiet tempus. Suspendisse eu lectus. In nunc. 3.1.3 Other Diff Formats Where the Differ class shows all of the inputs, a unified diff only includes modified lines and a bit of context. In version 2.3, a unified_diff() function was added to produce this sort of output: import difflib from difflib_data import * diff = difflib.unified_diff(text1_lines, text2_lines, lineterm=’’) print ’n’.join(list(diff)) The output should look familiar to users of svn or other version control tools: $ python difflib_unified.py --- +++ @@ -1,10 +1,10 @@ 3.1. difflib – Compute differences between sequences 19
  25. 25. Python Module of the Week, Release 1.91 Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer -eu lacus accumsan arcu fermentum euismod. Donec pulvinar porttitor -tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec -mauris eget magna consequat convallis. Nam sed sem vitae odio +eu lacus accumsan arcu fermentum euismod. Donec pulvinar, porttitor +tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec +mauris eget magna consequat convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta -adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate tristique -enim. Donec quis lectus a justo imperdiet tempus. +adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo +imperdiet tempus. Suspendisse eu lectus. In nunc. Using context_diff() produces similar readable output: $ python difflib_context.py *** --- *************** *** 1,10 **** Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer ! eu lacus accumsan arcu fermentum euismod. Donec pulvinar porttitor ! tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec ! mauris eget magna consequat convallis. Nam sed sem vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta ! adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate tristique ! enim. Donec quis lectus a justo imperdiet tempus. --- 1,10 ---- Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer ! eu lacus accumsan arcu fermentum euismod. Donec pulvinar, porttitor ! tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec ! mauris eget magna consequat convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta ! adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo ! imperdiet tempus. Suspendisse eu lectus. In nunc. 3.1.4 HTML Output HtmlDiff (new in Python 2.4) produces HTML output with the same information as the Diff class. This example uses make_table(), but the make_file() method produces a fully-formed HTML file as output. import difflib from difflib_data import * d = difflib.HtmlDiff() print d.make_table(text1_lines, text2_lines) 20 Chapter 3. String Services
  26. 26. Python Module of the Week, Release 1.91 $ python difflib_html.py <table class=quot;diffquot; id=quot;difflib_chg_to0__topquot; cellspacing=quot;0quot; cellpadding=quot;0quot; rules=quot;groupsquot; > <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <tbody> <tr><td class=quot;diff_nextquot; id=quot;difflib_chg_to0__0quot;><a href=quot;#difflib_chg_to0__0quot;>f</a></td <tr><td class=quot;diff_nextquot;><a href=quot;#difflib_chg_to0__1quot;>n</a></td><td class=quot;diff_headerquot; <tr><td class=quot;diff_nextquot;></td><td class=quot;diff_headerquot; id=quot;from0_3quot;>3</td><td nowrap=quot;now <tr><td class=quot;diff_nextquot; id=quot;difflib_chg_to0__1quot;></td><td class=quot;diff_headerquot; id=quot;from0_ <tr><td class=quot;diff_nextquot;></td><td class=quot;diff_headerquot; id=quot;from0_5quot;>5</td><td nowrap=quot;now <tr><td class=quot;diff_nextquot;></td><td class=quot;diff_headerquot; id=quot;from0_6quot;>6</td><td nowrap=quot;now <tr><td class=quot;diff_nextquot;></td><td class=quot;diff_headerquot; id=quot;from0_7quot;>7</td><td nowrap=quot;now <tr><td class=quot;diff_nextquot;></td><td class=quot;diff_headerquot; id=quot;from0_8quot;>8</td><td nowrap=quot;now <tr><td class=quot;diff_nextquot;><a href=quot;#difflib_chg_to0__topquot;>t</a></td><td class=quot;diff_heade <tr><td class=quot;diff_nextquot;></td><td class=quot;diff_headerquot; id=quot;from0_10quot;>10</td><td nowrap=quot;n </tbody> </table> 3.1.5 Junk Data All of the functions which produce diff sequences accept arguments to indicate which lines should be ignored, and which characters within a line should be ignored. This can be used to ignore markup or whitespace changes in two versions of file, for example. from difflib import SequenceMatcher A = quot; abcdquot; B = quot;abcd abcdquot; print ’A = quot;%squot;’ % A print ’B = quot;%squot;’ % B s = SequenceMatcher(None, A, B) i, j, k = s.find_longest_match(0, 5, 0, 9) print ’isjunk=None :’, (i, j, k), ’quot;%squot;’ % A[i:i+k], ’quot;%squot;’ % B[j:j+k] s = SequenceMatcher(lambda x: x==quot; quot;, A, B) i, j, k = s.find_longest_match(0, 5, 0, 9) print ’isjunk=(x==quot; quot;) :’, (i, j, k), ’quot;%squot;’ % A[i:i+k], ’quot;%squot;’ % B[j:j+k] The default for Differ is to not ignore any lines or characters explicitly, but to rely on the SequenceMatcher’s ability to detect noise. The default for ndiff is to ignore space and tab characters. $ python difflib_junk.py A = quot; abcdquot; B = quot;abcd abcdquot; isjunk=None : (0, 4, 5) quot; abcdquot; quot; abcdquot; isjunk=(x==quot; quot;) : (1, 0, 4) quot;abcdquot; quot;abcdquot; 3.1. difflib – Compute differences between sequences 21
  27. 27. Python Module of the Week, Release 1.91 3.1.6 SequenceMatcher SequenceMatcher, which implements the comparison algorithm, can be used with sequences of any type of object as long as the object is hashable. For example, two lists of integers can be compared, and using get_opcodes() a set of instructions for converting the original list into the newer can be printed: import difflib from difflib_data import * s1 = [ 1, 2, 3, 5, 6, 4 ] s2 = [ 2, 3, 5, 4, 6, 1 ] matcher = difflib.SequenceMatcher(None, s1, s2) for tag, i1, i2, j1, j2 in matcher.get_opcodes(): print (quot;%7s s1[%d:%d] (%s) s2[%d:%d] (%s)quot; % (tag, i1, i2, s1[i1:i2], j1, j2, s2[j1:j2])) $ python difflib_seq.py delete s1[0:1] ([1]) s2[0:0] ([]) equal s1[1:4] ([2, 3, 5]) s2[0:3] ([2, 3, 5]) insert s1[4:4] ([]) s2[3:4] ([4]) equal s1[4:5] ([6]) s2[4:5] ([6]) replace s1[5:6] ([4]) s2[5:6] ([1]) You can use SequenceMatcher with your own classes, as well as built-in types. See Also: difflib The standard library documentation for this module. Pattern Matching: The Gestalt Approach Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener published in Dr. Dobb’s Journal in July, 1988. 3.2 string – Working with text Purpose Contains constants and classes for working with text. Python Version 2.5 The string module dates from the earliest versions of Python. In version 2.0, many of the functions previously im- plemented only in the module were moved to methods of string objects. Legacy versions of those functions are still available, but their use is deprecated and they will be dropped in Python 3.0. The string module still contains several useful constants and classes for working with string and unicode objects, and this discussion will concentrate on them. 3.2.1 Constants The constants in the string module can be used to specify categories of characters such as ascii_letters and digits. Some of the constants are locale-dependent, such as lowercase, so the value changes to reflect the language settings of the user. Others, such as hexdigits, do not change when the locale changes. import string for n in dir(string): 22 Chapter 3. String Services
  28. 28. Python Module of the Week, Release 1.91 if n.startswith(’_’): continue v = getattr(string, n) if isinstance(v, basestring): print ’%s=%s’ % (n, repr(v)) print Most of the names for the constants are self-explanatory. $ python string_constants.py ascii_letters=’abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’ ascii_lowercase=’abcdefghijklmnopqrstuvwxyz’ ascii_uppercase=’ABCDEFGHIJKLMNOPQRSTUVWXYZ’ digits=’0123456789’ hexdigits=’0123456789abcdefABCDEF’ letters=’abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’ lowercase=’abcdefghijklmnopqrstuvwxyz’ octdigits=’01234567’ printable=’0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!quot;#$%&’()*+,-./:;<=>?@[]^ punctuation=’!quot;#$%&’()*+,-./:;<=>?@[]^_‘{|}~’ uppercase=’ABCDEFGHIJKLMNOPQRSTUVWXYZ’ whitespace=’tnx0bx0cr ’ 3.2.2 Functions There are two functions not moving from the string module. capwords() capitalizes all of the words in a string. import string s = ’The quick brown fox jumped over the lazy dog.’ print s print string.capwords(s) The results are the same as if you called split(), capitalized the words in the resulting list, then called join() to combine the results. $ python string_capwords.py The quick brown fox jumped over the lazy dog. The Quick Brown Fox Jumped Over The Lazy Dog. The other function creates translation tables that can be used with the translate() method to change one set of characters to another. 3.2. string – Working with text 23
  29. 29. Python Module of the Week, Release 1.91 import string leet = string.maketrans(’abegiloprstz’, ’463611092572’) s = ’The quick brown fox jumped over the lazy dog.’ print s print s.translate(leet) In this example, some letters are replaced by their l33t number alternatives. $ python string_maketrans.py The quick brown fox jumped over the lazy dog. Th3 qu1ck 620wn f0x jum93d 0v32 7h3 142y d06. 3.2.3 Templates String templates were added in Python 2.4 as part of PEP 292 and are intended as an alternative to the built-in interpolation syntax. With string.Template interpolation, variables are identified by name prefixed with $ (e.g., $var) or, if necessary to set them off from surrounding text, they can also be wrapped with curly braces (e.g., ${var}). This example compares a simple template with a similar string interpolation setup. import string values = { ’var’:’foo’ } t = string.Template(quot;quot;quot; $var $$ ${var}iable quot;quot;quot;) print ’TEMPLATE:’, t.substitute(values) s = quot;quot;quot; %(var)s %% %(var)siable quot;quot;quot; print ’INTERPLOATION:’, s % values As you see, in both cases the trigger character ($ or %) is escaped by repeating it twice. $ python string_template.py TEMPLATE: foo $ fooiable INTERPLOATION: foo % fooiable 24 Chapter 3. String Services
  30. 30. Python Module of the Week, Release 1.91 One key difference between templates and standard string interpolation is that the type of the arguments is not taken into account. The values are converted to strings and the strings are inserted. No formatting options are available. For example, there is no way to control the number of digits used to represent a floating point value. A benefit, though, is that by using the safe_substitute() method, it is possible to avoid exceptions if not all of the values needed by the template are provided as arguments. import string values = { ’var’:’foo’ } t = string.Template(quot;$var is here but $missing is not providedquot;) try: print ’TEMPLATE:’, t.substitute(values) except KeyError, err: print ’ERROR:’, str(err) print ’TEMPLATE:’, t.safe_substitute(values) Since there is no value for missing in the values dictionary, a KeyError is raised by substitute(). Instead of raising the error, safe_substitute() catches it and leaves the variable expression alone in the text. $ python string_template_missing.py TEMPLATE: ERROR: ’missing’ TEMPLATE: foo is here but $missing is not provided 3.2.4 Advanced Templates If the default syntax for string.Template is not to your liking, you can change the behavior by adjusting the regular expression patterns it uses to find the variable names in the template body. A simple way to do that is to change the delimiter and idpattern class attributes. import string class MyTemplate(string.Template): delimiter = ’%’ idpattern = ’[a-z]+_[a-z]+’ t = MyTemplate(’%% %with_underscore %notunderscored’) d = { ’with_underscore’:’replaced’, ’notunderscored’:’not replaced’, } print t.safe_substitute(d) In this example, variable ids must include an underscore somewhere in the middle, so %notunderscored is not replaced by anything. $ python string_template_advanced.py % replaced %notunderscored For more complex changes, you can override the pattern attribute and define an entirely new regular expression. The pattern provided must contain 4 named groups for capturing the escaped delimiter, the named variable, a braced version of the variable name, and invalid delimiter patterns. 3.2. string – Working with text 25
  31. 31. Python Module of the Week, Release 1.91 Let’s look at the default pattern: import string t = string.Template(’$var’) print t.pattern.pattern Since t.pattern is a compiled regular expression, we have to access its pattern attribute to see the actual string. $ python string_template_defaultpattern.py $(?: (?P<escaped>$) | # Escape sequence of two delimiters (?P<named>[_a-z][_a-z0-9]*) | # delimiter and a Python identifier {(?P<braced>[_a-z][_a-z0-9]*)} | # delimiter and a braced identifier (?P<invalid>) # Other ill-formed delimiter exprs ) If we wanted to create a new type of template using, for example, {{var}} as the variable syntax, we could use a pattern like this: import re import string class MyTemplate(string.Template): delimiter = ’{{’ pattern = r’’’ {{(?: (?P<escaped>{{)| (?P<named>[_a-z][_a-z0-9]*)}}| (?P<braced>[_a-z][_a-z0-9]*)}}| (?P<invalid>) ) ’’’ t = MyTemplate(’’’ {{{{ {{var}} ’’’) print ’MATCHES:’, t.pattern.findall(t.template) print ’SUBSTITUTED:’, t.safe_substitute(var=’replacement’) We still have to provide both the named and braced patterns, even though they are the same. Here’s the output: $ python string_template_newsyntax.py MATCHES: [(’{{’, ’’, ’’, ’’), (’’, ’var’, ’’, ’’)] SUBSTITUTED: {{ replacement 3.2.5 Deprecated Functions For information on the deprecated functions moved to the string and unicode classes, refer to String Methods in the manual. 26 Chapter 3. String Services
  32. 32. Python Module of the Week, Release 1.91 See Also: string Standard library documentation for this module. PEP 292 Simpler String Substitutions 3.3 StringIO and cStringIO – Work with text buffers using file-like API Purpose Work with text buffers using file-like API Python Version StringIO: 1.4, cStringIO: 1.5 The StringIO class provides a convenient means of working with text in-memory using the file API (read, write. etc.). There are 2 separate implementations. The cStringIO module is written in C for speed, while the StringIO module is written in Python for portability. Using cStringIO to build large strings can offer performance savings over some other string conctatenation techniques. 3.3.1 Example Here are some pretty standard, simple, examples of using StringIO buffers: # Find the best implementation available on this platform try: from cStringIO import StringIO except: from StringIO import StringIO # Writing to a buffer output = StringIO() output.write(’This goes into the buffer. ’) print >>output, ’And so does this.’ # Retrieve the value written print output.getvalue() output.close() # discard buffer memory # Initialize a read buffer input = StringIO(’Inital value for read buffer’) # Read from the buffer print input.read() This example uses read(), but of course the readline() and readlines() methods are also available. The StringIO class also provides a seek() method so it is possible to jump around in a buffer while reading, which can be useful for rewinding if you are using some sort of look-ahead parsing algorithm. $ python stringio_examples.py This goes into the buffer. And so does this. Inital value for read buffer Real world applications of StringIO include a web application stack where various parts of the stack may add text to the response, or testing the output generated by parts of a program which typically write to a file. 3.3. StringIO and cStringIO – Work with text buffers using file-like API 27
  33. 33. Python Module of the Week, Release 1.91 The application we are building at work includes a shell scripting interface in the form of several command line programs. Some of these programs are responsible for pulling data from the database and dumping it on the console (either to show the user, or so the text can serve as input to another command). The commands share a set of formatter plugins to produce a text representation of an object in a variety of ways (XML, bash syntax, human readable, etc.). Since the formatters normally write to standard output, testing the results would be a little tricky without the StringIO module. Using StringIO to intercept the output of the formatter gives us an easy way to collect the output in memory to compare against expected results. See Also: StringIO Standard library documentation for this module. The StringIO module ::: www.effbot.org effbot’s examples with StringIO Efficient String Concatenation in Python Examines various methods of combining strings and their relative merits. 3.4 struct – Working with Binary Data Purpose Convert between strings and binary data. Python Version 1.4 and later The struct module includes functions for converting between strings of bytes and native Python data types such as numbers and strings. 3.4.1 Functions vs. Struct Class There are a set of module-level functions for working with structured values, and there is also the Struct class (new in Python 2.5). Format specifiers are converted from their string format to a compiled representation, similar to the way regular expressions are. The conversion takes some resources, so it is typically more efficient to do it once when creating a Struct instance and call methods on the instance instead of using the module-level functions. All of the examples below use the Struct class. 3.4.2 Packing and Unpacking Structs support packing data into strings, and unpacking data from strings using format specifiers made up of characters representing the type of the data and optional count and endian-ness indicators. For complete details, refer to the standard library documentation. In this example, the format specifier calls for an integer or long value, a 2 character string, and a floating point number. The spaces between the format specifiers are included here for clarity, and are ignored when the format is compiled. import struct import binascii values = (1, ’ab’, 2.7) s = struct.Struct(’I 2s f’) packed_data = s.pack(*values) print ’Original values:’, values print ’Format string :’, s.format print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) 28 Chapter 3. String Services
  34. 34. Python Module of the Week, Release 1.91 The packed value is converted to a sequence of hex bytes for printing, since some of the characters are nulls. $ python struct_pack.py Original values: (1, ’ab’, 2.7000000000000002) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 If we pass the packed value to unpack(), we get basically the same values back (note the discrepancy in the floating point value). import struct import binascii packed_data = binascii.unhexlify(’0100000061620000cdcc2c40’) s = struct.Struct(’I 2s f’) unpacked_data = s.unpack(packed_data) print ’Unpacked Values:’, unpacked_data $ python struct_unpack.py Unpacked Values: (1, ’ab’, 2.7000000476837158) 3.4.3 Endianness By default values are encoded using the native C library notion of “endianness”. It is easy to override that choice by providing an explicit endianness directive in the format string. import struct import binascii values = (1, ’ab’, 2.7) print ’Original values:’, values endianness = [ (’@’, ’native, native’), (’=’, ’native, standard’), (’<’, ’little-endian’), (’>’, ’big-endian’), (’!’, ’network’), ] for code, name in endianness: s = struct.Struct(code + ’ I 2s f’) packed_data = s.pack(*values) print print ’Format string :’, s.format, ’for’, name print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) print ’Unpacked Value :’, s.unpack(packed_data) $ python struct_endianness.py Original values: (1, ’ab’, 2.7000000000000002) 3.4. struct – Working with Binary Data 29
  35. 35. Python Module of the Week, Release 1.91 Format string : @ I 2s f for native, native Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Unpacked Value : (1, ’ab’, 2.7000000476837158) Format string : = I 2s f for native, standard Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, ’ab’, 2.7000000476837158) Format string : < I 2s f for little-endian Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, ’ab’, 2.7000000476837158) Format string : > I 2s f for big-endian Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.7000000476837158) Format string : ! I 2s f for network Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.7000000476837158) 3.4.4 Buffers Working with binary packed data is typically reserved for highly performance sensitive situations or passing data into and out of extension modules. One way to optimize is to avoid allocating a new buffer for each packed structure. The pack_into() and unpack_from() methods support writing to pre-allocated buffers directly. import struct import binascii s = struct.Struct(’I 2s f’) values = (1, ’ab’, 2.7) print ’Original:’, values print print ’ctypes string buffer’ import ctypes b = ctypes.create_string_buffer(s.size) print ’Before :’, binascii.hexlify(b.raw) s.pack_into(b, 0, *values) print ’After :’, binascii.hexlify(b.raw) print ’Unpacked:’, s.unpack_from(b, 0) print print ’array’ import array a = array.array(’c’, ’0’ * s.size) print ’Before :’, binascii.hexlify(a) s.pack_into(a, 0, *values) 30 Chapter 3. String Services
  36. 36. Python Module of the Week, Release 1.91 print ’After :’, binascii.hexlify(a) print ’Unpacked:’, s.unpack_from(a, 0) $ python struct_buffers.py Original: (1, ’ab’, 2.7000000000000002) ctypes string buffer Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.7000000476837158) array Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.7000000476837158) See Also: struct The standard library documentation for this module. array The array module, for working with sequences of fixed-type values. binascii The binascii module, for producing ASCII representations of binary data. WikiPedia: Endianness Explanation of byte order and endianness in encoding. 3.5 textwrap – Formatting text paragraphs Purpose Formatting text by adjusting where line breaks occur in a paragraph. Python Version 2.5 The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 3.5.1 Example Data The examples below use textwrap_example.py, which contains a string sample_text: sample_text = ’’’ The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. ’’’ 3.5.2 Filling Paragraphs The fill() convenience function takes text as input and produces formatted text as output. Let’s see what it does with the sample_text provided. 3.5. textwrap – Formatting text paragraphs 31
  37. 37. Python Module of the Week, Release 1.91 import textwrap from textwrap_example import sample_text print ’No dedent:n’ print textwrap.fill(sample_text) The results are something less than what we want: $ python textwrap_fill.py No dedent: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 3.5.3 Removing Existing Indentation Notice the embedded tabs and extra spaces mixed into the middle of the output. It looks pretty rough. Of course, we can do better. We want to start by removing any common whitespace prefix from all of the lines in the sample text. This allows us to use docstrings or embedded multi-line strings straight from our Python code while removing the formatting of the code itself. The sample string has an artificial indent level introduced for illustrating this feature. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() print ’Dedented:n’ print dedented_text The results are starting to look better: $ python textwrap_dedent.py Dedented: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. Since “dedent” is the opposite of “indent”, the result is a block of text with the common initial whitespace from each line removed. If one line is already indented more than another, some of the whitespace will not be removed. One tab. Two tabs. One tab. becomes One tab. Two tabs. One tab. 32 Chapter 3. String Services
  38. 38. Python Module of the Week, Release 1.91 3.5.4 Combining Dedent and Fill Next, let’s see what happens if we take the dedented text and pass it through fill() with a few different width values. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() for width in [ 20, 60, 80 ]: print print ’%d Columns:n’ % width print textwrap.fill(dedented_text, width=width) This gives several sets of output in the specified widths: $ python textwrap_fill_width.py 20 Columns: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 60 Columns: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 80 Columns: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 3.5.5 Hanging Indents Besides the width of the output, you can control the indent of the first line independently of subsequent lines. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() print textwrap.fill(dedented_text, initial_indent=’’, subsequent_indent=’ ’) 3.5. textwrap – Formatting text paragraphs 33
  39. 39. Python Module of the Week, Release 1.91 This makes it relatively easy to produce a hanging indent, where the first line is indented less than the other lines. $ python textwrap_hanging_indent.py The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. The indent values can include non-whitespace characters, too, so the hanging indent can be prefixed with * to produce bullet points, etc. That came in handy when I converted my old zwiki content so I could import it into trac. I used the StructuredText package from Zope to parse the zwiki data, then created a formatter to produce trac’s wiki markup as output. Using textwrap, I was able to format the output pages so almost no manual tweaking was needed after the conversion. See Also: textwrap Standard library documentation for this module. 34 Chapter 3. String Services
  40. 40. CHAPTER FOUR DATA TYPES 4.1 array – Sequence of fixed-type data Purpose Manage sequences of fixed-type numerical data efficiently. Python Version 1.4 and later The array module defines a sequence data structure that looks very much like a list except that all of the members have to be of the same type. The types supported are listed in the standard library documentation. They are all numeric or other fixed-size primitive types such as bytes. 4.1.1 array Initialization An array is instantiated with an argument describing the type of data to be allowed, and possibly an initialization sequence. import array import binascii s = ’This is the array.’ a = array.array(’c’, s) print ’As string:’, s print ’As array :’, a print ’As hex :’, binascii.hexlify(a) In this example, the array is configured to hold a sequence of bytes and is initialized with a simple string. $ python array_string.py As string: This is the array. As array : array(’c’, ’This is the array.’) As hex : 54686973206973207468652061727261792e 4.1.2 Manipulating Arrays An array can be extended and otherwise manipulated in the same ways as other Python sequences. import array 35
  41. 41. Python Module of the Week, Release 1.91 a = array.array(’i’, xrange(5)) print ’Initial :’, a a.extend(xrange(5)) print ’Extended:’, a print ’Slice :’, a[3:6] print ’Iterator:’, list(enumerate(a)) $ python array_sequence.py Initial : array(’i’, [0, 1, 2, 3, 4]) Extended: array(’i’, [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]) Slice : array(’i’, [3, 4, 0]) Iterator: [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 0), (6, 1), (7, 2), (8, 3), (9, 4)] 4.1.3 Arrays and Files The contents of an array can be written to and read from files using built-in methods coded efficiently for that purpose. import array import binascii import tempfile a = array.array(’i’, xrange(5)) print ’A1:’, a # Write the array of numbers to the file output = tempfile.NamedTemporaryFile() a.tofile(output.file) # must pass an *actual* file output.flush() # Read the raw data input = open(output.name, ’rb’) raw_data = input.read() print ’Raw Contents:’, binascii.hexlify(raw_data) # Read the data into an array input.seek(0) a2 = array.array(’i’) a2.fromfile(input, len(a)) print ’A2:’, a2 This example illustrates reading the data “raw”, directly from the binary file, versus reading it into a new array and converting the bytes to the appropriate types. $ python array_file.py A1: array(’i’, [0, 1, 2, 3, 4]) Raw Contents: 0000000001000000020000000300000004000000 A2: array(’i’, [0, 1, 2, 3, 4]) 36 Chapter 4. Data Types
  42. 42. Python Module of the Week, Release 1.91 4.1.4 Alternate Byte Ordering If the data in the array is not in the native byte order, or needs to be swapped before being written to a file intended for a system with a different byte order, it is easy to convert the entire array without iterating over the elements from Python. import array import binascii def to_hex(a): chars_per_item = a.itemsize * 2 # 2 hex digits hex_version = binascii.hexlify(a) num_chunks = len(hex_version) / chars_per_item for i in xrange(num_chunks): start = i*chars_per_item end = start + chars_per_item yield hex_version[start:end] a1 = array.array(’i’, xrange(5)) a2 = array.array(’i’, xrange(5)) a2.byteswap() fmt = ’%10s %10s %10s %10s’ print fmt % (’A1 hex’, ’A1’, ’A2 hex’, ’A2’) print fmt % ((’-’ * 10,) * 4) for values in zip(to_hex(a1), a1, to_hex(a2), a2): print fmt % values $ python array_byteswap.py A1 hex A1 A2 hex A2 ---------- ---------- ---------- ---------- 00000000 0 00000000 0 01000000 1 00000001 16777216 02000000 2 00000002 33554432 03000000 3 00000003 50331648 04000000 4 00000004 67108864 See Also: array The standard library documentation for this module. struct The struct module. Numerical Python NumPy is a Python library for working with large datasets efficiently. 4.2 datetime – Date/time value manipulation Purpose The datetime module includes functions and classes for doing date parsing, formatting, and arithmetic. Python Version 2.3 and later 4.2. datetime – Date/time value manipulation 37
  43. 43. Python Module of the Week, Release 1.91 4.2.1 Times Time values are represented with the time class. Times have attributes for hour, minute, second, and microsecond. They also, optionally, include time zone information. The arguments to initialize a time instance are optional, but the default of 0 is unlikely to be what you want. import datetime t = datetime.time(1, 2, 3) print t print ’hour :’, t.hour print ’minute:’, t.minute print ’second:’, t.second print ’microsecond:’, t.microsecond print ’tzinfo:’, t.tzinfo $ python datetime_time.py 01:02:03 hour : 1 minute: 2 second: 3 microsecond: 0 tzinfo: None A time instance only holds values of time, and not dates. import datetime print ’Earliest :’, datetime.time.min print ’Latest :’, datetime.time.max print ’Resolution:’, datetime.time.resolution The min and max class attributes reflect the valid range of times in a single day. $ python datetime_time_minmax.py Earliest : 00:00:00 Latest : 23:59:59.999999 Resolution: 0:00:00.000001 The resolution for time is limited to microseconds. More precise values are truncated. import datetime for m in [ 1, 0, 0.1, 0.6 ]: print ’%02.1f :’ % m, datetime.time(0, 0, 0, microsecond=m) In fact, using floating point numbers for the microsecond argument generates a DeprecationWarning. $ python datetime_time_resolution.py datetime_time_resolution.py:15: DeprecationWarning: integer argument expected, got float print ’%02.1f :’ % m, datetime.time(0, 0, 0, microsecond=m) 1.0 : 00:00:00.000001 0.0 : 00:00:00 0.1 : 00:00:00 0.6 : 00:00:00 38 Chapter 4. Data Types
  44. 44. Python Module of the Week, Release 1.91 4.2.2 Dates Basic date values are represented with the date class. Instances have attributes for year, month, and day. It is easy to create a date representing today’s date using the today() class method. import datetime today = datetime.date.today() print today print ’ctime:’, today.ctime() print ’tuple:’, today.timetuple() print ’ordinal:’, today.toordinal() print ’Year:’, today.year print ’Mon :’, today.month print ’Day :’, today.day This example prints today’s date in several formats: $ python datetime_date.py 2009-05-31 ctime: Sun May 31 00:00:00 2009 tuple: time.struct_time(tm_year=2009, tm_mon=5, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, ordinal: 733558 Year: 2009 Mon : 5 Day : 31 There are also class methods for creating instances from integers (using proleptic Gregorian ordinal values, which starts counting from Jan. 1 of the year 1) or POSIX timestamp values. import datetime import time o = 733114 print ’o:’, o print ’fromordinal(o):’, datetime.date.fromordinal(o) t = time.time() print ’t:’, t print ’fromtimestamp(t):’, datetime.date.fromtimestamp(t) This example illustrates the different value types used by fromordinal() and fromtimestamp(). $ python datetime_date_fromordinal.py o: 733114 fromordinal(o): 2008-03-13 t: 1243809727.59 fromtimestamp(t): 2009-05-31 The range of date values supported can be determined using the min and max attributes. import datetime print ’Earliest :’, datetime.date.min print ’Latest :’, datetime.date.max print ’Resolution:’, datetime.date.resolution 4.2. datetime – Date/time value manipulation 39
  45. 45. Python Module of the Week, Release 1.91 The resolution for dates is whole days. $ python datetime_date_minmax.py Earliest : 0001-01-01 Latest : 9999-12-31 Resolution: 1 day, 0:00:00 Another way to create new date instances uses the replace() method of an existing date. For example, you can change the year, leaving the day and month alone. import datetime d1 = datetime.date(2008, 3, 12) print ’d1:’, d1 d2 = d1.replace(year=2009) print ’d2:’, d2 $ python datetime_date_replace.py d1: 2008-03-12 d2: 2009-03-12 4.2.3 timedeltas Using replace() is not the only way to calculate future/past dates. You can use datetime to perform basic arithmetic on date values via the timedelta class. Subtracting dates produces a timedelta, and a timedelta can be added or subtracted from a date to produce another date. The internal values for timedeltas are stored in days, seconds, and microseconds. import datetime print quot;microseconds:quot;, datetime.timedelta(microseconds=1) print quot;milliseconds:quot;, datetime.timedelta(milliseconds=1) print quot;seconds :quot;, datetime.timedelta(seconds=1) print quot;minutes :quot;, datetime.timedelta(minutes=1) print quot;hours :quot;, datetime.timedelta(hours=1) print quot;days :quot;, datetime.timedelta(days=1) print quot;weeks :quot;, datetime.timedelta(weeks=1) Intermediate level values passed to the constructor are converted into days, seconds, and microseconds. $ python datetime_timedelta.py microseconds: 0:00:00.000001 milliseconds: 0:00:00.001000 seconds : 0:00:01 minutes : 0:01:00 hours : 1:00:00 days : 1 day, 0:00:00 weeks : 7 days, 0:00:00 4.2.4 Arithmetic Date math uses the standard arithmetic operators. This example with date objects illustrates using timedeltas to compute new dates, and subtracting date instances to produce timedeltas (including a negative delta value). 40 Chapter 4. Data Types

×