Guide to NumPy
Travis E. Oliphant, PhD
Dec 7, 2006
This book is under restricted distribution using a Market-Determined, Tempo-
rary, Distribution-Restriction (MDTDR) system (see http://www.trelgol.com) until
October 31, 2010 at the latest. If you receive this book, you are asked not to copy it
in any form (electronic or paper) until the temporary distribution-restriction lapses.
If you have multiple users at an institution, you should either share a single copy
using some form of digital library check-out, or buy multiple copies. The more
copies purchased, the sooner the documentation can be released from this incon-
venient distribution restriction. After October 31, 2010 this book may be freely
copied in any format and used as source material for other books as long as ac-
knowledgement of the original author is given. Your support of this temporary
distribution restriction plays an essential role in allowing the author and others like
him to produce more quality books and software.
List of Tables
2.1 Built-in array-scalar types corresponding to data-types for an ndar-
ray. The bold-face types correspond to standard Python types. The
object type is special because arrays with dtype=’O’ do not return
an array scalar on item access but instead return the actual object
referenced in the array. . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Attributes of the ndarray. . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Array conversion methods . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3 Array item selection and shape manipulation methods. If axis is an
argument, then the calculation is performed along that axis. An axis
value of None means the array is ﬂattened before calculation proceeds. 67
3.4 Array object calculation methods. If axis is an argument, then the
calculation is performed along that axis. An axis value of None means
the array is ﬂattened before calculation proceeds. All of these meth-
ods can take an optional out= argument which can specify the output
array to write the results into. . . . . . . . . . . . . . . . . . . . . . 71
6.1 Array scalar types that inherit from basic Python types. The intc
array data type might also inherit from the IntType if it has the same
number of bits as the int array data type on your platform. . . . . . 129
9.1 Universal function (ufunc) attributes. . . . . . . . . . . . . . . . . . 161
10.1 Functions in numpy.dual (both in NumPy and SciPy) . . . . . . . . 178
Origins of NumPy
A complex system that works is invariably found to have evolved
from a simple system that worked
Copy from one, it’s plagiarism; copy from two, it’s research.
NumPy builds on (and is a successor to) the successful Numeric array object.
Its goal is to create the corner-stone for a useful environment for scientiﬁc com-
puting. In order to better understand the people surrounding NumPy and (its
library-package) SciPy, I will explain a little about how SciPy and (current) NumPy
originated. In 1998, as a graduate student studying biomedical imaging at the
Mayo Clinic in Rochester, MN, I came across Python and its numerical extension
(Numeric) while I was looking for ways to analyze large data sets for Magnetic
Resonance Imaging and Ultrasound using a high-level language. I quickly fell in
love with Python programming which is a remarkable statement to make about a
programming language. If I had not seen others with the same view, I might have
seriously doubted my sanity. I became rather involved in the Numeric Python com-
munity, adding the C-API chapter to the Numeric documentation (for which Paul
Dubois graciously made me a co-author).
As I progressed with my thesis work, programming in Python was so enjoyable
that I felt inhibited when I worked with other programming frameworks. As a result,
when a task I needed to perform was not available in the core language, or in the
Numeric extension, I looked around and found C or Fortran code that performed
the needed task, wrapped it into Python (either by hand or using SWIG), and used
the new functionality in my programs.
Along the way, I learned a great deal about the underlying structure of Numeric
and grew to admire it’s simple but elegant structures that grew out of the mechanism
by which Python allows itself to be extended.
Numeric was originally written in 1995 largely by Jim Hugunin
while he was a graduate student at MIT. He received help from
many people including Jim Fulton, David Ascher, Paul Dubois,
and Konrad Hinsen. These individuals and many others added
comments, criticisms, and code which helped the Numeric exten-
sion reach stability. Jim Hugunin did not stay long as an active
member of the community — moving on to write Jython and, later,
By operating in this need-it-make-it fashion I ended up with a substantial li-
brary of extension modules that helped Python + Numeric become easier to use
in a scientiﬁc setting. These early modules included raw input-output functions,
a special function library, an integration library, an ordinary diﬀerential equation
solver, some least-squares optimizers, and sparse matrix solvers. While I was doing
this laborious work, Pearu Peterson noticed that a lot of the routines I was wrap-
ping were written in Fortran and there was no simpliﬁed wrapping mechanism for
Fortran subroutines (like SWIG for C). He began the task of writing f2py which
made it possible to easily wrap Fortran programs into Python. I helped him a little
bit, mostly with testing and contributing early function-call-back code, but he put
forth the brunt of the work. His result was simply amazing to me. I’ve always been
impressed with f2py, especially because I knew how much eﬀort writing and main-
taining extension modules could be. Anybody serious about scientiﬁc computing
with Python will appreciate that f2py is distributed along with NumPy.
When I ﬁnished my Ph.D. in 2001, Eric Jones (who had recently completed his
Ph.D. at Duke) contacted me because he had a collection of Python modules he had
developed as part of his thesis work as well. He wanted to combine his modules with
mine into one super package. Together with Pearu Peterson we joined our eﬀorts,
and SciPy was born in 2001. Since then, many people have contributed module
code to SciPy including Ed Schoﬁeld, Robert Cimrman, David M. Cooke, Charles
(Chuck) Harris, Prabhu Ramachandran, Gary Strangman, Jean-Sebastien Roy, and
Fernando Perez. Others such as Travis Vaught, David Morrill, Jeﬀ Whitaker, and
Louis Luangkesorn have contributed testing and build support.
At the start of 2005, SciPy was at release 0.3 and relatively stable for an early
version number. Part of the reason it was diﬃcult to stabilize SciPy was that the
array object upon which SciPy builds was undergoing a bit of an upheaval. At about
the same time as SciPy was being built, some Numeric users were hitting up against
the limited capabilities of Numeric. In particular, the ability to deal with memory
mapped ﬁles (and associated alignment and swapping issues), record arrays, and
altered error checking modes were important but limited or non-existent in Numeric.
As a result, numarray was created by Perry Greenﬁeld, Todd Miller, and Rick White
at the Space Science Telescope Institute as a replacement for Numeric. Numarray
used a very diﬀerent implementation scheme as a mix of Python classes and C
code (which led to slow downs in certain common uses). While improving some
capabilities, it was slow to pick up on the more advanced features of Numeric’s
universal functions (ufuncs) — never re-creating the C-API that SciPy depended
on. This made it diﬃcult for SciPy to “convert” to numarray.
Many newcomers to scientiﬁc computing with Python were told that numarray
was the future and started developing for it. Very useful tools were developed
that could not be used with Numeric (because of numarray’s change in C-API),
and therefore could not be used easily in SciPy. This state of aﬀairs was very
discouraging for me personally as it left the community fragmented. Some developed
for numarray, others developed as part of SciPy. A few people even rejected adopting
Python for scientiﬁc computing entirely because of the split. In addition, I estimate
that quite a few Python users simply stayed away from both SciPy and numarray,
leaving the community smaller than it could have been given the number of people
that use Python for science and engineering purposes.
It should be recognized that the split was not intentional, but simply an out-
growth of the diﬀerent and exacting demands of scientiﬁc computing users. My
describing these events should not be construed as assigning blame to anyone. I
very much admire and appreciate everyone I’ve met who is involved with scientiﬁc
computing and Python. Using a stretched biological metaphor, it is only through
the process of dividing and merging that better results are born. I think this concept
applies to NumPy.
In early 2005, I decided to begin an eﬀort to help bring the diverging community
together under a common framework if it were possible. I ﬁrst looked at numarray
to see what could be done to add the missing features to make SciPy work with
it as a core array object. After a couple of days of studying numarray, I was not
enthusiastic about this approach. My familiarity with the Numeric code base no
doubt biased my opinion, but it seemed to me that the features of Numarray could
be added back to Numeric with a few fundamental changes to the core object. This
would make the transition of SciPy to a more enhanced array object much easier
in my mind.
Therefore, I began to construct this hybrid array object complete with an en-
hanced set of universal (broadcasting) functions that could deal with it. Along the
way, quite a few new features and signiﬁcant enhancements were added to the array
object and its surrounding infrastructure. This book describes the result of that
year-and-a-half-long eﬀort which culminated with the release of NumPy 0.9.2 in
early 2006 and NumPy 1.0 in late 2006. I ﬁrst named the new package, SciPy Core,
and used the scipy namespace. However, after a few months of testing under that
name, it became clear that a separate namespace was needed for the new package.
As a result, a rapid search for a new name resulted in actually coming back to the
NumPy name which was the unoﬃcial name of Numerical Python but never the
actual namespace. Because the new package builds on the code-base of and is a
successor to Numeric, I think the NumPy name is ﬁtting and hopefully not too
confusing to new users.
This book only brieﬂy outlines some of the infrastructure that surrounds the
basic objects in NumPy to provide the additional functionality contained in the older
Numeric package (i.e. LinearAlgebra, RandomArray, FFT). This infrastructure in
NumPy includes basic linear algebra routines, Fourier transform capabilities, and
random number generators. In addition, the f2py module is described in its own
documentation, and so is only brieﬂy mentioned in the second part of the book.
There are also extensions to the standard Python distutils and testing frameworks
included with NumPy that are useful in constructing your own packages built on top
of NumPy. The central purpose of this book, however, is to describe and document
the basic NumPy system that is available under the numpy namespace.
The numpy namespace includes all names under the numpy.core
and numpy.lib namespaces as well. Thus, import numpy will
also import the names from numpy.core and numpy.lib. This is the
recommended way to use numpy.
The following table gives a brief outline of the sub-packages contained in numpy
Sub-Package Purpose Comments
core basic objects all names exported to numpy
lib additional utilities all names exported to numpy
linalg basic linear algebra old LinearAlgebra from Numeric
ﬀt discrete Fourier transforms old FFT from Numeric
random random number generators old RandomArray from Numeric
distutils enhanced build and distribution improvements built on standard distutils
testing unit-testing utility functions useful for testing
f2py automatic wrapping of Fortran code a useful utility needed by SciPy
Our programs last longer if we manage to build simple abstractions
I will tell you the truth as soon as I ﬁgure it out.
NumPy provides two fundamental objects: an N-dimensional array object
(ndarray) and a universal function object (ufunc). In addition, there are other
objects that build on top of these which you may ﬁnd useful in your work, and these
will be discussed later. The current chapter will provide background information
on just the ndarray and the ufunc that will be important for understanding the
attributes and methods to be discussed later.
An N-dimensional array is a homogeneous collection of “items” indexed using N
integers. There are two essential pieces of information that deﬁne an N-dimensional
array: 1) the shape of the array, and 2) the kind of item the array is composed of.
The shape of the array is a tuple of N integers (one for each dimension) that
provides information on how far the index can vary along that dimension. The
other important information describing an array is the kind of item the array is
composed of. Because every ndarray is a homogeneous collection of exactly the
same data-type, every item takes up the same size block of memory, and each block
of memory in the array is interpreted in exactly the same way1
All arrays in NumPy are indexed starting at 0 and ending at M-1
following the Python convention.
For example, consider the following piece of code:
>>> a = array([[1,2,3],[4,5,6]])
for all code in this book it is assumed that you have ﬁrst entered
from numpy import *. In addition, any previously deﬁned ar-
rays are still deﬁned for subsequent examples.
This code deﬁnes an array of size 2×3 composed of 4-byte (little-endian) integer
elements (on my 32-bit platform). We can index into this two-dimensional array
using two integers: the ﬁrst integer running from 0 to 1 inclusive and the second
from 0 to 2 inclusive. For example, index (1, 1) selects the element with value 5:
All code shown in the shaded-boxes in this book has been (automatically) exe-
cuted on a particular version of NumPy. The output of the code shown below shows
which version of NumPy was used to create all of the output in your copy of this
>>> import numpy; print numpy. version
1By using OBJECT arrays, one can eﬀectively have heterogeneous arrays, but the system still
sees each element of the array as exactly the same thing (a reference to a Python object).
Figure 2.1: Conceptual diagram showing the relationship between the three fun-
damental objects used to describe the data in an array: 1) the ndarray itself, 2)
the data-type object that describes the layout of a single ﬁxed-size element of the
array, 3) the array-scalar Python object that is returned when a single element of
the array is accessed.
2.1 Data-Type Descriptors
In NumPy, an ndarray is an N-dimensional array of items where each item takes
up a ﬁxed number of bytes. Typically, this ﬁxed number of bytes represents a
number (e.g. integer or ﬂoating-point). However, this ﬁxed number of bytes could
also represent an arbitrary record made up of any collection of other data types.
NumPy achieves this ﬂexibility through the use of a data-type (dtype) object. Every
array has an associated dtype object which describes the layout of the array data.
Every dtype object, in turn, has an associated Python type-object that determines
exactly what type of Python object is returned when an element of the array is
accessed. The dtype objects are ﬂexible enough to contain references to arrays
of other dtype objects and, therefore, can be used to deﬁne nested records. This
advanced functionality will be described in better detail later as it is mainly useful
for the recarray (record array) subclass that will also be deﬁned later. However, all
ndarrays can enjoy the ﬂexibility provided by the dtype objects. Figure 2.1 provides
a conceptual diagram showing the relationship between the ndarray, its associated
data-type object, and an array-scalar that is returned when a single-element of the
array is accessed. Note that the data-type points to the type-object of the array
scalar. An array scalar is returned using the type-object and a particular element
of the ndarray.
Every dtype object is based on one of 21 built-in dtype objects. These built-
in objects allow numeric operations on a wide-variety of integer, ﬂoating-point,
and complex data types. Associated with each data-type is a Python type object
whose instances are array scalars. This type-object can be obtained using the type
attribute of the dtype object. Python typically deﬁnes only one data-type of a
particular data class (one integer type, one ﬂoating-point type, etc.). This can be
convenient for some applications that don’t need to be concerned with all the ways
data can be represented in a computer. For scientiﬁc applications, however, this is
not always true. As a result, in NumPy, their are 21 diﬀerent fundamental Python
data-type-descriptor objects built-in. These descriptors are mostly based on the
types available in the C language that CPython is written in. However, there are a
few types that are extremely ﬂexible, such as str , unicode , and void.
The fundamental data-types are shown in Table 2.1. Along with their (mostly)
C-derived names, the integer, ﬂoat, and complex data-types are also available using
a bit-width convention so that an array of the right size can always be ensured
(e.g. int8, ﬂoat64, complex128). The C-like names are also accessible using a
character code which is also shown in the table (use of the character codes, however,
is discouraged). Names for the data types that would clash with standard Python
object names are followed by a trailing underscore, ’ ’. These data types are so
named because they use the same underlying precision as the corresponding Python
data types. Most scientiﬁc users should be able to use the array-enhanced scalar
objects in place of the Python objects. The array-enhanced scalars inherit from the
Python objects they can replace and should act like them under all circumstances
(except for how errors are handled in math computations).
The array types bool , int , complex , ﬂoat , object , uni-
code , and str are enhanced-scalars. They are very similar to
the standard Python types (without the trailing underscore) and
inherit from them (except for bool and object ). They can be used
in place of the standard Python types whenever desired. Whenever
a data type is required, as an argument, the standard Python types
are recognized as well.
Three of the data types are ﬂexible in that they can have items that are of an
arbitrary size: the str type, the unicode type, and the void type. While, you
can specify an arbitrary size for these types, every item in an array is still of that
speciﬁed size. The void type, for example, allows for arbitrary records to be deﬁned
as elements of the array, and can be used to deﬁne exotic types built on top of the
Table 2.1: Built-in array-scalar types corresponding to data-types for an ndarray.
The bold-face types correspond to standard Python types. The object type is
special because arrays with dtype=’O’ do not return an array scalar on item access
but instead return the actual object referenced in the array.
Type Bit-Width Character
bool boolXX ’?’
byte intXX ’b’
ubyte uintXX ’B’
single ﬂoatXX ’f’
csingle complexXX ’F’
The two types intp and uintp are not separate types. They are
names bound to a speciﬁc integer type just large enough to hold a
memory address (a pointer) on the platform.
Numeric Compatibility: If you used old typecode characters in
your Numeric code (which was never recommended), you will need
to change some of them to the new characters. In particular,
the needed changes are ’c->’S1’, ’b’->’B’, ’1’->’b’, ’s’->’h’, ’w’-
>’H’, and ’u’->’I’. These changes make the typecharacter conven-
tion more consistent with other Python modules such as the struct
The fundamental data-types are arranged into a hierarchy of Python type-
objects shown in Figure 2.2. Each of the leaves on this hierarchy correspond to
actual data-types that arrays can have (in other words, there is a built in dtype ob-
ject associated with each of these new types). They also correspond to new Python
objects that can be created. These new objects are “scalar” types corresponding
to each fundamental data-type. Their purpose is to smooth out the rough edges
that result when mixing scalar and array operations. These scalar objects will be
discussed in more detail in Chapter 6. The other types in the hierarchy deﬁne
particular categories of types. These categories can be useful for testing whether
or not the object returned by self.dtype.type is of a particular class (using
2.2 Basic indexing (slicing)
Indexing is a powerful tool in Python and NumPy takes full advantage of this power.
In fact, some of capabilities of Python’s indexing were ﬁrst established by the needs
of Numeric users.2
Indexing is also sometimes called slicing in Python, and slicing
for an ndarray works very similarly as it does for other Python sequences. There
are three big diﬀerences: 1) slicing can be done over multiple dimensions, 2) exactly
one ellipsis object can be used to indicate several dimensions at once, 3) slicing
cannot be used to expand the size of an array (unlike lists).
A few examples should make slicing more clear. Suppose A is a 10 × 20 array,
then A is the same as A[3, :] and represents the 4th length-20 “row” of the array.
On the other hand, A[:, 3] represents the 4th length-10 “column” of the array. Every
2For example, the ability to index with a comma separated list of objects and have it correspond
to indexing with a tuple is a feature added to Python at the request of the NumPy community.
The Ellipsis object was also added to Python explicitly for the NumPy community. Extended
slicing (wherein a step can be provided) was also a feature added to Python because of Numeric.
Figure 2.2: Hierarchy of type objects representing the array data types. Not shown
are the two integer types intp and uintp which just point to the integer type
that holds a pointer for the platform. All the number types can be obtained using
bit-width names as well.
third element of the 4th column can be selected as A[:: 3, 3]. Ellipses can be used to
replace zero or more “:” terms. In other words, an Ellipsis object expands to zero
or more full slice objects (“:”) so that the total number of dimensions in the slicing
tuple matches the number of dimensions in the array. Thus, if A is 10×20×30×40,
then A[3 :, ..., 4] is equivalent to A[3 :, :, :, 4] while A[..., 3] is equivalent to A[:, :, :, 3].
The following code illustrates some of these concepts:
>>> a = arange(60).reshape(3,4,5); print a
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]]
[[40 41 42 43 44]
[45 46 47 48 49]
[50 51 52 53 54]
[55 56 57 58 59]]]
>>> print a[...,3]
[[ 3 8 13 18]
[23 28 33 38]
[43 48 53 58]]
>>> print a[1,...,3]
[23 28 33 38]
>>> print a[:,:,2]
[[ 2 7 12 17]
[22 27 32 37]
[42 47 52 57]]
>>> print a[0,::2,::2]
[[ 0 2 4]
[10 12 14]]
2.3 Memory Layout of ndarray
On a fundamental level, an N-dimensional array object is just a one-dimensional se-
quence of memory with fancy indexing code that maps an N-dimensional index into
a one-dimensional index. The one-dimensional index is necessary on some level be-
cause that is how memory is addressed in a computer. The fancy indexing, however,
can be very helpful for translating our ideas into computer code. This is because
many concepts we wish to model on a computer have a natural representation as
an N-dimensional array. While this is especially true in science and engineering,
it is also applicable to many other arenas which can be appreciated by considering
the popularity of the spreadsheet as well as “image processing” applications.
Some high-level languages give pre-eminence to a particular use of
2-dimensional arrays as Matrices. In NumPy, however, the core
object is the more general N-dimensional array. NumPy deﬁnes a
matrix object as a sub-class of the N-dimensional array.
In order to more fully understand the array object along with its attributes
and methods it is important to learn more about how an N-dimensional array is
represented in the computer’s memory. A complete understanding of this layout is
only essential for optimizing algorithms operating on general purpose arrays. But,
even for the casual user, a general understanding of memory layout will help to
explain the use of certain array attributes that may otherwise be mysterious.
2.3.1 Contiguous Memory Layout
There is a fundamental ambiguity in how the mapping to a one-dimensional index
can take place which is illustrated for a 2-dimensional array in Figure 2.3. In that
ﬁgure, each block represents a chunk of memory that is needed for representing
the underlying array element. For example, each block could represent the 8 bytes
needed to represent a double-precision ﬂoating point number.
In the ﬁgure, two arrays are shown, a 4x3 array and a 3x4 array. Each of these
arrays takes 12 blocks of memory shown as a single, contiguous segment. How this
memory is used to form the abstract 2-dimensional array can vary, however, and
the ndarray object supports both styles. Which style is in use can be interrogated
by the use of the ﬂags attribute which returns a dictionary of the state of array
(0,0) (0,1) (0,2)
(1,0) (1,1) (1,2)
(2,0) (2,1) (2,2)
Figure 2.3: Options for memory layout of a 2-dimensional array.
In the C-style of N-dimensional indexing shown on the left of Figure 2.3 the
last N-dimensional index “varies the fastest.” In other words, to move through
computer memory sequentially, the last index is incremented ﬁrst, followed by the
second-to-last index and so forth. Some of the algorithms in NumPy that deal with
N-dimensional arrays work best with this kind of data.
In the Fortran-style of N-dimensional indexing shown on the right of Figure 2.3,
the ﬁrst N-dimensional index “varies the fastest.” Thus, to move through computer
memory sequentially, the ﬁrst index is incremented ﬁrst until it reaches the limit in
that dimension, then the second index is incremented and the ﬁrst index is reset to
zero. While NumPy can be compiled without the use of a Fortran compiler, several
modules of SciPy (available separately) rely on underlying algorithms written in
Fortran. Algorithms that work on N-dimensional arrays that are written in Fortran
typically expect Fortran-style arrays.
The two-styles of memory layout for arrays are connected through the transpose
operation. Thus, if A is a (contiguous) C-style array, then the same block of mem-
ory can be used to represent AT
as a (contiguous) Fortran-style array. This kind
of understanding can be useful when trying to optimize the wrapping of Fortran
subroutines, or if a more detailed understanding of how to write algorithms for
generally-indexed arrays is desired. But, fortunately, the casual user who does not
care if an array is copied occasionally to get it into the right orientation needed for
a particular algorithm can forget about how the array is stored in memory and just
visualize it as an N-dimensional array (that is, after all, the whole point of creating
the ndarray object in the ﬁrst place).
2.3.2 Non-contiguous memory layout
Both of the examples presented above are single-segment arrays where the entire
array is visited by sequentially marching through memory one element at a time.
When an algorithm in C or Fortran expects an N-dimensional array, this single
segment (of a certain fundamental type) is usually what is expected along with
the shape N-tuple. With a single-segment of memory representing the array, the
one-dimensional index into computer memory can always be computed from the
N-dimensional index. This concept is explored further in the following paragraphs.
Let ni be the value of the ith
index into an array whose shape is represented by
the N integers di (i = 0 . . . N − 1). Then, the one-dimensional index into a C-style
contiguous array is
while the one-dimensional index into a Fortran-style contiguous array is
In these formulas we are assuming that
dj = dkdk+1 · · · dm−1dm
so that if m < k, the product is 1. While perfectly general, these formulas may be
a bit confusing at ﬁrst glimpse. Let’s see how they expand out for determining the
one-dimensional index corresponding to the element (1, 3, 2) of a 4 × 5 × 6 array. If
the array is stored as Fortran contiguous, then
= n0 · (1) + n1 · (4) + n2 · (4 · 5)
= 1 + 3 · 4 + 2 · 20 = 53.
On the other hand, if the array is stored as C contiguous, then
= n0 · (5 · 6) + n1 · (6) + n2 · (1)
= 1 · 30 + 3 · 6 + 2 · 1 = 50.
The general pattern should be more clear from these examples.
The formulas for the one-dimensional index of the N-dimensional arrays reveal
what results in an important generalization for memory layout. Notice that each
formula can be written as
i gives the stride for dimension i.3
Thus, for C and Fortran contiguous
arrays respectively we have
dj = di+1di+2 · · · dN−1,
dj = d0d1 · · · di−1.
The stride is how many elements in the underlying one-dimensional layout of
the array one must jump in order to get to the next array element of a speciﬁc
dimension in the N-dimensional layout. Thus, in a C-style 4 × 5 × 6 array one must
jump over 30 elements to increment the ﬁrst index by one, so 30 is the stride for
the ﬁrst dimension (sC
0 = 30). If, for each array, we deﬁne a strides tuple with N
integers, then we have pre-computed and stored an important piece of how to map
the N-dimensional index to the one-dimensional one used by the computer.
In addition to providing a pre-computed table for index mapping, by allowing
the strides tuple to consist of arbitrary integers we have provided a more general
layout for the N-dimensional array. As long as we always use the stride information
to move around in the N-dimensional array, we can use any convenient layout we
wish for the underlying representation as long as it is regular enough to be deﬁned
by constant jumps in each dimension. The ndarray object of NumPy uses this
stride information and therefore the underlying memory of an ndarray can be laid
Several algorithms in NumPy work on arbitrarily strided arrays.
However, some algorithms require single-segment arrays. When an
irregularly strided array is passed in to such algorithms, a copy is
3Our deﬁnition of stride here is an element-based stride, while the strides attribute returns a
byte-based stride. The byte-based stride is the element itemsize multiplied by the element-based
An important situation where irregularly strided arrays occur is array indexing.
Consider again Figure 2.3. In that ﬁgure a high-lighted sub-array is shown. Deﬁne
C to be the 4 × 3 C contiguous array and F to be the 3 × 4 Fortran contiguous
array. The highlighted areas can be written respectively as C[1:3,1:3] and F[1:3,1:3].
As evidenced by the corresponding highlighted region in the one-dimensional view
of the memory, these sub-arrays are neither C contiguous nor Fortran contiguous.
However, they can still be represented by an ndarray object using the same striding
tuple as the original array used. Therefore, a regular indexing expression on an
ndarray can always produce an ndarray object without copying any data. This
is sometimes referred to as the “view” feature of array indexing, and one can see
that it is enabled by the use of striding information in the underlying ndarray
object. The greatest beneﬁt of this feature is that it allows indexing to be done
very rapidly and without exploding memory usage (because no copies of the data
2.4 Universal Functions for arrays
NumPy provides a wealth of mathematical functions that operate on then ndarray
object. From algebraic functions such as addition and multiplication to trigonomet-
ric functions such as sin, and cos. Each universal function (ufunc) is an instance of
a general class so that function behavior is the same. All ufuncs perform element-
by-element operations over an array or a set of arrays (for multi-input functions).
The ufuncs themselves and their methods are documented in Part 9.
One important aspect of ufunc behavior that should be introduced early, how-
ever, is the idea of broadcasting. Broadcasting is used in several places throughout
NumPy and is therefore worth early exposure. To understand the idea of broad-
casting, you ﬁrst have to be conscious of the fact that all ufuncs are always element-
by-element operations. In other words, suppose we have a ufunc with two inputs
and one output (e.g. addition) and the inputs are both arrays of shape 4 × 6 × 5.
Then, the output is going to be 4 × 6 × 5, and will be the result of applying the
underlying function (e.g. +) to each pair of inputs to produce the output at the
corresponding N-dimensional location.
Broadcasting allows ufuncs to deal in a meaningful way with inputs that do not
have exactly the same shape. In particular, the ﬁrst rule of broadcasting is that
if all input arrays do not have the same number of dimensions, then a “1” will
be repeatedly pre-pended to the shapes of the smaller arrays until all the arrays
have the same number of dimensions. The second rule of broadcasting ensures that
arrays with a size of 1 along a particular dimension act as if they had the size of the
array with the largest shape along that dimension. The value of the array element
is assumed to be the same along that dimension for the “broadcasted” array. After
application of the broadcasting rules, the sizes of all arrays must match.
While a little tedious to explain, the broadcasting rules are easy to pick up by
looking at a couple of examples. Suppose there is a ufunc with two inputs, A and
B. Now supposed that A has shape 4 × 6 × 5 while B has shape 4 × 6 × 1. The
ufunc will proceed to compute the 4 × 6 × 5 output as if B had been 4 × 6 × 5 by
assuming that B[..., k] = B[..., 0] for k = 1, 2, 3, 4.
Another example illustrates the idea of adding 1’s to the beginning of the array
shape-tuple. Suppose A is the same as above, but B is a length 5 array. Because
of the ﬁrst rule, B will be interpreted as a 1 × 1 × 5 array, and then because of the
second rule B will be interpreted as a 4 × 6 × 5 array by repeating the elements of
B in the obvious way.
The most common alteration needed is to route-around the automatic pre-
pending of 1’s to the shape of the array. If it is desired, to add 1’s to the end
of the array shape, then dimensions can always be added using the newaxis name
in NumPy: B[..., newaxis, newaxis] returns an array with 2 additional 1’s appended
to the shape of B.
One important aspect of broadcasting is the calculation of functions on regularly
spaced grids. For example, suppose it is desired to show a portion of the multipli-
cation table by computing the function a ∗ b on a grid with a running from 6 to 9
and b running from 12 to 16. The following code illustrates how this could be done
using ufuncs and broadcasting.
>>> a = arange(6, 10); print a
[6 7 8 9]
>>> b = arange(12, 17); print b
[12 13 14 15 16]
>>> table = a[:,newaxis] * b
>>> print table
[[ 72 78 84 90 96]
[ 84 91 98 105 112]
[ 96 104 112 120 128]
[108 117 126 135 144]]
2.5 Summary of new features
More information about using arrays in Python can be found in the old Numeric doc-
umentation at http://numeric.scipy.org http://numeric.scipy.org. Quite a
bit of that documentation is still accurate, especially in the discussion of array ba-
sics. There are signiﬁcant diﬀerences, however, and this book seeks to explain them
in detail. The following list tries to summarize the signiﬁcant new features (over
Numeric) available in the ndarray and ufunc objects of NumPy:
1. more data types (all standard C-data types plus complex ﬂoats, Boolean,
string, unicode, and void *);
2. ﬂexible data types where each array can have a diﬀerent itemsize (but all
elements of the same array still have the same itemsize);
3. there is a true Python scalar type (contained in a hierarchy of types) for every
data-type an array can have;
4. data-type objects deﬁne the data-type with support for data-type objects with
ﬁelds and subarrays which allow record arrays with nested records;
5. many more array methods in addition to functional counterparts;
6. attributes more clearly distinguished from methods (attributes are intrinsic
parts of an array so that setting them changes the array itself);
7. array scalars covering all data types which inherit from Python scalars when
8. arrays can be misaligned, swapped, and in Fortran order in memory (facilitates
9. arrays can be more easily read from text ﬁles and created from buﬀers and
10. arrays can be quickly written to ﬁles in text and/or binary mode;
11. arrays support the removal of the 64-bit memory limitation as long as you
have Python 2.5 or later;
12. fancy indexing can be done on arrays using integer sequences and Boolean
13. coercion rules are altered for mixed scalar / array operations so that scalars
(anything that produces a 0-dimensional array internally) will not determine
the output type in such cases.
14. when coercion is needed, temporary buﬀer-memory allocation is limited to a
15. errors are handled through the IEEE ﬂoating point status ﬂags and there is
ﬂexibility on a per-thread level for handling these errors;
16. one can register an error callback function in Python to handle errors are set
to ’call’ for their error handling;
17. ufunc reduce, accumulate, and reduceat can take place using a diﬀerent type
then the array type if desired (without copying the entire array);
18. ufunc output arrays passed in can be a diﬀerent type than expected from the
19. ufuncs take keyword arguments which can specify 1) the error handling explic-
itly and 2) the speciﬁc 1-d loop to use by-passing the type-coercion detection.
20. arbitrary classes can be passed through ufuncs ( array wrap and
array priority expand previous array method);
21. ufuncs can be easily created from Python functions;
22. ufuncs have attributes to detail their behavior, including a dynamic doc string
that automatically generates the calling signature;
23. several new ufuncs (frexp, modf, ldexp, isnan, isﬁnite, isinf, signbit);
24. new types can be registered with the system so that specialized ufunc loops
can be written over new type objects;
25. new types can also register casting functions and rules for ﬁtting into the
26. C-API enhanced so that more of the functionality is available from compiled
27. C-API enhanced so array structure access can take place through macros;
28. new iterator objects created for easy handling in C of non-contiguous arrays;
29. new multi-iterator object created for easy handling in C of broadcasting;
30. types have more functions associated with them (no magic function lists in
the C-code). Any function needed is part of the type structure.
All of these enhancements will be documented more thoroughly in the remaining
portions of this book.
2.6 Summary of diﬀerences with Numeric
An eﬀort was made to retain backwards compatibility with Numeric all the way to
the C-level. This was mostly accomplished, with a few changes that needed to be
made for consistency of the new system. If you are just starting out with NumPy,
then this section may be skipped.
There are two steps (one required and one optional) to converting code that
works with Numeric to work fully with NumPy The ﬁrst step uses a com-
patibility layer and requires only small changes which can be handled by the
numpy.oldnumeric.alter code1 module. Code written to the compatibility layer will
work and be supported. The purpose of the compatibility layer is to make it easy to
convert to NumPy and many codes may only take this ﬁrst step and work ﬁne with
NumPy. The second step is optional as it removes dependency on the compatibility
layer and therefore requires a few more extensive changes. Many of these changes
can be performed by the numpy.oldnumeric.alter code2 module, but you may still
need to do some ﬁnal tweaking by hand. Because many users will probably be con-
tent to only use the ﬁrst step, the alter code2 module for second-stage migration
may not be as complete as it otherwise could be.
2.6.1 First-step changes
In order to use the compatibility layer there are still a few changes that need to be
made to your code. Many of these changes can be made by running the alter code1
module with your code as input.
1. Importing (the alter code1 module handles all these changes)
(a) import Numeric –> import numpy.oldnumeric as Numeric
(b) import Numeric as XX –> import numpy.oldnumeric as XX
(c) from Numeric import <name1>,...<nameN> –> from
numpy.oldnumeric import <name1>,...,<nameN>
(d) from Numeric import * –> from numpy.oldnumeric import *
(e) Similar name changes need to be made for Matrix, MLab, UserAr-
ray, LinearAlgebra, RandomArray RNG, RNG.Statistics, and FFT. The
new names are numpy.oldnumeric.<pkg> where <pkg> is matrix, mlab,
user array, linear algebra, random array, rng, rng stats, and ﬀt.
(f) multiarray and umath (if you used them directly) are now
numpy.core.multiarray and numpy.core.umath, but it is more future
proof to replace usages of these internal modules with numpy.oldnumeric.
2. Method name changes and methods converted to attributes. The alter code1
module handles all these changes.
(a) arr.typecode() –> arr.dtype.char
(b) arr.iscontiguous() –> arr.ﬂags.contiguous
(c) arr.byteswapped() –> arr.byteswap()
(d) arr.toscalar() –> arr.item()
(e) arr.itemsize() –> arr.itemsize
(f) arr.spacesaver() eliminated
(g) arr.savespace() eliminated
3. Some of the typecode characters have changed to be more consistent with
other Python modules (array and struct). You should only notice this change
if you used the actual typecode characters (instead of the named constants).
The alter code1 module will change uses of ’b’ to ’B’ for internal Numeric func-
tions that it knows about because NumPy will interpret ’b’ to mean a signed
byte type (instead of the old unsigned). It will also change the character codes
when they are used explicitly in the .astype method. In the compatibility layer
(and only in the compatibility layer), typecode-requiring function calls (e.g.
zeros, array) understand the old typecode characters.
The changes are (Numeric –> NumPy):
(a) ’b’ –> ’B’
(b) ’1’ –> ’b’
(c) ’s’ –> ’h’
(d) ’w’ –> ’H’
(e) ’u’ –> ’I’
4. arr.ﬂat now returns an indexable 1-D iterator. This behaves correctly when
passed to a function, but if you expected methods or attributes on arr.ﬂat
— besides .copy() — then you will need to replace arr.ﬂat with arr.ravel()
(copies only when necessary) or arr.ﬂatten() (always copies). The alter code1
module will change arr.ﬂat to arr.ravel() unless you used the construct arr.ﬂat
= obj or arr.ﬂat[ind].
5. If you used type-equality testing on the objects returned from arrays, then you
need to change this to isinstance testing. Thus type(a) is ﬂoat or type(a)
== ﬂoat should be changed to isinstance(a, ﬂoat). This is because array
scalar objects are now returned from arrays. These inherit from the Python
scalars where they can, but deﬁne their own methods and attributes. This
conversion is done by alter code1 for the types (ﬂoat, int, complex, and Ar-
6. If your code should produce 0-d arrays. These no-longer have a length as they
should be interpreted similarly to real scalars which don’t have a length.
7. Arrays cannot be tested for truth value unless they are empty (returns False)
or have only one element. This means that if Z: where Z is an array will fail
(unless Z is empty or has only one element). Also the ’and’ and ’or’ operations
(which test for object truth value) will also fail on arrays of more than one
element. Use the .any() and .all() methods to test for truth value of an array.
8. Masked arrays return a special nomask object instead of None when there is
no mask on the array for the functions getmask and attribute access arr.mask
9. Masked array functions have a default axis of None (meaning ravel), make
sure to specify an axis if your masked arrays are larger than 1-d.
10. If you used the construct arr.shape=<tuple>, this will not work for array
scalars (which can be returned from array operations). You cannot set the
shape of an array-scalar (you can read it though). As a result, for more general
code you should use arr=arr.reshape(<tuple>) which works for both
array-scalars and arrays.
The alter code1 script should handle the changes outlined in steps 1-5 above. The
ﬁnal incompatibilities in 6-9 are less common and must be modiﬁed by hand if
2.6.2 Second-step changes
During the second phase of migration (should it be necessary) the compatibility
layer is dropped. This phase requires additional changes to your code. There is
another conversion module (alter code2) which can help but it is not complete.
The changes required to drop dependency on the compatibility layer are
(a) numpy.oldnumeric –> numpy
(b) from numpy.oldnumeric import * –> from numpy import * (this may
clobber more names and therefore require further ﬁxes to your code but
then you didn’t do this regularly anyway did you). The recommended
procedure if this replacement causes problems is to ﬁx the use of from
numpy.oldnumeric import * to extract only the required names and then
(c) numpy.oldnumeric.mlab –> None, the functions come from other places.
(d) numpy.oldnumeric.linear algebra –> numpy.lilnalg with name changes to
the functions (made lower case and shorter).
(e) numpy.oldnumeric.random array –> numpy.random with some name
changes to the functions.
(f) numpy.oldnumeic.ﬀt –> numpy.ﬀt with some name changes to the func-
(g) numpy.oldnumeric.rng –> None
(h) numpy.oldnumeric.rng stats –> None
(i) numpy.oldnumeric.user array –> numpy.lib.user array
(j) numpy.oldnumeric.matrix –> numpy
2. The typecode names are all lower-case and refer to type-objects corresponding
to array scalars. The character codes are understood by array-creation func-
tions but are not given names. All named type constants should be replaced
with their lower-case equivalents. Also, the old character codes ’1’, ’s’, ’w’,
and ’u’ are not understood as data-types. It is probably easiest to manually
replace these with Int8, Int16, UInt16, and UInt32 and let the alter code2
script convert the names to lower-case typeobjects.
3. Keyword and argument changes
(a) All typecode= keywords must be changed to dtype=.
(b) The savespace keyword argument has been removed from all functions
where it was present (array, sarray, asarray, ones, and zeros). The sarray
function is equivalent to asarray.
4. The default data-type in NumPy is ﬂoat unlike in Numeric (and
numpy.oldnumeric) where it was int. There are several functions aﬀected
by this so that if your code was relying on the default data-type, then it must
be changed to explicitly add dtype=int.
5. The nonzero function in NumPy returns a tuple of index arrays just like
the corresponding method. There is a ﬂatnonzero function that ﬁrst ravels
the array and then returns a single index array. This function should be
interchangeable with the old use of nonzero.
6. The default axis is None (instead of 0) to match the methods for the func-
tions take, repeat, sum, average, product, sometrue, alltrue, cumsum, and
cumproduct (from Numeric) and also for the functions average, max, min,
ptp, prod, std, and mean (from MLab).
7. The default axis is None (instead of -1) to match the methods for the functions
argmin, argmax, compress
2.6.3 Updating code that uses Numeric using alter codeN
Despite the long list of changes that might be needed given above, it is likely that
your code does not use any of the incompatible corners and it should not be too
diﬃcult to convert from Numeric to NumPy. For example all of SciPy was converted
in about 2-3 days. The needed changes are largely search-and replace type changes,
and the alter codeN modules can help. The modules have two functions which help
convertﬁle (ﬁlename, orig=1)
Convert the ﬁle with the given ﬁlename to use NumPy. If orig is True, then
a backup is ﬁrst made and given the name ﬁlename.orig. Then, the ﬁle is
converted and the updated code written over the top of the old ﬁle.
convertall (direc=os.path.curdir, orig=1)
Converts all the “.py” ﬁles in the given directory to use NumPy. Backups of all
the ﬁles are ﬁrst made if orig is True as explained for the convertﬁle function.
convertsrc (direc=os.path.curdir, ext=None, orig=1)
Replace ’’Numeric/arrayobject.h’’ with ’’numpy/oldnumeric.h’’
in all ﬁles ending in the list of extensions given by ext (if ext is None, then all
ﬁles are updated). If orig is True, then ﬁrst make a backup ﬁle with “.orig”
as the extension.
Walks the tree pointed to by direc and converts all “.py” modules in each sub-
directory to use NumPy. No backups of the ﬁles are made. Also, con-
verts all .h and .c ﬁles to replace ’’Numeric/arrayobject.h’’ with
’’numpy/oldnumeric.h’’ so that NumPy is used.
2.6.4 Changes to think about
Even if you don’t make changes to your old code. If you are used to coding in
Numeric, then you may need to adjust your coding style a bit. This list provides
some helpful things to remember.
1. Switch from using typecode characters to bitwidth type names or c-type names
2. Convert use of uppercase type-names Int32, Float, etc., to lower case int32,
3. Convert use of functions to method calls where appropriate but explicitly
specify any axis arguments for arrays greater than 1-d.
4. The names for standard computations like Fourier transforms, linear algebra,
and random-number generation have changed to conform to the standard of
lower-case names possibly separated by an underscore.
5. Look for ways to take advantage of advanced slicing, but remember it always
returns a copy and may be slower at times.
6. Remove any kludges you inserted to eliminate problems with Numeric that
are now gone.
7. Look for ways to take advantage of new features like expanded data-types
8. See if you can inherit from the ndarray directly, rather than using
user array.container (UserArray). However, if you were using UserArray in
a multiple-inheritance hierarchy this is going to be more diﬃcult and you
can continue to use the standard container class in user array (but notice the
9. Watch your usage of scalars extracted from arrays. Treating Numeric arrays
like lists and then doing math on the elements 1 by 1 was always about 2x
slower than using real lists in Python. This can now be 3x-6x slower than using
lists in NumPy because of the increased complexity of both the indexing of
ndarrays and the math of array scalars. If you must select individual scalars
from NumPy, you can get some speed increases by using the item method
to get a standard Python scalar from an N-d array and by using the itemset
method to place a scalar into a particular location in the N-d array. This
complicates the appearance of the code, however. Also, when using these
methods inside a loop, be sure to assign the methods to a local variable to
avoid the attribute look-up at each loop iteration.
Throughout this book, warnings are inserted when compatibility issues with old
Numeric are raised. While you may not need to make any changes to get code to
run with the ndarray object, you will likely want to make changes to take advantage
of the new features of NumPy. If you get into a jam during the conversion process,
you should be aware that Numeric and NumPy can both be used together and they
will not interfere with each other. In addition, if you have Numeric 24.0 or newer,
they can even share the same memory. This makes it easy to use NumPy as well
as third-party tools that have not made the switch from Numeric yet.
2.7 Summary of diﬀerences with Numarray
Conversion from Numarray can also be relatively painless, depending on how de-
pendent your code is on the speciﬁc structure of the Numarray ufuncs, cfuncs,
and various array-like objects. The internals of Numarray can be quite diﬀer-
ent and so depending on how intimately you used those internals adapting to
NumPy can be more or less diﬃcult. C-code that used the Numarray C-API
can be easily adapted because NumPy includes a Numarray-compatible C-API
module. All you need to do is replace usage of “numarray/libnumarray.h” with
“numpy/libnumarray.h” and be sure the directory returned from the Python com-
mand numpy.get numarray include() is included in the list of directories used for
On the Python-side the largest number of diﬀerences are in the methods and
attributes of the array and the way array data-types are represented. In addition,
arrays containing Python Objects, strings, and records are an integral part of the
array object and not handled using a separate class (although enhanced separate
classes do exist for the case of character arrays and record arrays).
As is the case with Numeric, there is a two-step process available for migrat-
ing code written for Numarray to work with NumPy. This process involves run-
ning functions in the modules alter code1 and alter code2 located in the numar-
ray sub-package of NumPy. These modules have interfaces identical to the ones
that convert Numeric code, but they work to convert code written for numarray.
The ﬁrst module will convert your code to use the numarray compatibility module
(numpy.numarray), while the second will try and help convert code to move away
from dependency on the compatibility module. Because many users will proba-
bly be content to only use the ﬁrst step, the alter code2 module for second-stage
migration may not be as complete as it otherwise could be.
Also, the alter code1 module is not guaranteed to convert every piece of working
numarray code to use NumPy. If your code relied on the internal module structure of
numarray or on how the class hierarchy was laid out, then it will need to be changed
manually to run with NumPy. Of course you can still use your code with Numarray
installed side-by-side and the two array objects should be able to exchange data
2.7.1 First-step changes
The alter code1 script makes the following import and attribute/method changes
220.127.116.11 Import changes
• import numarray –> import numpy.numarray as numarray
• import numarray.package –> import numpy.numarray.package as numar-
ray package with all usages of numarray.package in the code replaced by nu-
• import numarray as <name> –> import numpy.numarray s <name>
• import numarray.package as <name> –> import numpy.numarray.package as
• from numarray import <names> –> from numpy.numarray import <names>
• from numarray.package import <names> –> from numpy.numarray.package
18.104.22.168 Attribute and method changes
• .imaginary –> .imag
• .ﬂat –> probably .ravel() (Many usages will still work correctly because you
can index and assign to self.ﬂat)
• .byteswapped() –> .byteswap(False)
• .byteswap() –> .byteswap(True) (Returns a reference to self instead of None).
• self.info() –> numarray.info(self)
• .isaligned() –> .ﬂags.aligned
• .isbyteswapped() –> not .dtype.isnative (the byte-order is a property of the
data-type object not the array itself in NumPy).
• .iscontiguous() –> .ﬂags.c contiguous
• .is c array() –> .dtype.isnative and .ﬂags.carray
• .is fortran contiguous() –> .ﬂags.f contiguous
• .is f array() –> .dtype.isnative and .ﬂags.farray
• .itemsize() –> .itemsize
• .nelements() –> .size
• self.new(type) –> numarray.newobj(self, type)
• .repeat(r) –> .repeat(r, axis=0)
• .size() –> .size
• .type() –> numarray.typefrom(self)
• .typecode() –> .dtype.char
• .stddev() –> .std()
• .togglebyteorder() –> numarray.togglebyteorder(self)
• .getshape() –> .shape
• .setshape(obj) –> .shape = obj
• .getﬂat() –> .ravel()
• .getreal() –> .real
• .setreal(obj) –> .real = obj
• .getimag() –> .imag
• .setimag(obj) –> .imag = obj
• .getimaginary() –> .imag
• .setimaginary(obj) –> .imag = obj
2.7.2 Second-step changes
One of the notable diﬀerences is that several functions (array, arange, fromﬁle, and
fromstring) do not take the shape= keyword argument. Instead you simply reshape
the result using the reshape method. Another notable diﬀerence is that instead of
allowing typecode=, type=, and dtype= variants for specifying the data-types, you
must use the dtype= keyword. Other diﬀerences include
• matrixmultiply(a,b) –> dot(a,b)
• innerproduct(a,b) –> inner(a,b)
• outerproduct(a,b) –> outer(a,b)
• kroneckerproduct(a,b) –> kron(a,b)
• tensormultiply(a,b) –> None
2.7.3 Additional Extension modules
There are three extension packages that come included with numarray which are
now downloaded separately. Stubs for these packages exist in numpy.numarray but
they try and ﬁnd the actual code by looking at what is currently installed. These
packages are available in SciPy but can be installed separately as well:
• nd image –> scipy.ndimage
• convolve –> scipy.stsci.convolve
• image –> scipy.stsci.image
If you don’t want to install all of scipy, you can grab just these packages from SVN
svn co http://svn.scipy.org/svn/scipy/trunk/Lib/ndimage ndimage
svn co http://svn.scipy.org/svn/scipy/trunk/Lib/stsci stsci
and then run
cd ndimage; sudo python setup.py install
cd stsci; sudo python setup.py install
On a Windows system, you can use the Tortoise SVN client which is integrated into
the Windows Explorer. It can be downloaded from http://tortoisesvn.tigris.org.
Instructions on how to use it are also provided on that site. After downloading
the packages from SVN, installation will still require a C-compiler (the mingw32
compiler works ﬁne even with MSVC-compiled Python as long as you specify –
compiler=mingw32). Alternatively you can download binary releases of scipy from
http://www.scipy.org to get the needed functionality or use the Enthon edition of
The Array Object
Don’t worry about people stealing your ideas. If your ideas are any
good, you’ll have to ram them down people’s throats.
—Howard Aiken, IBM engineer
No idea is so antiquated that it was not once modern; no idea is so
modern that it will not someday be antiquated.
3.1 ndarray Attributes
Array attributes reﬂect information that is intrinsic to the array itself. Generally,
accessing an array through its attributes allows you to get and sometimes set intrin-
sic properties of the array without creating a new array. The exposed attributes are
the core parts of an array and only some of them can be reset meaningfully without
creating a new array. Table 3.1 shows all the attributes with a brief description.
Detailed information on each attribute is given below.
Numeric Compatibility: you should check your old use of the .ﬂat
attribute. This attribute now returns an iterator object which acts
like a 1-d array in terms of indexing. while it does not share all
the attributes or methods of an array, it will be interpreted as
an array in functions that take objects and convert them to arrays.
Furthermore, Any changes in an array converted from a 1-d iterator
will be reﬂected back in the original array when the converted array
3.1.1 Memory Layout attributes
Array ﬂags provide information about how the memory area used for the array is
to be interpreted. There are 6 Boolean ﬂags in use which govern whether or
C CONTIGUOUS (C) the data is in a single, C-style contiguous segment;
F CONTIGUOUS (F) the data is in a single, Fortran-style contiguous seg-
OWNDATA (O) the array owns the memory it uses or if it borrows it from
another object (if this is False, the base attribute retrieves a reference to
the object this array obtained its data from);
WRITEABLE (W) the data area can be written to;
ALIGNED (A) the data and strides are aligned appropriately for the hard-
ware (as determined by the compiler);
UPDATEIFCOPY (U) this array is a copy of some other array (refer-
enced by .base). When this array is deallocated, the base array will be
updated with the contents of this array.
Only the UPDATEIFCOPY, WRITEABLE, and ALIGNED ﬂags can be
changed by the user. This can be done using the special array-connected,
dictionary-like object that the ﬂags attribute returns. By setting elements in
this dictionary, the underlying array obect’s ﬂags are altered. Flags can also
be changed using the method setflags(...). All ﬂags in the dictionary can
be accessed using their ﬁrst (upper case) letter as well as the full name.
Table 3.1: Attributes of the ndarray.
Attribute Settable Description
ﬂags No special array-connected dictionary-like object with
attributes showing the state of ﬂags in this ar-
ray; only the ﬂags WRITEABLE, ALIGNED, and
UPDATEIFCOPY can be modiﬁed by setting at-
tributes of this object
shape Yes tuple showing the array shape; setting this at-
tribute re-shapes the array
strides Yes tuple showing how many bytes must be jumped in
the data segment to get from one entry to the next
ndim No number of dimensions in array
data Yes buﬀer object loosely wrapping the array data (only
works for single-segment arrays)
size No total number of elements
itemsize No size (in bytes) of each element
nbytes No total number of bytes used
base No object this array is using for its data buﬀer, or
None if it owns its own memory
dtype Yes data-type object for this array
real Yes real part of the array; setting copies data to real
part of current array
imag Yes imaginary part, or read-only zero array if type is
not complex; setting works only if type is complex
ﬂat Yes one-dimensional, indexable iterator object that
acts somewhat like a 1-d array
ctypes No object to simplify the interaction of this array with
the ctypes module
array interface No dictionary with keys (data, typestr, descr, shape,
strides) for compliance with Python side of array
array struct No array interface on C-level
array priority No always 0.0 for base type ndarray
Certain logical combinations of ﬂags can also be read using named keys to the
special ﬂags dictionary. These combinations are
FNC Returns F CONTIGUOUS and not C CONTIGUOUS
FORC Returns F CONTIGUOUS or C CONTIGUOUS (one-segment test).
BEHAVED (B) Returns ALIGNED and WRITEABLE
CARRAY (CA) Returns BEHAVED and C CONTIGUOUS
FARRAY (FA) Returns BEHAVED and F CONTIGUOUS and not
The array ﬂags cannot be set arbitrarily. UPDATEIFCOPY can
only be set False. the ALIGNED ﬂag can only be set True if the
data is truly aligned. The ﬂag WRITEABLE can only be set True
if the array owns its own memory or the ultimate owner of the
memory exposes a writeable buﬀer interface (or is a string). The
exception for string is made so that unpickling can be done without
Flags can also be set and read using attribute access with the lower-
case key equivalent (without ﬁrst letter support). Thus, for example,
self.ﬂags.c contiguous returns whether or not the array is C-style contiguous,
and self.ﬂags.writeable=True changes the array to be writeable (if possible).
The shape of the array is a tuple giving the number of elements in each dimension.
The shape can be reset for single-segment arrays by setting this attribute to
another tuple. The total number of elements cannot change. However, a -1
may be used in a dimension entry to indicate that the array length in that
dimension should be computed so that the total number of elements does not
change. a.shape=x is equivalent to a=a.reshape(x) except the latter can
be used even if the array is not single-segment and even if a is an array scalar.
Setting the shape attribute to () for a 1-element array will turn self
into a 0-dimensional array. This is one of the few ways to get a
0-dimensional array in Python. Most other operations will return
an array scalar. Other ways to get a 0-dimensional array in Python
include calling array with a scalar argument and calling the squeeze
method of an array whose shape is all 1’s.
The strides of an array is a tuple showing for each dimension how many bytes
must be skipped to get to the next element in that dimension. Setting this
attribute to another tuple will change the way the memory is viewed. This
attribute can only be set to a tuple that will not cause the array to access
unavailable memory. If an attempt is made to do so, ValueError is raised.
The number of dimensions of an array is sometimes called the rank of the array.
Getting this attribute reveals the length of the shape tuple and the strides
A buﬀer object referencing the actual data for this array if this array is single-
segment. If the array is not single-segment, then an AttributeError is raised.
The buﬀer object is writeable depending on the status of self.ﬂags.writeable.
The total number of elements in the array.
The number of bytes each element of the array requires.
The total number of bytes used by the array. This is equal to
If the array does not own its own memory, then this attribute returns the object
whose memory this array is referencing. The returned object may not be the
original allocator of the memory, but may be borrowing it from still another
object. If this array does own its own memory, then None is returned unless
the UPDATEIFCOPY ﬂag is True in which case self.base is the array that
will be updated when self is deleted. UPDATEIFCOPY gets set for an array
that is created as a behaved copy of a general array. The intent is for the
misaligned array to get any changes that occur to the copy.
3.1.2 Data Type attributes
There are several ways to specify the kind of data that the array is composed of.
The fullest description that preserves ﬁeld information is always obtained using
an actual dtype object. See Chapter 7 for more discussion on data-type objects
and acceptable arguments to construct data-type objects. Three commonly-used
attributes of the data-type object returned are also documented here.
A data-type object that fully describes (including any deﬁned ﬁelds) each ﬁxed-
length item in the array. Whether or not the data is in machine byte-order
is also determined by the data-type. The data-type attribute can be set to
anything that can be interpreted as a data-type (see Chapter 7 for more
information). Setting this attribute allows you to change the interpretation of
the data in the array. The new data-type must be compatible with the array’s
current data-type. The new data-type is compatible if it has the same itemsize
as the current data-type descriptor, or (if the array is a single-segment array)
if the the array with the new data-type ﬁts in the memory already consumed
by the array.
A Python type object gives the typeobject whose instances represent elements of
the array. This type object can be used to instantiate a scalar of that type.
A typecode character unique to each of the 21 built-in types.
This string consists of a required ﬁrst character giving the “endianness” of the
data (“<” for little endian, “>” for big endian, and “|” for irrelevant), the
second character is a code for the kind of data (’b’ for Boolean, ’i’ for signed
integer, ’u’ for unsigned integer, ’f’ for ﬂoating-point, ’c’ for complex ﬂoating
point, ’O’ for object, ’S’ for ASCII string, ’U’ for unicode, and ’V’ for void),
the ﬁnal characters give the number of bytes each element uses.
3.1.3 Other attributes
Equivalent to self.transpose(). For self.ndim < 2, it returns a view of self.
The real part of an array. For arrays that are not complex this attribute returns
the array itself. Setting this attribute allows setting just the real part of an
array. If the array is already real then setting this attribute is equivalent to
self[...] = values.
A view of the imaginary part of an array. For arrays that are not complex, this
returns a read-only array of zeros. Setting this array allows in-place alteration
of the complex part of an imaginary array. If the array is not complex, then
trying to set this attribute raises an Error.
Return an iterator object (numpy.ﬂatiter) that acts like a 1-d version of the array.
1-d indexing works on this array and it can be passed in to most routines as
an array wherein a 1-d array will be constructed from it. The new 1-d array
will reference this array’s data if this array is C-style contiguous, otherwise,
new memory will be allocated for the 1-d array, the UPDATEIFCOPY ﬂag
will be set for the new array, and this array will have its WRITEABLE ﬂag
set FALSE until the the last reference to the new array disappears. When the
last reference to the new 1-d array disappears, the data will be copied over to
this non-contiguous array. This is done so that a.ﬂat eﬀectively references the
current array regardless of whether or not it is contiguous or non-contiguous.
As an example, consider the following code:
>>> a = zeros((4,5))
>>> b = ones(6)
array([[ 2., 2., 2.],
[ 2., 2., 2.]])
>>> print a
[[ 0. 0. 0. 0. 0.]
[ 2. 2. 2. 0. 0.]
[ 2. 2. 2. 0. 0.]
[ 0. 0. 0. 0. 0.]]
The numpy.ﬂatiter object has two methods: array () and copy() and one
attribute: base. The base attribute returns a reference to the underlying
The array priority attribute is a ﬂoating point number useful in mixed operations
involving two subtypes to decide which subtype is returned. The base ndarray
object has priority 0.0 and 1.0 is the default subtype priority.
3.1.4 Array Interface attributes
The array interface (sometimes called array protocol) was created in 2005 as a
means for array-like Python objects to re-use each other’s data buﬀers intelligently
whenever possible. The ndarray object supports both the Python-side and the C-
side of the array interface. The system is able to consume objects that expose the
array interface, and array objects can expose their inner workings to other objects
that support the array interface.
The python-side of the array interface. It is a dictionary with the following
data A 2-tuple (dataptr, read-only ﬂag). The dataptr is a string giving the
address (in hexadecimal format) of the array data. The read-only ﬂag is
True if the array memory is read-only.
strides The strides tuple. Same as strides attribute except None is returned
if the array is C-style contiguous.
shape The shape tuple. Same as shape attribute.
typestr A string giving the format of the data. Same as dtype.str attribute.
descr A list of tuples providing the detailed description of this data type.
This information is obtained from the arrdescr attribute of the dtypedescr
object associated with each array. For arrays with ﬁelds, this will return
a valid array-protocol descriptor list. For arrays without deﬁned ﬁelds,
this returns [(”,typestr)].
A PyCObject that wraps a pointer to a PyArrayInterface structure. This is only
useful on the C-level for rapid implementation of the array interface, using a
single attribute lookup.
This attribute creates an object that makes it easier to use arrays when calling
out to shared libraries with the ctypes module. The returned object has data,
shape, and strides attributes which return ctypes objects that can be used as
arguments to a shared library. These attributes are:
data A pointer to the memory area of the array as a Python integer. This
memory area may contain data that is not aligned, or not in correct
byte-order. The memory area may not even be writeable. The array
ﬂags and data-type of this array should be respected when passing this
attribute to arbitrary C-code to avoid trouble that can include Python
crashing. User Beware! The value of this attribute is exactly the same
as self. array interface [’data’].
shape (c intp*self.ndim) A ctypes array of length self.ndim where the base-
type is the C-integer corresponding to dtype(’p’) on this platform. This
base-type could be c int, c long, or c longlong depending on the platform.
The c intp type is deﬁned accordingly in numpy.ctypeslib. The ctypes
array contains the shape of the underlying array.
strides (c intp*self.ndim) A ctypes array of length self.ndim where the base-
type is the same as for the shape attribute. This ctypes array contains the
strides information from the underlying array. This strides information
is important for showing how many bytes must be jumped to get to the
next element in the array.
as parameter (c void p) Returns the data-pointer to the array as a ctypes
object. Among other possible uses, this enables this ctypes object to be
used directly in a ctypes-loaded call to an arbitrary function. Be sure
to respect the ﬂags on the array and the size and strides of the array so
as not to use this memory in-appropriately (see the ndpointer function
for how to return a class that can be used with the argtypes attribute of
Be careful using the ctypes attribute — especially on temporary
arrays or arrays constructed on the ﬂy. For example, calling
(a+b).ctypes.data as(ctypes.c void p) returns a pointer to memory
that is invalid because the array created as (a+b) is deallocated be-
fore the next Python statement. You can avoid this problem using
either c=a+b or ct=(a+b).ctypes. In the latter case, ct will hold
a reference to the array until ct is deleted or re-assigned.
The ctypes object also has several methods which can alter how the shape, strides,
and data of the underlying object is returned.
data as (obj) Return the data pointer cast-to a particular c-types ob-
ject. For example, calling self. as parameter is equivalent
to self.data as(ctypes.c void p). Perhaps you want to use
the data as a pointer to a ctypes array of ﬂoating-point data:
self.data as(ctypes.POINTER(ctypes.c double)).
shape as (obj) Return the shape tuple as an array of some other c-types
type. For example: self.shape as(ctypes.c short).
strides as (obj) Return the strides tuple as an array of some other c-types
type. For example: self.strides as(ctypes.c longlong).
If the ctypes module is not available, then the ctypes attribute of array objects still
returns something useful, but ctypes objects are not returned and errors may
be raised instead. In particular, the object will still have the as parameter
attribute which will return an integer equal to the data attribute.