Documente Academic
Documente Profesional
Documente Cultură
Numpy (Numeric Python) is a open source add-on module to python that provide common mathematical
and numerical routines in pre-compiled, fast functions. The NumPy package provides basic routines for
manipulating large arrays and matrices of numeric data. You can find more tutorials at
http://wiki.scipy.org/Tentative_NumPy_Tutorial (http://wiki.scipy.org/Tentative_NumPy_Tutorial). Also
check http://www.numpy.org (http://www.numpy.org) for additional informations.
Exercise time
However, for large amounts of calls to NumPy functions, it can become tedious to write numpy.X over
and over again. Instead, it is common to import under the briefer name np:
Array Creation
In [3]: # 1D array
np.array([1, 2, 3, 4])
In [4]: # 2D array
np.array([[1, 2], [3, 4]])
(array([[1, 2],
[3, 4]]), <type 'numpy.ndarray'>)
arange
A range can be quickly created with the arange method ( as indgen in IDL)
In [10]: np.arange(10)
Additional arguments enable to set the lower and upper bounds as well as the range step.
Indexing in 1D numpy arrays are accessed with the same slicing as for lists.
Reminder: [start:end:step]
In [12]: b = np.arange(10)
b[0:9:2]
Indexing in n-dimensions The first index represents the row, the second represents the column.
Dimensions need to be separated with commas ','.
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
13
21
Operations on vectors and matrices are handled easily, but upcasting (int => float) is always happening.
[[1. 1.]
[0. 0.]]
[[1 2]
[3 4]]
In [21]: np.dot(a, c)
For more complex operations on vectors and matrices, there is a submodule in numpy dedicated to
linear algebra called np.linalg for eigen vector decomposition, inverse operations, etc.
Masked Array
In [22]: a = np.array([1,2,3,4])
mask_a = np.array(a == 2)
print(a)
print(mask_a) # True when masked
[1 2 3 4]
[False True False False]
In [24]: print(a.sum())
print(masked_a.sum()) # The mask are taken into account
10
8
In [25]: masked_b = np.ma.array(a, mask=(a==1))
masked_b
Array operations
Basic operations
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
In [28]: c + 1
In [29]: c * 5 - 3
ufuncs
All common arithmetic operations are optimally implemented in numpy and take advantage of the C
machinery underneath. It means that squaring an array or computing its sum is much more efficient in
numpy than just computing it from lists. These small convenience methods are implemented as universal
functions (ufunc - http://docs.scipy.org/doc/numpy/reference/ufuncs.html
(http://docs.scipy.org/doc/numpy/reference/ufuncs.html)), like min, max, mean, std, etc.. can provide
information on the whole array
In [31]: c
In [32]: c.mean()
Out[32]: 15.0
In [33]: c.std()
Out[33]: 7.211102550927978
or in multidimensional data, can provide information along a given axis ( 0 = rows | 1 = columns )
In [34]: c.sum(axis=1)
In [35]: c.max(axis=0)
This is a simpler way of applying the np.min() or np.std() functions to the array.
Broadcasting
One of the major feature of numpy is the use of array broadcasting. Broadcasting allows operations (such
as addition, multiplication etc.) which are normally element-wise to be carried on arrays of different
shapes. It is a virtual replication of the arrays along the missing dimensions. It can be seen as a
generalization of operations involving an array and a scalar.
The addition of a scalar on an matrix can be seen as the addition of a matrix with identical elements (and
same dimensions)
In [37]: matrix + 6
The addition of a row on a matrix will be seen as the addition of a matrix with replicated rows (the
number of columns must match).
The addition of a column on a matrix will be seen as the addition of a matrix with replicated columns (the
number of rows must match).
In [40]: column = np.ones(4)
column
------------------------------------------------------------------
---------
ValueError Traceback (most recent c
all last)
<ipython-input-41-b29c50e50df5> in <module>()
----> 1 matrix + column # This will fail bec
This one failed since the righmost dimensions are different. So for columns, an additional dimension must
be specified and added on the right, indexing the array with an additional np.newaxis or simply None.
Out[42]: array([[0],
[1],
[2],
[3]])
NOTE: In the row case above, the shapes also did not match (4,5) for the matrix and (5,) for the row. The
actual rule of broadcasting is that for arrays of different rank, dimensions of length 1 are prepended
(added on the left of the array shape) until the two arrays have the same rank. For this reason, arrays with
the following shapes can be broadcasted together: (1, 1, 1, 8) and (9, 1) (4, 1, 9) and (3, 1)
For loop
Generally, we want to avoid iterating over the elements of arrays whenever we can (at all costs). The
reason is that in a interpreted language like Python (or MATLAB), iterations are really slow compared to
vectorized operations. However, sometimes iterations are unavoidable. For such cases, the Python for
loop is the most convenient way to iterate over an array:
In [5]: v = np.array([1,2,3,4])
for element in v:
print(element)
1
2
3
4
Writing numpy_data.txt