Sunteți pe pagina 1din 37

Data

Analysis
With Python

A. Beck

Introduction
Data Analysis With Python
Using
Python

Basic
Python
Arnaud Beck
Scipy

Data I/O

Visualization
Laboratoire Leprince-Ringuet, cole Polytechnique, CNRS/IN2P3

Space Science Training Week


Outline

Data
Analysis
With Python

A. Beck

Introduction
1 Introduction
Using
Python

Basic 2 Using Python


Python

Scipy

Data I/O 3 Basic Python


Visualization

4 Scipy

5 Data I/O

6 Visualization
Why come to Python ?

Data
Analysis
Should I use low-level,compiled language or an interpreted language ?
With Python Commercial or open source ?
A. Beck

Introduction
C/C++ Matlab Python
Using Easy and flexible X X
Python

Basic Performances X
Python

Scipy
Free and available on any system X X
Data I/O

Visualization
Why stick to Python ?

Data
Analysis
Python is distinguished by its large and active scientific computing community.
With Python There are people developing libraries for virtually anything.
A. Beck
Glue to other languages
Introduction

Using Libraries to interface other languages (C/C++/Fortran)...


Python
...with the same performances ! !
Basic
Python Critical part of codes are written in a lower level language.
Scipy

Data I/O Parallelization


Visualization
MPI
OpenMP
GPU

Data management and visualization

IO data in any format (HDF5, VTK, ...)


Data management dedicated libraries (scipy, pandas)
Direct visualization or interfaces with other softwares (Paraview, Mayavi)
Outline

Data
Analysis
With Python

A. Beck

Introduction
1 Introduction
Using
Python

Basic 2 Using Python


Python

Scipy

Data I/O 3 Basic Python


Visualization

4 Scipy

5 Data I/O

6 Visualization
Getting Python for data analysis

Data
Analysis
With Python

A. Beck

Basic Python distribution


Introduction

Using Available on any Linux or Mac OS.


Python

Basic
Python
Critical for data analysis
Scipy

Data I/O
Modules : Scipy, Matplotlib
Visualization

Application specific
Modules : mpi4py, VTK, pytable, etc.

It is possible to install fully pre-built scientific Python environment : Enthought


Python Distribution or Python(x,y) for Windows.
Running Python

Data
Analysis
Interactive mode in a Python shell
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy
Use of a script
Data I/O

Visualization

Turn your python script into a unix script

You can compile scripts into binary .pyc files. Mostly for developers.
IPython : a convenient and comfortable Python shell

Data
Analysis
With Python

A. Beck

Introduction
Interesting features
Using
Python Command history
Basic
Python Any Xterm command accessible via !
Scipy Commands auto-completion
Data I/O
Quick help through the use of ?
Visualization
Inline and interactive graphics
Timing and profiling tools
Many many more ...

Best tool for exploring, debugging or work interactively. Have a look !


IPython example

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Outline

Data
Analysis
With Python

A. Beck

Introduction
1 Introduction
Using
Python

Basic 2 Using Python


Python

Scipy

Data I/O 3 Basic Python


Visualization

4 Scipy

5 Data I/O

6 Visualization
Python is an object oriented language

Data
Analysis
With Python

A. Beck

Introduction
In Python, we do things with stuff !
Using
Python
things = operations
Basic
stuff = objects
Python

Scipy

Data I/O Type Example


Visualization
Numbers 128, 3.14, 4+5j
Strings 'Rony', "Giovannis"
Lists [1,"string",2.45]
Tuples (1,"string",2.45)

Strings, Lists and Tuples are sequences.


Strings and Tuples are immutable.
Numbers

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Strings
Ordered collection (or sequence) of characters

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
String Methods

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Lists
Sequence of any objects

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Slices
Manipulating sequences

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Importing modules

Data
Analysis
With Python

A. Beck
Modules define new object types and operations.
Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization

The large and growing Python users community provides an increasing number
of modules that already do what you need.
Outline

Data
Analysis
With Python

A. Beck

Introduction
1 Introduction
Using
Python

Basic 2 Using Python


Python

Scipy

Data I/O 3 Basic Python


Visualization

4 Scipy

5 Data I/O

6 Visualization
The Scipy module

Data
Analysis
With Python

A. Beck Scipy is a collection of powerful , high level functions for mathematics and data
Introduction
management. It is based on the numpy.ndarray object type and vectorized
Using
operations. The operations are optimized and coded in C to deliver high
Python performances.
Basic
Python

Scipy

Data I/O

Visualization

If you are using a for loop, you are probably doing something wrong !
Creating an ndarray

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Manipulating ndarrays

Data
Analysis
With Python

A. Beck

Introduction
Slicing is still the basis of array manipulation.
Using
Python Reshape > Change number and size of dimensions of the array.
Basic
Python
Sort > Quite self explanatory.
Scipy Delete, insert, append > Remove or add parts of the array.
Data I/O
Squeeze, flatten, ravel > More ways to control dimensionality of the array.
Visualization
Transpose,swapaxes, rollaxis > More ways to arange the dimensions as
you want

These functions are important because a well aranged data is a quickly


processed data.
Extracting information from your data

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization

Intersection (convenient for filtering)


Histograms (perfect for distribution functions)
Convolution
Integration
Interpolation
Name it ...
Outline

Data
Analysis
With Python

A. Beck

Introduction
1 Introduction
Using
Python

Basic 2 Using Python


Python

Scipy

Data I/O 3 Basic Python


Visualization

4 Scipy

5 Data I/O

6 Visualization
Reading data

Data
Analysis
With Python

A. Beck

Introduction
The whole game is to fit your data in a ndarray.
Using
Python

Basic
Python

Scipy
data = scipy.fromfile("file",dtype=float32,count=-1,sep=" ")
Data I/O

Visualization Works with raw binary files and ASCII files but not very flexible.

data = scipy.loadtxt("file",skiprows=0,delimiter=",")

More flexible but works only with text files.


The file object

Data
Analysis
With Python

A. Beck
The file object is a basic python type. It is created by
Introduction

Using
Python

Basic
Python
fid = open("filename","r")
Scipy
"r" for read, "w" for write.
Data I/O

Visualization

fid.readline() > reads a line in a string


fid.readlines() > reads all line in a list of strings
fid.tell() > returns the files current position (in byte)
fid.seek(n) > goes to position n
fid.read() > reads all file in a string
fid.close()
Manipulating a file

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Quick words about reading HDF5 files

Data
Analysis
Reading HDF5 files is module dependant. You can use either tables or h5py
With Python for instance.
A. Beck

Introduction These modules coexist well with Scipy and load data directly into ndarray.
Using
Python

Basic tables example


Python

Scipy

Data I/O

Visualization
Writing data

Data
Analysis scipy.save("file",ndarray) and scipy.load("file") in order to use
With Python
the binary scipy format to store arrays.
A. Beck
ndarray.tofile() in order to store an array in a text file or raw binary.
Introduction
fileobject.write("any_string") to write a string in a text file.
Using
Python The h5py and tables modules are used to write HDF5 files.
Basic
Python VTK script
Scipy

Data I/O

Visualization
Outline

Data
Analysis
With Python

A. Beck

Introduction
1 Introduction
Using
Python

Basic 2 Using Python


Python

Scipy

Data I/O 3 Basic Python


Visualization

4 Scipy

5 Data I/O

6 Visualization
Visualization workflow

Data
Analysis
With Python

A. Beck

Introduction
Python Python
Using Raw Data Postprocessed data Formated data file
Python

Basic
Python

Scipy

Data I/O
Python, matplotlib
Visualization
Visualization software
Paraview, Visit, Mayavi ...

Plot
Visualization
Matplotlib : the figure object

Data
Analysis
fig = figure([options])
With Python

A. Beck Options include :


Introduction Size in inches
Using
Python Dpi 1.0

Basic Face and edge colors


Python

Scipy Frame layout 0.8

Data I/O

Normalized to maximum
Visualization
Operations include :
Title and axis labels 0.6

fig.xlabel("string")
Axis ticks and extent 0.4

fig.ticks(ndarray)
Injected Charge
Display a colorbar
0.2 Bubble Size
fig.colorbar()
Laser Amplitude
Display a legend
fig.legend() 0.0
0 5 10 15

Save figure (png or eps) Propagation length [mm]

fig.savefig()
Matplotlib : Simple plots

Data
Analysis
plot(x,y,[options])
With Python

A. Beck

Introduction If x is omitted, default is x=range(len(y)).


Using
Python

Basic
All typical options are here : lines (style, color, width ...), markers (size, shape,
Python colors ...), labels for legend, antialiasing, transparency, many more ...
Scipy

Data I/O

Visualization
Matplotlib 2D plots : imshow and pcolor

Data
Analysis
With Python

A. Beck 2Dar = rand((100,100)) 2Dar = rand((100,100))


imshow(2Dar,[options]) pcolor(2Dar,[options])
Introduction

Using
Python

Basic
Python
1.0 100 1.0

Scipy
0.9 0.9
0
Data I/O
0.8 80 0.8
Visualization
20
0.7 0.7

0.6 60 0.6
40

0.5 0.5

60
0.4 40 0.4

0.3 0.3
80

0.2 20 0.2

0.1 0.1
0 20 40 60 80

0.0 0 0.0
0 20 40 60 80 100
2D plots with a little bit of tuning

Data
Analysis
With Python

A. Beck 3.0
60

Introduction

Using
40
Python

Basic
Python
20 2.0
Scipy
y [m]

Data I/O
0
Visualization

20 1.0

40

60
0.0
60 40 20 0

x ct [m]
Other features of matplotlib

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
Matplotlib has native LATEX rendering

Data
Analysis
label = r"$Math \LaTex code$"
With Python

A. Beck

Introduction

Using
Python

Basic
Python

Scipy

Data I/O

Visualization
The futur of visualization in Python

Data
Analysis
With Python

A. Beck

Introduction

Using
Python

Basic It is an extremely vast, active and changing domain.


Python

Scipy
New modules are emerging : Chaco, MayaVi, Bokeh, stressing interactivity and
Data I/O

Visualization
dynamic data visualizations in web browsers and in 3D.

What you saw today is extremely basic and is only a tiny part of what Python is
capable of.

S-ar putea să vă placă și