Sunteți pe pagina 1din 223

An Introduction to High Performance Computing

Stuart Rankin
sjr20@cam.ac.uk
High Performance Computing Service (http://www.hpc.cam.ac.uk/)
University Information Services (http://www.uis.cam.ac.uk/)
29th June 2015 / UIS Training for the MRC Cancer Unit
Health and Safety

2 of 82
Welcome

I Please sign in on the attendance sheet.


I Please fill in the online feedback at the end of the course:
http://feedback.training.cam.ac.uk/ucs/form.php
I Keep your belongings with you.
I The printer will not work.
I Please ask questions and let us know if you need assistance.

3 of 82
Plan of the Course

Part 1: Basics
Part 2: High Performance Computing Service
Part 3: Using a HPC Facility
11:00-11:30 Practical and break
12:00-13:30 Practical and break for lunch
14:00-14:30 Practical
14:45-15:15 Practical
15:30-CLOSE Further discussion

4 of 82
Plan of the Course

Part 1: Basics
Part 2: High Performance Computing Service
Part 3: Using a HPC Facility
11:00-11:30 Practical and break
12:00-13:30 Practical and break for lunch
14:00-14:30 Practical
14:45-15:15 Practical
15:30-CLOSE Further discussion

4 of 82
5

Part I: Basics
Basics: Outline

Why Buy a Big Computer?

Inside a Modern Computer

How to Build a Supercomputer

Programming a Multiprocessor Machine

6 of 82
Basics: Why Buy a Big Computer?

What types of big problem might require a Big Computer?


Compute Intensive: A single problem requiring a large amount of
computation.
Memory Intensive: A single problem requiring a large amount of
memory.
Data Intensive: Operation on a large amount of data.
High Throughput: Many unrelated problems to be executed over a
long period.

7 of 82
Basics: Why Buy a Big Computer?

What types of big problem might require a Big Computer?


Compute Intensive: A single problem requiring a large amount of
computation.
Memory Intensive: A single problem requiring a large amount of
memory.
Data Intensive: Operation on a large amount of data.
High Throughput: Many unrelated problems to be executed over a
long period.

7 of 82
Basics: Why Buy a Big Computer?

What types of big problem might require a Big Computer?


Compute Intensive: A single problem requiring a large amount of
computation.
Memory Intensive: A single problem requiring a large amount of
memory.
Data Intensive: Operation on a large amount of data.
High Throughput: Many unrelated problems to be executed over a
long period.

7 of 82
Basics: Why Buy a Big Computer?

What types of big problem might require a Big Computer?


Compute Intensive: A single problem requiring a large amount of
computation.
Memory Intensive: A single problem requiring a large amount of
memory.
Data Intensive: Operation on a large amount of data.
High Throughput: Many unrelated problems to be executed over a
long period.

7 of 82
Basics: Why Buy a Big Computer?

What types of big problem might require a Big Computer?


Compute Intensive: A single problem requiring a large amount of
computation.
Memory Intensive: A single problem requiring a large amount of
memory.
Data Intensive: Operation on a large amount of data.
High Throughput: Many unrelated problems to be executed over a
long period.

7 of 82
Basics: Compute Intensive Problems
I Distribute the work across multiple CPUs to reduce the execution
time as far as possible.
I Program workload must be parallelised.
Parallel programs split into copies (threads).
Each thread performs a part of the work on its own
CPU, concurrently with the others.
A well-parallelised program will fully exercise as
many CPUs as it has threads.
I The CPUs may need to exchange data rapidly, using specialized
hardware.
I Large systems running multiple parallel jobs also need fast access
to storage.
I Many use cases from Physics, Chemistry, Engineering, Astronomy,
Biology...
I The traditional domain of HPC and the Supercomputer.

8 of 82
Basics: Compute Intensive Problems
I Distribute the work across multiple CPUs to reduce the execution
time as far as possible.
I Program workload must be parallelised.
Parallel programs split into copies (threads).
Each thread performs a part of the work on its own
CPU, concurrently with the others.
A well-parallelised program will fully exercise as
many CPUs as it has threads.
I The CPUs may need to exchange data rapidly, using specialized
hardware.
I Large systems running multiple parallel jobs also need fast access
to storage.
I Many use cases from Physics, Chemistry, Engineering, Astronomy,
Biology...
I The traditional domain of HPC and the Supercomputer.

8 of 82
Basics: Compute Intensive Problems
I Distribute the work across multiple CPUs to reduce the execution
time as far as possible.
I Program workload must be parallelised.
Parallel programs split into copies (threads).
Each thread performs a part of the work on its own
CPU, concurrently with the others.
A well-parallelised program will fully exercise as
many CPUs as it has threads.
I The CPUs may need to exchange data rapidly, using specialized
hardware.
I Large systems running multiple parallel jobs also need fast access
to storage.
I Many use cases from Physics, Chemistry, Engineering, Astronomy,
Biology...
I The traditional domain of HPC and the Supercomputer.

8 of 82
Basics: Compute Intensive Problems
I Distribute the work across multiple CPUs to reduce the execution
time as far as possible.
I Program workload must be parallelised.
Parallel programs split into copies (threads).
Each thread performs a part of the work on its own
CPU, concurrently with the others.
A well-parallelised program will fully exercise as
many CPUs as it has threads.
I The CPUs may need to exchange data rapidly, using specialized
hardware.
I Large systems running multiple parallel jobs also need fast access
to storage.
I Many use cases from Physics, Chemistry, Engineering, Astronomy,
Biology...
I The traditional domain of HPC and the Supercomputer.

8 of 82
Basics: Compute Intensive Problems
I Distribute the work across multiple CPUs to reduce the execution
time as far as possible.
I Program workload must be parallelised.
Parallel programs split into copies (threads).
Each thread performs a part of the work on its own
CPU, concurrently with the others.
A well-parallelised program will fully exercise as
many CPUs as it has threads.
I The CPUs may need to exchange data rapidly, using specialized
hardware.
I Large systems running multiple parallel jobs also need fast access
to storage.
I Many use cases from Physics, Chemistry, Engineering, Astronomy,
Biology...
I The traditional domain of HPC and the Supercomputer.

8 of 82
Basics: Scaling & Amdahls Law

I Using more CPUs is not necessarily faster.


I Typically parallel codes have a scaling limit.
I Partly due to the system overhead of managing more threads, but
also to more basic constraints;
I Amdahls Law (slightly simplistic model)
1
S(N) = p
1p+ N
where
p is the fraction of the program which can be parallelized
N is the number of processors
S(N) is the fraction by which the program has sped up
relative to N = 1.

9 of 82
Basics: Scaling & Amdahls Law

I Using more CPUs is not necessarily faster.


I Typically parallel codes have a scaling limit.
I Partly due to the system overhead of managing more threads, but
also to more basic constraints;
I Amdahls Law (slightly simplistic model)
1
S(N) = p
1p+ N
where
p is the fraction of the program which can be parallelized
N is the number of processors
S(N) is the fraction by which the program has sped up
relative to N = 1.

9 of 82
Basics: Amdahls Law

http://en.wikipedia.org/wiki/File:AmdahlsLaw.svg

10 of 82
The Bottom Line

I Parallelisation requires effort:


I First optimise performance on one CPU.
I Then make p as large as possible.
I Eventually using more CPUs is detrimental.

11 of 82
Basics: Data Intensive Problems

I Distribute the data across multiple CPUs to process in a


reasonable time.
I Note that the same work may be done on each data segment.
I Rapid movement of data in and out of (disk) storage becomes
important.
NB Memory and storage are usually different things.
I Big Data and how to efficiently process it currently occupies much
thought.
I Life Sciences (genomics).

12 of 82
Basics: Data Intensive Problems

I Distribute the data across multiple CPUs to process in a


reasonable time.
I Note that the same work may be done on each data segment.
I Rapid movement of data in and out of (disk) storage becomes
important.
NB Memory and storage are usually different things.
I Big Data and how to efficiently process it currently occupies much
thought.
I Life Sciences (genomics).

12 of 82
Basics: Data Intensive Problems

I Distribute the data across multiple CPUs to process in a


reasonable time.
I Note that the same work may be done on each data segment.
I Rapid movement of data in and out of (disk) storage becomes
important.
NB Memory and storage are usually different things.
I Big Data and how to efficiently process it currently occupies much
thought.
I Life Sciences (genomics).

12 of 82
Basics: Data Intensive Problems

I Distribute the data across multiple CPUs to process in a


reasonable time.
I Note that the same work may be done on each data segment.
I Rapid movement of data in and out of (disk) storage becomes
important.
NB Memory and storage are usually different things.
I Big Data and how to efficiently process it currently occupies much
thought.
I Life Sciences (genomics).

12 of 82
Basics: Data Intensive Problems

I Distribute the data across multiple CPUs to process in a


reasonable time.
I Note that the same work may be done on each data segment.
I Rapid movement of data in and out of (disk) storage becomes
important.
NB Memory and storage are usually different things.
I Big Data and how to efficiently process it currently occupies much
thought.
I Life Sciences (genomics).

12 of 82
Basics: Data Intensive Problems

I Distribute the data across multiple CPUs to process in a


reasonable time.
I Note that the same work may be done on each data segment.
I Rapid movement of data in and out of (disk) storage becomes
important.
NB Memory and storage are usually different things.
I Big Data and how to efficiently process it currently occupies much
thought.
I Life Sciences (genomics).

12 of 82
Basics: High Throughput

I Distribute work across multiple CPUs to reduce the overall


execution time as far as possible.
I Workload is trivially (or embarrassingly) parallel.
Workload breaks up naturally into independent pieces.
Each piece is performed by a separate process on a separate CPU
(concurrently).
I Emphasis is on throughput over a period, rather than on
performance on a single problem.
I Obviously a supercomputer can do this too.

13 of 82
Basics: High Throughput

I Distribute work across multiple CPUs to reduce the overall


execution time as far as possible.
I Workload is trivially (or embarrassingly) parallel.
Workload breaks up naturally into independent pieces.
Each piece is performed by a separate process on a separate CPU
(concurrently).
I Emphasis is on throughput over a period, rather than on
performance on a single problem.
I Obviously a supercomputer can do this too.

13 of 82
Basics: High Throughput

I Distribute work across multiple CPUs to reduce the overall


execution time as far as possible.
I Workload is trivially (or embarrassingly) parallel.
Workload breaks up naturally into independent pieces.
Each piece is performed by a separate process on a separate CPU
(concurrently).
I Emphasis is on throughput over a period, rather than on
performance on a single problem.
I Obviously a supercomputer can do this too.

13 of 82
Basics: High Throughput

I Distribute work across multiple CPUs to reduce the overall


execution time as far as possible.
I Workload is trivially (or embarrassingly) parallel.
Workload breaks up naturally into independent pieces.
Each piece is performed by a separate process on a separate CPU
(concurrently).
I Emphasis is on throughput over a period, rather than on
performance on a single problem.
I Obviously a supercomputer can do this too.

13 of 82
Basics: Memory Intensive Problems

I Aggregate sufficient memory to enable solution at all.


I Technically more challenging if the program cannot be parallelised
efficiently.
I Historically, the arena of large SGI systems.

14 of 82
Basics: Memory Intensive Problems

I Aggregate sufficient memory to enable solution at all.


I Technically more challenging if the program cannot be parallelised
efficiently.
I Historically, the arena of large SGI systems.

14 of 82
Basics: Inside a Modern Computer

15 of 82
Basics: Inside a Modern Computer

I Even small computers now have multiple CPU cores per socket

I Larger computers have multiple sockets (each with local memory)

I CPU cores also have vector (data-parallel) acceleration


(SSE/AVX).
I Todays ordinary computer is yesterdays supercomputer (with
much of the same complication).

15 of 82
Basics: Inside a Modern Computer

I Even small computers now have multiple CPU cores per socket
= each socket contains a Symmetric Multi-Processor (SMP).
I Larger computers have multiple sockets (each with local memory)

I CPU cores also have vector (data-parallel) acceleration


(SSE/AVX).
I Todays ordinary computer is yesterdays supercomputer (with
much of the same complication).

15 of 82
Basics: Inside a Modern Computer

I Even small computers now have multiple CPU cores per socket
= each socket contains a Symmetric Multi-Processor (SMP).
I Larger computers have multiple sockets (each with local memory)

I CPU cores also have vector (data-parallel) acceleration


(SSE/AVX).
I Todays ordinary computer is yesterdays supercomputer (with
much of the same complication).

15 of 82
Basics: Inside a Modern Computer

I Even small computers now have multiple CPU cores per socket
= each socket contains a Symmetric Multi-Processor (SMP).
I Larger computers have multiple sockets (each with local memory)
= Non-Uniform Memory Architecture (NUMA).
I CPU cores also have vector (data-parallel) acceleration
(SSE/AVX).
I Todays ordinary computer is yesterdays supercomputer (with
much of the same complication).

15 of 82
Basics: Inside a Modern Computer

I Even small computers now have multiple CPU cores per socket
= each socket contains a Symmetric Multi-Processor (SMP).
I Larger computers have multiple sockets (each with local memory)
= Non-Uniform Memory Architecture (NUMA).
I CPU cores also have vector (data-parallel) acceleration
(SSE/AVX).
I Todays ordinary computer is yesterdays supercomputer (with
much of the same complication).

15 of 82
Basics: Inside a Modern Computer

I Even small computers now have multiple CPU cores per socket
= each socket contains a Symmetric Multi-Processor (SMP).
I Larger computers have multiple sockets (each with local memory)
= Non-Uniform Memory Architecture (NUMA).
I CPU cores also have vector (data-parallel) acceleration
(SSE/AVX).
I Todays ordinary computer is yesterdays supercomputer (with
much of the same complication).

15 of 82
Basics: How to Build a Supercomputer

I A supercomputer aggregates contemporary CPUs to obtain


increased computing power.
I Usually today these are clusters.

16 of 82
Basics: How to Build a Supercomputer

I A supercomputer aggregates contemporary CPUs to obtain


increased computing power.
I Usually today these are clusters.

16 of 82
Basics: How to Build a Supercomputer

1. Take some (multicore) CPUs and add some memory.


I Could be an off-the-shelf server, or something more special.
I A NUMA multiprocessor building block: a node.
I All CPU cores (unequally) share the node memory

16 of 82
Basics: How to Build a Supercomputer

1. Take some (multicore) CPUs and add some memory.


I Could be an off-the-shelf server, or something more special.
I A NUMA multiprocessor building block: a node.
I All CPU cores (unequally) share the node memory

16 of 82
Basics: How to Build a Supercomputer

1. Take some (multicore) CPUs and add some memory.


I Could be an off-the-shelf server, or something more special.
I A NUMA multiprocessor building block: a node.
I All CPU cores (unequally) share the node memory

16 of 82
Basics: How to Build a Supercomputer

1. Take some (multicore) CPUs and add some memory.


I Could be an off-the-shelf server, or something more special.
I A NUMA multiprocessor building block: a node.
I All CPU cores (unequally) share the node memory

16 of 82
Basics: How to Build a Supercomputer

1. Take some (multicore) CPUs and add some memory.


I Could be an off-the-shelf server, or something more special.
I A NUMA multiprocessor building block: a node.
I All CPU cores (unequally) share the node memory
= the node is a shared memory multiprocessor.

16 of 82
Basics: How to Build a Supercomputer

2. Connect the nodes with a


network or networks:
Gbit Ethernet: 100 MB/sec
FDR Infiniband: 5 GB/sec

Faster network is for inter-CPU


communication across nodes.
Slower network is for
management and provisioning.
Storage may use either.

17 of 82
Basics: How to Build a Supercomputer

2. Connect the nodes with a


network or networks:
Gbit Ethernet: 100 MB/sec
FDR Infiniband: 5 GB/sec

Faster network is for inter-CPU


communication across nodes.
Slower network is for
management and provisioning.
Storage may use either.

17 of 82
Basics: How to Build a Supercomputer

2. Connect the nodes with a


network or networks:
Gbit Ethernet: 100 MB/sec
FDR Infiniband: 5 GB/sec

Faster network is for inter-CPU


communication across nodes.
Slower network is for
management and provisioning.
Storage may use either.

17 of 82
Basics: How to Build a Supercomputer

2. Connect the nodes with a


network or networks:
Gbit Ethernet: 100 MB/sec
FDR Infiniband: 5 GB/sec

Faster network is for inter-CPU


communication across nodes.
Slower network is for
management and provisioning.
Storage may use either.

17 of 82
Basics: How to Build a Supercomputer

2. Connect the nodes with a


network or networks:
Gbit Ethernet: 100 MB/sec
FDR Infiniband: 5 GB/sec

Faster network is for inter-CPU


communication across nodes.
Slower network is for
management and provisioning.
Storage may use either.

17 of 82
Basics: How to Build a Supercomputer

2. Connect the nodes with a


network or networks:
Gbit Ethernet: 100 MB/sec
FDR Infiniband: 5 GB/sec

Faster network is for inter-CPU


communication across nodes.
Slower network is for
management and provisioning.
Storage may use either.

17 of 82
Basics: How to Build a Supercomputer

2. Connect the nodes with a


network or networks:
Gbit Ethernet: 100 MB/sec
FDR Infiniband: 5 GB/sec

Faster network is for inter-CPU


communication across nodes.
Slower network is for
management and provisioning.
Storage may use either.

17 of 82
Basics: How to Build a Supercomputer

3. Logically bind the nodes


I Clusters consist of distinct nodes (i.e. separate Linux computers)
on common private network(s) and managed centrally.
Clusters are distributed memory machines.
Each task sees only its local node (without help).
Each task must fit within a single nodes memory.
I More expensive machines logically bind nodes into a single Linux
system.
E.g. SGI UV.
These are shared memory machines.
Logically one big node (but very non-uniform).

18 of 82
Basics: How to Build a Supercomputer

3. Logically bind the nodes


I Clusters consist of distinct nodes (i.e. separate Linux computers)
on common private network(s) and managed centrally.
Clusters are distributed memory machines.
Each task sees only its local node (without help).
Each task must fit within a single nodes memory.
I More expensive machines logically bind nodes into a single Linux
system.
E.g. SGI UV.
These are shared memory machines.
Logically one big node (but very non-uniform).

18 of 82
Basics: How to Build a Supercomputer

3. Logically bind the nodes


I Clusters consist of distinct nodes (i.e. separate Linux computers)
on common private network(s) and managed centrally.
Clusters are distributed memory machines.
Each task sees only its local node (without help).
Each task must fit within a single nodes memory.
I More expensive machines logically bind nodes into a single Linux
system.
E.g. SGI UV.
These are shared memory machines.
Logically one big node (but very non-uniform).

18 of 82
Basics: How to Build a Supercomputer

3. Logically bind the nodes


I Clusters consist of distinct nodes (i.e. separate Linux computers)
on common private network(s) and managed centrally.
Clusters are distributed memory machines.
Each task sees only its local node (without help).
Each task must fit within a single nodes memory.
I More expensive machines logically bind nodes into a single Linux
system.
E.g. SGI UV.
These are shared memory machines.
Logically one big node (but very non-uniform).

18 of 82
Basics: How to Build a Supercomputer

3. Logically bind the nodes


I Clusters consist of distinct nodes (i.e. separate Linux computers)
on common private network(s) and managed centrally.
Clusters are distributed memory machines.
Each task sees only its local node (without help).
Each task must fit within a single nodes memory.
I More expensive machines logically bind nodes into a single Linux
system.
E.g. SGI UV.
These are shared memory machines.
Logically one big node (but very non-uniform).

18 of 82
Basics: Programming a Multiprocessor Machine

I Non-parallel (serial) code


For a single node as for a workstation.
Typically run as many copies per node as cores, assuming node
memory is sufficent.
Replicate across multiple nodes.

19 of 82
Basics: Programming a Multiprocessor Machine

I Non-parallel (serial) code


For a single node as for a workstation.
Typically run as many copies per node as cores, assuming node
memory is sufficent.
Replicate across multiple nodes.

19 of 82
Basics: Programming a Multiprocessor Machine

I Non-parallel (serial) code


For a single node as for a workstation.
Typically run as many copies per node as cores, assuming node
memory is sufficent.
Replicate across multiple nodes.

19 of 82
Basics: Programming a Multiprocessor Machine

I Non-parallel (serial) code


For a single node as for a workstation.
Typically run as many copies per node as cores, assuming node
memory is sufficent.
Replicate across multiple nodes.

19 of 82
Basics: Programming a Multiprocessor Machine

I Parallel code
Shared memory methods within a node.
E.g. pthreads, OpenMP.
Distributed memory methods between nodes.
Message Passing Interface (MPI).

19 of 82
Basics: Programming a Multiprocessor Machine

I Parallel code
Shared memory methods within a node.
E.g. pthreads, OpenMP.
Distributed memory methods between nodes.
Message Passing Interface (MPI).

19 of 82
Basics: Programming a Multiprocessor Machine

I Parallel code
Shared memory methods within a node.
E.g. pthreads, OpenMP.
Distributed memory methods between nodes.
Message Passing Interface (MPI).

19 of 82
Basics: Summary

I Why have a supercomputer?


I Big problems, long problems, many problems, Big Data.
I Most current supercomputers are clusters of separate nodes.
I Each node has multiple cores and non-uniform shared memory.
I Parallel code uses shared memory (pthreads/OpenMP) within a
node, distributed memory (MPI) between nodes.
I Non-parallel code uses the memory of one node, but may be
copied across many.

20 of 82
Basics: Summary

I Why have a supercomputer?


I Big problems, long problems, many problems, Big Data.
I Most current supercomputers are clusters of separate nodes.
I Each node has multiple cores and non-uniform shared memory.
I Parallel code uses shared memory (pthreads/OpenMP) within a
node, distributed memory (MPI) between nodes.
I Non-parallel code uses the memory of one node, but may be
copied across many.

20 of 82
Basics: Summary

I Why have a supercomputer?


I Big problems, long problems, many problems, Big Data.
I Most current supercomputers are clusters of separate nodes.
I Each node has multiple cores and non-uniform shared memory.
I Parallel code uses shared memory (pthreads/OpenMP) within a
node, distributed memory (MPI) between nodes.
I Non-parallel code uses the memory of one node, but may be
copied across many.

20 of 82
Basics: Summary

I Why have a supercomputer?


I Big problems, long problems, many problems, Big Data.
I Most current supercomputers are clusters of separate nodes.
I Each node has multiple cores and non-uniform shared memory.
I Parallel code uses shared memory (pthreads/OpenMP) within a
node, distributed memory (MPI) between nodes.
I Non-parallel code uses the memory of one node, but may be
copied across many.

20 of 82
Basics: Summary

I Why have a supercomputer?


I Big problems, long problems, many problems, Big Data.
I Most current supercomputers are clusters of separate nodes.
I Each node has multiple cores and non-uniform shared memory.
I Parallel code uses shared memory (pthreads/OpenMP) within a
node, distributed memory (MPI) between nodes.
I Non-parallel code uses the memory of one node, but may be
copied across many.

20 of 82
Basics: Summary

I Why have a supercomputer?


I Big problems, long problems, many problems, Big Data.
I Most current supercomputers are clusters of separate nodes.
I Each node has multiple cores and non-uniform shared memory.
I Parallel code uses shared memory (pthreads/OpenMP) within a
node, distributed memory (MPI) between nodes.
I Non-parallel code uses the memory of one node, but may be
copied across many.

20 of 82
21

Part II: The High Performance Computing Service


HPCS: Outline

A Brief History

Darwin - an Infiniband CPU Cluster

Wilkes - a Dual-Rail Infiniband GPU Cluster

The CU Cluster

Other Activities

Recent Developments

22 of 82
HPCS: A Brief History

Created: 1996 (as the HPCF).


Mission: Delivery and support of a large HPC resource for use by
the University of Cambridge research community.
Self-funding: Paying and non-paying service levels.
User base: Includes DiRAC (STFC) and industrial users.
Plus: Hosted clusters and research projects.

Absorbed into the UIS in 2014 (part of Research &


Institutional Services).

23 of 82
HPCS: A Brief History

Created: 1996 (as the HPCF).


Mission: Delivery and support of a large HPC resource for use by
the University of Cambridge research community.
Self-funding: Paying and non-paying service levels.
User base: Includes DiRAC (STFC) and industrial users.
Plus: Hosted clusters and research projects.

Absorbed into the UIS in 2014 (part of Research &


Institutional Services).

23 of 82
HPCS: A Brief History

Created: 1996 (as the HPCF).


Mission: Delivery and support of a large HPC resource for use by
the University of Cambridge research community.
Self-funding: Paying and non-paying service levels.
User base: Includes DiRAC (STFC) and industrial users.
Plus: Hosted clusters and research projects.

Absorbed into the UIS in 2014 (part of Research &


Institutional Services).

23 of 82
HPCS: A Brief History

Created: 1996 (as the HPCF).


Mission: Delivery and support of a large HPC resource for use by
the University of Cambridge research community.
Self-funding: Paying and non-paying service levels.
User base: Includes DiRAC (STFC) and industrial users.
Plus: Hosted clusters and research projects.

Absorbed into the UIS in 2014 (part of Research &


Institutional Services).

23 of 82
HPCS: A Brief History

1997 76.8 Gflop/s


2002 1.4 Tflop/s
2006 18.27 Tflop/s
2010 30 Tflop/s
2012 183.38 Tflop/s
2013 183.38 CPU + 239.90 GPU Tflop/s

http://www.top500.org

24 of 82
HPCS: A Brief History

1997 76.8 Gflop/s


2002 1.4 Tflop/s
2006 18.27 Tflop/s
2010 30 Tflop/s
2012 183.38 Tflop/s
2013 183.38 CPU + 239.90 GPU Tflop/s

http://www.top500.org

24 of 82
Darwin1 (20062012)

25 of 82
Darwin3 (2012)(b) & Wilkes (2013)(f)

26 of 82
Darwin3 (2012)(b) & Wilkes (2013)(f)

26 of 82
Darwin: an Infiniband CPU Cluster

I Each compute node:


2x8 cores, Intel Sandy Bridge 2.6 GHz.
64 GB RAM (63900 MB usable).
56 Gb/sec (4X FDR) Infiniband.
I 600 compute nodes (300 belong to Cambridge).
I 8 login nodes (login.hpc.cam.ac.uk).

27 of 82
Darwin: an Infiniband CPU Cluster

I Each compute node:


16 cores
63900 MB
5 GB/sec Infiniband (for MPI and storage)
I 600 compute nodes (300 belong to Cambridge).
I 8 login nodes (login.hpc.cam.ac.uk).

27 of 82
Wilkes: a Dual-Rail Infiniband GPU Cluster

I Each compute node:


2 NVIDIA Tesla K20c GPU.
2x6 cores, Intel Ivy Bridge 2.6 GHz.
64 GB RAM (63900 MB usable).
2 56 Gb/sec (4X FDR) Infiniband.
I 128 compute nodes.
I 8 login nodes (login.hpc.cam.ac.uk).
I Environment shared with Darwin (same filesystems, user
environment, scheduler).

28 of 82
Wilkes: a Dual-Rail Infiniband GPU Cluster

I Each compute node:


2 GPUs
12 cores
63900 MB
2 5 GB/sec Infiniband (for MPI and storage)
I 128 compute nodes.
I 8 login nodes (login.hpc.cam.ac.uk).
I Environment shared with Darwin (same filesystems, user
environment, scheduler).

28 of 82
Wilkes: a Dual-Rail Infiniband GPU Cluster

I Each compute node:


2 GPUs
12 cores
63900 MB
2 5 GB/sec Infiniband (for MPI and storage)
I 128 compute nodes.
I 8 login nodes (login.hpc.cam.ac.uk).
I Environment shared with Darwin (same filesystems, user
environment, scheduler).

28 of 82
HPCS Production Cluster Schematic

29 of 82
The CU Cluster

I Each compute node:


2x8 cores, Intel Sandy Bridge 2.6 GHz.
64 GB RAM (63900 MB usable).
56 Gb/sec (4X FDR) Infiniband.
1 Gb/sec Ethernet.
I 15 compute nodes.
I 1 login node (192.168.43.182).
I 14 TB NFS storage (2 TB /home + 12 TB /scratch)

30 of 82
The CU Cluster

I Each compute node:


16 cores
63900 MB
5 GB/sec Infiniband (for MPI)
100 MB/sec Ethernet (for storage)
I 15 compute nodes.
I 1 login node (192.168.43.182).
I 14 TB NFS storage (2 TB /home + 12 TB /scratch)

30 of 82
HPCS: Other Activities

I Hosted clusters
e.g. MRC BSU, Cardiovascular Epidemiology, Whittle Lab.
I Research projects
e.g. Square Kilometre Array, Jaguar Land Rover.
I Integration and consultancy services
I Industrial services

31 of 82
HPCS: Other Activities

I Hosted clusters
e.g. MRC BSU, Cardiovascular Epidemiology, Whittle Lab.
I Research projects
e.g. Square Kilometre Array, Jaguar Land Rover.
I Integration and consultancy services
I Industrial services

31 of 82
HPCS: Other Activities

I Hosted clusters
e.g. MRC BSU, Cardiovascular Epidemiology, Whittle Lab.
I Research projects
e.g. Square Kilometre Array, Jaguar Land Rover.
I Integration and consultancy services
I Industrial services

31 of 82
HPCS: Other Activities

I Hosted clusters
e.g. MRC BSU, Cardiovascular Epidemiology, Whittle Lab.
I Research projects
e.g. Square Kilometre Array, Jaguar Land Rover.
I Integration and consultancy services
I Industrial services

31 of 82
HPCS: Recent Developments

I Closer integration with UIS services (e.g. UIS Password).

I Relocated during February 2015 to the West Cambridge Data


Centre.
I UIS new organisational structure in May
HPCS is a subdivision of Research & Institutional Services.
I New services trialled soon as part of the CBC Pilot Project:
I Virtual Server Service

32 of 82
HPCS: Recent Developments

I Closer integration with UIS services (e.g. UIS Password).

I Relocated during February 2015 to the West Cambridge Data


Centre.
I UIS new organisational structure in May
HPCS is a subdivision of Research & Institutional Services.
I New services trialled soon as part of the CBC Pilot Project:
I Virtual Server Service

32 of 82
HPCS: Recent Developments

I Closer integration with UIS services (e.g. UIS Password).

I Relocated during February 2015 to the West Cambridge Data


Centre.
I UIS new organisational structure in May
HPCS is a subdivision of Research & Institutional Services.
I New services trialled soon as part of the CBC Pilot Project:
I Virtual Server Service

32 of 82
HPCS: Recent Developments

I Closer integration with UIS services (e.g. UIS Password).

I Relocated during February 2015 to the West Cambridge Data


Centre.
I UIS new organisational structure in May
HPCS is a subdivision of Research & Institutional Services.
I New services trialled soon as part of the CBC Pilot Project:
I Virtual Server Service

32 of 82
HPCS: Recent Developments

I Closer integration with UIS services (e.g. UIS Password).

I Relocated during February 2015 to the West Cambridge Data


Centre.
I UIS new organisational structure in May
HPCS is a subdivision of Research & Institutional Services.
I New services trialled soon as part of the CBC Pilot Project:
I Virtual Server Service

32 of 82
The West Cambridge Data Centre

33 of 82
The West Cambridge Data Centre: Hall 1

34 of 82
The West Cambridge Data Centre: Hall 1

34 of 82
35

Part III: Using HPC


Using HPC: Outline

Security

Connecting

User Environment

Software

Job Submission

36 of 82
Using HPC: Security

I Cambridge IT is under constant attack by would-be intruders.


I Big systems are big, juicy targets.
I Your data and research career is threatened by intruders.
I Dont let intruders in.

37 of 82
Using HPC: Security

I Cambridge IT is under constant attack by would-be intruders.


I Big systems are big, juicy targets.
I Your data and research career is threatened by intruders.
I Dont let intruders in.

37 of 82
Using HPC: Security

I Cambridge IT is under constant attack by would-be intruders.


I Big systems are big, juicy targets.
I Your data and research career is threatened by intruders.
I Dont let intruders in.

37 of 82
Using HPC: Security

I Cambridge IT is under constant attack by would-be intruders.


I Big systems are big, juicy targets.
I Your data and research career is threatened by intruders.
I Dont let intruders in.

37 of 82
Using HPC: Security

1. Keep your password (or private key passphrase) safe.


2. Keep the software on your laptops and PCs up to date.
3. Dont share accounts.
4. Never connect from untrusted machines (e.g. internet cafes).
5. Always use SSH (never rlogin or telnet).
6. Never, ever do xhost +.

38 of 82
Using HPC: Security

1. Keep your password (or private key passphrase) safe.


2. Keep the software on your laptops and PCs up to date.
3. Dont share accounts.
4. Never connect from untrusted machines (e.g. internet cafes).
5. Always use SSH (never rlogin or telnet).
6. Never, ever do xhost +.

38 of 82
Using HPC: Security

1. Keep your password (or private key passphrase) safe.


2. Keep the software on your laptops and PCs up to date.
3. Dont share accounts.
4. Never connect from untrusted machines (e.g. internet cafes).
5. Always use SSH (never rlogin or telnet).
6. Never, ever do xhost +.

38 of 82
Using HPC: Security

1. Keep your password (or private key passphrase) safe.


2. Keep the software on your laptops and PCs up to date.
3. Dont share accounts.
4. Never connect from untrusted machines (e.g. internet cafes).
5. Always use SSH (never rlogin or telnet).
6. Never, ever do xhost +.

38 of 82
Using HPC: Security

1. Keep your password (or private key passphrase) safe.


2. Keep the software on your laptops and PCs up to date.
3. Dont share accounts.
4. Never connect from untrusted machines (e.g. internet cafes).
5. Always use SSH (never rlogin or telnet).
6. Never, ever do xhost +.

38 of 82
Using HPC: Security

1. Keep your password (or private key passphrase) safe.


2. Keep the software on your laptops and PCs up to date.
3. Dont share accounts.
4. Never connect from untrusted machines (e.g. internet cafes).
5. Always use SSH (never rlogin or telnet).
6. Never, ever do xhost +.

38 of 82
Using HPC: Connecting

I SSH secure protocol only.

I The CU cluster currently follows the same pattern as the HPCS


(but you have an extra firewall).
I HPCS allows access from registered IP addresses only.

39 of 82
Using HPC: Connecting

I SSH secure protocol only.


Supports login, file transfer, remote desktop. . .
I The CU cluster currently follows the same pattern as the HPCS
(but you have an extra firewall).
I HPCS allows access from registered IP addresses only.

39 of 82
Using HPC: Connecting

I SSH secure protocol only.


Supports login, file transfer, remote desktop. . .
I The CU cluster currently follows the same pattern as the HPCS
(but you have an extra firewall).
I HPCS allows access from registered IP addresses only.

39 of 82
Using HPC: Connecting

I SSH secure protocol only.


Supports login, file transfer, remote desktop. . .
I The CU cluster currently follows the same pattern as the HPCS
(but you have an extra firewall).
I HPCS allows access from registered IP addresses only.

39 of 82
Using HPC: Connecting

I SSH secure protocol only.


Supports login, file transfer, remote desktop. . .
I The CU cluster currently follows the same pattern as the HPCS
(but you have an extra firewall).
I HPCS allows access from registered IP addresses only.
Almost all Cambridge University addresses already registered.

39 of 82
Using HPC: Connecting

I SSH secure protocol only.


Supports login, file transfer, remote desktop. . .
I The CU cluster currently follows the same pattern as the HPCS
(but you have an extra firewall).
I HPCS allows access from registered IP addresses only.
Almost all Cambridge University addresses already registered.
Connection from home possible via the VPN service
http://www.ucs.cam.ac.uk/vpn

39 of 82
Using HPC: Connecting

I SSH secure protocol only.


Supports login, file transfer, remote desktop. . .
I The CU cluster currently follows the same pattern as the HPCS
(but you have an extra firewall).
I HPCS allows access from registered IP addresses only.
Almost all Cambridge University addresses already registered.
Connection from home possible via the VPN service
http://www.ucs.cam.ac.uk/vpn
or SSH tunnel through a departmental gateway.

39 of 82
Connecting: Windows Clients

I putty, pscp, psftp


http://www.chiark.greenend.org.uk/ sgtatham/putty/download.html
I WinSCP
http://winscp.net/eng/download.php
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I Cygwin
http://cygwin.com/install.html

I MobaXterm
http://mobaxterm.mobatek.net/

40 of 82
Connecting: Windows Clients

I putty, pscp, psftp


http://www.chiark.greenend.org.uk/ sgtatham/putty/download.html
I WinSCP
http://winscp.net/eng/download.php
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I Cygwin
http://cygwin.com/install.html

I MobaXterm
http://mobaxterm.mobatek.net/

40 of 82
Connecting: Windows Clients

I putty, pscp, psftp


http://www.chiark.greenend.org.uk/ sgtatham/putty/download.html
I WinSCP
http://winscp.net/eng/download.php
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I Cygwin
http://cygwin.com/install.html

I MobaXterm
http://mobaxterm.mobatek.net/

40 of 82
Connecting: Windows Clients

I putty, pscp, psftp


http://www.chiark.greenend.org.uk/ sgtatham/putty/download.html
I WinSCP
http://winscp.net/eng/download.php
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I Cygwin
http://cygwin.com/install.html

I MobaXterm
http://mobaxterm.mobatek.net/

40 of 82
Connecting: Windows Clients

I putty, pscp, psftp


http://www.chiark.greenend.org.uk/ sgtatham/putty/download.html
I WinSCP
http://winscp.net/eng/download.php
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I Cygwin (provides an application environment similar to Linux)
http://cygwin.com/install.html
Includes X server for displaying graphical applications running remotely.
I MobaXterm
http://mobaxterm.mobatek.net/

40 of 82
Connecting: Windows Clients

I putty, pscp, psftp


http://www.chiark.greenend.org.uk/ sgtatham/putty/download.html
I WinSCP
http://winscp.net/eng/download.php
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I Cygwin
http://cygwin.com/install.html

I MobaXterm
http://mobaxterm.mobatek.net/

40 of 82
Connecting: Linux/MacOSX/UNIX Clients

I ssh, scp, sftp, rsync


Installed (or installable).
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I On MacOSX, install XQuartz to display remote graphical
applications.
http://xquartz.macosforge.org/landing/

41 of 82
Connecting: Linux/MacOSX/UNIX Clients

I ssh, scp, sftp, rsync


Installed (or installable).
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I On MacOSX, install XQuartz to display remote graphical
applications.
http://xquartz.macosforge.org/landing/

41 of 82
Connecting: Linux/MacOSX/UNIX Clients

I ssh, scp, sftp, rsync


Installed (or installable).
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I On MacOSX, install XQuartz to display remote graphical
applications.
http://xquartz.macosforge.org/landing/

41 of 82
Connecting: Linux/MacOSX/UNIX Clients

I ssh, scp, sftp, rsync


Installed (or installable).
I TurboVNC (remote desktop, 3D optional)
http://sourceforge.net/projects/turbovnc/files/
I On MacOSX, install XQuartz to display remote graphical
applications.
http://xquartz.macosforge.org/landing/

41 of 82
Connecting: Login

I For the CU cluster, replace login.hpc.cam.ac.uk with


192.168.43.182.
I From Linux/MacOSX/UNIX (or Cygwin):
ssh -Y abc123@login.hpc.cam.ac.uk
I From graphical clients:
Host: login.hpc.cam.ac.uk
Username: abc123 (your local account name)
I Dont connect to the head node (darwin.hpc in our case).
I Non-registered addresses will fail with Connection refused.

42 of 82
Connecting: Login

I For the CU cluster, replace login.hpc.cam.ac.uk with


192.168.43.182.
I From Linux/MacOSX/UNIX (or Cygwin):
ssh -Y abc123@login.hpc.cam.ac.uk
I From graphical clients:
Host: login.hpc.cam.ac.uk
Username: abc123 (your local account name)
I Dont connect to the head node (darwin.hpc in our case).
I Non-registered addresses will fail with Connection refused.

42 of 82
Connecting: Login

I For the CU cluster, replace login.hpc.cam.ac.uk with


192.168.43.182.
I From Linux/MacOSX/UNIX (or Cygwin):
ssh -Y abc123@login.hpc.cam.ac.uk
I From graphical clients:
Host: login.hpc.cam.ac.uk
Username: abc123 (your local account name)
I Dont connect to the head node (darwin.hpc in our case).
I Non-registered addresses will fail with Connection refused.

42 of 82
Connecting: Login

I For the CU cluster, replace login.hpc.cam.ac.uk with


192.168.43.182.
I From Linux/MacOSX/UNIX (or Cygwin):
ssh -Y abc123@login.hpc.cam.ac.uk
I From graphical clients:
Host: login.hpc.cam.ac.uk
Username: abc123 (your local account name)
I Dont connect to the head node (darwin.hpc in our case).
I Non-registered addresses will fail with Connection refused.

42 of 82
Connecting: Login

I For the CU cluster, replace login.hpc.cam.ac.uk with


192.168.43.182.
I From Linux/MacOSX/UNIX (or Cygwin):
ssh -Y abc123@login.hpc.cam.ac.uk
I From graphical clients:
Host: login.hpc.cam.ac.uk
Username: abc123 (your local account name)
I Dont connect to the head node (darwin.hpc in our case).
I Non-registered addresses will fail with Connection refused.

42 of 82
Connecting: First time login

I The first connection to a particular hostname produces the


following:
The authenticity of host login-sand2.hpc.cam.ac.uk (131.111.1.214)
cant be established.
RSA key fingerprint is
0b:ef:59:90:fb:13:4a:c9:56:82:7b:cd:4b:2b:e1:3b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added login-sand2.hpc.cam.ac.uk (RSA) to the
list of known hosts.

I One should always check the fingerprint before typing yes.


I Graphical SSH clients should ask a similar question.
I Designed to detect fraudulent servers.
I Exercise 1 - Log into your HPCS training account.

43 of 82
Connecting: First time login

I The first connection to a particular hostname produces the


following:
The authenticity of host login-sand2.hpc.cam.ac.uk (131.111.1.214)
cant be established.
RSA key fingerprint is
0b:ef:59:90:fb:13:4a:c9:56:82:7b:cd:4b:2b:e1:3b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added login-sand2.hpc.cam.ac.uk (RSA) to the
list of known hosts.

I One should always check the fingerprint before typing yes.


I Graphical SSH clients should ask a similar question.
I Designed to detect fraudulent servers.
I Exercise 1 - Log into your HPCS training account.

43 of 82
Connecting: First time login

I The first connection to a particular hostname produces the


following:
The authenticity of host login-sand2.hpc.cam.ac.uk (131.111.1.214)
cant be established.
RSA key fingerprint is
0b:ef:59:90:fb:13:4a:c9:56:82:7b:cd:4b:2b:e1:3b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added login-sand2.hpc.cam.ac.uk (RSA) to the
list of known hosts.

I One should always check the fingerprint before typing yes.


I Graphical SSH clients should ask a similar question.
I Designed to detect fraudulent servers.
I Exercise 1 - Log into your HPCS training account.

43 of 82
MobaXterm SSH (Windows)

44 of 82
MobaXterm SSH (Windows)

44 of 82
Connecting: File Transfer

I From Linux/MacOSX/UNIX (or Cygwin):


rsync -av old directory/ abc123@login.hpc.cam.ac.uk:scratch/new directory
copies contents of old directory to/scratch/new directory.
rsync -av old directory abc123@login.hpc.cam.ac.uk:scratch/new directory
copies old directory (and contents) to
/scratch/new directory/old directory.
Rerun to update or resume after interruption.
All transfers are checksummed.
For transfers in the opposite direction, place the remote machine as
the first argument.
I With graphical clients, connect as before and drag and drop.
I Exercise 2 - File transfer.

45 of 82
Connecting: File Transfer

I From Linux/MacOSX/UNIX (or Cygwin):


rsync -av old directory/ abc123@login.hpc.cam.ac.uk:scratch/new directory
copies contents of old directory to/scratch/new directory.
rsync -av old directory abc123@login.hpc.cam.ac.uk:scratch/new directory
copies old directory (and contents) to
/scratch/new directory/old directory.
Rerun to update or resume after interruption.
All transfers are checksummed.
For transfers in the opposite direction, place the remote machine as
the first argument.
I With graphical clients, connect as before and drag and drop.
I Exercise 2 - File transfer.

45 of 82
Connecting: File Transfer

I From Linux/MacOSX/UNIX (or Cygwin):


rsync -av old directory/ abc123@login.hpc.cam.ac.uk:scratch/new directory
copies contents of old directory to/scratch/new directory.
rsync -av old directory abc123@login.hpc.cam.ac.uk:scratch/new directory
copies old directory (and contents) to
/scratch/new directory/old directory.
Rerun to update or resume after interruption.
All transfers are checksummed.
For transfers in the opposite direction, place the remote machine as
the first argument.
I With graphical clients, connect as before and drag and drop.
I Exercise 2 - File transfer.

45 of 82
Connecting: File Transfer

I From Linux/MacOSX/UNIX (or Cygwin):


rsync -av old directory/ abc123@login.hpc.cam.ac.uk:scratch/new directory
copies contents of old directory to/scratch/new directory.
rsync -av old directory abc123@login.hpc.cam.ac.uk:scratch/new directory
copies old directory (and contents) to
/scratch/new directory/old directory.
Rerun to update or resume after interruption.
All transfers are checksummed.
For transfers in the opposite direction, place the remote machine as
the first argument.
I With graphical clients, connect as before and drag and drop.
I Exercise 2 - File transfer.

45 of 82
Connecting: File Transfer

I From Linux/MacOSX/UNIX (or Cygwin):


rsync -av old directory/ abc123@login.hpc.cam.ac.uk:scratch/new directory
copies contents of old directory to/scratch/new directory.
rsync -av old directory abc123@login.hpc.cam.ac.uk:scratch/new directory
copies old directory (and contents) to
/scratch/new directory/old directory.
Rerun to update or resume after interruption.
All transfers are checksummed.
For transfers in the opposite direction, place the remote machine as
the first argument.
I With graphical clients, connect as before and drag and drop.
I Exercise 2 - File transfer.

45 of 82
Connecting: Remote Desktop

I First time starting a remote desktop:

[sjr20@login-sand2 ~]$ vncserver

You will require a password to access your desktops.

Password:
Verify:
Would you like to enter a view-only password (y/n)? n

New X desktop is login-sand2:8

Starting applications specified in /home/sjr20/.vnc/xstartup.turbovnc


Log file is /home/sjr20/.vnc/login-sand2:8.log

I For 3D graphics sessions, use login-gfx1.

46 of 82
Connecting: Remote Desktop

I First time starting a remote desktop:

[sjr20@login-sand2 ~]$ vncserver

You will require a password to access your desktops.

Password:
Verify:
Would you like to enter a view-only password (y/n)? n

New X desktop is login-sand2:8

Starting applications specified in /home/sjr20/.vnc/xstartup.turbovnc


Log file is /home/sjr20/.vnc/login-sand2:8.log

I For 3D graphics sessions, use login-gfx1.

46 of 82
Connecting: Remote Desktop

I Remote desktop already running:

[sjr20@login-sand2 ~]$ vncserver -list

TurboVNC server sessions:

X DISPLAY # PROCESS ID
:8 12745

I Kill it:

[sjr20@login-sand2 ~]$ vncserver -kill :8


Killing Xvnc process ID 12745

I Typically you only need one remote desktop.


I Keeps running until killed, or the node reboots.

47 of 82
Connecting: Remote Desktop

I To connect to the desktop from Linux:

[sjr20@themis ~]$ vncviewer -via sjr20@login-sand2.hpc.cam.ac.uk localhost:8


Connected to RFB server, using protocol version 3.8
Enabling TightVNC protocol extensions
Performing standard VNC authentication
Password:

I Press F8 to bring up the control panel.


I Exercise 3 - Remote desktop.

48 of 82
Connecting: Remote Desktop

I To connect to the desktop from Linux:

[sjr20@themis ~]$ vncviewer -via sjr20@login-sand2.hpc.cam.ac.uk localhost:8


Connected to RFB server, using protocol version 3.8
Enabling TightVNC protocol extensions
Performing standard VNC authentication
Password:

I Press F8 to bring up the control panel.


I Exercise 3 - Remote desktop.

48 of 82
HPCS TurboVNC Session

49 of 82
Linux TurboVNC Control Panel

50 of 82
Connecting: Remote Desktop (MobaXterm)

51 of 82
Connecting: Remote Desktop (MobaXterm)

51 of 82
3D Remote Visualization

52 of 82
3D Remote Visualization

I Choose login-gfx1.
I Launch any application requiring 3D (OpenGL) with vglrun.
I May need to adjust the compression level for your network
connection.

52 of 82
Using HPC: User Environment

I The CU cluster is based on the HPCS 22nd April 2015 image.


I Scientific Linux 6.6 (Red Hat Enterprise Linux 6.6 rebuild)
I bash
I GNOME2 desktop (if you want)
I Lustre 2.4.1 (patched), Mellanox OFED 2.4, CUDA 6.5
I But you dont need to know that.

53 of 82
Using HPC: User Environment

I The CU cluster is based on the HPCS 22nd April 2015 image.


Red Hat Enterprise Linux 6

CUDA 6
I But you dont need to know that.

53 of 82
Using HPC: User Environment

I The CU cluster is based on the HPCS 22nd April 2015 image.


Red Hat Enterprise Linux 6

CUDA 6
I But you dont need to know that. (Probably. . . )

53 of 82
Using HPC: User Environment

I The CU cluster is based on the HPCS 22nd April 2015 image.


Red Hat Enterprise Linux 6

CUDA 6
I But you dont need to know that. (Probably. . . )

53 of 82
User Environment: Filesystems

I /home/abc123
I 40GB soft quota (45GB hard).
I Visible equally from all nodes.
I Single storage server.
I Backed up nightly to tape.
I Not intended for job outputs or large/many input files.
I /scratch/abc123
I Visible equally from all nodes.
I Larger and faster.
I Intended for job inputs and outputs.
I Not backed up.

54 of 82
Filesystems: Quotas

I quota
====================================================================================
Usage on /home (lfs quota -u abc123 /home):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/home 24513908 41943040 47185920 - 75364 0 0 -
====================================================================================
Usage on /scratch (lfs quota -u abc123 /scratch):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/lustre1 5467644384 0 0 - 3864823 0 0 -
...

I Aim to stay below the soft limit (quota).


I Once over the soft limit, you have 7 days grace to return below.
I When the grace period expires, or you reach the hard limit (limit),
no more data can be written.
I It is important to rectify an out of quota condition ASAP.

55 of 82
Filesystems: Quotas

I quota
====================================================================================
Usage on /home (lfs quota -u abc123 /home):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/home *43567687 41943040 47185920 - 75364 0 0 -
====================================================================================
Usage on /scratch (lfs quota -u abc123 /scratch):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/lustre1 5467644384 0 0 - 3864823 0 0 -
...

I Aim to stay below the soft limit (quota).


I Once over the soft limit, you have 7 days grace to return below.
I When the grace period expires, or you reach the hard limit (limit),
no more data can be written.
I It is important to rectify an out of quota condition ASAP.

55 of 82
Filesystems: Quotas

I quota
====================================================================================
Usage on /home (lfs quota -u abc123 /home):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/home *43567687 41943040 47185920 - 75364 0 0 -
====================================================================================
Usage on /scratch (lfs quota -u abc123 /scratch):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/lustre1 5467644384 0 0 - 3864823 0 0 -
...

I Aim to stay below the soft limit (quota).


I Once over the soft limit, you have 7 days grace to return below.
I When the grace period expires, or you reach the hard limit (limit),
no more data can be written.
I It is important to rectify an out of quota condition ASAP.

55 of 82
Filesystems: Quotas

I quota
====================================================================================
Usage on /home (lfs quota -u abc123 /home):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/home *43567687 41943040 47185920 - 75364 0 0 -
====================================================================================
Usage on /scratch (lfs quota -u abc123 /scratch):
====================================================================================
Disk quotas for user abc123 (uid 456):
Filesystem kbytes quota limit grace files quota limit grace
/lustre1 5467644384 0 0 - 3864823 0 0 -
...

I Aim to stay below the soft limit (quota).


I Once over the soft limit, you have 7 days grace to return below.
I When the grace period expires, or you reach the hard limit (limit),
no more data can be written.
I It is important to rectify an out of quota condition ASAP.

55 of 82
Filesystems: Backups

I Tape backups normally commence at 22:00 every night.


I They are not an undelete - take care when deleting.
I Successful restoration depends on:
I The file having existed long enough to have been backed up at all.
I The last good version existing in a current backup.
I Request restoration as soon as possible with location and exact
time of loss.

56 of 82
Filesystems: Backups

I Tape backups normally commence at 22:00 every night.


I They are not an undelete - take care when deleting.
I Successful restoration depends on:
I The file having existed long enough to have been backed up at all.
I The last good version existing in a current backup.
I Request restoration as soon as possible with location and exact
time of loss.

56 of 82
Filesystems: Backups

I Tape backups normally commence at 22:00 every night.


I They are not an undelete - take care when deleting.
I Successful restoration depends on:
I The file having existed long enough to have been backed up at all.
I The last good version existing in a current backup.
I Request restoration as soon as possible with location and exact
time of loss.

56 of 82
Filesystems: Backups

I Tape backups normally commence at 22:00 every night.


I They are not an undelete - take care when deleting.
I Successful restoration depends on:
I The file having existed long enough to have been backed up at all.
I The last good version existing in a current backup.
I Request restoration as soon as possible with location and exact
time of loss.

56 of 82
Filesystems: Backups

I Tape backups normally commence at 22:00 every night.


I They are not an undelete - take care when deleting.
I Successful restoration depends on:
I The file having existed long enough to have been backed up at all.
I The last good version existing in a current backup.
I Request restoration as soon as possible with location and exact
time of loss.
I Scratch files are not backed up.

56 of 82
Filesystems: Permissions

I Be careful and if unsure, please ask support@hpc.


I Can lead to accidental destruction of your data or account
compromise.
I Avoid changing the permissions on your home directory.
I Files under /home are particularly security sensitive.
I Easy to break passwordless communication between nodes.

57 of 82
Using HPC: Software

I Free software accompanying Red Hat Enterprise 6 is (or can be)


provided.
I Other software (free and non-free) is available via modules.
I Some proprietary software may not be generally accessible.
I See http://www.hpc.cam.ac.uk/using-clusters/software.
I New software may be possible to provide on request.
I Self-installed software must be properly licensed.

58 of 82
User Environment: Environment Modules

I Modules load or unload additional software packages.


I Some are required and automatically loaded on login.
I Others are optional extras, or possible replacements for other
modules.
I Beware unloading default modules in/.bashrc.
I Beware overwriting environment variables such as PATH and
LD LIBRARY PATH in/.bashrc. If necessary append or prepend.

59 of 82
User Environment: Environment Modules

I Currently loaded:

module list
Currently Loaded Modulefiles:
1) dot 6) intel/impi/4.1.3.045 11) default-impi
2) scheduler 7) global
3) java/jdk1.7.0_60 8) intel/cce/12.1.10.319
4) turbovnc/1.1 9) intel/fce/12.1.10.319
5) vgl/2.3.1/64 10) intel/mkl/10.3.10.319

I Available:

module av

60 of 82
User Environment: Environment Modules

I Currently loaded:

module list
Currently Loaded Modulefiles:
1) dot 4) turbovnc/1.1 7) global
2) scheduler 5) vgl/2.3.1/64 8) use.own
3) java/jdk1.7.0_60 6) openmpi/gcc/1.8.6 9) default-ompi

I Available:

module av

60 of 82
User Environment: Environment Modules

I Show:
module show castep/impi/7.0.3
-------------------------------------------------------------------
/usr/local/Cluster-Config/modulefiles/castep/impi/7.0.3:

module-whatis adds CASTEP 7.0.3 (Intel MPI) to your environment

Note that this software is restricted to registered users.

prepend-path PATH /usr/local/Cluster-Apps/castep/impi/7.0.3/bin:/usr/local/...


-------------------------------------------------------------------

I Load:

module load castep/impi/7.0.3

I Unload:

module unload castep/impi/7.0.3

61 of 82
User Environment: Environment Modules

I Purge:

module purge

I Defaults:

module show default-impi


module unload default-impi
module load default-impi-LATEST

I Run time environment must match compile time environment.

62 of 82
User Environment: Compilers

Intel: icc, icpc, ifort (recommended)

icc -O3 -xHOST -ip code.c -o prog


mpicc -O3 -xHOST -ip mpi_code.c -o mpi_prog

GCC: gcc, g++, gfortran

gcc -O3 -mtune=native code.c -o prog


mpicc -cc=gcc -O3 -mtune=native mpi_code.c -o mpi_prog

PGI: pgcc, pgCC, pgf90

pgcc -O3 -tp=sandybridge code.c -o prog


mpicc -cc=pgcc -O3 -tp=sandybridge mpi_code.c -o mpi_prog

Exercise 4: Modules and Compilers

63 of 82
User Environment: Compilers

Intel: icc, icpc, ifort (recommended)

icc -O3 -xHOST -ip code.c -o prog


mpicc -O3 -xHOST -ip mpi_code.c -o mpi_prog

GCC: gcc, g++, gfortran

gcc -O3 -mtune=native code.c -o prog


mpicc -cc=gcc -O3 -mtune=native mpi_code.c -o mpi_prog

PGI: pgcc, pgCC, pgf90

pgcc -O3 -tp=sandybridge code.c -o prog


mpicc -cc=pgcc -O3 -tp=sandybridge mpi_code.c -o mpi_prog

Exercise 4: Modules and Compilers

63 of 82
Using HPC: Job Submission

64 of 82
Using HPC: Job Submission

I Compute resources are managed by a scheduler:


SLURM/PBS/SGE/LSF/. . .
I Jobs are submitted to the scheduler
analogous to submitting jobs to a print queue.

65 of 82
Using HPC: Job Submission

I Jobs are submitted from the login nodes


not themselves managed by the scheduler.
I Jobs may be either non-interactive (batch) or interactive.
I Batch jobs run a shell script on the first of a list of allocated nodes.
I Interactive jobs provide a command line on the first of a list of
allocated nodes.

66 of 82
Using HPC: Job Submission

I Jobs are submitted from the login nodes


not themselves managed by the scheduler.
I Jobs may be either non-interactive (batch) or interactive.
I Batch jobs run a shell script on the first of a list of allocated nodes.
I Interactive jobs provide a command line on the first of a list of
allocated nodes.

66 of 82
Using HPC: Job Submission

I The HPCS moved away from Torque (a form of PBS) to SLURM


in February 2014.
I The CU cluster scheduler currently imitates the HPCS clusters
the single partition is blade (replaces sandybridge/tesla).
I The HPCS dedicates entire nodes to each job
the owner receives exclusive access.
I Template submission scripts are available.

67 of 82
Job Submission: Using SLURM or PBS

I SLURM

[abc123@login]$ sbatch slurm_submission_script


Submitted batch job 790299

I PBS (Torque, OpenPBS, PBS Pro)

[abc123@login]$ qsub pbs_submission_script


790299.master.cluster

68 of 82
Job Submission: Show Queue

I SLURM
[abc123@login]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
790299 sandybrid Test3 abc123 PD 0:00 8 (Priority)
790290 sandybrid Test2 abc123 R 27:56:10 8 sand-6-[38-40],sand-7-[27-31]

I PBS (Torque, OpenPBS, PBS Pro)


[abc123@login]$ qstat -u abc123
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ ----- - -----
790290.master.cl abc123 tesla Test2 5519 8 32 248000 36:00 R 27:56
790281.master.cl abc123 tesla Test1 31905 4 16 124000 36:00 C 26:17
790299.master.cl abc123 tesla Test3 -- 8 32 248000 36:00 Q --

69 of 82
Job Submission: Show Queue

I SLURM
[abc123@login]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
790299 sandybrid Test3 abc123 PD 0:00 8 (Resources)
790290 sandybrid Test2 abc123 R 27:56:10 8 sand-6-[38-40],sand-7-[27-31]

I PBS (Torque, OpenPBS, PBS Pro)


[abc123@login]$ qstat -u abc123
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ ----- - -----
790290.master.cl abc123 tesla Test2 5519 8 32 248000 36:00 R 27:56
790281.master.cl abc123 tesla Test1 31905 4 16 124000 36:00 C 26:17
790299.master.cl abc123 tesla Test3 -- 8 32 248000 36:00 Q --

69 of 82
Job Submission: Monitor Job

I SLURM

[abc123@login]$ scontrol show job=790299

I PBS (Torque, OpenPBS, PBS Pro)

[abc123@login]$ qstat -f 790299

70 of 82
Job Submission: Cancel Job

I SLURM

[abc123@login]$ scancel 790299

I PBS (Torque, OpenPBS, PBS Pro)

[abc123@login]$ qdel 790299

71 of 82
Job Submission: Scripts

I SLURM
See slurm submit.darwin, slurm submit.wilkes.
#!/bin/bash
#! Name of the job:
#SBATCH -J darwinjob
#! Which project should be charged:
#SBATCH -A CHANGEME
#! How many whole nodes should be allocated?
#SBATCH --nodes=2
#! How many (MPI) tasks will there be in total? (<= nodes*16)
#SBATCH --ntasks=32
#! How much wallclock time will be required?
#SBATCH --time=02:00:00
#! Select partition:
#SBATCH -p sandybridge
...

I #SBATCH lines are structured comments


correspond to sbatch command line options.

72 of 82
Job Submission: Scripts

I SLURM
See slurm submit.darwin, slurm submit.wilkes.
#!/bin/bash
#! Name of the job:
#SBATCH -J darwinjob
#! Which project should be charged:
#SBATCH -A CHANGEME
#! How many whole nodes should be allocated?
#SBATCH --nodes=2
#! How many (MPI) tasks will there be in total? (<= nodes*16)
#SBATCH --ntasks=32
#! How much wallclock time will be required?
#SBATCH --time=02:00:00
#! Select partition:
#SBATCH -p sandybridge
...

I #SBATCH lines are structured comments


correspond to sbatch command line options.

72 of 82
Job Submission: Scripts

I PBS (Torque, OpenPBS, PBS Pro)


#!/bin/bash
#! Name of the job:
#PBS -N darwinjob
#! Which project should be charged:
#PBS -A CHANGEME
#! How many nodes, cores per node, memory and wall-clock time should be allocated?
#PBS -l nodes=8:ppn=16,mem=512000mb,walltime=02:00:00
#! Select queue:
#PBS -q sandybridge
...

I #PBS lines are structured comments


correspond to qsub command line options.

73 of 82
Job Submission: Scripts

I PBS (Torque, OpenPBS, PBS Pro)


#!/bin/bash
#! Name of the job:
#PBS -N darwinjob
#! Which project should be charged:
#PBS -A CHANGEME
#! How many nodes, cores per node, memory and wall-clock time should be allocated?
#PBS -l nodes=8:ppn=16,mem=512000mb,walltime=02:00:00
#! Select queue:
#PBS -q sandybridge
...

I #PBS lines are structured comments


correspond to qsub command line options.

73 of 82
Job Submission: Accounting Commands [HPCS]
I How many core hours available do I have?
mybalance

User Usage | Account Usage | Account Limit Available (CPU hrs)


---------- --------- + -------------- --------- + ------------- ---------
abc123 18 | STARS 171 | 100,000 99,829
abc123 18 | STARS-SL2 35 | 101,000 100,965
abc123 925 | BLACKH 10,634 | 166,667 156,033

I How many core hours does some other project or user have?
gbalance -p HALOS

User Usage | Account Usage | Account Limit Available (CPU hrs)


---------- --------- + --------- --------- + ------------- ---------

pq345 0 | HALOS 317,656 | 600,000 282,344


xyz10 11,880 | HALOS 317,656 | 600,000 282,344

(Use -u for user.)

I List all jobs charged to a project/user between certain times:


gstatement -p HALOS -u xyz10 -s "2014-01-01-00:00:00" -e "2014-01-20-00:00:00"
JobID User Account JobName Partition End NCPUS CPUTimeRAW ExitCode State
------------ --------- ---------- -------- ---------- ------------------- ---------- ---------- -------- ----------
14505 xyz10 halos help sandybrid+ 2014-01-07T12:59:40 16 32 0:9 COMPLETED
14506 xyz10 halos help sandybrid+ 2014-01-07T13:00:11 16 48 2:0 FAILED
...

74 of 82
Job Submission: Single Node Jobs

I Serial jobs requiring large memory, or OpenMP codes.

#!/bin/bash
...
#SBATCH --nodes=1
...
export OMP NUM THREADS= # For OpenMP across cores.
options=<specific option for multithreading>
$application $options
...

75 of 82
Job Submission: Single Node Jobs

I Serial jobs requiring large memory, or OpenMP codes.

#!/bin/bash
...
#SBATCH --nodes=1
...
export OMP NUM THREADS=16 # For OpenMP across 16 cores.
options=<specific option for multithreading>
$application $options
...

75 of 82
Job Submission: Single Node Jobs

I Serial jobs requiring large memory, or OpenMP codes.

#!/bin/bash
...
#SBATCH --nodes=1
...
export OMP NUM THREADS=8 # For OpenMP across 8 cores.
options=<specific option for multithreading>
$application $options
...

75 of 82
Job Submission: Single Node Jobs

I Serial jobs requiring large memory, or OpenMP codes.

#!/bin/bash
...
#SBATCH --nodes=1
...
export OMP NUM THREADS= # For OpenMP across cores.
options=<specific option for multithreading>
$application $options
...

75 of 82
Job Submission: MPI Jobs

I Parallel job across multiple nodes.

#!/bin/bash
...
#SBATCH --nodes=4
#SBATCH --ntasks=64 # i.e. 16x4 MPI tasks in total.
...
mpirun -np 64 $application $options
...
I SLURM-aware MPI launches remote tasks via SLURM.
I The template script uses $SLURM TASKS PER NODE to set PPN.

76 of 82
Job Submission: MPI Jobs

I Parallel job across multiple nodes.

#!/bin/bash
...
#SBATCH --nodes=4
#SBATCH --ntasks=32 # i.e. 8x4 MPI tasks in total.
...
mpirun -ppn 8 -np 32 $application $options
...
I SLURM-aware MPI launches remote tasks via SLURM.
I The template script uses $SLURM TASKS PER NODE to set PPN.

76 of 82
Job Submission: MPI Jobs

I Parallel job across multiple nodes.

#!/bin/bash
...
#SBATCH --nodes=4
#SBATCH --ntasks=32 # i.e. 8x4 MPI tasks in total.
...
mpirun -ppn 8 -np 32 $application $options
...
I SLURM-aware MPI launches remote tasks via SLURM.
I The template script uses $SLURM TASKS PER NODE to set PPN.

76 of 82
Job Submission: Hybrid Jobs

I Parallel jobs using both MPI and OpenMP.

#!/bin/bash
...
#SBATCH --nodes=4
#SBATCH --ntasks=32 # i.e. 8x4 MPI tasks in total.
...
export OMP NUM THREADS=2 # i.e. 2 threads per MPI task.
mpirun -ppn 8 -np 32 $application $options
...
I This job uses 64 cores (each MPI task splits into 2 OpenMP threads).

77 of 82
Job Submission: Hybrid Jobs

I Parallel jobs using both MPI and OpenMP.

#!/bin/bash
...
#SBATCH --nodes=4
#SBATCH --ntasks=32 # i.e. 8x4 MPI tasks in total.
...
export OMP NUM THREADS=2 # i.e. 2 threads per MPI task.
mpirun -ppn 8 -np 32 $application $options
...
I This job uses 64 cores (each MPI task splits into 2 OpenMP threads).

77 of 82
Job Submission: High Throughput Jobs

I Multiple serial jobs across multiple nodes.


I Use srun to launch tasks (job steps) within a job.

#!/bin/bash
...
#SBATCH --nodes=4
...
cd directory for job1
srun --exclusive -N 1 -n 1 $application $options for job1 > output 2> error &
cd directory for job2
srun --exclusive -N 1 -n 1 $application $options for job2 > output 2> error &
...
cd directory for job64
srun --exclusive -N 1 -n 1 $application $options for job64 > output 2> error
wait
I Exercise 5 & 6 - Submitting Jobs.

78 of 82
Job Submission: High Throughput Jobs

I Multiple serial jobs across multiple nodes.


I Use srun to launch tasks (job steps) within a job.

#!/bin/bash
...
#SBATCH --nodes=4
...
cd directory for job1
srun --exclusive -N 1 -n 1 $application $options for job1 > output 2> error &
cd directory for job2
srun --exclusive -N 1 -n 1 $application $options for job2 > output 2> error &
...
cd directory for job64
srun --exclusive -N 1 -n 1 $application $options for job64 > output 2> error
wait
I Exercise 5 & 6 - Submitting Jobs.

78 of 82
Job Submission: High Throughput Jobs

I Multiple serial jobs across multiple nodes.


I Use srun to launch tasks (job steps) within a job.

#!/bin/bash
...
#SBATCH --nodes=4
...
cd directory for job1
srun --exclusive -N 1 -n 1 $application $options for job1 > output 2> error &
cd directory for job2
srun --exclusive -N 1 -n 1 $application $options for job2 > output 2> error &
...
cd directory for job64
srun --exclusive -N 1 -n 1 $application $options for job64 > output 2> error
wait
I Exercise 5 & 6 - Submitting Jobs.

78 of 82
Job Submission: High Throughput Jobs

I Multiple serial jobs across multiple nodes.


I Use srun to launch tasks (job steps) within a job.

#!/bin/bash
...
#SBATCH --nodes=4
...
cd directory for job1
srun --exclusive -N 1 -n 1 $application $options for job1 > output 2> error &
cd directory for job2
srun --exclusive -N 1 -n 1 $application $options for job2 > output 2> error &
...
cd directory for job64
srun --exclusive -N 1 -n 1 $application $options for job64 > output 2> error
wait
I Exercise 5 & 6 - Submitting Jobs.

78 of 82
Job Submission: High Throughput Jobs

I Multiple serial jobs across multiple nodes.


I Use srun to launch tasks (job steps) within a job.

#!/bin/bash
...
#SBATCH --nodes=4
...
cd directory for job1
srun --exclusive -N 1 -n 1 $application $options for job1 > output 2> error &
cd directory for job2
srun --exclusive -N 1 -n 1 $application $options for job2 > output 2> error &
...
cd directory for job64
srun --exclusive -N 1 -n 1 $application $options for job64 > output 2> error
wait
I Exercise 5 & 6 - Submitting Jobs.

78 of 82
Job Submission: Interactive [HPCS]

I Compute nodes are accessible via SSH while you have a job
running on them.
I Alternatively, submit an interactive job:
sintr -A MYPROJECT -p sandybridge -N2 -t 2:0:0

I Within the window (screen session):


Launches a shell on the first node (when the job starts).
Graphical applications should display correctly.
Create new shells with ctrl-a c, navigate with ctrl-a n and ctrl-a p.
ssh or srun can be used to start processes on any nodes in the job.
SLURM-aware MPI will do this automatically.

79 of 82
Job Submission: Interactive [HPCS]

I Compute nodes are accessible via SSH while you have a job
running on them.
I Alternatively, submit an interactive job:
sintr -A MYPROJECT -p sandybridge -N2 -t 2:0:0

I Within the window (screen session):


Launches a shell on the first node (when the job starts).
Graphical applications should display correctly.
Create new shells with ctrl-a c, navigate with ctrl-a n and ctrl-a p.
ssh or srun can be used to start processes on any nodes in the job.
SLURM-aware MPI will do this automatically.

79 of 82
Job Submission: Interactive [HPCS]

I Compute nodes are accessible via SSH while you have a job
running on them.
I Alternatively, submit an interactive job:
sintr -A MYPROJECT -p sandybridge -N2 -t 2:0:0

I Within the window (screen session):


Launches a shell on the first node (when the job starts).
Graphical applications should display correctly.
Create new shells with ctrl-a c, navigate with ctrl-a n and ctrl-a p.
ssh or srun can be used to start processes on any nodes in the job.
SLURM-aware MPI will do this automatically.

79 of 82
Job Submission: Array Jobs
I This feature varies between versions.
I http : //slurm.schedmd.com/job array .html
I Used for submitting and managing large sets of similar jobs.
I Each job in the array has the same initial options.
I SLURM
[abc123@login]$ sbatch --array=1-7 -A STARS-SL2 submission script
Submitted batch job 791609
[abc123@login-sand2]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
791609 1 sandybrid hpl abc123 R 0:06 1 sand-6-32
791609 3 sandybrid hpl abc123 R 0:06 1 sand-6-37
791609 5 sandybrid hpl abc123 R 0:06 1 sand-6-59
791609 7 sandybrid hpl abc123 R 0:06 1 sand-7-27

791609 1, 791609 3, 791609 5, 791609 7


i.e. ${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
SLURM ARRAY JOB ID = SLURM JOBID for the first element.

80 of 82
Job Submission: Array Jobs
I This feature varies between versions.
I http : //slurm.schedmd.com/job array .html
I Used for submitting and managing large sets of similar jobs.
I Each job in the array has the same initial options.
I SLURM
[abc123@login]$ sbatch --array=1-7:2 -A STARS-SL2 submission script
Submitted batch job 791609
[abc123@login-sand2]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
791609 1 sandybrid hpl abc123 R 0:06 1 sand-6-32
791609 3 sandybrid hpl abc123 R 0:06 1 sand-6-37
791609 5 sandybrid hpl abc123 R 0:06 1 sand-6-59
791609 7 sandybrid hpl abc123 R 0:06 1 sand-7-27

791609 1, 791609 3, 791609 5, 791609 7


i.e. ${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
SLURM ARRAY JOB ID = SLURM JOBID for the first element.

80 of 82
Job Submission: Array Jobs
I This feature varies between versions.
I http : //slurm.schedmd.com/job array .html
I Used for submitting and managing large sets of similar jobs.
I Each job in the array has the same initial options.
I SLURM
[abc123@login]$ sbatch --array=1,3,5,7 -A STARS-SL2 submission script
Submitted batch job 791609
[abc123@login-sand2]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
791609 1 sandybrid hpl abc123 R 0:06 1 sand-6-32
791609 3 sandybrid hpl abc123 R 0:06 1 sand-6-37
791609 5 sandybrid hpl abc123 R 0:06 1 sand-6-59
791609 7 sandybrid hpl abc123 R 0:06 1 sand-7-27

791609 1, 791609 3, 791609 5, 791609 7


i.e. ${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
SLURM ARRAY JOB ID = SLURM JOBID for the first element.

80 of 82
Job Submission: Array Jobs
I This feature varies between versions.
I http : //slurm.schedmd.com/job array .html
I Used for submitting and managing large sets of similar jobs.
I Each job in the array has the same initial options.
I SLURM
[abc123@login]$ sbatch --array=1,3,5,7 -A STARS-SL2 submission script
Submitted batch job 791609
[abc123@login-sand2]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
791609 1 sandybrid hpl abc123 R 0:06 1 sand-6-32
791609 3 sandybrid hpl abc123 R 0:06 1 sand-6-37
791609 5 sandybrid hpl abc123 R 0:06 1 sand-6-59
791609 7 sandybrid hpl abc123 R 0:06 1 sand-7-27

791609 1, 791609 3, 791609 5, 791609 7


i.e. ${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
SLURM ARRAY JOB ID = SLURM JOBID for the first element.

80 of 82
Job Submission: Array Jobs
I This feature varies between versions.
I http : //slurm.schedmd.com/job array .html
I Used for submitting and managing large sets of similar jobs.
I Each job in the array has the same initial options.
I SLURM
[abc123@login]$ sbatch --array=1,3,5,7 -A STARS-SL2 submission script
Submitted batch job 791609
[abc123@login-sand2]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
791609 1 sandybrid hpl abc123 R 0:06 1 sand-6-32
791609 3 sandybrid hpl abc123 R 0:06 1 sand-6-37
791609 5 sandybrid hpl abc123 R 0:06 1 sand-6-59
791609 7 sandybrid hpl abc123 R 0:06 1 sand-7-27

791609 1, 791609 3, 791609 5, 791609 7


i.e. ${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
SLURM ARRAY JOB ID = SLURM JOBID for the first element.

80 of 82
Job Submission: Array Jobs
I This feature varies between versions.
I http : //slurm.schedmd.com/job array .html
I Used for submitting and managing large sets of similar jobs.
I Each job in the array has the same initial options.
I SLURM
[abc123@login]$ sbatch --array=1,3,5,7 -A STARS-SL2 submission script
Submitted batch job 791609
[abc123@login-sand2]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
791609 1 sandybrid hpl abc123 R 0:06 1 sand-6-32
791609 3 sandybrid hpl abc123 R 0:06 1 sand-6-37
791609 5 sandybrid hpl abc123 R 0:06 1 sand-6-59
791609 7 sandybrid hpl abc123 R 0:06 1 sand-7-27

791609 1, 791609 3, 791609 5, 791609 7


i.e. ${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
SLURM ARRAY JOB ID = SLURM JOBID for the first element.

80 of 82
Job Submission: Array Jobs
I This feature varies between versions.
I http : //slurm.schedmd.com/job array .html
I Used for submitting and managing large sets of similar jobs.
I Each job in the array has the same initial options.
I SLURM
[abc123@login]$ sbatch --array=1,3,5,7 -A STARS-SL2 submission script
Submitted batch job 791609
[abc123@login-sand2]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
791609 1 sandybrid hpl abc123 R 0:06 1 sand-6-32
791609 3 sandybrid hpl abc123 R 0:06 1 sand-6-37
791609 5 sandybrid hpl abc123 R 0:06 1 sand-6-59
791609 7 sandybrid hpl abc123 R 0:06 1 sand-7-27

791609 1, 791609 3, 791609 5, 791609 7


i.e. ${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
SLURM ARRAY JOB ID = SLURM JOBID for the first element.

80 of 82
Job Submission: Array Jobs (ctd)

I Updates can be applied to specific array elements using


${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
I Alternatively operate on the entire array via
${SLURM ARRAY JOB ID}.
I Some commands still require the SLURM JOB ID (sacct, sreport,
sshare, sstat and a few others).
I Exercise 7 - Array Jobs.

81 of 82
Job Submission: Array Jobs (ctd)

I Updates can be applied to specific array elements using


${SLURM ARRAY JOB ID} ${SLURM ARRAY TASK ID}
I Alternatively operate on the entire array via
${SLURM ARRAY JOB ID}.
I Some commands still require the SLURM JOB ID (sacct, sreport,
sshare, sstat and a few others).
I Exercise 7 - Array Jobs.

81 of 82
Job Submission: Scheduling Top Dos & Donts

I Do . . .
I Give reasonably accurate wall times (allows backfilling).
I Check your balance occasionally (mybalance).
I Test on a small scale first.
I Implement checkpointing if possible (reduces resource wastage).

I Dont . . .
I Request more cores than you need
you will wait longer and use more credits.
I Cancel jobs unnecessarily
priority increases over time.

82 of 82

S-ar putea să vă placă și