Documente Academic
Documente Profesional
Documente Cultură
Research Computing
Library & Technology Services
https://researchcomputing.lehigh.edu
About Us?
➤ Who?
❏ Unit of Lehigh's Library & Technology Services within the Center for Innovation in
Teaching & Learning
➤ Our Mission
❏ We enable Lehigh Faculty, Researchers and Scholars achieve their goals by providing
various computational resources; hardware, software, and storage; consulting and
training.
➤ Research Computing Sta
2/91
Background and De ntions
➤ Computational Science and Engineering
➤ HPC
3/91
Why use HPC?
➤ HPC may be the only way to achieve computational goals in a given amount of time
❏ Size: Many problems that are interesting to scientists and engineers cannot t on a PC
usually because they need more than a few GB of RAM, or more than a few hundred GB
of disk.
❏ Speed: Many problems that are interesting to scientists and engineers would take a very
long time to run on a PC: months or even years; but a problem that would take a month
on a PC might only take a few hours on a supercomputer
4/91
Parallel Computing
➤ many calculations are carried out simultaneously
➤ based on principle that large problems can often be divided into smaller ones, which are
then solved in parallel
➤ Parallel computers can be roughly classi ed according to the level at which the hardware
supports parallelism.
❏ Multicore computing
❏ Symmetric multiprocessing
❏ Distributed computing
❏ Grid computing
5/91
What does HPC do?
➤ Simulation of Physical Phenomena ➤ Design
❏ Fraud detection
➤ Visualization
6/91
HPC by Disciplines
➤ Traditional Disciplines
❏ Finance
❆ Preditive Analytics
❆ Trading
❏ Humanities
7/91
What is Linux?
➤ Linux is an operating system that evolved from a kernel created by Linus Torvalds when he
was a student at the University of Helsinki.
➤ It’s meant to be used as an alternative to other operating systems, Windows, Mac OS, MS-
DOS, Solaris and others.
Unix 2 .4
➤ If you are using a Supercomputer/High Performance Computer for your research, it will be
based on a *nix OS.
➤ It is required/neccessary/mandatory to learn Linux Programming (commands, shell
scripting) if your research involves use of High Performance Computing or Supercomputing
resources.
8/91
Di erence between Shell and Command
➤ What is a Shell?
❏ The command line interface is the primary interface to Linux/Unix operating systems.
❏ Each shell has varying capabilities and features and the user should choose the shell that best suits
their needs.
❏ The shell is simply an application running on top of the kernel and provides a powerful interface to
the system.
➤ What is a command and how do you use it?
❏ command prompt (or just prompt) is a sequence of (one or more) characters used in a command-
line interface to indicate readiness to accept commands.
❏ A prompt usually ends with one of the characters $, %, #, :, > and often includes other information,
such as the path of the current working directory.
9/91
Types of Shell
➤ sh : Bourne Shell
➤ csh : C Shell
❏ backward-compatible with the Bourne shell and includes many features of the C shell
❏ Developed by Brian Fox for the GNU Project as a free software replacement for the Bourne shell
(sh).
❏ Default Shell on Linux and Mac OSX
❏ It is essentially the C shell with programmable command line completion, command-line editing,
and a few other features.
10/91
Directory Structure
➤ All les are arranged in a hierarchial structure, like an inverted tree.
11/91
Relative & Absolute Path
➤ Path means a position in the directory tree.
❏ the path is not de ned uniquely and does depend on the current path.
❏ the path is de ned uniquely and does not depend on the current path
12/91
Variables
➤ Linux permits the use of variables, similar to any programming language such as C,C++,
Fortran etc
➤ A variable is a named object that contains data used by one or more applications.
➤ There are two types of variables, Environment and User De ned and can contain anumber,
character or a string of characters.
➤ Environment Variables provides a simple way to share con guration settings between
multiple applications and processes in Linux.
➤ By Convention, enviromental variables are often named using all uppercase letters.
➤ To reference a variable (environment or user de ned) prepend $ to the name of the variable
➤ Its a good practice to protect your variable name within {...} such as ${PATH} when
referencing it. (We’ll see an example in a few slides)
❏ $PATH, ${LD_LIBRARY_PATH}
13/91
Variables (contd)
➤ Rules for Variable Names
4. Case sensitive
➤ Examples
❏ Shell: myvar=somevalue
➤ You can convert any Shell variable to Environment variable using the export command
14/91
Linux Commands
➤ man shows the manual for a command or program.
❏ man pwd
➤ pwd: print working directory, gives the absolute path of your current location in the directory
hierarchy
➤ cd dirname: change to folder called dirname
❏ If you omit directory name, you will end up in your home directory
❏ Enter cd
15/91
Linux Command (contd)
➤ cp file1 file2: command to copy le1 to le2
❏ You can use absolute or relative path for the source and destination cp ../file .
❏ If you need to copy over a directory and its contents, add a -r flag
❏ cp -r /home/alp514/sum2017 .
❏ If you provide a directory path as an argument, then the contents of that directory will be listed
❏ echo $HOME: prints the contents of the variable HOME i.e. your home directory to the screen
16/91
File Editing
➤ The two most commonly used editors on Linux/Unix systems are:
❏ emacs
➤ vi/vim is installed by default on Linux/Unix systems and has only a command line interface
(CLI).
➤ emacs has both a CLI and a graphical user interface (GUI).
17/91
vi/emacs commands
INSERTING/APPENDING TEXT VI CURSOR MOVEMENT VI EMACS
COMMAND COMMAND COMMAND
18/91
vi/emacs commands
FILE MANIPULATION VI EMACS FILE MANIPULATION VI EMACS
COMMAND COMMAND COMMAND COMMAND
save file :w C‑x C‑s replace a character r
undo edit u C‑_ or C‑x u edit/open file :e file C‑x C‑f file
end of line
19/91
File Permission
➤ In *NIX OS’s, you have three types of ## -rw-r--r-- 1 apacheco staff 53423 Feb 13 2018 Rplot.jpeg
## drwxr-xr-x 8 apacheco staff 256 Feb 26 2017 USmap
le permissions ## drwxr-xr-x 7 apacheco staff 224 Feb 26 2017 assets
❏ read (r)
➤ The rst character signi es type of le
❏ write (w)
❏ d: directory
❏ execute (x)
❏ -: regular le
➤ for three types of users
❏ l: symbolic link
❏ user
➤ The next three characters of rst triad
❏ group signi es what the owner can do
❏ world i.e. everyone else who has ➤ The second triad signi es what group
access to the system member can do
⏞
g
➤ The third triad signi es what everyone
d rwx r − x r − x
else can do
⏟
u ⏟
o
20/91
File Permission
➤ Read carries a weight of 4 ➤ Instead of using numerical permissions
21/91
Scripting Language
➤ A scripting language or script language is a programming language that supports the writing
of scripts.
➤ Scripting Languages tend to be good for automating the execution of other programs.
❏ analyzing data
➤ They are also good for writing a program that is going to be used only once and then
discarded.
➤ A script is a program written for a software environment that automate the execution of
tasks which could alternatively be executed one-by-one by a human operator.
➤ The majority of script programs are “quick and dirty”, where the main goal is to get the
program written quickly.
22/91
Writing your rst script
➤ Write a script
#!/bin/bash
# My First Script
echo "Hello World!"
➤ Set permissions
chmod +x hello.sh
./hello.sh
## Hello World !
23/91
Breaking down the script
➤ My First Script
## #!/bin/bash
## # My First Script
## echo "Hello World!"
➤ The rst line is called the ”ShaBang” line. It tells the OS which interpreter to use.
❏ sh : #!/bin/sh
❏ ksh : #!/bin/ksh
❏ csh : #!/bin/csh
❏ tcsh: #!/bin/tcsh
➤ The third line tells the OS to print ”Hello World!” to the screen.
24/91
Quotation
➤ Double Quotation " "
➤ Back Quotation ` `
myvar=hello
myname=Alex
echo "$myvar $myname"
echo '$myvar $myname'
echo $(pwd)
## hello Alex
## $myvar $myname
## /Users/apacheco/Tutorials/DCVS/bitbucket/lurc
25/91
Arithmetic Operations
OPERATION OPERATOR EXAMPLE
Addition + $((1+2))
Multiplication * $[$a*$b]
Exponentiation ** $[2**3]
26/91
Declare command
➤ Use the declare command to set variable and functions attributes.
❏ declare -r var
❏ declare -r varName=value
❏ declare -i var
❏ declare -i varName=value
j=10/5 ; echo $j
## 10/5
## 2
27/91
Flow Control
➤ Shell Scripting Languages execute commands in sequence similar to programming
languages such as C, Fortran, etc.
❏ Conditionals: if
28/91
Conditionals: if
➤ An if ... then construct tests whether the exit status of a condition is 0, and if so, executes one or more commands.
if [ condition1 ]
then
some commands
fi
➤ if the exit status of a condition is non-zero, bash allows a if ... then ... else construct
if [ condition1 ]; then
some commands
else
some commands
fi
➤ You can also add another if statement i.e. else if or elif if you want to check another conditional
if [ condition1 ]; then
some commands
elif [ condition2 ]; then
some commands
else
some commands
fi
29/91
Comparison operators
➤ integers and strings
equal to if [ 1 -eq 2 ] if [ $a == $b ]
30/91
Other Operators
➤ File Test Operators
OPERATION EXAMPLE
➤ Logical Operators
OPERATION EXAMPLE
NOT if [ ! -e .bashrc ]
OR if [[ $a -gt 0 ∣∣ $x -lt 5 ]]
31/91
Nested ifs
➤ if statements can be nested and/or simplied using logical operators
if [ $a -gt 0 ]; then
if [ $a -lt 5 ]; then
echo "The value of a lies somewhere between 0 and 5"
fi
fi
if [[ $a -gt 0 || $x -lt 5 ]]
echo "a is positive or x is less than 5"
fi
32/91
Loops
➤ A loop is a block of code that iterates a list of commands.
➤ while (until) tests for a condition at the top of a loop, and keeps looping as long as that
condition is true (false)
while [ condition ]; do
some commands
done
until [ condition ]; do
some commands
done
33/91
Loop Example
#!/bin/bash bash factorial.sh << EOF
echo -n " Enter a number less than 10: " 10
read counter EOF
factorial=1
for i in $(seq 1 $counter); do
let factorial*=$i
## Enter a number less than 10:3628800
done
echo $factorial
#!/bin/bash
echo -n " Enter a number less than 10: " ## Enter a number less than 10: 3628800
read counter
factorial=1
until [ $counter -le 1 ]; do
factorial=$[ $factorial * $counter ]
let counter-=1
done
echo $factorial
34/91
Arrays
➤ Array elements may be initialized with the variable[xx] notation variable[xx]=1
35/91
Command Line Arguments
➤ Similar to programming languages, bash (and other shell scripting languages) can also take
command line arguments
❏ ./scriptname,arg1,arg2,arg3,arg4,... respectively
❏ for more than 9 arguments, position parameters need to be protected for e.g. ${10} OR
❏ use shift N: shift positional parameters from N+1 to $# are renamed to variable
names from $1 to $# - N + 1
36/91
Example - Command Line Arguments
#!/bin/bash
USAGE="USAGE: $0 <at least 1 argument>"
if [[ "$#" -lt 1 ]]; then
echo $USAGE
exit
fi
echo "Number of Arguments: " $#
echo "List of Arguments: " $@
echo "Name of script that you are running: " $0
echo "Command You Entered:" $0 $*
while [ "$#" -gt 0 ]; do
echo "Argument List is: " $@
echo "Number of Arguments: " $#
shift
done
sh ./shift.sh $(seq 1 3)
## Number of Arguments: 3
## List of Arguments: 1 2 3
## Name of script that you are running: ./shift.sh
## Command You Entered: ./shift.sh 1 2 3
## Argument List is: 1 2 3
## Number of Arguments: 3
## Argument List is: 2 3
## Number of Arguments: 2
## Argument List is: 3
## Number of Arguments: 1
37/91
Functions
➤ Like ”real” programming languages, bash has functions.
➤ A function is a subroutine, a code block that implements a set of operations, a ”black box”
that performs a speci ed task.
➤ Wherever there is repetitive code, when a task repeats with only slight variations in
procedure, then consider using a function.
function function_name {
somecommands
}
OR
function_name () {
some commands
}
38/91
Example Function
#!/bin/bash sh ./shift10.sh $(seq 1 10)
usage () {
echo "USAGE: $0 [atleast 11 arguments]"
## USAGE: ./shift10.sh [atleast 11 arguments]
exit
}
39/91
Function Arguments
➤ You can also pass arguments to a function.
➤ All function parameters or arguments can be accessed via $1, $2, $3,..., $N.
➤ Array variable called FUNCNAME contains the names of all shell functions currently in the
execution call stack.
➤ By default all variables are global.
local var=value
local varName
40/91
Recursive Function
➤ A function may recursively call itself.
factorial() {
## Factorial of 1 is 1
local i=$1 ; local f
## Factorial of 3 is 6
declare -i i ; declare -i f
## Factorial of 5 is 120
## Factorial of 7 is 5040
if [[ "$i" -le 2 && "$i" -ne 0 ]]; then
## Factorial of 9 is 362880
echo $i
## Factorial of 11 is 39916800
elif [[ "$i" -eq 0 ]]; then
## Factorial of 13 is 6227020800
echo 1
## Factorial of 15 is 1307674368000
else
## Factorial of 17 is 355687428096000
f=$(( $i - 1 ))
f=$( factorial $f )
f=$(( $f * $i ))
echo $f
fi
}
41/91
Research Computing Resources
➤ Maia
❏ 32-core Symmetric Multiprocessor (SMP) system available to all Lehigh Faculty, Sta and
Students
❏ dual 16-core AMD Opteron 6380 2.5GHz CPU
❏ Theoretical Performance: 640 GFLOPs (640 billion oating point operations per second)
FLOPs
GFLOPs = cores × clock ×
cycle
42/91
Research Computing Resources
➤ Sol: 80 node Shared Condominium Cluster
❏ 9 nodes, dual 10-core Intel Xeon E5-2650 v3 2.3GHz CPU, 25MB Cache, 128GB RAM
❏ 33 nodes, dual 12-core Intel Xeon E5-2670 v3 2.3Ghz CPU, 30 MB Cache, 128GB RAM
❏ 14 nodes, dual 12-core Intel Xeon E5-2650 v4 2.3Ghz CPU, 30 MB Cache, 64GB RAM
❏ 1 node, dual 8-core Intel Xeon 2630 v3 2.4GHz CPU, 20 MB Cache, 512GB RAM
❏ 23 nodes, dual 18-core Intel Xeon Gold 6140 2.3GHz CPU, 24.7 MB Cache, 192GB RAM
❏ Access: Batch Scheduled, interactive on login node for compiling, editing only
43/91
Sol
PROCESSOR PARTITION NODES CPUS GPUS CPU GPU CPU GPU ANNUAL
MEMORY MEMORY TFLOPS TFLOPS SUS
(GB) (GB)
Gold 6140 enge, engi, 23 828 48 4416 528 39.744 17.626 7,253,280
unnamed
➤ Haswell (v3) and Broadwell (v4) provide 256-bit Advanced Vector Extensions SIMD
instructions
44/91
What about Storage resources
➤ LTS provides various storage options for research and teaching
➤ Research groups can purchase a sharable project space on Ceph @ $375/TB for 5 year
duration
➤ Ceph is in-house, built, operated and administered by LTS Research Computing Sta .
➤ HPC users can write job output directly to their Ceph volume
➤ Ceph volume can be mounted as a network drive on Windows or CIFS on Mac and Linux
➤ Storage quota on
❏ Maia: 5GB
❏ Sol: 150GB
45/91
Network Layout Sol & Ceph Storage Cluster
46/91
How do I get started using HPC resources?
➤ Login to sol: ssh -X username@sol.cc.lehigh.edu
❏ You should see something like [alp514@sol ~]$ if you are logged into sol
❏ sol is the head/login and storage node for the monocacy cluster.
❏ Running intense computation on this node causes a high load on the storage that would
cause other users jobs to run slow.
❏ All compute nodes are labelled as sol-[a-e][1-6][01-18]
❆ [01-18] is the location in the rack with 01 at the bottom and 18 at the top
47/91
Available Software
➤ Commercial, Free and Open source software is installed on
❏ Maia: /zhome/Apps
❏ Sol: /share/Apps
❏ Why? We may have di erent versions of same software or software built with di erent
compilers
❏ Module environment allows you to dynamically change your *nix environment based on
software being used
❏ Standard on many University and national High Performance Computing resource since
circa 2011
➤ LTS provides licensed and open source software for Windows, Mac and Linux and Gogs, a
self hosted Git Service or Github clone
48/91
Module Command
COMMAND DESCRIPTION
module load abc add software abc to your environment (modify your PATH, LD_LIBRARY_PATH etc as
needed)
module show abc display what variables are added or modified in your environment
module help abc display help message for the module abc
➤ Users who prefer not to use the module environment will need to modify their .bashrc or
.tcshrc les. Run module show for list variables that need modi ed, appended or
prepended
49/91
Installed Software
➤ Chemistry/Materials Science ➤ Computational Fluid Dynamics
❏ CPMD ❏ Abaqus
❏ GAMESS ❏ Ansys
❏ Gaussian ❏ Comsol
❏ NWCHEM ❏ OpenFOAM
❏ Desmond ❏ Magma
❏ GROMACS ❏ Maple
❏ LAMMPS ❏ Mathematica
❏ NAMD ❏ Matlab
MPI enabled
50/91
More Software
➤ Machine & Deep Learning ➤ Bioinformatics
❏ BamTools
❏ TensorFlow
❏ BayeScan
❏ Ca e
❏ bgc
❏ SciKit-Learn
❏ BWA
❏ SciKit-Image
❏ FreeBayes
❏ Theano
❏ SAMTools
❏ Keras
❏ tabix
➤ Natural Language Processing (NLP)
❏ trimmomatic
❏ Natural Language Toolkit (NLTK)
❏ barcode_splitter
❏ Stanford NLP
❏ phyluce
Python packages ❏ VelvetOptimiser
51/91
More Software
➤ Scripting Languages ➤ Libraries
❏ R ❏ BLAS/LAPACK/GSL/SCALAPACK
❏ Perl ❏ Boost
❏ Python ❏ FFTW
❏ GNU ❏ HDF5
❏ Intel ❏ NetCDF
❏ JAVA ❏ METIS/PARMETIS
❏ PGI ❏ PetSc
❏ CUDA ❏ QHull/QRupdate
❏ MVAPICH2 ❏ SuperLU
❏ OpenMPI
52/91
More Software
➤ Visualization Tools ➤ Other Tools
❏ Avogadro ❏ CMake
❏ GNUPlot ❏ Lmod
❏ PWGui ❏ Numba
❏ PyMol ❏ Scons
❏ RDKit ❏ SPACK
❏ VESTA ❏ MD Tools
❏ VMD ❆ BioPython
❏ XCrySDen ❆ CCLib
❆ MDAnalysis
53/91
Using your own Software?
➤ You can always install a software in your home directory
❏ SPACK is an excellent package manager that can even create module les
➤ create a module and dynamically load it so that it doesn't interfere with other software
installed on the system
❏ e.g. You might want to use openmpi instead of mvapich2
❏ the system admin may not want install it system wide for just one user
➤ Add the directory where you will install the module les to the variable MODULEPATH in
.bashrc/.tcshrc
# My .bashrc file
export MODULEPATH=${MODULEPATH}:/home/alp514/modulefiles
54/91
Compilers
➤ Various versions of compilers installed on Sol
➤ Open Source: GNU Compiler (also called gcc even though gcc is the c compiler)
➤ On Sol, all except gcc 4.8.5 are available via the module environment
55/91
Compiling Code
➤ Usage: <compiler> <options> <source code>
➤ Example:
❏ -o myexec: compile code and create an executable myexec, default executable is a.out.
❏ -I{directory path}: directory to search for include les and fortran modules.
❏ target Sandybridge and later processors for optimization using uni ed binary
❆ Intel: -axCORE-AVX512,CORE-AVX2,CORE-AVX-I,AVX
56/91
Compiling and Running Serial Codes
[2018-02-22 08:47.27] ~/Workshop/2017XSEDEBootCamp/OpenMP
[alp514.sol-d118](842): icc -o laplacec laplace_serial.c
[2018-02-22 08:47.46] ~/Workshop/2017XSEDEBootCamp/OpenMP
[alp514.sol-d118](843): ./laplacec
Maximum iterations [100-4000]?
1000
---------- Iteration number: 100 ------------
[995,995]: 63.33 [996,996]: 72.67 [997,997]: 81.40 [998,998]: 88.97 [999,999]: 94.86 [1000,1000]: 98.67
---------- Iteration number: 200 ------------
[995,995]: 79.11 [996,996]: 84.86 [997,997]: 89.91 [998,998]: 94.10 [999,999]: 97.26 [1000,1000]: 99.28
---------- Iteration number: 300 ------------
[995,995]: 85.25 [996,996]: 89.39 [997,997]: 92.96 [998,998]: 95.88 [999,999]: 98.07 [1000,1000]: 99.49
---------- Iteration number: 400 ------------
[995,995]: 88.50 [996,996]: 91.75 [997,997]: 94.52 [998,998]: 96.78 [999,999]: 98.48 [1000,1000]: 99.59
---------- Iteration number: 500 ------------
[995,995]: 90.52 [996,996]: 93.19 [997,997]: 95.47 [998,998]: 97.33 [999,999]: 98.73 [1000,1000]: 99.66
---------- Iteration number: 600 ------------
[995,995]: 91.88 [996,996]: 94.17 [997,997]: 96.11 [998,998]: 97.69 [999,999]: 98.89 [1000,1000]: 99.70
---------- Iteration number: 700 ------------
[995,995]: 92.87 [996,996]: 94.87 [997,997]: 96.57 [998,998]: 97.95 [999,999]: 99.01 [1000,1000]: 99.73
---------- Iteration number: 800 ------------
[995,995]: 93.62 [996,996]: 95.40 [997,997]: 96.91 [998,998]: 98.15 [999,999]: 99.10 [1000,1000]: 99.75
---------- Iteration number: 900 ------------
[995,995]: 94.21 [996,996]: 95.81 [997,997]: 97.18 [998,998]: 98.30 [999,999]: 99.17 [1000,1000]: 99.77
---------- Iteration number: 1000 ------------
[995,995]: 94.68 [996,996]: 96.15 [997,997]: 97.40 [998,998]: 98.42 [999,999]: 99.22 [1000,1000]: 99.78
57/91
Compilers for Parallel Programming: OpenMP & TBB
➤ OpenMP support is built-in
PGI -mp
58/91
Compiling and Running OpenMP Codes
[2018-02-22 08:47.56] ~/Workshop/2017XSEDEBootCamp/OpenMP/Solutions
[alp514.sol-d118](845): icc -qopenmp -o laplacec laplace_omp.c
[2018-02-22 08:48.09] ~/Workshop/2017XSEDEBootCamp/OpenMP/Solutions
[alp514.sol-d118](846): OMP_NUM_THREADS=4 ./laplacec
Maximum iterations [100-4000]?
1000
---------- Iteration number: 100 ------------
[995,995]: 63.33 [996,996]: 72.67 [997,997]: 81.40 [998,998]: 88.97 [999,999]: 94.86 [1000,1000]: 98.67
---------- Iteration number: 200 ------------
[995,995]: 79.11 [996,996]: 84.86 [997,997]: 89.91 [998,998]: 94.10 [999,999]: 97.26 [1000,1000]: 99.28
---------- Iteration number: 300 ------------
[995,995]: 85.25 [996,996]: 89.39 [997,997]: 92.96 [998,998]: 95.88 [999,999]: 98.07 [1000,1000]: 99.49
---------- Iteration number: 400 ------------
[995,995]: 88.50 [996,996]: 91.75 [997,997]: 94.52 [998,998]: 96.78 [999,999]: 98.48 [1000,1000]: 99.59
---------- Iteration number: 500 ------------
[995,995]: 90.52 [996,996]: 93.19 [997,997]: 95.47 [998,998]: 97.33 [999,999]: 98.73 [1000,1000]: 99.66
---------- Iteration number: 600 ------------
[995,995]: 91.88 [996,996]: 94.17 [997,997]: 96.11 [998,998]: 97.69 [999,999]: 98.89 [1000,1000]: 99.70
---------- Iteration number: 700 ------------
[995,995]: 92.87 [996,996]: 94.87 [997,997]: 96.57 [998,998]: 97.95 [999,999]: 99.01 [1000,1000]: 99.73
---------- Iteration number: 800 ------------
[995,995]: 93.62 [996,996]: 95.40 [997,997]: 96.91 [998,998]: 98.15 [999,999]: 99.10 [1000,1000]: 99.75
---------- Iteration number: 900 ------------
[995,995]: 94.21 [996,996]: 95.81 [997,997]: 97.18 [998,998]: 98.30 [999,999]: 99.17 [1000,1000]: 99.77
---------- Iteration number: 1000 ------------
[995,995]: 94.68 [996,996]: 96.15 [997,997]: 97.40 [998,998]: 98.42 [999,999]: 99.22 [1000,1000]: 99.78
59/91
Compilers for Parallel Programming: MPI
➤ MPI is a library and not a compiler, built or compiled for di erent compilers.
Fortran mpif90
C mpicc
C++ mpicxx
60/91
MPI Libraries
➤ There are two di erent MPI implementations commonly used
❏ used as a starting point for various commercial and open source MPI libraries
❏ MVAPICH2: Developed by D. K. Panda with support for In niBand, iWARP, RoCE, and Intel Omni-
Path. (default MPI on Sol)
❏ Intel MPI: Intel's version of MPI. You need this for Xeon Phi MICs.
➤ OpenMPI: A Free, Open Source implementation from merger of three well know MPI implementations.
Can be used for commodity network as well as high speed network
❏ FT-MPI from the University of Tennessee
61/91
Running MPI Programs
➤ Every MPI implementation come with their own job launcher: mpiexec (MPICH,OpenMPI &
MVAPICH2), mpirun (OpenMPI) or mpirun_rsh (MVAPICH2)
➤ Required options: number of processes and list of hosts on which to run program
➤ To run a MPI code, you need to use the launcher from the same implementation that was
used to compile the code.
➤ For e.g.: You cannot compile code with OpenMPI and run using the MPICH and MVAPICH2's
launcher
❏ Since MVAPICH2 is based on MPICH, you can launch MVAPICH2 compiled code using
MPICH's launcher.
➤ SLURM scheduler provides srun as a wrapper around all mpi launchers
62/91
Compiling and Running MPI Codes
[2018-02-22 08:48.27] ~/Workshop/2017XSEDEBootCamp/MPI/Solutions
[alp514.sol-d118](848): mpicc -o laplacec laplace_mpi.c
[2018-02-22 08:48.41] ~/Workshop/2017XSEDEBootCamp/MPI/Solutions
[alp514.sol-d118](849): mpiexec -n 4 ./laplacec
Maximum iterations [100-4000]?
1000
---------- Iteration number: 100 ------------
[995,995]: 63.33 [996,996]: 72.67 [997,997]: 81.40 [998,998]: 88.97 [999,999]: 94.86 [1000,1000]: 98.67
---------- Iteration number: 200 ------------
[995,995]: 79.11 [996,996]: 84.86 [997,997]: 89.91 [998,998]: 94.10 [999,999]: 97.26 [1000,1000]: 99.28
---------- Iteration number: 300 ------------
[995,995]: 85.25 [996,996]: 89.39 [997,997]: 92.96 [998,998]: 95.88 [999,999]: 98.07 [1000,1000]: 99.49
---------- Iteration number: 400 ------------
[995,995]: 88.50 [996,996]: 91.75 [997,997]: 94.52 [998,998]: 96.78 [999,999]: 98.48 [1000,1000]: 99.59
---------- Iteration number: 500 ------------
[995,995]: 90.52 [996,996]: 93.19 [997,997]: 95.47 [998,998]: 97.33 [999,999]: 98.73 [1000,1000]: 99.66
---------- Iteration number: 600 ------------
[995,995]: 91.88 [996,996]: 94.17 [997,997]: 96.11 [998,998]: 97.69 [999,999]: 98.89 [1000,1000]: 99.70
---------- Iteration number: 700 ------------
[995,995]: 92.87 [996,996]: 94.87 [997,997]: 96.57 [998,998]: 97.95 [999,999]: 99.01 [1000,1000]: 99.73
---------- Iteration number: 800 ------------
[995,995]: 93.62 [996,996]: 95.40 [997,997]: 96.91 [998,998]: 98.15 [999,999]: 99.10 [1000,1000]: 99.75
---------- Iteration number: 900 ------------
[995,995]: 94.21 [996,996]: 95.81 [997,997]: 97.18 [998,998]: 98.30 [999,999]: 99.17 [1000,1000]: 99.77
---------- Iteration number: 1000 ------------
[995,995]: 94.68 [996,996]: 96.15 [997,997]: 97.40 [998,998]: 98.42 [999,999]: 99.22 [1000,1000]: 99.78
63/91
Cluster Environment
➤ A cluster is a group of
computers (nodes) that
works together closely
❏ Head/Login Node
❏ Compute Node
➤ Multi-user environment
64/91
How to run jobs
➤ All compute intensive jobs are scheduled
➤ Need to specify
❆ number of nodes
65/91
Scheduler & Resource Management
➤ A software that manages resources (CPU time, memory, etc) and schedules job execution
❆ Scheduler: Maui
➤ A job can be considered as a user’s request to use a certain amount of resources for a
certain amount of time
66/91
Job Scheduling
➤ Map jobs onto the node-time space
67/91
Back lling
➤ A strategy to improve utilization
68/91
How much time must I request
➤ Ask for an amount of time that is
69/91
Available Queues
➤ Sol
PARTITION NAME MAX RUNTIME IN HOURS MAX SU CONSUMED NODE PER HOUR
lts 72 20
imlab 48 22
imlab‑gpu 48 24
eng 72 22
eng‑gpu 72 24
engc 72 24
himem 72 48
enge 72 36
engi 72 36
➤ Maia
smp‑test 1 4
smp 96 384
70/91
How much memory can or should I use per core?
➤ The amount of installed memory less the amount that is used by the operating system and
other utilities
➤ A general rule of thumb on most HPC resources: leave 1-2GB for the OS to run.
➤ Sol
himem 32 31.5
➤ if you need to run a single core job that requires 10GB memory in the imlab partition, you
need to request 2 cores even though you are only using 1 core.
➤ Maia: Users need to specify memory required in their submit script. Max memory that
should be requested is 126GB.
71/91
Basic Job Manager Commands
➤ Submission
➤ Monitoring
➤ Manipulating
➤ Reporting
72/91
Job Types
➤ Interactive Jobs
❏ Will log you into a compute node and wait for your prompt
❏ Purpose: testing and debugging code. Do not run jobs on head node!!!
➤ Batch Jobs
❏ Work ow: write a script -> submit script -> take mini vacation -> analyze results
73/91
Useful SLURM Directives
SLURM DIRECTIVE DESCRIPTION
‑‑time=hh:mm:ss Request resources to run job for hh hours, mm minutes and ss seconds.
74/91
Useful SLURM Directives (contd)
SLURM DIRECTIVE DESCRIPTION
‑‑qos=nogpu Request a quality of service (qos) for the job in imlab, engc partitions.
You can request 1 or 2 gpus with a minimum of 1 core or cpu per gpu
➤ SLURM can also take short hand notation for the directives
‑‑partition=queuename ‑p queuename
‑‑time=hh:mm:ss ‑t hh:mm:ss
‑‑nodes=m ‑N m
‑‑ntasks=n ‑n n
‑‑account=mypi ‑A mypi
‑‑job‑name=jobname ‑J jobname
‑‑output=filename.out ‑o filename.out
75/91
SLURM Filename Patterns
➤ sbatch allows for a lename pattern to contain one or more replacement symbols, which are
a percent sign "%" followed by a letter (e.g. %j).
PATTERN DESCRIPTION
%n Node identifier relative to current job (e.g. "0" is the first node of the running job) This will create a
separate IO file per node.
%t task identifier (rank) relative to current job. This will create a separate IO file per task.
%u User name.
%x Job name.
76/91
Useful PBS Directives
PBS DIRECTIVE DESCRIPTION
‑l walltime=hh:mm:ss Request resources to run job for hh hours, mm minutes and ss seconds.
for e.g. abe will send email when job begins and either aborts or ends
77/91
Useful PBS/SLURM environmental variables
SLURM COMMAND DESCRIPTION PBS COMMAND
SLURM_JOB_NODELIST Name of the file that contains a list of the HOSTS provided for the job PBS_NODEFILE
Name of the job. This can be set using the ‑N option in the PBS script PBS_JOBNAME
value of the SHELL variable in the environment in which qsub was PBS_O_SHELL
executed
78/91
Job Types: Interactive
➤ PBS: Use qsub -I command with PBS Directives
➤ SLURM: Use srun command with SLURM Directives followed by --pty /bin/bash --login
❏ If you have soltools module loaded, then use interact with at least one SLURM Directive
➤ Run a job interactively replace --pty /bin/bash --login with the appropriate command.
❏ For e.g. srun -t 20 -n 1 -p imlab --qos=nogpu $(which lammps) -in in.lj -var x
1 -var n 1
❏ Default values are 3 days, 1 node, 20 tasks per node and lts partition
79/91
Job Types: Batch
➤ Work ow: write a script -> submit script -> take mini vacation -> analyze results
➤ Add PBS or SLURM directives after the shebang line but before any shell commands
➤ qsub and sbatch can take the options for #PBS and #SBATCH as command line arguments
80/91
Minimal submit script for Serial Jobs
#!/bin/bash
#PBS -q smp
#PBS -l walltime=1:00:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=4GB
#PBS -N myjob
cd ${PBS_O_WORKDIR}
./myjob < filename.in > filename.out
#!/bin/bash
#SBATCH --partition=lts
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --job-name myjob
cd ${SLURM_SUBMIT_DIR}
./myjob < filename.in > filename.out
81/91
Minimal submit script for MPI Job
#!/bin/bash
#SBATCH --partition=lts
#SBATCH --time=1:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=20
## For --partition=imlab,
### use --ntasks-per-node=22
### and --qos=nogpu
#SBATCH --job-name myjob
cd ${SLURM_SUBMIT_DIR}
srun ./myjob < filename.in > filename.out
exit
82/91
Minimal submit script for OpenMP Job
#!/bin/tcsh
#SBATCH --partition=imlab
# Directives can be combined on one line
#SBATCH --time=1:00:00 --nodes=1 --ntasks-per-node=22
#SBATCH --qos=nogpu
#SBATCH --job-name myjob
cd ${SLURM_SUBMIT_DIR}
# Use either
setenv OMP_NUM_THREADS 22
./myjob < filename.in > filename.out
# OR
OMP_NUM_THREADS=22 ./myjob < filename.in > filename.out
exit
83/91
Minimal submit script for LAMMPS GPU job
#!/bin/tcsh
#SBATCH --partition=imlab
# Directives can be combined on one line
#SBATCH --time=1:00:00
#SBATCH --nodes=1
# 1 CPU can be be paired with only 1 GPU
# 1 GPU can be paired with all 24 CPUs
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
# Need both GPUs, use --gres=gpu:2
#SBATCH --job-name myjob
cd ${SLURM_SUBMIT_DIR}
# Load LAMMPS Module
module load lammps/17nov16-gpu
# Run LAMMPS for input file in.lj
srun $(which lammps) -in in.lj -sf gpu -pk gpu 1 gpuID ${CUDA_VISIBLE_DEVICE}
exit
84/91
Need to run multiple jobs in sequence?
➤ Option 1: Submit jobs as soon as previous jobs complete
❏ one node: your submit script should be able to run several serial jobs in background
and then use the wait command for all jobs to nish
❏ more than one node: this requires some background in scripting but the idea is the
same as above
85/91
Monitoring & Manipulating Jobs
SLURM COMMAND DESCRIPTION PBS COMMAND
squeue ‑‑start Show estimated start time of jobs in queue showstart jobid
scontrol show job jobid Check status of your job identified by jobid checkjob jobid
scontrol hold jobid Put your job identified by jobid on hold qhold jobid
scontrol release jobid Release the hold that you put on jobid qrls jobid
➤ The following scripts written by RC sta can also be used for monitoring jobs.
86/91
Modifying Resources for Queued Jobs
➤ Modify a job after submission but before starting:
❏ request gpus (when changing to one of the gpu partitions): gres=gpu:<1 or 2>
➤ SPECIFICATIONs can be combined for e.g. command to move a queued job to imlab
partition and change timelimit to 48 hours for a job 123456 is
❏ scontrol update partition=imlab qos=nogpu timelimit=48:00:00
jobid=123456
87/91
Usage Reporting
➤ sacct: displays accounting data for all jobs and job steps in the SLURM job accounting log or
Slurm database
❏ alloc_summary.sh
❏ balance
❏ solreport
❆ PIs can obtain usage report for all or speci c users on their allocation
88/91
Usage Reporting
89/91
Online Usage Reporting
➤ Monthly usage summary (updated daily)
❏ AY 17-18
❏ AY 16-17
90/91
Additional Help & Information
➤ Issue with running jobs or need help to get started:
➤ More Information
❏ Research Computing
➤ Subscribe
➤ My contact info
❏ eMail: alp514@lehigh.edu
❏ My Schedule
91/91