Documente Academic
Documente Profesional
Documente Cultură
Steve Anthony
October 30, 2014
http://www.lehigh.edu/go/rcwiki
Welcome to HPC
All Lehigh HPC resources run the GNU/Linux operating
system. Mostly CentOS, with a smattering of Debian.
A good introduction to Linux lecture is available through XSEDE/TACC
http://www.tacc.utexas.edu/user-services/training/course-materials
NICS Unix training 11/13 and 11/18 webcast and in EWFM 292 and 299.
SSH is the protocol used to access the HPC resources at
Lehigh.
OS X: Applications->Utilities->Terminal and type ssh LehighID@hostname
Windows: SSH Client available at Public Sites and
www.lehigh.edu/software.
Off-campus access requires VPN or SSH gateway.
VPN information available at lehigh.edu/vpn.
To use SSH gateway connect to ssh.lehigh.edu then connect to resource
needed using ssh hostname.
Get an Account
What's Available?
$0 $450/allocation +
$50/account (per-
year)
Level 1: Maia
System Statistics
1 SMP node, 2x AMD Opteron 16-core 6380, 128GB
RAM. 4TB local scratch space. GbE interconnect.
Hostname is maia.cc.lehigh.edu, logins through
polaris.cc.lehigh.edu
Scheduling
Job scheduling per-core/per-memory using PBS
Torque/Maui is required for all jobs, submit on polaris.
Queues
Jobs may be submitted to smp-test or smp queues
depending on requirements.
Level 2: Corona
System Statistics
64 Nodes, 2x AMD Opteron 8-core 6128 (16
cores/node), 32 or 64GB RAM per node. 1 or 2TB
scratch space per node. Infiniband or GbE interconnect.
Hostname is corona.cc.lehigh.edu.
Scheduling
Job scheduling using PBS Torque/Maui is required for
all jobs.
Queues
Jobs may be submitted to bf, short, normal or p-ib
queues depending on requirements.
Corona: allocations
Corona makes use of system allocations to ensure equitable access
and provide resource guarantees to users. Users are granted 1 corona
node-year of priority access.
Priority boost for all jobs.
Run jobs longer (96 hours)
Run more jobs simultaneously.
Complex
PATH is a Linux environment variable used by the shell to determine
which directories to search for executable files when the user issues a
command at the prompt.
o PATH=/usr/local/bin:/usr/bin:/bin tells the shell to search
/usr/local/bin, then /usr/bin, and finally /bin when the user types a
and error
command such as matlab. If it does not exist in one of these
directories you will see "bash: matlab: command not found"
LD_LIBRARY_PATH is also an environment variable. It is used to
determine which directories should be searched for libraries an
executable requires to run correctly.
prone!
In order to add a directory to either of these you would issue a
command like: PATH=/home/sma310/bin:$PATH; this tells the shell to
look in my user bin directory before searching the rest of the existing
PATH
Environment Modules
The Problem: Managing environment variables is complex.
o May need to change depending on workflow, compilers used.
o Need to know exactly where various binary and library directories are
located on the system.
o Potential for conflicts and dependencies: eg. PGI OpenMPI toolchain
requires PGI compiler. Prepending GCC will break it.
Environment modules handle setting up the user environment and
managing dependencies and conflicts.
o module avail - list available modules
o module list - list loaded modules
o module load name/version -- load module name/version toolchain, or
gives an error if there is a conflict with another module.
o module unload name/version - unloads the module name/version.
o module purge - unload all modules.
Environment Modules
Go from managing PATH and LD_LIBRARY_PATH to:
module purge
Submitting a Job
Job Management
qsub file.submit: submit the job described in
file.submit.
https://webapps.lehigh.edu/dokuwiki/sites/researchcomputing/doku.php?id=corona#queue_structure
polaris:/training/oct3014/interact/job.pbs
Interactive Sessions
# Use the smp-test queue on Polaris No tasks beyond code compilation and
#PBS -q smp-test editing are allowed on corona1.
# Request 1 core and 2GB RAM for 15 min Polaris, is purely a submit host. If you need
#PBS -l walltime=00:15:00,nodes=1,ncpus=1,mem=2gb to compile for Maia, you must submit an
----------------------------------------------------- interactive job.
# Use the short queue on Corona Interactive sessions are the required
#PBS -q short method for testing and troubleshooting
# Request 1 corona node for 15 min jobs.
#PBS -l walltime=00:15:00,nodes=1:ppn=16 Allows you to test jobs step by step without
----------------------------------------------------- impacting other users.
Then, submit the job: When you're finished, type 'logout' to exit,
$ qsub -I job.pbs or press Ctrl+D.
qsub: waiting for job 604001.corona1.cc.lehigh.edu to start
qsub: job 604001.corona1.cc.lehigh.edu ready
Information to include:
o What you are trying to do, including software or
compiler names and versions.
o What system you're using as a platform.
o What you've tried to do, and the result.
o Any logs you generated while working.
o Test suite is always helpful!
http://www.lehigh.edu/go/rcwiki