Sunteți pe pagina 1din 25

HPC Resources at Lehigh

Steve Anthony
October 30, 2014
http://www.lehigh.edu/go/rcwiki
Welcome to HPC
All Lehigh HPC resources run the GNU/Linux operating
system. Mostly CentOS, with a smattering of Debian.
A good introduction to Linux lecture is available through XSEDE/TACC
http://www.tacc.utexas.edu/user-services/training/course-materials
NICS Unix training 11/13 and 11/18 webcast and in EWFM 292 and 299.
SSH is the protocol used to access the HPC resources at
Lehigh.
OS X: Applications->Utilities->Terminal and type ssh LehighID@hostname
Windows: SSH Client available at Public Sites and
www.lehigh.edu/software.
Off-campus access requires VPN or SSH gateway.
VPN information available at lehigh.edu/vpn.
To use SSH gateway connect to ssh.lehigh.edu then connect to resource
needed using ssh hostname.
Get an Account

What's Available?

Service Level 1 Service Level 2

Maia Capella, Trit1-3,


Cuda0, and Corona

$0 $450/allocation +
$50/account (per-
year)
Level 1: Maia
System Statistics
1 SMP node, 2x AMD Opteron 16-core 6380, 128GB
RAM. 4TB local scratch space. GbE interconnect.
Hostname is maia.cc.lehigh.edu, logins through
polaris.cc.lehigh.edu
Scheduling
Job scheduling per-core/per-memory using PBS
Torque/Maui is required for all jobs, submit on polaris.
Queues
Jobs may be submitted to smp-test or smp queues
depending on requirements.
Level 2: Corona
System Statistics
64 Nodes, 2x AMD Opteron 8-core 6128 (16
cores/node), 32 or 64GB RAM per node. 1 or 2TB
scratch space per node. Infiniband or GbE interconnect.
Hostname is corona.cc.lehigh.edu.
Scheduling
Job scheduling using PBS Torque/Maui is required for
all jobs.
Queues
Jobs may be submitted to bf, short, normal or p-ib
queues depending on requirements.
Corona: allocations
Corona makes use of system allocations to ensure equitable access
and provide resource guarantees to users. Users are granted 1 corona
node-year of priority access.
Priority boost for all jobs.
Run jobs longer (96 hours)
Run more jobs simultaneously.

Once exhausted, users may either buy additional priority time


($450/node-year) or opt to run in the freecycle queues.
Priority lower than any allocation job.
Reduced maximum walltime (72 hours)
Can run fewer jobs simultaneously (6 nodes @72hr each)
Jobs in freecycle queues are NOT preemptable by allocation jobs.
Filesystem Map

/home -- local user directories


/zhome/Apps -- NAS2 software installations
common to all systems.
/zhome/userid -- NAS2 user files shared
available on all systems.
/Projects -- NAS1 user/group projects.
/usr/local -- local software installations.
/scratch -- local disk for temporary files.
PATH and
LD_LIBRARY_PATH

PATH is a Linux environment variable used by the shell to determine


which directories to search for executable files when the user issues a
command at the prompt.
o PATH=/usr/local/bin:/usr/bin:/bin tells the shell to search
/usr/local/bin, then /usr/bin, and finally /bin when the user types a
command such as matlab. If it does not exist in one of these
directories you will see "bash: matlab: command not found"
LD_LIBRARY_PATH is also an environment variable. It is used to
determine which directories should be searched for libraries an
executable requires to run correctly.
In order to add a directory to either of these you would issue a
command like: PATH=/home/sma310/bin:$PATH; this tells the shell to
look in my user bin directory before searching the rest of the existing
PATH
PATH and
LD_LIBRARY_PATH

Complex
PATH is a Linux environment variable used by the shell to determine
which directories to search for executable files when the user issues a
command at the prompt.
o PATH=/usr/local/bin:/usr/bin:/bin tells the shell to search
/usr/local/bin, then /usr/bin, and finally /bin when the user types a

and error
command such as matlab. If it does not exist in one of these
directories you will see "bash: matlab: command not found"
LD_LIBRARY_PATH is also an environment variable. It is used to
determine which directories should be searched for libraries an
executable requires to run correctly.

prone!
In order to add a directory to either of these you would issue a
command like: PATH=/home/sma310/bin:$PATH; this tells the shell to
look in my user bin directory before searching the rest of the existing
PATH
Environment Modules
The Problem: Managing environment variables is complex.
o May need to change depending on workflow, compilers used.
o Need to know exactly where various binary and library directories are
located on the system.
o Potential for conflicts and dependencies: eg. PGI OpenMPI toolchain
requires PGI compiler. Prepending GCC will break it.
Environment modules handle setting up the user environment and
managing dependencies and conflicts.
o module avail - list available modules
o module list - list loaded modules
o module load name/version -- load module name/version toolchain, or
gives an error if there is a conflict with another module.
o module unload name/version - unloads the module name/version.
o module purge - unload all modules.
Environment Modules
Go from managing PATH and LD_LIBRARY_PATH to:

module purge

module load openmpi/1.8/pgi/14.3

...Do some work.

module unload openmpi/1.8/pgi/14.3

module load openmpi/1.6.5/intel/14.0.1


Queue Structure: Polaris

Queue Priority Preemptable Max Walltime Simultaneous

smp 100,000 No 96 hours 384 core-hours

smp-test 500,000 No 1 hour 4 core-hours


polaris:/training/oct3014/hellow/job.pbs

Job submission file


# Specifies the name PBS gives to this job.
#PBS -N hello_world
# Use the smp-test queue to use for this job.
#PBS -q smp-test
# Request your resources.
#PBS -l nodes=1,ncpus=1,mem=2gb,walltime=12:00:00
# Send e-mail notification to this address
#PBS -M sma310@lehigh.edu
# E-mail when the job [b]egins, [e]nds, or [a]borts
#PBS -m ea
# Determine how many cores to run on
NPROCS=`wc -l < $PBS_NODEFILE`
echo "PBS job id : $PBS_JOBID"
echo "PBS_O_WORKDIR : $PBS_O_WORKDIR"
# Clean environment
module purge
# Load modules
module load openmpi/1.6.0/gcc/4.7.1
# Run the job
mpirun -np $NPROCS $PBS_O_WORKDIR/helloworld
# sleep so we can see the output of commands
sleep 300
Remember!
# This is a comment.
#This is a PBS directive
polaris:/training/oct3014/hellow/job.pbs

Submitting a Job

Job Management
qsub file.submit: submit the job described in
file.submit.

qstat -a: show jobs you have submitted to the queue.

qdel jobID: removes jobID from the queue.

checkjob/tracejob jobID: check the status of a


submitted/running job or show historical information
(past 24 hours).

showstart jobID: show computed start time for job.

http://corona.cc.lehigh.edu/q : snapshots of the


queue, updated every 15 minutes.
Queue Structure: Corona

Queue Priority Preemptable Max Walltime Other Limits

p-ib 600,000 No 96/72 hours Min. 2


nodes/Max 4-
node days (all
users)

normal 600,000 No 96/72 hours

short 800,000 No 24/12 hours

bf -800,000 Yes 96 hours

def-q Attempts to routes to p-ib, normal, short, and bf in that order

https://webapps.lehigh.edu/dokuwiki/sites/researchcomputing/doku.php?id=corona#queue_structure
polaris:/training/oct3014/interact/job.pbs

Interactive Sessions
# Use the smp-test queue on Polaris No tasks beyond code compilation and
#PBS -q smp-test editing are allowed on corona1.
# Request 1 core and 2GB RAM for 15 min Polaris, is purely a submit host. If you need
#PBS -l walltime=00:15:00,nodes=1,ncpus=1,mem=2gb to compile for Maia, you must submit an
----------------------------------------------------- interactive job.
# Use the short queue on Corona Interactive sessions are the required
#PBS -q short method for testing and troubleshooting
# Request 1 corona node for 15 min jobs.
#PBS -l walltime=00:15:00,nodes=1:ppn=16 Allows you to test jobs step by step without
----------------------------------------------------- impacting other users.
Then, submit the job: When you're finished, type 'logout' to exit,
$ qsub -I job.pbs or press Ctrl+D.
qsub: waiting for job 604001.corona1.cc.lehigh.edu to start
qsub: job 604001.corona1.cc.lehigh.edu ready

Tue Apr 9 10:48:01 EDT 2013 : erasing contents of


corona28:/scratch
Tue Apr 9 10:48:01 EDT 2013 : /scratch erased
[sma310@corona28 ~]$
corona:/training/oct3014/p-ib_bf/job.pbs

Backfill and Infiniband


Why Bother?
# Specifies the name PBS gives to this job. General use of the Infiniband nodes
#PBS -N mpich2_hello_world
is limited to 4 node days
# Use the high queue simultaneous use between all users.
#PBS -q bf o Relaxed limits in backfill means
# Notice the request specifies IB nodes
you can run jobs on more node,
#PBS -l nodes=4:ppn=16:ib,walltime=00:05:00 and for much longer than in the
p-ib queue.
# Send e-mail notification to this address
o You aren't stuck waiting for
#PBS -M sma310@lehigh.edu
# E-mail when the job [b]egins, [e]nds, or [a]borts another user's job to complete.
#PBS -m bea o Fewer jobs capable of
# Determine how many cores to run on
preempting backfill jobs on
NPROCS=`wc -l < $PBS_NODEFILE`
# Run the job Infiniband nodes are submitted.
mpirun --hostfile $PBS_NODEFILE -np $NPROCS This means preemption is much
$PBS_O_WORKDIR/helloworld less likely than in general bf.
Data Staging
# Use the smp test queue User home directories are hosted on
#PBS -q smp-test Corona1 and shared to all nodes.
# Request 2 cores, and 4GB RAM for 25 Lots of I/O from multiple users can
minutes cause this to act as a major
#PBS -l bottleneck.
walltime=00:25:00,nodes=1,ncpus=2, Instances of users jobs spending
mem=4gb more than half allocated CPU time
NPROCS=`wc -l < $PBS_NODEFILE` waiting for I/O.
# Define executable Can mitigate this issue by moving
executable=~/zhome/Apps/hellow2 data from /home to /scratch on the
# Mode data to the node node where our job is running.
cp -a ~/job/data /scratch/userid/data There is a catch; this only works for
cd /scratch/userid/data jobs using a single node. Multi-node
# Run the job jobs which need shared access to
$executable -data /scratch/userid/data data cannot use this approach.
# Move results back to your home directory
cp -a /scratch/userid/data ~/job/output
Fairshare
Measure of system utilization (proc-hours) which is
used to adjust job priority.
Used to prevent a single user from monopolizing the
system for a long period of time.
14 day window with a decay policy. Total FS usage
determined from usage over that time, compared to FS
target(per-user) and used to adjust priority.

Note: If there are no other jobs in the queue, FS won't help,


it does not prevent low priority jobs from running. This also
means a low priority isn't always bad, only hurts when there
is resource contention.
Using Maia/Corona
- Compile and submit jobs from the head node; don't run
jobs on corona1/polaris, they will be killed.
- Submit jobs early and often. If the resource is busy, Maui
can't hold a spot if you haven't submitted any jobs.
- Dont grossly overestimate walltimes, especially if you
submit the same type of job regularly.
- Remember to cleanup /scratch if you use it before your
job completes. On Corona, this space is erased at the
beginning of each new job. On Maia, its erased without
warning when it begins to get close to full (past 80%).
Troubleshooting
What to do when you code/job fails to run:
Check the job output and error logs for clues. Googling the error message
can often put you on the right track.
Verify modules are loaded and run ldd on the executable to make sure it
finds all the libraries it requires to run.
o If a library is not found, verify the prerequisites are installed and
modules for them are loaded, check .bashrc for conflicting modules.
Check your quota by running the myQuota command. If you're using
Polaris, you need run df -h instead.
Create a test suite and submit a help ticket.
o Test suite should include: input files, directions to run the code (submit
script), and known good output (if available).
Requesting Help
Best way is to submit a ticket at:
http://www.lehigh.edu/help

Information to include:
o What you are trying to do, including software or
compiler names and versions.
o What system you're using as a platform.
o What you've tried to do, and the result.
o Any logs you generated while working.
o Test suite is always helpful!
http://www.lehigh.edu/go/rcwiki

S-ar putea să vă placă și