Distributed & Parallel Computing Cluster: Patrick Mcguigan

Distributed & Parallel Computing Cluster
Patrick McGuigan
mcguigan@cse.uta.edu
2/12/04
DPCC Background

NSF funded Major Research Instrumentation (MRI) grant Goals Personnel

PI Co-PIs Senior Personnel Systems Administrator

2/12/04
DPCC Goals

Establish a regional Distributed and Parallel Computing Cluster at UTA (DPCC@UTA) An inter-departmental and inter-institutional facility Facilitate collaborative research that requires large scale storage (tera to peta bytes), high speed access (gigabit or more) and mega processing (100s of processors)
2/12/04
DPCC Research Areas

Data mining / KDD

Association rules, graph-mining, Stream processing etc. Simulation, moving towards a regional D Center
High Energy Physics Dermatology/skin cancer
Image database, lesion detection and monitoring

Grid computing, PICO Non-intrusive network performance evaluation Formal specification and verification Video streaming, scene analysis
Distributed computing Networking Software Engineering Multimedia Facilitate collaborative efforts that need high-performance computing
2/12/04
DPCC Personnel

PI Dr. Chakravarthy CO-PIs
Drs. Aslandogan, Das, Holder, Yu
Senior Personnel Paul Bergstresser, Kaushik De, Farhad Kamangar, David Kung, Mohan Kumar, David Levine, Jung-Hwan Oh, Gregely Zaruba Systems Administrator
Patrick McGuigan
2/12/04
DPCC Components
Establish a distributed memory cluster (150+ processors) Establish a Symmetric or shared multiprocessor system Establish a large shareable high speed storage (100s of Terabytes)
2/12/04
DPCC Cluster as of 2/1/2004

Located in 101 GACB Inauguration 2/23 as part of E-Week 5 racks of equipment + UPS
2/12/04
2/12/04
2/12/04
2/12/04
Photos (scaled for presentation)
2/12/04
DPCC Resources
97 machines

81 worker nodes 2 interactive nodes 10 IDE based RAID servers 4 nodes support Fibre Channel SAN 4.5 TB in each IDE RAID 5.2 TB in FC SAN
2/12/04
50+ TB storage

DPCC Resources (continued)
1 Gb/s network interconnections

core switch satellite switches
1 Gb/s SAN network UPS
2/12/04
DPCC Layout
node1 node2 node3 node4 node5 node6 node7 node8 node9 node10 node11 node12 node13 node14 Node15 node16 node17 node18 node19 node20 node21 node22 100 Mbs switch node23 node24 node25 node26 node27 node28 node29 node30 node31 node32 ArcusII 5.2 TB Brocade 3200 gfsnode3 gfsnode2 lockserver gfsnode1 node53 node54 node55 node56 node57 node58 node59 node60 node61 node62 raid4 IDE 6TB node63 node64 node65 node66 node67 node68 node69 node70 node71 node72 raid5 IDE 6TB raid7 IDE 6TB node73 node74 node75 node76 node77 node78 node79 node80 node81
Open Open Open Open
Campus
Campus
Linksys 3512
Linksys 3512
master.dpcc.uta.edu
grid.dpcc.uta.edu
raid3
node33 node34 node35
node43 node44 node45 node46 node47 node48 node49 Node50 node51 node52 6TB raid2 IDE 6TB
raid6 Foundry FastIron 800 raid8
node36 node37 node38 node39 node40
raid9
node41 node42
raid10
raid1
IDE
Linksys 3512
Linksys 3512
Linksys 3512
2/12/04
DPCC Resource Details
Worker nodes
Dual Xeon processors
32 machines @ 2.4GHz 49 machines @ 2.6GHz
2 GB RAM IDE Storage
32 machines @ 60 GB 49 machines @ 80 GB
Redhat 7.3 Linux (2.4.20 kernel)
2/12/04
DPCC Resource Details (cont.)
Raid Server

Dual Xeon processors (2.4 GHz) 2 GB RAM 4 Raid Controllers

2 port controller (qty 1) Mirrored OS disks 8 port controller (qty 3) RAID5 with hot spare
24 250GB disks 2 40GB disk NFS used to support worker nodes

2/12/04
DPCC Resource Details (cont.)
FC SAN

RAID5 Array 42 142GB FC disks FC Switch 3 GFS nodes

Dual Xeon (2.4 Ghz) 2 GB RAM Global File System (GFS) Serve to cluster via NFS
1 GFS Lockserver
2/12/04
Using DPCC
Two nodes available for interactive use

master.dpcc.uta.edu grid.dpcc.uta.edu
More nodes are likely to support other services (Web, DB access) Access through SSH (version 2 client)

Freeware Windows clients are available (ssh.com) File transfers through SCP/SFTP
2/12/04
Using DPCC (continued)

User quotas not implemented on home directory yet. Be sensible in your usage. Large data sets will be stored on RAIDs (requires coordination with sys admin) All storage visible to all nodes.
2/12/04
Getting Accounts

Have your supervisor request account Account will be created Bring ID to 101 GACB to receive password
Keep password safe master.dpcc.uta.edu grid.dpcc.uta.edu
Login to any interactive machine

USE yppasswd command to change password If you forget your password
See me in my office, I will reset your password (with ID) Call or e-mail me, I will reset your password to the original password
2/12/04
User environment
Default shell is bash

Change with ypchsh Customize user environment using startup files

.bash_profile (login session) .bashrc (non-login) export <variable>=<value> source <shell file>
Customize with statements like:
Much more information in man page
2/12/04
Program development tools
GCC 2.96

C C++ Java (gcj) Objective C Chill JDK Sun J2SDK 1.4.2

2/12/04
Java
Development tools (cont.)
Python

python = version 1.5.2 python2 = version 2.2.2 Version 5.6.1
Perl
Flex, Bison, gdb If your favorite tool is not available, well consider adding it!
2/12/04
Batch Queue System
OpenPBS

Server runs on master pbs_mom runs on worker nodes Scheduler runs on master Jobs can be submitted from any interactive node User commands

qsub submit a job for execution qstat determine status of job, queue, server qdel delete a job from the queue qalter modify attributes of a job
Single queue (workq)

2/12/04
PBS qsub
qsub used to submit jobs to PBS

A job is represented by a shell script Shell script can alter environment and proceed with execution Script may contain embedded PBS directives Script is responsible for starting parallel jobs (not PBS)
2/12/04
Hello World
[mcguigan@master pbs_examples]$ cat helloworld echo Hello World from $HOSTNAME [mcguigan@master pbs_examples]$ qsub helloworld 15795.master.cluster [mcguigan@master pbs_examples]$ ls helloworld helloworld.e15795 helloworld.o15795 [mcguigan@master pbs_examples]$ more helloworld.o15795 Hello World from node1.cluster
2/12/04
Hello World (continued)

Job ID is returned from qsub Default attributes allow job to run

1 Node 1 CPU 36:00:00 CPU time
standard out and standard error streams are returned
2/12/04
Hello World (continued)
Environment of job

Defaults to login shell (overide with #!) or S switch Login environment variable list with PBS additions:
PBS_O_HOST PBS_O_QUEUE PBS_O_WORKDIR PBS_ENVIRONMENT PBS_JOBID PBS_JOBNAME PBS_NODEFILE PBS_QUEUE
Additional environment variables may be transferred using -v switch

2/12/04
PBS Environment Variables

PBS_ENVIRONMENT PBS_JOBCOOKIE PBS_JOBID PBS_JOBNAME PBS_MOMPORT PBS_NODENUM PBS_O_HOME PBS_O_HOST PBS_O_LANG PBS_O_LOGNAME PBS_O_MAIL PBS_O_PATH PBS_O_QUEUE PBS_O_SHELL PBS_O_WORKDIR PBS_QUEUE=workq PBS_TASKNUM=1
2/12/04
qsub options
Output streams:

-e (error output path) -o (standard output path) -j (join error + output as either output or error) -m [aben] when to mail (abort, begin, end, none) -M who to mail -N (15 printable characters MAX first is alphabetical) -q [name] Unimportant for now -v pass specific variables -V pass all environment variables of qsub to job
Mail options

Name of job

Which queue to submit job to Environment variables Additional attributes -w specify dependencies
2/12/04
Qsub options (continued)
-l switch used to specify needed resources
Number of nodes
nodes = x ncpus = x cput=hh:mm:ss walltime=hh:mm:ss
Number of processors
CPU time
Walltime
See man page for pbs_resources

2/12/04
Hello World

qsub l nodes=1 l ncpus=1 l cput=36:00:00 N helloworld m a q workq helloworld Options can be included in script: #PBS -l nodes=1 #PBS -l ncpus=1 #PBS -m a #PBS -N helloworld2 #PBS -l cput=36:00:00 echo Hello World from $HOSTNAME
2/12/04
qstat

Used to determine status of jobs, queues, server $ qstat $ qstat <job id> Switches

-u <user> list jobs of user -f provides extra output -n provides nodes given to job -q status of the queue -i show idle jobs
2/12/04
qdel & qalter

qdel used to remove a job from a queue
qdel <job ID>
qalter used to alter attributes of currently queued job
qalter <job id> attributes (similar to qsub)
2/12/04
Processing on a worker node
All RAID storage visible to all nodes

/dataxy where x is raid ID, y is Volume (1-3) /gfsx where x is gfs volume (1-3) /scratch
Local storage on each worker node
Data intensive applications should copy input data (when possible) to /scratch for manipulation and copy results back to raid storage
2/12/04
Parallel Processing
MPI installed on interactive + worker nodes

MPICH 1.2.5 Path: /usr/local/mpich-1.2.5 -l nodes=x -l ncpus=2x
Asking for multiple processors

2/12/04
Parallel Processing (continued)

PBS node file created when job executes Available to job via $PBS_NODEFILE Used to start processes on remote nodes

mpirun rsh
2/12/04
Using node file (example job)

#!/bin/sh #PBS -m n #PBS -l nodes=3:ppn=2 #PBS -l walltime=00:30:00 #PBS -j oe #PBS -o helloworld.out #PBS -N helloword_mpi NN=`cat $PBS_NODEFILE | wc -l` echo "Processors received = "$NN echo "script running on host `hostname`" cd $PBS_O_WORKDIR echo echo "PBS NODE FILE" cat $PBS_NODEFILE echo /usr/local/mpich-1.2.5/bin/mpirun -machinefile $PBS_NODEFILE -np $NN ~/mpi-example/helloworld
2/12/04
MPI

Shared Memory vs. Message Passing MPI
C based library to allow programs to communicate Each cooperating execution is running the same program image Different images can do different computations based on notion of rank MPI primitives allow for construction of more sophisticated synchronization mechanisms (barrier, mutex)
2/12/04
helloworld.c
#include #include #include #include <stdio.h> <unistd.h> <string.h> "mpi.h"
int main( argc, argv ) int argc; char **argv; { int rank, size; char host[256]; int val; val = gethostname(host,255); if ( val != 0 ){ strcpy(host,"UNKNOWN"); } MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); printf( "Hello world from node %s: process %d of %d\n", host, rank, size ); MPI_Finalize(); 2/12/04 return 0;
Using MPI programs

Compiling
$ /usr/local/mpich-1.2.5/bin/mpicc helloworld.c $ /usr/local/mpich-1.2.5/bin/mpirun <options> \ helloworld Common options:
Executing

-np number of processes to create -machinefile list of nodes to run on
2/12/04
Resources for MPI

http://www-hep.uta.edu/~mcguigan/dpcc/mpi
MPI documentation
http://www-unix.mcs.anl.gov/mpi/indexold.html
Links to various tutorials
Parallel programming course
2/12/04

Distributed & Parallel Computing Cluster: Patrick Mcguigan

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Distributed & Parallel Computing Cluster: Patrick Mcguigan

Încărcat de

Drepturi de autor:

Formate disponibile

Distributed & Parallel Computing Cluster

NSF funded Major Research Instrumentation (MRI) grant Goals Personnel

PI Co-PIs Senior Personnel Systems Administrator

DPCC Research Areas

Data mining / KDD

High Energy Physics Dermatology/skin cancer

Image database, lesion detection and monitoring

PI Dr. Chakravarthy CO-PIs

Drs. Aslandogan, Das, Holder, Yu

DPCC Cluster as of 2/1/2004

Photos (scaled for presentation)

DPCC Resources (continued)

1 Gb/s network interconnections

core switch satellite switches

1 Gb/s SAN network UPS

node33 node34 node35

raid6 Foundry FastIron 800 raid8

node36 node37 node38 node39 node40

DPCC Resource Details

Dual Xeon processors

32 machines @ 2.4GHz 49 machines @ 2.6GHz

2 GB RAM IDE Storage

Redhat 7.3 Linux (2.4.20 kernel)

DPCC Resource Details (cont.)

Dual Xeon processors (2.4 GHz) 2 GB RAM 4 Raid Controllers

24 250GB disks 2 40GB disk NFS used to support worker nodes

DPCC Resource Details (cont.)

RAID5 Array 42 142GB FC disks FC Switch 3 GFS nodes

Two nodes available for interactive use

Using DPCC (continued)

Keep password safe master.dpcc.uta.edu grid.dpcc.uta.edu

Login to any interactive machine

USE yppasswd command to change password If you forget your password

Default shell is bash

Change with ypchsh Customize user environment using startup files

Customize with statements like:

Much more information in man page

Program development tools

C C++ Java (gcj) Objective C Chill JDK Sun J2SDK 1.4.2

Development tools (cont.)

python = version 1.5.2 python2 = version 2.2.2 Version 5.6.1

Batch Queue System

Single queue (workq)

qsub used to submit jobs to PBS

Hello World (continued)

Job ID is returned from qsub Default attributes allow job to run

1 Node 1 CPU 36:00:00 CPU time

standard out and standard error streams are returned

Hello World (continued)

PBS_O_HOST PBS_O_QUEUE PBS_O_WORKDIR PBS_ENVIRONMENT PBS_JOBID PBS_JOBNAME PBS_NODEFILE PBS_QUEUE

Additional environment variables may be transferred using -v switch

PBS Environment Variables

Qsub options (continued)

-l switch used to specify needed resources

nodes = x ncpus = x cput=hh:mm:ss walltime=hh:mm:ss

See man page for pbs_resources

qdel & qalter

qdel used to remove a job from a queue

qdel <job ID>

qalter used to alter attributes of currently queued job

qalter <job id> attributes (similar to qsub)

Processing on a worker node