What has changed?
The linux servers in the Statistics department were historically managed as independent units, which does not allow for easy load balancing of work; some nodes end up with a large amount of processes, while others are idle. To help address this issue, we implemented a queuing system.
What is SLURM
Our clusters consists of many compute nodes, but at the same time have many users submitting many jobs. So, we need a mechanism to distribute the jobs across the nodes in a reasonable fashion and and SLURM is the one we are using now. Slurm (Simple Linux Utility for Resource Management) is a highly configurable open source workload and resource manager designed for Linux clusters of all sizes. Its key features are:
- extensive scheduling options including advanced reservations,
- suspend/resume for supporting binaries,
- scheduler backfill,
- fair-share scheduling, and
- preemptive scheduling for critical jobs.
Slurm provides similar function as torque. But, the some commands are different on Slurm and Torque. For example, to see a list of all jobs on the cluster, using Moab/Torque, one would issue just the qstat command whereas the Slurm equivalent would be the squeue command:
How to access the SLURM queue for Stat
Login to the machine pronto.las.iastate.edu from any SSH client. Here is a tutorial for SSH Terminal Access
How to submit jobs ?
A basic job submission workflow can be found at http://www.brightcomputing.com/Blog/bid/174099/Slurm-101-Basic-Slurm-Usage-for-Linux-Clusters.
You can submit your jobs in either PBS or slurm format to queue a job for execution via SLURM. You can find some example submission scripts on Research IT's SLURM basics page.
Some useful commands
- srun myscript: This command is used to submit the job. Your jobs will be scheduled for queues on the basis of resources requested.
- squeue: It gives the status of all the ques and the current queue structure
A helpful comparison cheat sheet is available at http://www.schedmd.com/slurmdocs/rosetta.pdf.
The queue structure has been created to be as flexible as possible given the hardware available, and type of jobs typically run in the department. The queue structure should allow small jobs to always get through the queue in a short period of time, without waiting behind larger jobs. It also allows the flexibility for large jobs of unknown length to run without being interrupted by an arbitrary queue wall.
File Storage on STAT cluster
Each user has a home directory, i.e. /home/<username> which is mounted from a central storage server called 'shome'. This server is used to hold the common home directories for the STAT cluster. Your home directory can be accessed by all of the compute nodes in the cluster (linux*, impact*, thirteen*, etc.). You home directory is where you should put your job submission scripts, programs, and input/output files. This directory is not backed up. When your job is done running, you should copy the results you want to save to your work folder or another location that is secure and backed up.
Within your home folder, there is a folder called 'work'. This folder lives on the EMC Isilon (aka MyFiles), and is within the STAT department folder. This folder is backed up and is served by redundant servers. You should not run jobs from this folder, or reference files in this folder as input or output to your jobs. This folder should be used as somewhere to store your raw data, and final results. You can copy between this folder and your home directory while on smaster before submitting your job.
Available Software Modules
To find out available software modules on STAT cluster, you can use the command "module spider" to find the right module. More detail on how to use "module" command can be found from https://researchit.las.iastate.edu/research-it-software-archive and https://researchit.las.iastate.edu/spack-based-software-modules.