Quantcast
Channel: CodeSection,代码区,Linux操作系统:Ubuntu_Centos_Debian - CodeSec
Viewing all articles
Browse latest Browse all 11063

[原]Slurm提交MPI作业

$
0
0
Slurm提交MPI作业 首先准备一个MPI程序,这里使用python语言的mpi4py库写了一个 helloworld.py #!/usr/bin/env python """ Parallel Hello World """ from mpi4py import MPI import sys import time size = MPI.COMM_WORLD.Get_size() rank = MPI.COMM_WORLD.Get_rank() name = MPI.Get_processor_name() sys.stdout.write("Hello, World! I am process %d of %d on %s.\n" % (rank, size, name)) time.sleep(300) Slurm提交作业脚本 helloworld.sh #!/bin/sh #SBATCH -o /apps/mpi/myjob.out #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 mpirun python /apps/mpi/helloworld.py Slurm提交MPI作业 $ sbatch helloworld.sh 查看MPI作业信息 查看MPI作业状态 $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 40 control hellowor jhadmin R 3:06 2 centos6x[1-2] 查看MPI作业详细信息 $ scontrol show jobs JobId=40 JobName=helloworld.sh UserId=jhadmin(500) GroupId=jhadmin(500) MCS_label=N/A Priority=4294901724 Nice=0 Account=(null) QOS=(null) JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:05:01 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2016-09-12T04:27:00 EligibleTime=2016-09-12T04:27:00 StartTime=2016-09-12T04:27:00 EndTime=2016-09-12T04:32:01 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=control AllocNode:Sid=centos6x1:2239 ReqNodeList=(null) ExcNodeList=(null) NodeList=centos6x[1-2] BatchHost=centos6x1 NumNodes=2 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=4,node=2 Socks/Node=* NtasksPerN:B:S:C=2:0:*:* CoreSpec=* MinCPUsNode=2 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/apps/mpi/helloworld.sh WorkDir=/apps/mpi StdErr=/apps/mpi/myjob.out StdIn=/dev/null StdOut=/apps/mpi/myjob.out Power= MPI输出信息 $ cat /apps/mpi/myjob.out srun: cluster configuration lacks support for cpu binding Hello, World! I am process 0 of 4 on centos6x1. Hello, World! I am process 1 of 4 on centos6x1. Hello, World! I am process 2 of 4 on centos6x2. Hello, World! I am process 3 of 4 on centos6x2. 作业进程信息 centos6x1 pstree -apl 6290 slurmstepd,6290 ├─slurm_script,6294 /tmp/slurmd/job00040/slurm_script │ └─mpirun,6295 python /apps/mpi/helloworld.py │ ├─python,6306 /apps/mpi/helloworld.py │ │ └─{python},6309 │ ├─python,6307 /apps/mpi/helloworld.py │ │ └─{python},6308 │ ├─srun,6297 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid37944 │ │ ├─srun,6300 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid37944 │ │ ├─{srun},6301 │ │ ├─{srun},6302 │ │ └─{srun},6303 │ └─{mpirun},6296 ├─{slurmstepd},6292 └─{slurmstepd},6293 centos6x2 pstree -apl 4655 slurmstepd,4655 ├─orted,4660 -mca orte_ess_jobid 3794403328 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri"3794403 │ ├─python,4663 /apps/mpi/helloworld.py │ │ └─{python},4665 │ └─python,4664 /apps/mpi/helloworld.py │ └─{python},4666 ├─{slurmstepd},4657 ├─{slurmstepd},4658 └─{slurmstepd},4659 另一种方式提交MPI作业 $ salloc -n 8 mpiexec python /apps/mpi/helloworld.py ... Hello, World! I am process 1 of 8 on centos6x1. Hello, World! I am process 0 of 8 on centos6x1. Hello, World! I am process 3 of 8 on centos6x1. Hello, World! I am process 2 of 8 on centos6x1. Hello, World! I am process 4 of 8 on centos6x2. Hello, World! I am process 6 of 8 on centos6x2. Hello, World! I am process 7 of 8 on centos6x2. Hello, World! I am process 5 of 8 on centos6x2. 作业进程信息 centos6x1 $ pstree -apl 8212 salloc,8212 -n 8 mpiexec python /apps/mpi/helloworld.py ├─mpiexec,8216 python /apps/mpi/helloworld.py │ ├─python,8227 /apps/mpi/helloworld.py │ │ └─{python},8231 │ ├─python,8228 /apps/mpi/helloworld.py │ │ └─{python},8232 │ ├─python,8229 /apps/mpi/helloworld.py │ │ └─{python},8233 │ ├─python,8230 /apps/mpi/helloworld.py │ │ └─{python},8234 │ ├─srun,8218 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid36682 │ │ ├─srun,8221 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid36682 │ │ ├─{srun},8222 │ │ ├─{srun},8223 │ │ └─{srun},8224 │ └─{mpiexec},8217 └─{salloc},8213 centos6x2

$ pstree -apl 6356 slurmstepd,6356 ├─orted,6369 -mca orte_ess_jobid 3668246528 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri"3668246 │ ├─python,6372 /apps/mpi/helloworld.py │ │ └─{python},6376 │ ├─python,6373 /apps/mpi/helloworld.py │ │ └─{python},6378 │ ├─python,6374 /apps/mpi/helloworld.py │ │ └─{python},6377 │ └─python,6375 /apps/mpi/helloworld.py │ └─{python},6379 ├─{slurmstepd},6366 ├─{slurmstepd},6367 └─{slurmstepd},6368


Viewing all articles
Browse latest Browse all 11063

Trending Articles