关于集群作业管理系统 Maui,PBS和Torque

Maui集群调度器是Moab集群套件前身,是一个开放源码的集群和超级计算机作业调度器(scheduler)。 PBS是一个用于作业分配的调度器(scheduler),其主要任务是分配批作业计算任务到现有的计算资源上。 以下版本的PBS系统目前可用: OpenPBS:没有技术支持的原始开源版本; PBSPro(PBS专业版):由Altair Engineering发行和支持的商业版本; Torque:衍生的OpenPBS,由Cluster Resources Inc.发展,支持和维护

2009年6月18日星期四

Maui作业管理系统的安装

在Cluster中,作业管理系统是很重要的一个部分。好的作业管理系统,能够公平、合理的分配计算资源,杜绝资源浪费。
在小型的Cluster中,人们一般用Torque PBS作为作业管理系统,它本身自带一具管理工具:pbs_sched,它能够根据fifo的原则安排作业,对一般的集群管理应该是足够了。但如果你的集群有几十个节点,分成若干个队列,则pbs_sched就力不从心了。
为此,Torque推出了一个免费的管理软件maui,它能够实现多个队列、多个用户的作业管理,允许管理人员建立各种作业排队的规则,是一款很好的小型Cluster作业管理软件。
下面是它的安装简介,前提是先安装调试好Torque PBS后,用maui替代pbs_sched,希望能对Cluster管理者有所帮助。
1. Install Maui on server node. Enable the scheduler.

$$ /home/tgz/torque/maui-3.2.6p21/configure --with-pbs=/usr/local
$$ make
$$ make install

2. Modify the maui.d daemon.

$$ cp /home/tgztorque/maui-3.2.6p21/etc/maui.d /etc/init.d/
$$ vi /etc/init.d/maui.d

Modify,
"
MAUI_PREFIX=/usr/local/maui
"

3. Configure Maui.(http://www.clusterresources.com/products/maui/docs/a.fparameters.shtml)

$$ vi /usr/local/maui/maui.cfg

Modify,
"
# maui.cfg 3.2.6p20
SERVERHOST server
# primary admin must be first in list
ADMIN1 root
# Resource Manager Definition
#RMCFG[SERVER] [email=TYPE=PBS@RMNMHOST]TYPE=PBS@RMNMHOST[/email]@
RMCFG[0] TYPE=PBS HOST=server
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
#FSPOLICY PSDEDICATED
#FSDEPTH 7
#FSINTERVAL 86400
#FSDECAY 0.80
# Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
#NODEALLOCATIONPOLICY MINRESOURCE
#NODEALLOCATIONPOLICY CPULOAD
NODEALLOCATIONPOLICY FIRSTAVAILABLE
# QOS: http://supercluster.org/mauidocs/7.3qos.html
# QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
# Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test] 17:00:00
# SRDAYS[test] MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test] 0:30:00
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
# USERCFG[DEFAULT] FSTARGET=25.0
# USERCFG[john] PRIORITY=100 FSTARGET=10.0-
# GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch] FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
DEFERTIME 00:05:00
JOBMAXOVERRUN 00:05:00
ENABLEMULTIREQJOBS TRUE
ENABLEMUITINODEJOBS TRUE
JOBNODEMATCHPOLICY EXACTNODE
NODEACCESSPOLICY SHARED
USERCFG[DEFAULT] MAXJOB=2 MAXNODE=3
"

4. Start Maui daemon.

$$ /etc/init.d/maui.d start
$$ chkconfig --add maui.d
$$ chkconfig --level 3456 maui.d on "
$$ chkconfig --list maui.d

5. Configure Torque to support Maui.
$$ qmgr
"
Max open servers: 4
Qmgr: p s
#
# Create queues and set their attributes.
#
#
# Create and define queue buyx
#
create queue buyx
set queue buyx queue_type = Execution
set queue buyx Priority = 20
set queue buyx max_running = 128
set queue buyx resources_max.cput = 14400:00:00
set queue buyx resources_max.nice = 20
set queue buyx resources_max.nodect = 128
set queue buyx resources_max.walltime = 14400:00:00
set queue buyx resources_min.nice = 16
set queue buyx resources_min.nodect = 1
set queue buyx resources_default.cput = 14400:00:00
set queue buyx resources_default.nice = 16
set queue buyx acl_group_enable = True
set queue buyx acl_groups = gauss
set queue buyx max_user_run = 5
set queue buyx enabled = True
set queue buyx started = True
#
# Set server attributes.
#
set server scheduling = True
set server max_user_run = 5
set server acl_host_enable = True
set server acl_hosts = server
set server managers = [email=root@*.server]root@*.server[/email]
set server managers += [email=root@server]root@server[/email]
set server operators = [email=root@*.server]root@*.server[/email]
set server operators += [email=root@server]root@server[/email]
set server default_queue = buyx
set server log_events = 511
set server mail_from = root
set server query_other_jobs = True
set server resources_default.cput = 14400:00:00
set server resources_default.neednodes = 1
set server resources_default.nodect = 1
set server resources_default.nodes = 1
set server scheduler_iteration = 10
set server node_check_rate = 150
set server tcp_timeout = 6
set server default_node = 1
set server mom_job_sync = True
set server keep_completed = 0
set server submit_hosts = server
set server next_job_number = 8
本文来源于分子模拟论坛 http://www.mdbbs.org/,原文链接:http://www.mdbbs.org/thread-13974-1-1.html

没有评论: