Showing posts with label new node. Show all posts
Showing posts with label new node. Show all posts

24 August 2013

501. Briefly: Adding a new node to SGE

I've done this a couple of times by now, and I always forget one step or another. Most of the information is on http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html but here it is in a briefer form:

In the example I've used krypton as the node name, and 192.168.1.180 as the IP.
My front node is called beryllium and has an IP of 192.168.1.1.

0. On the front node
Add the new node name to the front node/queue master

Add execution host
qconf -ae 

which opens a text file in vim

Edited hostname (krypton) but nothing else. Saving returns
added host krypton to exec host list
Add krypton as a submit host
qconf -as krypton
krypton added to submit host list
Doing this before touching the node makes life a little bit easier.

1. Edit /etc/hosts on the node
Leave
127.0.0.1 localhost
but remove
127.0.1.1 krypton
and make sure that it says
192.168.1.180 krypton
instead.

Throw in
192.168.1.1 beryllium
as well.

2. Install SGE on node
sudo apt-get install gridengine-exec gridengine-client

You'll be asked about
Configure automatically: yes Cell name: rupert Master hostname: beryllium
3. Add node to queue and group
I maintain separate queues and groups depending on how many cores each node has. See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create queues and groups.

If they already exits, just do

qconf -aattr hostgroup hostlist krypton @fourcores
qconf -aattr queue slots "[krypton=4]" fourcores.q

to add the new node.

4. Add pe to queue if necessary
Since I have different queues depending on the number of cores of a node, I tend to have to fiddle with this.

See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create pe:s.

If the pe you need is already created, you can do
qconf -mq fourcores.q

and edit pe_list

5. Check
On the front node, do
qhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - beryllium lx26-amd64 3 0.16 7.8G 5.3G 14.9G 398.2M boron lx26-amd64 6 6.02 7.6G 1.6G 14.9G 0.0 helium lx26-amd64 2 - 2.0G - 1.9G - lithium lx26-amd64 3 - 3.9G - 0.0 - neon lx26-amd64 8 8.01 31.4G 1.3G 59.6G 0.0 krypton lx26-amd64 4 4.01 15.6G 2.8G 14.9G 0.0