Archive for ‘cluster’ Category
Browse:
cluster »
Subcategories:

I ran into this a couple of weeks ago and it’s been driving me bonkers. I finally figured out what’s wrong. I was just trying to get my feet wet using the Sun Grid Engine and figured I follow their instruction page and try out the example shell script and submit it using “qsub” command. I was doing this on the frontend machine that’s been configured properly as a ROCKS cluster frontend. This was not working and the error I kept getting was “Unable to run job: denied: host “name_of_computer” is no submit host. Exiting.”

After googling around for a couple of days I found the answer (atleast the answer in my case). Issuing the following command solved my problem:

qconf -as frontend-name

Apparently the SGE roll does not setup the frontend node as a “submit host” during install. After this (the above command) everything seems to work properly. Now I can do “qstat -f” and “qsub”.

Notes on installing Rocks Cluster Software…..

datePosted on 16:37, January 27th, 2009 by Many Ayromlou

– Front-end machine is a Dell 2950 with 2 x 1 GigE Broadcom ports onboard and one myricom 10GigE card.
– Broadcom port #2 is disabled in BIOS
– Broadcom port #1 is enabled and configured for external network (Internet)
– Myricom 10 GigE card is hooked into a Foundry 8x10GigE switch that uplinks into our private Class B network. We own a portion of this network aa.bb.cc.130-190. Netmask is 255.255.255.192. The Foundry also downlinks to a Force10 48port Gig switch (with optional 10GigE port). This is where the cluster compute nodes are connected at 1 GigE (soon to be bonded 4GigE).
– Rocks install on front-end machine likes to bringup broadcom #1 port (External) as eth0 and myri is not installed by default. So when prompted during install I configure it as per normal (eth0 for inside and eth1 for external) as far as IP addresses are concerned. This will keep the config files sane!!!
– Then I have to install the myri10g device driver from their site
– Now eth0 and 1 are backwards. Rocks wants eth0 to be private and eth1 to be public. To swap them we have to tell the kernel to swap the devices via udev rules. Edit /etc/udev/rules.d/11-local.rules and insert the following line inside:
KERNEL=="eth*",SYSFS{address}=="00:60:dd:47:75:a6",NAME="eth0"
This will force nic with mac address 00:60:dd:47:75:a6 to come up as eth0. Now we also have to change ifcfg-eth0 and ifcfg-eth1 files in /etc/sysconfig/network-scripts to make sure the right IP goes with the right interface/MAC address.
– Lastly we have to add “modprobe myri10ge” and “route add -net aa.bb.cc.0/26 gw aa.bb.cc.129 dev eth0” to /etc/rc.d/rc.local to shoehorn the driver and the route.

This should bring up a sane frontend machine.

– Before doing insert-ethers on the frontend, we have to edit /opt/rocks/lib/python2.4/site-packages/rocks/commands/sync/dns/plugin_dns.py since we have a portion of a larger subnet as our private address space. The Python file assumes a private class C address/mask which is not the case for me. We have to make the small change to make the file look like this (Thanks to Scott Hamilton for his post):

def reverseIP(self, addr, mask):
"Reverses the elements of a dot-decimal address."

if type(addr) != types.ListType:
addr = string.split(addr,".")

addr.reverse()

clip = mask/8
if (mask % 8):
clip += 1
# I added this section to fix a bug that breaks the dns configuration when
# isntalling on subnets smaller than 255.255.255.0
if (clip == 4):
clip = 3
# Only show the host portion of the address.
addr = addr[:-clip]
reversed = addr[0] for i in addr[1:]:
reversed = "%s.%s" % (reversed, i)

return reversed

– This gets insert-ethers going but there is still the problem of being able to tell the program that you don’t want to start at 190 (which is the end of my address space) and count down whenever there is a new compute node online. I want to start at 180 and count down (180-190 space I want to reserve for admin stuff for the Xserve raids). So the command to issue is:
insert-ethers --baseip=aa.bb.cc.180
– Now I can power the computer nodes which have four interfaces (2 broadcom onboard plus 2 extra intel gige cards) each, making sure that broadcom 1 port is hooked up to the switch on all the machines. This is default PXE port for the Dell 1950 III’s. If everything is groovy insert-ethers will detect the machine and hand it aa.bb.cc.179 as IP address.
– At this point once the install is done on compute-0-0 (first machine you turn on) you can check /etc/dhcpd.conf on the frontend and notice that all the interface instances have the same IP. This is something we have to change once we bond the interfaces (maybe not….not sure yet).
– If during insert-ethers on frontend something screws up you can get a listing using “rocks list host” or “rocks list host interface”. Once you find the offending node you can “rocks remove compute-0-0” for example, followed by “rocks sync config” and “rocks sync dns”.
– I initially ran into a problem where Ganglia would not update the nodes info. This I think was caused because ganglia uses multicast to pass info between clients (compute nodes) and server (frontend machine). I changed the /etc/gmond.conf file on the compute nodes to be as follows (only portion shown here):

/* UDP Channels for Send and Recv */

udp_recv_channel {
port = 8649
}

udp_send_channel {
host = aa.bb.cc.130
port = 8649
}

This way the listening portion of ganglia can communicate with itself on port 8649 on each of the compute nodes and the collected stats can then be sent to aa.bb.cc.130 which is my frontend machine. Similarly on the frontend machine I modified /etc/gmond.conf to look like:

/* UDP Channels for Send and Recv */

udp_recv_channel {
/* mcast_join = 236.149.78.5 */
port = 8649
}

udp_send_channel {
/* mcast_join = 236.149.78.5 */
host = aa.bb.cc.130
port = 8649
}

Note the commented multicast address which is not in use anymore. This way all the clients (compute nodes) send their info to the server (frontend), who’s listening on port 8649. The Server itself also sends it’s own information to it’s own IP address (snakke eating it’s own tail kinda thing). Once this is done I do a “/etc/init.d/gmond restart” on all the machines (compute nodes and frontend). Now the website for ganglia should be happy and full of info about the nodes.

More later…..