/sys/net Adventures: 2013

Friday, October 11, 2013

Easy way to find your public IP in scripts or CLI

There is a lot of online tools that gives your public IP but most of them are either not accepting cli User-Agent or they require nasty parsing with AWK,grep,sed,etc

Best way to get your public IP address is to use ifconfig.me which only returns the IP; simple and efficient.

 $ curl http://ifconfig.me

The only problem is the responsiveness (few seconds).

Another cool way is to query the opendns servers :

 $ dig +short myip.opendns.com @resolver1.opendns.com

This method is pretty fast but requires the dig command (part of the bind-utils package).

Dell Firmware update fails with "mktemp: too many templates"

If you have the following error while updating a Dell server firmware (BIOS, RAID, etc) via Linux binary (*.BIN) :

 mktemp: too many templates

Then check the binary's filename for specials characters, in my case Chrome added a "(1)" at the end of the filename. Remove it, restart the update process and you're good to go !

Linux server sends SYNACK packet only after receiving 8 SYN

Got a really weird issue recently, in some rare case (mostly Mac and Mobile phone clients), the connection to a linux server was really really slow (about 12s).

The issue was not only impacting Apache but all TCP services like SSH, hence it was not a particular service issue/misconfiguration.

The Chrome console on a MacBook Pro showed that the initial connection took about 10s, on the other hand a Win7 client in the same LAN had no problem at all.

After some digging on the client and server side, I found out that the client needs to send 8 SYN packets before the server replies with a SYNACK which explain why the connexion is so slow. Once the SYNACK is send back to the client, the communication speed is back to normal.

One hour headache later, it turn out that I enabled some Sysctl TCP tunning values that somehow introduced the issue.

I disabled the net.ipv4.tcp_tw_recycle and net.ipv4.tcp_tw_reuse features and everything went back to normal.

I think the problem comes from the net.ipv4.tcp_tw_reuse option, but as the issue impacted a production service (and is really hard to reproduce) I didn't try to re-enable tcp_tw_recycle.

Some posts advice to disable window scaling, I strongly disencourage this as it would result in poor network performances.

Hope that helps !

Below the tcpdump output that shows the 8 client's SYN packets before the SYNACK is sent back. Test was performed on SSH service as you can see, the TCP handshake took 10 secondes.

 # SYN 1  
 15:57:26.303076 IP (tos 0x0, ttl 53, id 9488, offset 0, flags [DF], proto TCP (6), length 64)  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xdf5f (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835124724 ecr 0,sackOK,eol], length 0  
 # SYN 2  
 15:57:27.306416 IP (tos 0x0, ttl 53, id 37141, offset 0, flags [DF], proto TCP (6), length 64)  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xdb71 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835125730 ecr 0,sackOK,eol], length 0  
 15:57:28.315804 IP (tos 0x0, ttl 53, id 2415, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 3  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xd785 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835126734 ecr 0,sackOK,eol], length 0  
 15:57:29.330233 IP (tos 0x0, ttl 53, id 62758, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 4  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xd398 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835127739 ecr 0,sackOK,eol], length 0  
 15:57:30.335779 IP (tos 0x0, ttl 53, id 29003, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 5  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xcfa9 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835128746 ecr 0,sackOK,eol], length 0  
 15:57:31.345254 IP (tos 0x0, ttl 53, id 5246, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 6  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xcbba (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835129753 ecr 0,sackOK,eol], length 0  
 15:57:33.382242 IP (tos 0x0, ttl 53, id 5958, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 7  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xc3dc (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835131767 ecr 0,sackOK,eol], length 0  
 15:57:37.881881 IP (tos 0x0, ttl 53, id 21274, offset 0, flags [DF], proto TCP (6), length 48)  
 # SYN 8  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0x5c3d (correct), seq 2356956535, win 65535, options [mss 1460,sackOK,eol], length 0  
 15:57:37.881907 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 48)  
 # SYNACK (at last !!!)  
   server_ip.ssh > client_ip.49316: Flags [S.], cksum 0x7a12 (correct), seq 3228952474, ack 2356956536, win 14600, options [mss 1460,nop,nop,sackOK], length 0  
 15:57:37.885362 IP (tos 0x0, ttl 53, id 62772, offset 0, flags [DF], proto TCP (6), length 40)  
 # ACK  
   client_ip.49316 > server_ip.ssh: Flags [.], cksum 0xdfde (correct), seq 1, ack 1, win 65535, length 0

Wednesday, October 9, 2013

Juniper JunOS transfer on commit fails

I had quite a surprise when I discover that my transfer on commit stopped working on my SRX firewall.

The error in the logfile was :

 ACCT_XFER_FAILED: Error transferring /var/transfer/config/*

Not really explicit...

Turn out that /cf/var/ was full, it needs at least some free MB to work properly, that sound weird as the config file only requires several KB.

The configuration is indeed copied to /cf/var/transfer/config/ before being transferred over the network. If /cf/var/ is full then the configuration cannot be copied and the transfer process finds nothing to send hence the error message above.

If you have the same error, cleanup some old logfiles and maybe decrease the amount of data you're logging.

For any other errors, I recommend this post which explains all the other issues you may have with transfer on commit.

http://www.net-gyver.com/?p=655

Hope that helps !

Monday, September 2, 2013

Squirrel Mail crash with "PHP Fatal error: Call to undefined function sqimap_run_literal_command()"

I have encountered a bug on Squirrel Mail (version 1.4.8-21), the client's credentials are accepted but the page is then stucked on "webmail/src/redirect.php"

On the Apache side, I noticed the following PHP error :

  PHP Fatal error: Call to undefined function sqimap_run_literal_command() in /usr/share/squirrelmail/functions/imap_general.php on line 518

After some debugging, I found out that Squirrel Mail does NOT properly handle specials characters; In this case I had an accent in the user password.
I reseted the user's password and everything went back to normal.

Hope that helps !

Friday, August 30, 2013

Bash : Wait for a command with timeout

Here is a very useful little command that wait for a process to finish and kill it if doesn't exit after a pre defined timeout.

The command is called timeout and is part of the coreutils package witch is embedded by default in CentOS 6.

Usage is very simple, you just need to specify the timeout, the process to start and optionally the signal to send in case of failure.

Timeout returns the exit value of the process and if the timeout has been reached, the value 124 is returned.

Here are some examples on how to use timeout :

Successful process :

 $ timeout 10s ls /tmp  
 ....  
 ....  
 $ echo $?  
 0

Unsuccessful process :

 $ timeout 10s ls /blabla  
 ls: cannot access /blabla: No such file or directory  
 $ echo $?  
 2

Timed out process :

 $ timeout 5s sleep 10  
 $ echo $?  
 124

More details in the man page.

Hope that helps !

Tuesday, July 16, 2013

Cobbler reposync failed

I run daily cobbler reposync crons and it appears sometimes the process fails with the following error :

 Exception occured: <class 'cobbler.cexceptions.CX'>  
 Exception value: 'cobbler reposync failed'  
 Exception Info:  
  File "/usr/lib/python2.6/site-packages/cobbler/utils.py", line 126, in die  
   raise CX(msg)  
   
 Exception occured: <class 'cobbler.cexceptions.CX'>  
 Exception value: 'cobbler reposync failed'  
 Exception Info:  
  File "/usr/lib/python2.6/site-packages/cobbler/action_reposync.py", line 125, in run  
   self.sync(repo)  
   File "/usr/lib/python2.6/site-packages/cobbler/action_reposync.py", line 169, in sync  
   return self.yum_sync(repo)  
   File "/usr/lib/python2.6/site-packages/cobbler/action_reposync.py", line 402, in yum_sync  
   utils.die(self.logger,"cobbler reposync failed")  
   File "/usr/lib/python2.6/site-packages/cobbler/utils.py", line 134, in die  
   raise CX(msg)  
...
!!! TASK FAILED !!!

I don't really have an explanation for this but it seems that the reposync process doesn't really fail as if I run the reposync command manually the process goes fine.

The createrepo command line is shown when you execute "cobbler reposync", for example :

 hello, reposync  
 run, reposync, run!  
 creating: /var/www/cobbler/repo_mirror/Dell-CentOS5/.origin/Dell-CentOS5.repo  
 running: /usr/bin/reposync -l -m -d --config=/var/www/cobbler/repo_mirror/Dell-CentOS5/.origin/Dell-CentOS5.repo --repoid=Dell-CentOS5 --download_path=/var/www/cobbler/repo_mirror

In this case the command line is :

reposync -l -m -d --config=/var/www/cobbler/repo_mirror/Dell-CentOS5/.origin/Dell-CentOS5.repo --repoid=Dell-CentOS5 --download_path=/var/www/cobbler/repo_mirror

The process should go fine, however the return value $? is '1' which can explain why the cobbler commands fails.

Monday, July 15, 2013

Emulate bad or WAN network performances from a particular IP on a Gigabit LAN network

If you're developing Web or mobile applications, you'll certainly be confronted to poor network conditions.

The problem now is "how can I test my application under bad network conditions". Well you could rent a forein internet connection or use tools that reports performance from various remote countries however this is not a good debugging environment.

The solution is to use TC and NetEM on your front development server (typically Web or reverse proxy server), then use filters so only one client station (the debugging station) is impacted.
Don't forget to use filter otherwise all your clients will be impacted.

Below an example on how to emulate a network with :

1Mbps bandwidth
400ms delay
5% packet loss
1% Corrupted packet
1% Duplicate packet

The debugging client IP is 192.168.0.42 (i.e the IP impacted by the bad network performance);
The following commands need to be executed on the front developement server, please set the appropriate NIC for you environment (eth0 used below) :

 # Clean up rules  
   
 tc qdisc del dev eth0 root  
   
 # root htb init 1:  
   
 tc qdisc add dev eth0 handle 1: root htb  
   
 # Create class 1:42 with 1Mbps bandwidth  
   
 tc class add dev eth0 parent 1:1 classid 1:42 htb rate 1Mbps  
   
 # Set network degradations on class 1:42  
   
 tc qdisc add dev eth0 parent 1:42 handle 30: netem loss 5% delay 400ms duplicate 1% corrupt 1%  
   
 # Filter class 1:42 to 192.168.0.42 only (match destination IP)  
   
 tc filter add dev eth0 protocol ip prio 1 u32 match ip dst 192.168.0.42 flowid 1:42  
   
 # Filter class 1:42 to 192.168.0.42 only (match source IP)  
   
 tc filter add dev eth0 protocol ip prio 1 u32 match ip src 192.168.0.42 flowid 1:42

To check that the rules are properly set use the following commands :

 tc qdisc show dev eth0  
 tc class show dev eth0  
 tc filter show dev eth0

Once you're done with the testing, cleanup the rules with the command :

 tc qdisc del dev eth0 root

There is many other options you can use (correlation, distribution, packet reordering, etc), please check the documentation available at :

http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

If this setup fits your requirements, I advice you to create a shell script so you can start/stop the rules with custom values. Be aware that you can also make filters based on source/destination ports, etc.

If you have more complex requirements, you can try WANem, which is a live Linux Distribution with a graphical interface on top of NetEM. Please be aware that this requires route modifications on your client and server (or any other routing tricks).

http://wanem.sourceforge.net/
http://sourceforge.net/projects/wanem/files/Documents/WANemv11-Setup-Guide.pdf

I didn't had the opportunity to try it, please let me know if you have any feedback.

Monday, July 8, 2013

Dell DRAC Console/KVM with Chrome or Firefox

Here is a really simple trick to access to your DRAC remote console (i.e virtual KVM) with Chrome or firefox.
This trick has been tested with DRAC 5,6 and 7 only.

Requirement : You need to have a working JRE

Log in to your DRAC Web interface, go to "System -> Console Media"
Clic on "Launch Virtual Console"
The browser will ask you to open or save a file, save it on your Hard Drive
The downloaded file has the form "viewer.jnlp(x.x.x.x@x@idrac-xxxxxxx,+xxxxxxxxx,+User-xxxxx@xxxxxxxxx)"
Rename the file "viewer.jnlp" (i.e remove the garbage data after the extension)
Double clic on the file and you're done.

Really easy but so handy !

Hope that helps

Wednesday, May 22, 2013

omreport : failed to load external entity "/opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl"

If you're having the following error when executing omreport :

 I/O warning : failed to load external entity "/opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl"  
 error  
 xsltParseStylesheetFile : cannot parse /opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl  
 Error! XML Transformation failed

Then install srvadmin-omcommon package :

 # yum install srvadmin-omcommon

Tuesday, May 21, 2013

DRAC Firmware update failed : Error: 30001 Method httpCgiErrorPage()

Have tried to update an old DRAC4 Firmware from firmware 1.5 to 1.75 via Linux binary and came to an unplaisant surprise :

 Dell Remote Access Controller 4/P  
 The version of this Update Package is newer than the currently installed version.  
 Software application name: Dell Remote Access Controller 4/P Firmware  
 Package version: 1.75  
 Installed version: 1.50  
 Continue? Y/N:Y  
 Executing update...  
 WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.  
 THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!  
 ......................................................................................
 /tmp/duptmp.xml:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 /tmp/.dellSP-XmlResult12908-32487.M19124:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 unable to parse /tmp/.dellSP-XmlResult12908-32487.M19124  
 /tmp/.dellSP-XmlResult12908-32487.M19124:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 unable to parse /tmp/.dellSP-XmlResult12908-32487.M19124

Doesn't look good and of course if I try to access the DRAC via HTTPs, I've got a nice CGI error :

 Error: 30001 Method httpCgiErrorPage()

Looked on the web and somebody (who contacted Dell Support) advises to shutdown the server, unplug the DRAC card for a while and plug it in back... Well explain to your CTO that you need to shutdown a production server, unrack it, unplug a card just because a DRAC update failed o_O
Reference: http://lists.us.dell.com/pipermail/linux-poweredge/2008-January/034556.html

The solution that worked for me was to install the racadm Dell tool on my bastion and reset the firmware remotely.

First install racadm :

 # yum install srvadmin-racadm4.x86_64

Note : This is for DRAC4, didn't had the issue with newer DRAC.
Note 2 : You need to have the Dell OSMA repository installed on your server:
http://www.openfusion.net/linux/dell_omsa

Then run the following command :

 # racadm -rDRAC_IP -i racreset

Note : Change DRAC_IP with your DRAC IP.
Note 2 : This operation will NOT erase your DRAC configuration.

Wait a while, pray, and if you're lucky as me you should be back on line (with the original firmware version of course).

Final word, I stopped being lazy and updated the firmware via the Web GUI which is a long and annoying process. Of course I used Internet Explorer as I felt like Murphy's law was around this day ^^

Hope that helps !

Yum stuck/hangs at "Running Transaction Test"

If yum is stuck at the "Running Transaction Test" step, double check that you don't have a stalled network mount (NFS,SMB,etc) somewhere.

Umount it and retry your yum/rpm command.

More info on how to umount a stalled NFS share :
http://sysnet-adventures.blogspot.fr/2013/05/umount-stalledfrozen-nfs-mount-point.html

Umount a stalled/frozen NFS mount point

NFS is known to be a little nasty when it comes to umount stalled shares.

Most of the time a simple umount doesn't work, which is a bit frustrating specially when it comes to production servers; The process just hangs and there is no way to interrupt...

Below two procedures to umount stalled NFS shares. You should try method one before method two as it requires some network "hacks".

Method 1 :

Use a forced lazy umount, this method works 90% of the time :

 # umount -f -l /mnt/nfs_share
Note : Don't use bash auto-completion !!!

Method 2:

This method is to be used only if method one failed.

The trick is to temporarily steal the NFS server IP adress on the NFS client (the one with stalled mount) so this client thinks that the NFS server is still alive.

Warning : Use method 1 above if your NFS server is still reachable from the NFS client. Otherwise you'll have an IP conflit and trust me you really don't want that to happen.

Let's assume the NFS server IP is 192.168.0.1

Double check that the NFS server is down with ping or nmap.
If your NFS client has very restrictive IPTables rules shut them down temporarily
On the NFS client, set the NFS server IP as secondary address

 # ifconfig eth0:0 192.168.0.1  
 Note : Adjust interface to your own needs

Umount the NFS with a forced lzay umount

 # umount -f -l /mnt/nfs_share  
 Note : Don't use bash auto-completion !!!

Check that the NFS mount is gone

Remove secondary interface

 # ifconfig eth0:0 down  
 Note : Adjust interface to your own needs

Restart IPTables if needed
Be happy
Go to sleep, it's been a long day (or night)

If you have multiple NFS clients stalled, you can set the secondary IP on one client only.

Client 1 : Step 1 to 5

Client 2 to n : Step 4 and 5

Client 1 : Step 6 to 9

This will only work if your NFS client can communicate between each others (watch for IPTables or any other filtering softwares/devices).

Hope that helps ! (that helped me a lot :)

Monday, May 20, 2013

Remove absolute path from MD5 file

The following command will remove absolute paths from a md5 file :

  sed -i -r "s/ .*\/(.+)/  \1/g" file.md5

This is quite useful when you download backup files which have MD5 containing absolut paths.

For example :

 $ cat file.md5  
 8ee6e966f2cb7a84620ed137604e00c5 /data/prod/file

If you want to check this file on another server, you won't be able to do it unless you put the data file in the exact same directory (/data/prod/).

After running the above sed command, your md5 file will look like :

 $ cat file.md5  
 8ee6e966f2cb7a84620ed137604e00c5 file

You can then check your md5 print with "md5sum -c file.md5". You just need to have the MD5 file in the same directory as the data file.

Whitelist files with Clamav

Exclude files from scan :

Sometimes, you will need to whitelist files from a scan, clamscan offers the --exclude option but its usage is not really user-friendly...

Imagine that you need to exclude two files /data/rep/file1 and /data/rep2/file2, the command line would be :

 # clamscan -r -i --exclude=/data/rep/file1 --exclude=/data/rep2/file2

This is fine if you have few files to whitelist but it quickly becomes unreadable when you have dozen files and directories.

The solution is to input a file to clamscan with xargs. Create a text file containing all files/directories you need to whiltelist (one file/directory per line) :

 # cat /var/lib/clamav/whitelist-files.txt  
 /data/rep/file1  
 /data/rep2/file2

You can also add regexp like *.mp3 (be aware that this is quite dangerous)

Run clamscan with the following command :

 # sed -e 's/^/--exclude=/' /var/lib/clamav/whitelist-files.txt | xargs clamscan -r -i /directory_to_scan/

Don't forget to put double quotes or escape when you exclude paths with special characters (especially spaces).

Last but not least, always double check that the files you're whitelisting are completely safe. You can check that out with a meta AV engine like Jotti :
http://virusscan.jotti.org/en

Whitelist a virus signature :

To whitelist a virus a signature, you need to get the ClamAV signature definition, this is the code you have on the right side of the infected file line. For example :

 /data/file.flv: CVE_2012_0773-2 FOUND

In this case the signature definition is CVE_2012_0773-2, add it to /var/lib/clamav/whitelist-signatures.ign2

That's all ! Be very cautious when whitelisting Virus signatures.
Hope that helps !

Tuesday, May 7, 2013

Enable LDAP SSL/TLS user authentification in Zabbix

By default Zabbix Web interface doesn't offer the SSL/TLS encryption option for the LDAP connector however the feature is available in the PHP code.

If you need to enable the startTLS feature, you will have to edit the PHP file manually :

Edit /usr/share/zabbix/include/classes/class.cldap.php on Zabbix Web server :

 vi /usr/share/zabbix/include/classes/class.cldap.php

Search for the 'starttls' definition (line 44) and set the value to 'true' :

 'starttls' => true,

Save the file and you're good to go !

Hope that helps !

Monday, April 29, 2013

Shrew VPN Client + Juniper SRX : "session terminated by gateway" (Autodisconnect)

If like me, you're trying to connect to a Juniper dynamic VPN with Shrew VPN Client, be aware that this not yet possible.

The connection works but the tunnel is constantly disconnected after 60 seconds.

I asked the core developer "Matthiew Grooms" about this and after few debug, it seems like a fix is needed in Shrew's code:

"It's pretty clear whats going on but it won't be possible to fix without 
a rewrite of the modecfg code on the Shrew Soft VPN client, which is probably 
needed anyway."

Full technical details are available at :
https://lists.shrew.net/pipermail/vpn-help/2012-December/014091.html

If anybody found out an alternative solution please share !

Monday, April 22, 2013

Finding less used wireless channel on Linux

Want to find the less used wireless channel around you ? This little one liner will give a summary of all wireless channels with the number of SSID associated.

 echo "Nb SSID - Channel" ; iwlist scan 2>/dev/null | grep "Channel:" | cut -d':' -f2 | sort -n | uniq -c   
 Nb SSID - Channel  
       4 1  
       3 2  
       3 3  
       1 5  
       5 6  
       4 7  
       3 9  
       4 11  
       3 12

Right column is the Channel number and left column is the number of SSID found on this channel.

Of course, this is just a basic overview, for real deployments you should use a more sophisticated tool like inSSIDer.

Also an interesting post on how to choose the right channel :
http://www.dslreports.com/faq/14250

Set scale in (non interactive) scripts with the bc command

If you need to do some kind of math operation in shell scripts you might want to write :

 $ echo "9/2" | bc  
 4

By default bc truncate the result which is bit annonying...

Using bc -l option gives you a far too precise result :

 $ echo "9/2" | bc -l  
 4.50000000000000000000

Fortunately, bc comes with the "scale" option to set the scale to whatever presision you wish.

To do it non interactively, you need to specify the scale in the echo before the operation.
For example, to set a precision of 2 digits :

 $ echo -e "scale=2 \n 9/2" | bc  
 4.50

So the script code would look like :

 val=$(echo -e "scale=2 \n 9/2" | bc)

Friday, April 19, 2013

Replace end of lines (\n) by any character

Just found out a classic UNIX command I never used before.

The command "paste" allows you to replace end of lines (i.e \n) by any character of your choice.

I normally used sed for this purpose but the syntax is quite...dirty. For example to replace all end of lines by a space with sed, the command is :

 $ sed ':a;N;$!ba;s/\n/ /g' /path/to/file

With paste the syntax is much more human readable :

 $ paste -s /path/to/file

By default "paste" replaces end of lines with tabs, to specify a delimiter use the -d option.

Replace end of lines with spaces :

 $ paste -s -d' ' /path/to/file

Replace end of lines with commas :

 $ paste -s -d',' /path/to/file

Use '-' for reading from stdin :

 echo -e "a\nb\nc\nd" | paste -s -d',' -  
 a,b,c,d

More info in the man as always.

This tips will not change your life but i found it quite useful !

Hope that helps !

Thursday, April 4, 2013

Choosing RAID Level / Stripe Size

Below interesting articles on how to choose your RAID level / stripe size.

Good litterature on RAID :
http://www.fccps.cz/download/adv/frr/hdd/hdd.html

RAID Level Explained :
http://www.techrepublic.com/blog/datacenter/choose-a-raid-level-that-works-for-you/3237

RAID Stripe Explained :
http://www.anandtech.com/show/788/5

RAID Benchmarks :
https://raid.wiki.kernel.org/index.php/Performance

RAID Calculator :
http://www.z-a-recovery.com/art-raid-estimator.htm

In any case, always plan your workload type (Read/Writes, Sequential/Random, Large/Small File, Number of concurrent access).

Wednesday, April 3, 2013

Create large partitions on Linux / Bypass the 2TB partition Limit

The default partition schema (MBR based) limits partition to 2.2TB. With new hardrives this limit is easily reached.

In order to create partition bigger than 2.2TB you need to switch from MBR to GUID (GPT) partition table.
This can be done with the "parted" utility on Linux.

For exemple if you want to create a single big partition on /dev/sdb :

 # parted /dev/sdb  
 (parted) mklabel GPT  
 (parted) mkpart partition_name fstype 1 -1  
 (parted) print  
 Model: DELL PERC H700 (scsi)  
 Disk /dev/sdb: 4000GB  
 Sector size (logical/physical): 512B/512B  
 Partition Table: gpt  
 Number Start  End   Size  File system Name Flags  
  1   1049kB 4000GB 4000GB        data

Note : I found out that partition name and fstype are quite useless.

You can then format the partition with the filesystem of your choice or create a LVM PV.

More info on GUID / MBR Limits :
http://en.wikipedia.org/wiki/GUID_Partition_Table

Parted official website :
http://www.gnu.org/software/parted/

More parted exemples :
http://www.thegeekstuff.com/2011/09/parted-command-examples/

Hope that helps !

Thursday, March 28, 2013

Omreport doesn't update disk rebuild progress

Had to replace a hard drive on a Dell Server and omreport rebuild progress got stuck at 1%.

The solution is to restart the srvadmin service :

 # srvadmin-services.sh restart

This is quite dirty but it's the only solution I found. This also happened when I changed a PERC H700 battery.

Another way to check the rebuild process is to export log with omconfig :

 # omconfig storage controller action=exportlog controller=0

This creates a /var/log/lsi_MMDD.log file, with the rebuild progress :

 03/09/13 22:07:51: EVT#13296-03/09/13 22:07:51: 99=Rebuild complete on VD 01/1  
 03/09/13 22:07:51: EVT#13297-03/09/13 22:07:51: 100=Rebuild complete on PD 05(e0x20/s5)  
 03/09/13 22:07:51: EVT#13298-03/09/13 22:07:51: 114=State change on PD 05(e0x20/s5) from REBUILD(14) to ONLINE(18)  
 03/09/13 22:07:51: EVT#13299-03/09/13 22:07:51: 81=State change on VD 01/1 from DEGRADED(2) to OPTIMAL(3)  
 03/09/13 22:07:51: EVT#13300-03/09/13 22:07:51: 249=VD 01/1 is now OPTIMAL

Same thing for the battery learn cycle.

Hope that helps !

Tuesday, March 26, 2013

Dell Openmanage/Omreport failed after updating to CentOS 6.4

After updating a testing machine from CentOS 6.3 to 6.4, the Dell OpenManage tools stopped working AT ALL.
It seems that with the lastest CentOS kernel (2.6.32-358.2.1.el6.x86_64), they moved away some IPMI drivers from kernel modules to "built-in"

The result is :

 # omreport chassis  
 Health   
 # srvadmin-services.sh start  
 Starting Systems Management Device Drivers:  
 Starting dell_rbu:                     [ OK ]  
 Starting ipmi driver:                   [FAILED]  
 Starting Systems Management Device Drivers:  
 Starting dell_rbu: Already started             [ OK ]  
 Starting ipmi driver:                   [FAILED]  
 Starting DSM SA Shared Services:              [ OK ]  
 /var/log/messages reports :   
 instsvcdrv: /etc/rc.d/init.d//dsm_sa_ipmi start command failed with status 1

Solution :

 # yum install OpenIPMI

Note : There is no need to start or chkconfig the service.

You can check that the IPMI components are seen with the following command :

 # service ipmi status  
 ipmi_msghandler module in kernel.  
 ipmi_si module in kernel.  
 ipmi_devintf module loaded.  
 /dev/ipmi0 exists.

Then start Openmanager services :

 # srvadmin-services.sh start