Tuesday, December 28, 2010

Executing cronjobs on specific days in a month

You can use this post as a reference to schedule the Cronjobs in more Granular fashion.

We had a requirement to execute couple of scripts on specific Saturdays of a month throughout the year  i.e., First script (say /opt/firstscript.sh) should be executed on all Saturdays EXCEPT first Saturday of the month, and the Second script (say /opt/secondscript.sh) should be executed ONLY on first Saturday of the month. This we required to do for scheduling Weekly Tape backups.

Executing a script on all Saturdays is fairly simple, we need to just specify number 6 on 5th column on crontab entry, but here the scenario is different.
The solution for this is shown below. This uses ‘date’ command to check if the current date is a first Saturday of a month.

0  2  *  *  6     [ $(date +%d) -gt 7 ]  && /opt/firstscript.sh               
# Executes at 2am on all Saturdays Except the first Saturday of the month (Saturdays other than 1st  can fall only after 7th ).

0  22  *  *  6   [ $(date +%d) -le 7 ]  && /opt/secondscript.sh        
# Executed at 10 PM ONLY on first Saturday of the month (A first Saturday can either be 7th or less than that ).

Thursday, December 23, 2010

Checking NFS and SAMBA shares


How to check the NFS shares available from a NFS Client ?
#  showmount -e <NFS Server IP>

[root@hostxyz /]# showmount -e nfsserver.domain.com
Export list for nfsserver.domain.com:
/VOL0           stoash01-d3,staat01-d3,rdbash01,,, , 
/VOL1           (everyone)
/ORAAPPS        (everyone)
/ret_vol3       (everyone)
/nas_server_bkp (everyone)
/testnfs        (everyone)
[root@hostxyz /]#

From NFS server, how to find the list of NFS clients accessing the NFS share currently ?

Execute the following script:
for i in `cat /var/lib/nfs/rmtab | awk -F":" '{print $1}' | sort -u`; do nslookup $i | tail -2 | awk '{print $4}'; done


How to find the list of Samba shares available from a Windows Server?

# smbclient -L Samba server IP –Uusername

[root@hostxyz /]# smbclient -L windows.tcprod.com -Uadevaraju
Domain=[TCPROD] OS=[Windows Server 2003 R2 3790 Service Pack 2] Server=[Windows Server 2003 R2 5.2]

        Sharename       Type      Comment
        ---------       ----      -------
        IPC$            IPC       Remote IPC
        sqldb_htk       Disk
        webash01-dev    Disk
        C$              Disk      Default share
        VOL3            Disk
        testsmb         Disk
        SMBSYSENG       Disk
        HPUniver.3      Printer   HP Universal Printing PS
        HPUniver.2      Printer   HP Universal Printing PCL 5
        ADMIN$          Disk      Remote Admin
        HPUniver        Printer   HP Universal Printing PCL 6
        print$          Disk      Printer Drivers
        F$              Disk      Default share
        nas_server_bkp  Disk
        SMBVOL1         Disk
        windowsprodbackup Disk
        SMBVOL0         Disk
        Report_server_bkp Disk
        E$              Disk      Default share
        exchange_db     Disk
        sysengdata      Disk

How to mount a Windows Share on to a Linux mount point:

Put this entry in /etc/fstab and do 'mount -a'
//Windows server IP/Sambashare  /Linux_mount cifs username=XXX,password=XXXX 0 0

//Windows_server/sambashare      /linux_mnt cifs username=ashok,password=XXXX  0 0

Sudo access to a specific command set

Lets say we have a requirement to give sudo access only to a particular command set.  Let’s take couple of scenarios like this:

1.   We want to give privilege to DBA team to mount/umount ONLY a particular filesystem ( /oracle_data) but  we don’t want them to mount/umount other filesystem.
2.   We want to give privilege to NOC team to start/stop ONLY the httpd service but we don’t want them to start/stop other services.

The syntax in /etc/sudoers file should be as follows:

%dbateam          ALL=(ALL) NOPASSWD:  /bin/mount /oracle_data, /bin/umount /oracle_data

%nocteam          ALL=(ALL) NOPASSWD:  /sbin/service httpd start, /sbin/service httpd stop, /sbin/service httpd status

Having set like this, the respective team members can execute the commands as follows:

# sudo /bin/mount /oracle_data      # Works
# sudo /bin/umount /oracle_data    # Works

# sudo /bin/mount /other_filesystem     # This will fail

# sudo /sbin/service httpd start       #  Works
# sudo /sbin/service httpd stop       #  Works
# sudo /sbin/service httpd restart   #   Fails. Since restart is not specified

# sudo /sbin/service nfs start           #  This will fail

Wednesday, December 22, 2010

Finding total size of files owned by a particular user


# find <pathname> -user <username> -ls | awk '{sum += $7} END {printf "Total size: %8.4f MB\n", sum/1024/1024}'

Foldername to search: /nasllm-ih
Username:  rram

[root@hostxyz ~]# find /nasllm-ih -user rram -ls | awk '{sum += $7} END {printf "Total size: %8.4f MB\n", sum/1024/1024}'                   
Total size: 958.7282 MB
[root@hostxyz ~]#


Here I have created a folder /root/ashok consists files owned by user “adevaraju” of total size: 160 MB under its various sub-directories.

[root@hostxyz ~]# ls -lRh /root/ashok
total 21M
drwxr-xr-x 3 adevaraju root 4.0K Dec 14 06:30 d1
drwxr-xr-x 2 adevaraju root 4.0K Dec 14 06:35 d2
-rw-r--r-- 1 adevaraju root  10M Dec 14 06:21 file1
-rw-r--r-- 1 adevaraju root  10M Dec 14 06:22 file4

total 21M
drwxr-xr-x 2 adevaraju root 4.0K Dec 14 06:35 dx
-rw-r--r-- 1 adevaraju root  10M Dec 14 06:22 file2
-rw-r--r-- 1 adevaraju root  10M Dec 14 06:22 file3

total 21M
-rw-r--r-- 1 adevaraju root 10M Dec 14 06:30 file5
-rw-r--r-- 1 adevaraju root 10M Dec 14 06:30 file6

total 101M
-rw-r--r-- 1 adevaraju root 100M Dec 14 06:30 file7
[root@oralsb11-new ~]#
 [root@oralsb11-new ~]# du -sh ashok/
161M    ashok/

[root@hostxyz ~]# find /root/ashok -user adevaraju -ls | awk '{sum += $7} END {printf "SUM: %8.4f MB\n", sum/1024/1024}'
SUM: 160.0156 MB
[root@hostxyz ~]#

PS: By replacing the search folder to '/', we can find the total size of files owned by a specified user in the entire system.

Monday, November 15, 2010

Port numbers in Linux

Ports in computer networking is an application-specific or process-specific software construct serving as a communications endpoint. It is used by Transport Layer protocols of the Internet Protocol Suite, such as Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). A specific port is identified by its number, commonly known as the port number, the IP address with which it is associated, and the protocol used for communication.
The Internet Assigned Numbers Authority (IANA) is responsible for maintaining the official assignments of port numbers for specific uses. However, many unofficial uses of both well-known and registered port numbers occur in practice.

The port numbers are divided into three ranges:

Well-Known/Standard Ports   (Range: 0 to 1023 )
Used by system processes that provide widely-used types of network services such as SSH, Telnet, SMTP, FTP etc

Registered Ports, and    (Range:  1024 to 49151 )
Used by specific service upon applications such as Oracle database listener (1521), MySql (3306), Microsoft Terminal server (3389) etc.

Dynamic and/or Private Ports. (Range: 49152 to 65535 )
These ports can’t be registered by IANA.  This is used for custom or temporary purposes and for automatic allocation of short-lived (or ephemeral ) ports which is used internally by application/processes. You can see these ports by running ‘netstat’ command under “Local address” column.

In Linux, the port details can be viewed by checking the /etc/services file  and the non-standard (un-registered) ports used by the server can be find using /proc/sys/net/ipv4/ip_local_port_range file.

[adevaraju@host01 ~]$ cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000
[adevaraju@host01 ~]$

Finding file creation time in Linux

Question:  In Linux, is it possible to find the actual time when a file was created first?

The Answer is NO until Ext3

File creation time ISN’T maintained in Linux with file-system Ext3 or earlier version. 
It will change when file content changes. It is explained with an example here:

[adevaraju@cac1-t1 ~]$ date
Tue Nov  9 01:22:25 CST 2010                                                      # Time now is 01:22:25
[adevaraju@cac01-t1 ~]$ echo "Some text on the file " > testfile       # Creating a file
[adevaraju@cac01-t1 ~]$ ls -lc testfile
-rw-r--r-- 1 adevaraju domain users 23 Nov  9 01:22 testfile     # Shows file change time as 01:22
 [adevaraju@cac01-t1 ~]$ ls -lu testfile
-rw-r--r-- 1 adevaraju domain users 23 Nov  9 01:22 testfile     # Access time is also 01:22

[adevaraju@cac01-t1 ~]$ cat testfile                                       # Now am accessing the file
Some text on the file

 [adevaraju@cac01-t1 ~]$ ls -lc testfile                            
-rw-r--r-- 1 adevaraju domain users 23 Nov  9 01:22 testfile     # Change time remains the same
[adevaraju@cac01-t1 ~]$ ls -lu testfile
-rw-r--r-- 1 adevaraju domain users 23 Nov  9 01:23 testfile     # Access time changed to 01:23
[adevaraju@cac01-t1 ~]$ date
Tue Nov  9 01:23:32 CST 2010

[adevaraju@cac01-t1 ~]$ echo "Additional text to the file" >> testfile # file content changed
 [adevaraju@cac01-t1 ~]$ ls -lc testfile                                                                            
-rw-r--r-- 1 adevaraju domain users 51 Nov  9 01:24 testfile      # Change time changes to 01:24
[adevaraju@cacllm01-t1 ~]$ ls -lu testfile
-rw-r--r-- 1 adevaraju domain users 51 Nov  9 01:23 testfile      # But access time remains the same since we did only changes to file.
[adevaraju@cacllm01-t1 ~]$ date
Tue Nov  9 01:24:22 CST 2010
[adevaraju@cacllm01-t1 ~]$

Atlast the original file creation time (which is 01:22) is gone, we CAN’T retrieve that information.

PS:  EXT4 file-system comes out with a feature to find the File-creation time. 

Sunday, November 7, 2010

What should be an optimal Load average for a Linux server ?

When a system shows sluggish performance, we are examining Load average which reports some value. For a long time I was wondering how these values are actually calculated and what should be the ideal value for a given Linux server. 
After some serious search, I found this article which gives satisfactory explanation on Load average.

For people who don’t have time to go through these articles, here is the summary:

1.   Load average NOT to be confused with CPU percentage.

2.   Optimal Load average equals your number of CPU Cores. 
     The point of perfect utilization, meaning that the CPUs are always busy and, yet, no process ever waits for one, is the average matching the number of CPUs. If there are four CPUs on a machine and the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds. This understanding can be extrapolated to the 5- and 15-minute averages.”

3.   Load average is good for getting a general feeling of the server's condition, but isn't the whole picture.

To make it further simple, if you have 8 CPU Cores (can be found using cat /proc/cpuinfo) on a Linux server, the ideal Load average should be around 8 (+/- 1). If its > 8, then the server resources are over-utilized and if < 8, the server isn’t running with its full potential.

Wednesday, October 13, 2010

Finding Load average on Linux

Load Average of a Linux server can be find in various ways and they are shown below.  The 3 values which it displays are the system  load averages for the past 1, 5, and 15 minutes.

 [root@S3 ~/]# uptime
 07:18:07 up 57 days,  8:04,  8 users,  load average: 1.82, 1.24, 0.97

[root@S3 ~/]# cat /proc/loadavg
1.67  1.22  0.97  1/283 24487

 [root@S3 ~/]# w
 07:18:19 up 57 days,  8:04,  8 users,  load average: 1.56, 1.21, 0.97
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
adevaraj pts/0    gtlslhh.tcprod. 01:41    0.00s  0.10s  0.03s sshd: adevaraju [priv]
tkhalef  pts/3    gtwash01.tcprod. Mon05   45:35   0.38s  0.03s sshd: tkhalef [priv]
nimmika  pts/4   05:49    1:28m  0.02s  0.02s -bash
[root@S3 ~/]#

[root@S3 ~/]# top
top - 07:19:14 up 57 days,  8:05,  8 users,  load average: 1.57, 1.28, 1.00
Tasks: 284 total,   1 running, 281 sleeping,   0 stopped,   2 zombie
Cpu(s):  2.2%us,  0.7%sy,  0.0%ni, 95.9%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16634296k total, 14157704k used,  2476592k free,   438604k buffers
Swap:  5406712k total,       56k used,  5406656k free, 12835252k cached

21914 apache    15   0     0    0    0 Z  7.7  0.0   0:02.06 httpd <defunct>
24774 root      15   0  2508 1072  720 R  3.8  0.0   0:00.02 top
20504 apache    15   0 50596  24m 4800 S  1.9  0.1   0:03.89 httpd
    1 root      15   0  2072  616  532 S  0.0  0.0   0:08.33 init

Sunday, October 10, 2010

What an useradd command in Linux does?

When you execute an useradd command  (# useradd <username>”), the following happens.

1.       Add the new user entry in the following files: /etc/passwd and /etc/shadow

2.       A group will be created with the same username and it will be updated in the following files: /etc/group and /etc/gshadow.

3.       Home folder for the user will be created (/home/<username>) and the default profile settings will be copied from /etc/skel to it.

# ls -la /etc/skel/
total 56
drwxr-xr-x   3 root root  4096 Aug 16 13:03 .
drwxr-xr-x 111 root root 12288 Oct  6 15:10 ..
-rw-r--r--   1 root root    33 Jan 21  2009 .bash_logout
-rw-r--r--   1 root root   176 Jan 21  2009 .bash_profile
-rw-r--r--   1 root root   124 Jan 21  2009 .bashrc
-rw-r--r--   1 root root   515 May 24  2008 .emacs
drwxr-xr-x   4 root root  4096 Jul 26 18:19 .mozilla
-rw-r--r--   1 root root   658 Sep 21  2009 .zshrc

Please note in the case, if you happen to accidently delete either /etc/passwd or /etc/shadow files, you can restore it from its corresponding backup files (i.e /etc/passwd- ,  /etc/shadow- ) respectively. These files are updated upon the system reboot.  So you can’t expect these files to be having the entries for user accounts which are added after the system reboot.

SUID (Set User ID) Explained

The password information of an user account is saved in /etc/shadow file. When you check the file permission of it, you would see that it has Read permission ONLY for root.  So ever wondered how can a normal user will be able to Write on this file while executing the ‘passwd’ command for changing his password ??

[adevaraju@hostx ~]$ ls -l /etc/shadow
-r-------- 1 root root 1436 Oct  6 14:40 /etc/shadow
[adevaraju@hostx ~]$

There comes SUID in picture……..If you check the file permission for ‘passwd’ command, you can see that it has a SUID (Set User ID) set for it as shown below. Now lemme tell the definition of SUID. “When SUID bit is set for any command then whoever executes that command, will execute it with the privilege of file owner”. 

Here w.r.t ‘passwd’ command, when a normal user executes it, then it will run with “root” ownership. As root user can over-write any local files, he can update the /etc/shadow file, though it doesn’t have Write permission on it. And that’s how a normal user can change his password.

[adevaraju@hostx ~]$ ls -l /usr/bin/passwd
-rwsr-xr-x 1 root root 22984 Jan  6  2007 /usr/bin/passwd
[adevaraju@hostx ~]$

How to set SUID ?

# chmod u+s <command/script name>


# chmod 4755 <command/script name>

How to search files with SUID set?

# find / -perm -4000  -type f -print

Please note while doing security audit on a server, finding and reviewing the existence of executables with SUID set is an important action item that needs to be taken care; as there are very dangerous.

Refer: http://www.bashguru.com/2010/03/unixlinux-advanced-file-permissions.html

Thursday, October 7, 2010

Resolving Too many files open error in Linux

Users couldn't execute *ANY* commands. It gives "Too many open files" error as shown below:

orabi@miaash02-t1$ top
ksh: top: /usr/bin/top: cannot execute [Too many open files in system]
dwilliams@miaash02-t1$ ls -l
ksh: ls: /bin/ls: cannot execute [Too many open files in system]

Everything in Linux are files; Linux forks most things including devices, sockets and pipes as files.  There is a kernel parameter called “file-max” which controls the maximum number of files that can be opened in a system. The default value is 65K (approx), can be find using the following command:
“sysctl -a | grep file-max”.
To check the count of number of files open, we can use the following command: “lsof | wc –l”. However this will not give you the exact number, because it is possible for a single file to be opened multiple times for readind and each additional concurrent open will increase the count for file-max value.  And in addition even connections to network ports can eat up the ‘file-max’ value.

As a temporary solution, we can increase the ‘file-max’ value by issuing the following command:
# echo “value” > /proc/sys/fs/file-max and then we need to identify the problem by analyzing either the System logs or by using lsof command itself with appropriate options. 

Finding default block size in Linux

How to find default block size in Linux?
# tune2fs -l /dev/sda1 | grep Block
Block count:                4980736
Block size:                    4096
Blocks per group:     32768
From this example, you can see that the default block size for the filesystem on /dev/sda1 partition is 4096 bytes, or 4k. That's the default block size for ext3 filesystem.    

How to define the block size while creating  ext3 filesystem ?


 # mkfs.ext3 –b <block size>  <Device>
Example:   # mkfs.ext3 -b 2048 /dev/sda1                
This creates a filesystem with 2KB block size on partition /dev/sda1.

Going indepth of Linux process

At times need arises to dig in deep about a running process. We may be seeing a process occupying more system resource, in that case, we have to find what does that process actually doing and which are all the files it is accessing.

There are multiple ways:

1.       Using ‘ps’ with appropriate options  (This will give show you the process tree of all process)

# ps axjf 
# ps eauxf

2.       Using ‘lsof’   (‘-p’ and ‘-c’ shows you the files opened by a particular process and command respectively)

# lsof -p  <PID>                

To know PID of a process name (e.g smbd),
# pidof  [process name]      (i.e # pidof smbd)
# lsof -c  <Command name>

Wanna go still deeper………………………………………………………………….??

3.       Use ‘strace’  ( Will list out all the System calls and Signals made by a Process. It records the System calls “C functions” which are called by a  process  and  the  Signals  which  are received by a process)

#  strace -f -p  <PID>

How to find PID associated with a Port number

At times we may run into situation to find the PID associated with a particular Port number. We may not be able to start/restart a particular service (say Jboss), since the Port number which is needed for the service is occupied by another process. You would have seen the error saying “Address already in use”. 

Find the PID associated with the particular port and kill it, if not required, and start the service.

Couple of straight forward commands:

# lsof -i      (or)  lsof -i: <port number>

# netstat -lantp   (or)  netstat -lautp


[root@hostx ~]# lsof -i:80                        # Second column shows the PID associated with HTTPD service running on port 80.
httpd   28064   root    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28066 apache    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28067 apache    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28068 apache    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28069 apache    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28070 apache    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28071 apache    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28072 apache    4u  IPv6 27763387       TCP *:http (LISTEN)
httpd   28073 apache    4u  IPv6 27763387       TCP *:http (LISTEN)                                                                                                                         

[root@hostx ~]# lsof -i:123                   # 1582 is the PID for NTP service running on port 123
ntpd    1582  ntp   16u  IPv4   3830       UDP *:ntp
ntpd    1582  ntp   17u  IPv6   3831       UDP *:ntp
ntpd    1582  ntp   18u  IPv6   3846       UDP [::1]:ntp
ntpd    1582  ntp   19u  IPv6   3847       UDP [fe80::3c8c:7ff:fe96:2705]:ntp
ntpd    1582  ntp   20u  IPv6   3890       UDP [fe80::1c43:d9ff:fe07:dfc0]:ntp
ntpd    1582  ntp   21u  IPv4   3964       UDP localhost.localdomain:ntp
ntpd    1582  ntp   22u  IPv4   3965       UDP kesav01-t2:ntp

 [root@hostx ~]# netstat -lantp | grep java      # 3rd column shows the Local server Port number and last column shows the PID and process associated with it.
tcp        0      0     *                   LISTEN      20586/java
tcp        0      0     *                   LISTEN      20586/java
tcp        0      0  *                   LISTEN      20607/java
tcp        0      0  *                   LISTEN      20586/java
tcp        0      0     *                   LISTEN      20586/java
tcp        0      0     *                   LISTEN      20607/java
tcp        0      0            ESTABLISHED 20607/java
tcp        0      0            ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20607/java
tcp        0      0             ESTABLISHED 20607/java
tcp        0      0             ESTABLISHED 20607/java
tcp        2      0             ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20607/java
tcp        0      0             ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20607/java
tcp        0      0             ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20607/java
tcp        0      0             ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20586/java
tcp        0      0             ESTABLISHED 20607/java
[root@hostx ~]#

[root@hostx ~]# netstat -lautp | grep java       # Difference with ‘u’ option is, it resolves the Hostname and Port number.
tcp        0      0 *:55980                     *:*                         LISTEN      20586/java
tcp        0      0 *:11025                     *:*                         LISTEN      20586/java
tcp        0      0 kseav01-t2:radan-http      *:*                         LISTEN      20607/java
tcp        0      0 kesav01-t2:8089            *:*                         LISTEN      20586/java
tcp        0      0 *:42587                     *:*                         LISTEN      20586/java
tcp        0      0 *:55933                     *:*                         LISTEN      20607/java
[root@hostx ~]#

Memory leakage in Linux

What is Memory leakage?

Memory Leakage basically refers to a situation where the memory allocated to an application (can be a Database query as well) is not getting freed up. This can be due to bug in the program which utilize the memory resource. In my opinion, when an application closes, it should issue the proper exit statement so that it will free up the memory which it occupied and kill all its child process. 
So this Memory leakage is basically an application issue, and any leaked memory will be freed up after the application is killed or stopped.

How to find the Memory leakage?

We would first need to determine which application/process is consuming more memory.  This can be done through data gathering. We need to periodically check what is running on a server and what is the Memory utilization . With the time when the server runs out of available memory to allocate, the leaking application might crash or it may even crash the system.

Valgrind is a popular Open source tool available for detecting the memory leakage of an application (URL:  http://valgrind.org/ ). A sample Valgrind output attached.

How to fix it?

We need to correct the application from using more memory ( Done by a Programmer or DBA).  I don’t think we can do much on the server side since the job of kernel is release the requested memory by an application. Certain things we can do at Server end are, tuning some kernel parameter like restricting the number of child process forked by a single process, restricting number of process forked by an user etc. If the kernel have the intelligence to distinguish, which request for more memory is valid and which one causes memory leaks, it would be great but am not sure if it’s possible to do such level of tuning J

In Linux, Memory leakage can be found using the below given 'ps' command. This will give you an idea about Memory usage of each process in a sorted manner.
# ps aux --sort pmem