Tuesday, March 31, 2009

Listener Hangs, Child listener process remains persistence

Problem Description
Some days ago in our database server we got a problem regarding listener issue. Our TNS Listener hangs. Below is the problem symtompts.

•The lsnrctl status or lsnrctl stop or lsnrctl reload does not respond. Just like it hangs after displays message connecting to ..... .

•No one from outside can connect to database.

•Local connection without listener was ok.

•Listener process takes high cpu than normal usage.

•Listener process forks. The word fork is an UNIX OS related term and it indicates listener process creates a copy of itself. The copied process is called child process and the original process is called a parent process. Due to load of the listener a child listener process is created and it remains persistent. Whenever we give ps -ef then two tnslsnr is shown as below.

$ ps -ef | grep tnslsnr
oracle 3102 1 0 Jan 01 ? 12:28 /var/opt/oracle/bin/tnslsnr LISTENER -inherit
oracle 5012 3102 0 Jan 25 ? 10:15 /var/opt/oracle/bin/tnslsnr LISTENER -inherit


From the output first one is parent listener process and second line is child listener process. For child listener process parent id is 3102.

Just killing the child process allows new connections to work until the problem reoccurs. So after seeing above and if listener hangs then do,
$kill -9 5012 3102

Cause of the Problem
This problem remains in oracle 10.1.0.3, 10.1.0.4, 10.1.0.4.2, 10.1.0.5, 10.2.0.1 and 10.2.0.2. The listener hangs if the child listener process is not closed i.e after creating child process it persists. Note that, child listener processes are not unusual, depending on traffic as well as when the OS grep snapshot is taken. However, a persistent secondary process (longer than 5 seconds) is not normal and may be a result of this referenced problem.

This listener hanging event can happen on a standalone server or on a RAC server.

Solution of the Problem
1)The issue is fixed in patchset 10.2.0.3 and in 10.2.0.4. So apply patchset.

2)Apply Patch 4518443 which is available in metalink. Download from metalink and apply on your databse server.

3)As a workaround, you can follow the following two steps if you don't like to apply patch now.

Step 01: Add the following entry in your listener.ora file.
SUBSCRIBE_FOR_NODE_DOWN_EVENT_{listener_name}=OFF

Where {listener_name} should be replaced with the actual listener name configured in the LISTENER.ORA file.

Suppose your have default listener name and it is LISTENER. Then in the listener.ora file(by default in location $ORACLE_HOME/network/admin on unix) add the following entry in a new line,

SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF


Step 02: Go to directory cd $ORACLE_HOME/opmn/conf , find ons.config and move it to another location. Like,

cd $ORACLE_HOME/opmn/conf
mv ons.config ons.config.bak


After completing above two steps bounce the listener.

lsnrctl stop
lsnrctl start


Alternatively, you can simply issue,
$lsnrctl reload
if database availability is important.
Note that adding the SUBSCRIBE_FOR_NODE_DOWN_EVENT_{listener_name} to listener.ora file on RAC and disabling the ONS file, will mean that FAN (fast application notification) will not be possible. So, if you have a RAC configuration, then apply the patch and do not disable ONS or FAN.

No comments:

Post a Comment