niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Apps » Exim
occasional SIGSEGV in exim4 under heavy load
Post new topic   Reply to topic Page 1 of 1 [3 Posts] View previous topic :: View next topic
Author Message
Chris Lightfoot
*nix forums addict


Joined: 22 Mar 2005
Posts: 62

PostPosted: Wed Jul 12, 2006 2:21 pm    Post subject: occasional SIGSEGV in exim4 under heavy load Reply with quote

(I've already submitted this as a bug in the Debian exim
package, but I've now reproduced it on stock 4.62, so it
may be of interest to this list as well.)

Under heavy SMTP load, we occasionally observe the exim4
daemon crashing (with the result that no further
connections can be accepted, obviously). We can reproduce
this here with the `postal' SMTP benchmark (package
postal) and the following command-line:

postal -p 10 -c 10 -m 1 localhost users -

(running on the host running exim4). The file `users'
contains a single line with an email address; in this case
I used a local address aliased to /dev/null in
/etc/aliases. While running these tests I had the whole of
/var/spool/exim4 mounted on a tmpfs (to simulate a
hardware configuration with very fast disk); since the bug
is likely timing-related you may have to do the same to
reproduce it. Here typically it will exhibit within the
first five minutes of postal's run.

Here's a stack trace from the exim4 daemon when it
crashes:

#0 0x00000000 in ?? ()
#1 0x40361825 in __pthread_sighandler () from /lib/libpthread.so.0
#2 <signal handler called>
#3 0x403d05d9 in __libc_sigaction () from /lib/libc.so.6
#4 0x4035e828 in sigaction () from /lib/libpthread.so.0
#5 0x080866c5 in os_non_restarting_signal (sig=17, handler=0x805c930 <main_sigchld_handler>) at os.c:267
#6 0x0805e9f3 in daemon_go () at daemon.c:1842
#7 0x0806e06b in main (argc=3, cargv=0xbfffdbc4) at exim.c:3922

-- for reasons related to the Debian package, the line
numbers in os.c in that trace don't correspond to those in
the official source tree. The relevant source line in the
official source is 103.

Looking at the backtrace, it appears that what's happened
is that a signal (presumably SIGCHLD) has arrived while
os_non_restarting_signal is running. The SIGCHLD handler
itself calls os_non_restarting_signal, and a crash
results. I'm not sure why, though -- there's nothing in
the code for that function that's obviously nonreentrant
(it only uses automatic variables and calls sigaction(2),
which is async-signal-safe).

Note that exim in this case is linked against -lpthread,
presumably because it's also linked against -lpq. The
problem does not occur (or, if it does occur, does so
sufficiently rarely as not to have been caught by my
tests) in an exim which is not linked against -lpthread.

The following patch to src/os.c, which blocks the signal
for which a handler is being installed over the call to
sigaction, appears to fix the problem, which is at least
compatible with the above hypothesis, though not a great
fix.

--- os.c.orig 2006-07-11 18:02:09.000000000 +0100
+++ os.c 2006-07-11 18:05:15.000000000 +0100
@@ -261,11 +261,20 @@

#ifdef SA_RESTART
struct sigaction act;
+sigset_t mask, curmask;
+
+sigemptyset(&mask);
+sigprocmask(SIG_BLOCK, &mask, &curmask);
+sigaddset(&mask, sig);
+sigprocmask(SIG_SETMASK, &mask, NULL);
+
act.sa_handler = handler;
sigemptyset(&(act.sa_mask));
act.sa_flags = 0;
sigaction(sig, &act, NULL);

+sigprocmask(SIG_SETMASK, &curmask, NULL);
+
#ifdef STAND_ALONE
printf("Used sigaction() with flags = 0\n");
#endif

-- although looking at that patch now I'm left wondering,
what *was* I thinking? What I meant was,

sigemptyset(&mask);
sigaddset(&mask, sig);
sigprocmask(SIG_BLOCK, &mask, &curmask);

/* ... */

sigprocmask(SIG_SETMASK, &curmask, NULL);

or similar. Actually I don't think there'd be any harm in
blocking all signals over the call to sigaction. I haven't
tried that though.

--
``Nothing so gives the illusion of intelligence
as personal association with large sums of money.''
(John Kenneth Galbraith)

--
## List details at http://www.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/
Back to top
Marc Haber
*nix forums Guru


Joined: 20 Feb 2005
Posts: 646

PostPosted: Tue Jul 18, 2006 4:29 pm    Post subject: Re: occasional SIGSEGV in exim4 under heavy load Reply with quote

On Wed, 12 Jul 2006 15:21:05 +0100, Chris Lightfoot
<chris@ex-parrot.com> wrote:
Quote:
Under heavy SMTP load, we occasionally observe the exim4
daemon crashing (with the result that no further
connections can be accepted, obviously).'

This seems to be a kernel issue, as it cannot be reproduced on any
system running a Linux 2.6 kernel. The original poster uses Linux 2.4.

See also the Debian BTS, http://bugs.debian.org/377857

Greetings
Marc

--
-------------------------------------- !! No courtesy copies, please !! -----
Marc Haber | " Questions are the | Mailadresse im Header
Mannheim, Germany | Beginning of Wisdom " | http://www.zugschlus.de/
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834

--
## List details at http://www.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/
Back to top
Chris Lightfoot
*nix forums addict


Joined: 22 Mar 2005
Posts: 62

PostPosted: Thu Jul 20, 2006 10:40 pm    Post subject: Re: occasional SIGSEGV in exim4 under heavy load Reply with quote

On Tue, Jul 18, 2006 at 06:29:22PM +0200, Marc Haber wrote:
Quote:
On Wed, 12 Jul 2006 15:21:05 +0100, Chris Lightfoot
chris@ex-parrot.com> wrote:
Under heavy SMTP load, we occasionally observe the exim4
daemon crashing (with the result that no further
connections can be accepted, obviously).'

This seems to be a kernel issue, as it cannot be reproduced on any
system running a Linux 2.6 kernel. The original poster uses Linux 2.4.

See also the Debian BTS, http://bugs.debian.org/377857

I doubt it's a kernel issue, though a libc bug looks
probable. However, I haven't yet managed to reproduce it
outside exim, so I'm not sure of that yet.

Test code here (tell me if you can make it crash on linux
2.4 -- compile it with -lpthread):
http://bitter.ukcod.org.uk/~chris/tmp/20060720/sigchld.c

--
``What does it mean? It means I never have to work again.''
(Don McLean, on `American Pie', attrib.)

--
## List details at http://www.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [3 Posts] View previous topic :: View next topic
The time now is Mon Dec 01, 2008 9:19 pm | All times are GMT
navigation Forum index » Apps » Exim
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Exim4 not creating and/or receiving incoming mail... jtelep@localonline.net Exim 1 Fri Jul 21, 2006 1:59 am
No new posts Is there a boot loader that can load a file a start up an... christopher.mcrorie@gmail Setup 1 Thu Jul 20, 2006 2:57 am
No new posts Load balancing outgoing messages Pierre VANNIER Postfix 2 Wed Jul 19, 2006 8:21 am
No new posts Announce: LBPool 1.0 beta1 (Load Balancing JDBC Connectio... Kevin Burton MySQL 2 Wed Jul 19, 2006 5:35 am
No new posts Advise on load Robert Fitzpatrick Postfix 5 Tue Jul 18, 2006 4:22 pm

Loans | Loans | Loans | Credit Cards | Loans
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2225s ][ Queries: 16 (0.1391s) ][ GZIP on - Debug on ]