niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » *nix » BSD » OpenBSD
mute crash / DDB
Post new topic   Reply to topic Page 1 of 1 [10 Posts] View previous topic :: View next topic
Author Message
Steve at fivetrees
*nix forums addict


Joined: 21 May 2005
Posts: 82

PostPosted: Thu Jul 06, 2006 12:20 pm    Post subject: Re: mute crash / DDB Reply with quote

"jpd" <read_the_sig@do.not.spam.it.invalid> wrote in message
news:4h4b2rF1q2r04U1@individual.net...
Quote:
Begin <IradnQyJctWRcDHZnZ2dnUVZ8qednZ2d@pipex.net
On 2006-07-06, Steve at fivetrees <steve@NOSPAMTAfivetrees.com> wrote:
These machines, and their predecessors, and *their* predecessors, have
been very reliable - apart from the odd event such as this every few
months.

I don't know if your boxes are on a UPS, but if it really is that
sporadic and apparently fairly independent of the hardware, it even
might be anomalies in the power. If you have logs of previous incidents,
how regular are they, really?

They are indeed on a UPS. It's the machine that's taking most of the load
(active webserver) that goes mute every once in a long while (in the old
days, it'd reboot rather than die, as previously noted).

Re regularity: somewhere around 4-6 months. My guess (as I've said) is that
it's a resource issue. Note that I've seen this issue (assuming it's the
same issue, which seems likely) over several generations of both OpenBSD and
hardware [1]. I wasn't too worried about it when the symptom was a reboot;
I'm slightly more concerned now that it freezes, since it knocks out the
webserver etc until I notice and/or start getting phone calls. I have
monitoring enabled via my coloco provider, but since this works on the basis
of pings, it doesn't help Sad.

I note recent discussion on the misc@ list re "3.9 freeze" - which exactly
describes what I'm seeing - i.e. completely dead but still responds to
pings.

[1] Except 2.6, on which I managed to get around 480 days of uptime. I'm a
bit more proactive on patches and controlled reboots these days Wink. Back
then I ran a custom kernel; I've double-checked for significant differences,
but I'll repeat the exercise.

Steve
http://www.fivetrees.com
Back to top
jpd
*nix forums Guru


Joined: 22 Feb 2005
Posts: 877

PostPosted: Thu Jul 06, 2006 11:39 am    Post subject: Re: mute crash / DDB Reply with quote

Begin <IradnQyJctWRcDHZnZ2dnUVZ8qednZ2d@pipex.net>
On 2006-07-06, Steve at fivetrees <steve@NOSPAMTAfivetrees.com> wrote:
Quote:
These machines, and their predecessors, and *their* predecessors, have
been very reliable - apart from the odd event such as this every few
months.

I don't know if your boxes are on a UPS, but if it really is that
sporadic and apparently fairly independent of the hardware, it even
might be anomalies in the power. If you have logs of previous incidents,
how regular are they, really?


--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.
Back to top
Steve at fivetrees
*nix forums addict


Joined: 21 May 2005
Posts: 82

PostPosted: Thu Jul 06, 2006 10:59 am    Post subject: Re: mute crash / DDB Reply with quote

"DoN. Nichols" <dnichols@d-and-d.com> wrote in message
news:e8i7cu1fm4@news1.newsguy.com...
Quote:
According to Steve at fivetrees <steve@NOSPAMTAfivetrees.com>:
jKILLSPAM.schipper@math.uu.nl> wrote in message
news:44aa6e7a$0$29097$dbd4b001@news.wanadoo.nl...
Steve at fivetrees <steve@nospamtafivetrees.com> wrote:

[ ... ]

Curious. What happened, then? Better still - how can I find out?

Mostly, waiting for the crash then fiddling around with ddb (which may
or may not help; it's not trivial). For colo'ed servers, this means a
serial line is a good idea.

Logically, yes. However my colocation supplier charges, quite reasonably,
an
hourly fee for connection of a KVM (or presumably a serial port). If this
happens once every 3-6 months, this is probably not a viable solution Wink.

You have two colo'd boxes right? At the same location? What
are the chances that you can request a null-modem cable between ttya on
the two machines? That way, whichever one goes down, you can use tip
from the other to check up on the down machine.

Ooo. What an excellent idea. (And why didn't I think of that??) I'll do
that.

Quote:
What is the hardware? I've been thinking of Sun hardware, where
you get the OpenBoot PROM on the serial port if everything else goes
down.

These are rack-mounted i386 machines.

Quote:
Asking for help is likely to provide you with good, but not very
specific, advice - 'upgrade'.

Yes, I realise. However: I've seen variations on this behaviour through
various releases of OBSD (see my earlier post). My best guess at the
moment

[ ... ]

Also remote upgrading is decidedly scary Wink.

I'll bet. How far away is the machine's location? *can* you
get physical access to do work on it?

They're in London; I'm on the South coast, about 70 miles away. So it's not
impossible.

But I remain unconvinced that an upgrade is the solution. These machines,
and their predecessors, and *their* predecessors, have been very reliable -
apart from the odd event such as this every few months. Having said that, I
will need to upgrade at some point - but again I'll probably upgrade the
machines too.

Thanks a great deal for the good thoughts.

Steve
(still glowing from the spamd installation success - see other thread Wink)

http://www.fivetrees.com
Back to top
DoN. Nichols
*nix forums beginner


Joined: 20 Feb 2005
Posts: 33

PostPosted: Thu Jul 06, 2006 5:34 am    Post subject: Re: mute crash / DDB Reply with quote

According to Steve at fivetrees <steve@NOSPAMTAfivetrees.com>:
Quote:
jKILLSPAM.schipper@math.uu.nl> wrote in message
news:44aa6e7a$0$29097$dbd4b001@news.wanadoo.nl...
Steve at fivetrees <steve@nospamtafivetrees.com> wrote:

[ ... ]

Quote:
Curious. What happened, then? Better still - how can I find out?

Mostly, waiting for the crash then fiddling around with ddb (which may
or may not help; it's not trivial). For colo'ed servers, this means a
serial line is a good idea.

Logically, yes. However my colocation supplier charges, quite reasonably, an
hourly fee for connection of a KVM (or presumably a serial port). If this
happens once every 3-6 months, this is probably not a viable solution Wink.

You have two colo'd boxes right? At the same location? What
are the chances that you can request a null-modem cable between ttya on
the two machines? That way, whichever one goes down, you can use tip
from the other to check up on the down machine.

What is the hardware? I've been thinking of Sun hardware, where
you get the OpenBoot PROM on the serial port if everything else goes
down.

[ ... ]

Quote:
Asking for help is likely to provide you with good, but not very
specific, advice - 'upgrade'.

Yes, I realise. However: I've seen variations on this behaviour through
various releases of OBSD (see my earlier post). My best guess at the moment

[ ... ]

Quote:
Also remote upgrading is decidedly scary Wink.

I'll bet. How far away is the machine's location? *can* you
get physical access to do work on it?

Quote:
So far I've upgraded the
machines *and* the OS every 2 years or so, i.e. fresh installs on fresh
machines, plenty of testing, and then a simple change to the nameserver and
reverse DNS to use the new machines. These are fairly busy machines; I
daren't take them offline.

That can be a problem, indeed.

Good Luck,
DoN.
--
Email: <dnichols@d-and-d.com> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
Back to top
jKILLSPAM.schipper@math.u
*nix forums Guru Wannabe


Joined: 13 Nov 2005
Posts: 202

PostPosted: Tue Jul 04, 2006 4:40 pm    Post subject: Re: mute crash / DDB Reply with quote

Steve at fivetrees <steve@nospamtafivetrees.com> wrote:
Quote:
jKILLSPAM.schipper@math.uu.nl> wrote in message
news:44aa6e7a$0$29097$dbd4b001@news.wanadoo.nl...
Steve at fivetrees <steve@nospamtafivetrees.com> wrote:
"MEOW" <mews@localhost.daemonium.com> wrote in message
news:slrneakbft.7qb.mews@localhost.daemonium.com...
On 2006-07-04, Steve at fivetrees <steve@NOSPAMTAfivetrees.com> wrote:
So: by keeping the standard kernel, have I replaced "very occasionally
reboot" with "very occasionally die to a DDB prompt, without
rebooting"?

Steve
(PS: clearly the real solution is to open up more resources via sysctl.
But
I'd like to be sure I understand what's happening.)

The kernel does not respond to ICMP echo requests when in DDB.

Curious. What happened, then? Better still - how can I find out?

Mostly, waiting for the crash then fiddling around with ddb (which may
or may not help; it's not trivial). For colo'ed servers, this means a
serial line is a good idea.

Logically, yes. However my colocation supplier charges, quite reasonably, an
hourly fee for connection of a KVM (or presumably a serial port). If this
happens once every 3-6 months, this is probably not a viable solution Wink.
(Unless I can find a reasonable cause and force the failure on demand. I
have test machines here, but I've been unable to replicate the failure on
them - probably I'm not able to load them up enough.)

Since you mentioned a failover box, I thought you might be able to wire
these two together. That would be quite a bit cheaper, I suppose.

Quote:
Asking for help is likely to provide you with good, but not very
specific, advice - 'upgrade'.

Yes, I realise. However: I've seen variations on this behaviour through
various releases of OBSD (see my earlier post). My best guess at the moment
is that a combination of time elapsed since booting and peak loading
conspire, once in a blue moon, to cause a condition where the OS runs out of
resources. Therefore I suspect an upgrade, by itself, is unlikely to solve
the issue. Before jumping into sysctl, I'd like to understand *which*
resources... or at least understand the nature of the failure, whatever the
cause.

Okay, that's not a bad idea.

Quote:
Also remote upgrading is decidedly scary Wink. So far I've upgraded the
machines *and* the OS every 2 years or so, i.e. fresh installs on fresh
machines, plenty of testing, and then a simple change to the nameserver and
reverse DNS to use the new machines. These are fairly busy machines; I
daren't take them offline.

If, and only if, you can get proper failover working, upgrading is easy.

Otherwise, it's still easy, but as good as the upgrade process is
nowadays, I wouldn't try it without taking the box offline.

(Which is not to say it's not possible; but it's not necessarily good
for the heart...)

Joachim
Back to top
Steve at fivetrees
*nix forums addict


Joined: 21 May 2005
Posts: 82

PostPosted: Tue Jul 04, 2006 1:49 pm    Post subject: Re: mute crash / DDB Reply with quote

<jKILLSPAM.schipper@math.uu.nl> wrote in message
news:44aa6e7a$0$29097$dbd4b001@news.wanadoo.nl...
Quote:
Steve at fivetrees <steve@nospamtafivetrees.com> wrote:
"MEOW" <mews@localhost.daemonium.com> wrote in message
news:slrneakbft.7qb.mews@localhost.daemonium.com...
On 2006-07-04, Steve at fivetrees <steve@NOSPAMTAfivetrees.com> wrote:
So: by keeping the standard kernel, have I replaced "very occasionally
reboot" with "very occasionally die to a DDB prompt, without
rebooting"?

Steve
(PS: clearly the real solution is to open up more resources via sysctl.
But
I'd like to be sure I understand what's happening.)

The kernel does not respond to ICMP echo requests when in DDB.

Curious. What happened, then? Better still - how can I find out?

Mostly, waiting for the crash then fiddling around with ddb (which may
or may not help; it's not trivial). For colo'ed servers, this means a
serial line is a good idea.

Logically, yes. However my colocation supplier charges, quite reasonably, an
hourly fee for connection of a KVM (or presumably a serial port). If this
happens once every 3-6 months, this is probably not a viable solution Wink.
(Unless I can find a reasonable cause and force the failure on demand. I
have test machines here, but I've been unable to replicate the failure on
them - probably I'm not able to load them up enough.)

Quote:
Asking for help is likely to provide you with good, but not very
specific, advice - 'upgrade'.

Yes, I realise. However: I've seen variations on this behaviour through
various releases of OBSD (see my earlier post). My best guess at the moment
is that a combination of time elapsed since booting and peak loading
conspire, once in a blue moon, to cause a condition where the OS runs out of
resources. Therefore I suspect an upgrade, by itself, is unlikely to solve
the issue. Before jumping into sysctl, I'd like to understand *which*
resources... or at least understand the nature of the failure, whatever the
cause.

Also remote upgrading is decidedly scary Wink. So far I've upgraded the
machines *and* the OS every 2 years or so, i.e. fresh installs on fresh
machines, plenty of testing, and then a simple change to the nameserver and
reverse DNS to use the new machines. These are fairly busy machines; I
daren't take them offline.

Ah well. Thanks for the input.

Steve
http://www.fivetrees.com
Back to top
jKILLSPAM.schipper@math.u
*nix forums Guru Wannabe


Joined: 13 Nov 2005
Posts: 202

PostPosted: Tue Jul 04, 2006 1:34 pm    Post subject: Re: mute crash / DDB Reply with quote

Steve at fivetrees <steve@nospamtafivetrees.com> wrote:
Quote:
"MEOW" <mews@localhost.daemonium.com> wrote in message
news:slrneakbft.7qb.mews@localhost.daemonium.com...
On 2006-07-04, Steve at fivetrees <steve@NOSPAMTAfivetrees.com> wrote:
So: by keeping the standard kernel, have I replaced "very occasionally
reboot" with "very occasionally die to a DDB prompt, without rebooting"?

Steve
(PS: clearly the real solution is to open up more resources via sysctl.
But
I'd like to be sure I understand what's happening.)

The kernel does not respond to ICMP echo requests when in DDB.

Curious. What happened, then? Better still - how can I find out?

Mostly, waiting for the crash then fiddling around with ddb (which may
or may not help; it's not trivial). For colo'ed servers, this means a
serial line is a good idea.

Asking for help is likely to provide you with good, but not very
specific, advice - 'upgrade'.

Joachim
Back to top
Steve at fivetrees
*nix forums addict


Joined: 21 May 2005
Posts: 82

PostPosted: Tue Jul 04, 2006 12:14 pm    Post subject: Re: mute crash / DDB Reply with quote

"MEOW" <mews@localhost.daemonium.com> wrote in message
news:slrneakbft.7qb.mews@localhost.daemonium.com...
Quote:
On 2006-07-04, Steve at fivetrees <steve@NOSPAMTAfivetrees.com> wrote:
So: by keeping the standard kernel, have I replaced "very occasionally
reboot" with "very occasionally die to a DDB prompt, without rebooting"?

Steve
(PS: clearly the real solution is to open up more resources via sysctl.
But
I'd like to be sure I understand what's happening.)

The kernel does not respond to ICMP echo requests when in DDB.

Curious. What happened, then? Better still - how can I find out?

Quote:
The sysctl.conf
file says if ddb.panic is 0 then the OS will not drop to DDB but will
rather
reboot.

That line is commented out by default. Perhaps I should enable those lines.
However, I'm now not confident I've identified the failure scenario.

Thanks for the response.

Steve
http://www.fivetrees.com
Back to top
MEOW
*nix forums beginner


Joined: 04 Jul 2006
Posts: 1

PostPosted: Tue Jul 04, 2006 8:59 am    Post subject: Re: mute crash / DDB Reply with quote

On 2006-07-04, Steve at fivetrees <steve@NOSPAMTAfivetrees.com> wrote:
Quote:
So: by keeping the standard kernel, have I replaced "very occasionally
reboot" with "very occasionally die to a DDB prompt, without rebooting"?

Steve
(PS: clearly the real solution is to open up more resources via sysctl. But
I'd like to be sure I understand what's happening.)

The kernel does not respond to ICMP echo requests when in DDB. The sysctl.conf
file says if ddb.panic is 0 then the OS will not drop to DDB but will rather
reboot.
Back to top
Steve at fivetrees
*nix forums addict


Joined: 21 May 2005
Posts: 82

PostPosted: Tue Jul 04, 2006 1:39 am    Post subject: mute crash / DDB Reply with quote

Tonight, one of my coloco'ed (remote) webservers (OpenBSD 3.7) died. It
responded to a ping, but all other services were down. No SSH, no HTTP,
nothing.

My other (mirrored) server was fine, so I requested an automated power cycle
of the dead one - a few minutes later it came back up fine. I checked the
logs - nothing. All the logs showed normal traffic until a certain time,
then nothing - only the rebooting process. (I did a lot of checking of
system files etc after this; all was fine.)

Now - if memory serves, the only other time I've seen this was with a local
server which had run out of Samba file handles - it panicked (IIRC) and
dropped to a DDB prompt. In that case, the only evidence I could find (IIRC)
was in the Samba logs. Again it responded to a ping, and nothing else.

When I started out webhosting, I had help from an OpenBSD committer. He
commented out the DDB kernel lines. (This was back in the days of OpenBSD
2.6.) More recently I'd read so many times that custom kernels are uncool
and unnecessary, I've omitted this step and stuck with GENERIC.

In the past (with DDB commented out), one of my servers (through various OS
versions) would reboot itself once every 3-6 months or so. Again it seemed
to have been a "help - I've run out of resources" problem. But it survived -
it rebooted by itself and all was well.

So: by keeping the standard kernel, have I replaced "very occasionally
reboot" with "very occasionally die to a DDB prompt, without rebooting"?

Steve
(PS: clearly the real solution is to open up more resources via sysctl. But
I'd like to be sure I understand what's happening.)

http://www.fivetrees.com
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [10 Posts] View previous topic :: View next topic
The time now is Sat Nov 22, 2008 9:43 am | All times are GMT
navigation Forum index » *nix » BSD » OpenBSD
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Regular Expressions crash course elyob PHP 1 Thu Jul 20, 2006 8:07 pm
No new posts Bizarre Crash Recovery Michel Esber IBM DB2 1 Thu Jul 20, 2006 6:24 pm
No new posts JVM Crash with DBXML dgalewsky@gmail.com Berkeley DB 0 Wed Jul 19, 2006 10:35 pm
No new posts PostgreSQL Server Crash using plPHP or PL/Perl Carl M. Nasal II PostgreSQL 5 Tue Jul 11, 2006 9:43 pm
No new posts How to reset console to text mode on X crash? ted@loft.tnolan.com (Ted FreeBSD 4 Mon Jul 10, 2006 6:01 pm

Personal Loans | WoW Gold | Free phpBB forum | Web Advertising | Payday Loan
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2132s ][ Queries: 20 (0.0796s) ][ GZIP on - Debug on ]