|
|
|
|
|
|
| Author |
Message |
David Mathog *nix forums Guru Wannabe
Joined: 23 Feb 2005
Posts: 145
|
Posted: Tue Jun 21, 2005 3:47 pm Post subject:
what does "link beat lost" indicate?
|
|
|
What does it indicate when an interface shows pairs of
messages like these
Jun 21 10:09:20 myserver ifplugd(eth1)[1607]: Link beat lost.
Jun 21 10:09:21 myserver ifplugd(eth1)[1607]: Link beat detected.
when under heavy load?
In all cases these "outages" last exactly 1 second (or at
least the lost/detected are logged one second apart).
This happens when the interface is heavily loaded either
transmitting or receiving. (Or probably both at once, but
I've not actually tested that.)
For instance, today it was triggered several times by a ufsdump
from a Solaris 8 box to the linux box. So far this has generated
4 of these events over about 50 minutes.
Thanks,
David Mathog
mathog@caltech.edu |
|
| Back to top |
|
 |
Darren *nix forums addict
Joined: 01 Mar 2005
Posts: 84
|
Posted: Wed Jun 22, 2005 6:24 am Post subject:
Re: what does "link beat lost" indicate?
|
|
|
David Mathog wrote:
| Quote: | What does it indicate when an interface shows pairs of
messages like these
Jun 21 10:09:20 myserver ifplugd(eth1)[1607]: Link beat lost.
Jun 21 10:09:21 myserver ifplugd(eth1)[1607]: Link beat detected.
when under heavy load?
In all cases these "outages" last exactly 1 second (or at
least the lost/detected are logged one second apart).
This happens when the interface is heavily loaded either
transmitting or receiving. (Or probably both at once, but
I've not actually tested that.)
For instance, today it was triggered several times by a ufsdump
from a Solaris 8 box to the linux box. So far this has generated
4 of these events over about 50 minutes.
Thanks,
David Mathog
mathog@caltech.edu
The ifplugd daemon is detecting a cable disconnection. I would say you |
should check your cable/plug integrity, and since your problem is not so
random you may also need to look at your ethernet cards some components may
be failing under heavy traffic.
--
Peace |
|
| Back to top |
|
 |
David Mathog *nix forums Guru Wannabe
Joined: 23 Feb 2005
Posts: 145
|
Posted: Wed Jun 22, 2005 2:05 pm Post subject:
Re: what does "link beat lost" indicate?
|
|
|
Darren wrote:
| Quote: | David Mathog wrote:
What does it indicate when an interface shows pairs of
messages like these
Jun 21 10:09:20 myserver ifplugd(eth1)[1607]: Link beat lost.
Jun 21 10:09:21 myserver ifplugd(eth1)[1607]: Link beat detected.
when under heavy load?
The ifplugd daemon is detecting a cable disconnection. I would say you
should check your cable/plug integrity, and since your problem is not so
random you may also need to look at your ethernet cards some components may
be failing under heavy traffic.
|
The cables are all fine. How do I distinguish between an iffy
switch (one possible culprit) and some bug in the linux driver and or
tcp/ip stack and/or ifplugd (all equally likely)? For instance,
if the interface is very heavily loaded I worry that ifplugd might
not be able to access it as often as it wants, and so throws a "link
beat lost" message. ifconfig shows for this interface:
eth1 Link encap:Ethernet HWaddr 00:E0:81:22:2F:E7
inet addr:192.168.1.220
Bcast:192.168.1.255
Mask:255.255.255.0
inet6 addr: fe80::2e0:81ff:fe22:2fe7/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:85160803 errors:0 dropped:0 overruns:8356 frame:0
TX packets:63192569 errors:0 dropped:0 overruns:0 carrier:1
collisions:0 txqueuelen:1000
RX bytes:684742770 (653.0 Mb) TX bytes:390586093 (372.4 Mb)
Interrupt:19 Base address:0x2480
which looks reasonably healthy to me, with no errors and no dropped.
ifplugd is running at prio 16 (in top) as is atalkd, smbd and just
about every other network daemon, could this just be the result
of a busy system plus busy network? Perhaps I should just crank
the ifplugd poll time up from 1 to 5 seconds and then these messages
will go away?
I suppose I could also borrow another switch and see if it does the same
thing under load. That wouldn't be as definitive as I'd like though
since this bug, wherever it is, could be timing dependent, and that
might vary in very subtle ways between switches.
Can you think of a network diagnostic tool, preferably one that
runs on DOS or the lightest possible linux, to separate out these
variables?
Thanks,
David Mathog
mathog@caltech.edu |
|
| Back to top |
|
 |
De Kameel *nix forums addict
Joined: 02 Mar 2005
Posts: 84
|
Posted: Wed Jun 22, 2005 6:38 pm Post subject:
Re: what does "link beat lost" indicate?
|
|
|
David Mathog wrote:
| Quote: | The cables are all fine. How do I distinguish between an iffy
switch (one possible culprit) and some bug in the linux driver and or
tcp/ip stack and/or ifplugd (all equally likely)?
|
I have a system with a Tulip NIC in it. Everything had been OK, until I
tried going to a newer Mandrake (8.x or 9.0, yes old story): during the
install I got the "link beat lost". Cause of the problem: the newer
Mandrake used another driver for the Tulip NIC.
So, yes, it could be a driver thing.
I never solved this and stayed on the old Mandrake.
De Kameel |
|
| Back to top |
|
 |
Moe Trin *nix forums Guru
Joined: 20 Feb 2005
Posts: 972
|
Posted: Thu Jun 23, 2005 10:07 pm Post subject:
Re: what does "link beat lost" indicate?
|
|
|
In the Usenet newsgroup alt.os.linux.mandrake, in article
<d9c291$grh$1@naig.caltech.edu>, David Mathog wrote:
| Quote: | For instance, if the interface is very heavily loaded I worry that
ifplugd might not be able to access it as often as it wants, and so
throws a "link beat lost" message. ifconfig shows for this interface:
|
Close - it definitely is a load problem on your computer.
| Quote: | eth1 Link encap:Ethernet HWaddr 00:E0:81:22:2F:E7
|
[compton ~]$ etherwhois 00:E0:81
00-E0-81 (hex) TYAN COMPUTER CORP.
00E081 (base 16) TYAN COMPUTER CORP.
1753 S. MAIN STREET
MILPITAS CA 95035
UNITED STATES
00E081 Tyan Computer Corp. Onboard Intel 82558 10/100
00E081 Tyan Computer Corp. also reported as a 3C982 (3c59x.c)
[compton ~]$
Which driver is being used? Look at the boot messages, or 'lsmod'
| Quote: | UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:85160803 errors:0 dropped:0 overruns:8356 frame:0
|
You are doing something that isn't leaving enough time for the system
to get the packets off the network stack before they get overwritten
by the next one. _NORMALLY_ this can be a DMA problem, or an IRQ
saturation.
| Quote: | which looks reasonably healthy to me, with no errors and no dropped.
|
8356 / 85160803 is a small number, but it _should_ be zero.
| Quote: | Interrupt:19 Base address:0x2480
|
IRQ 19 is _normally_ a low priority interrupt.
| Quote: | ifplugd is running at prio 16 (in top) as is atalkd, smbd and just
about every other network daemon, could this just be the result
of a busy system plus busy network?
|
Do you need all the extra network stuff? Saw a similar problem recently
where the system was getting the snot beat out of it, and AN improvement
was had by tweaking kernel parameters. Try a repost over in the main line
group 'comp.os.linux.networking'.
Old guy |
|
| Back to top |
|
 |
Moe Trin *nix forums Guru
Joined: 20 Feb 2005
Posts: 972
|
Posted: Sat Jun 25, 2005 9:21 pm Post subject:
Re: what does "link beat lost" indicate?
|
|
|
In the Usenet newsgroup comp.os.linux.networking, in article
<d9hccs$apm$1@naig.caltech.edu>, David Mathog wrote:
| Quote: | Well, ideally. The compute nodes, which are Tyan S2466MPX boards
also show a small number of RX overruns when heavily loaded. But
they don't show the link beat lost messages.
|
Just for giggles, what kind of loads are you running? It might be that
the head node is more heavily loaded than the compute nodes.
| Quote: | While both of these are SMP boards, only the head node actually has
two processors installed. Could it be an SMP issue?
|
I don't know why it would be, but I have limited experience with SMP.
Interestingly,
| Quote: | cat /proc/interrupts (on the head node, which has this problem)
CPU0 CPU1
0: 5279498 1294858489 IO-APIC-edge timer
19: 62326629 78396949 IO-APIC-level ohci_hcd, eth1
|
The interrupts are (more or less) evenly split,
| Quote: | 20: 21834581 17790232 IO-APIC-level aic7xxx
21: 10723291 12359846 IO-APIC-level aic7xxx
|
and you're throwing a lot of bits at the SCSI chains. I still wonder if
this isn't a DMA issue.
| Quote: | now on one of the compute nodes (which don't have "link beat lost")
|
yeah, but notice that the eth0 is dominating the IRQs - wayyyy beyond the
IRQ 0 timer.
| Quote: | So the head node is sharing that interrupt with, umm, the USB
controller. There are no USB devices attached to this system.
|
and that should negate the sharing as a problem, unless there is an
electrically noisy location (and I'd expect that to show as errors
elsewhere).
| Quote: | Looks like I'll need to go into the BIOS to change the interrupts
so that 19 isn't shared, assuming that is even possible. Have
to wait until the server can be shut down to try this.
|
Look at the DMA setup first.
| Quote: | Do you need all the extra network stuff?
Yes.
|
I was just checking - AppleTalk and Samba in addition to regular TCP/IP?
| Quote: | Try a repost over in the main line group 'comp.os.linux.networking'.
Cross posted this response there.
|
Hopefully, we'll see additional responses.
You're quite welcome - wish I was being more helpful.
Old guy |
|
| Back to top |
|
 |
Google
|
|
| Back to top |
|
 |
|
|
The time now is Thu Jan 08, 2009 11:48 pm | All times are GMT
|
|
Bankruptcy | Bankruptcy | Credit Cards | Fantasy | Neopets Cheats, Games and Neopoints
|
|
Copyright © 2004-2005 DeniX Solutions SRL
|
|
|
|
Other DeniX Solutions sites:
Unix/Linux blog |
electronics forum |
medicine forum |
science forum |
|
|
Privacy Policy
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|