niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » *nix » Linux » Distributions » Mandrake
what does "link beat lost" indicate?
Post new topic   Reply to topic Page 1 of 1 [6 Posts] View previous topic :: View next topic
Author Message
David Mathog
*nix forums Guru Wannabe


Joined: 23 Feb 2005
Posts: 145

PostPosted: Tue Jun 21, 2005 3:47 pm    Post subject: what does "link beat lost" indicate? Reply with quote

What does it indicate when an interface shows pairs of
messages like these

Jun 21 10:09:20 myserver ifplugd(eth1)[1607]: Link beat lost.
Jun 21 10:09:21 myserver ifplugd(eth1)[1607]: Link beat detected.

when under heavy load?

In all cases these "outages" last exactly 1 second (or at
least the lost/detected are logged one second apart).

This happens when the interface is heavily loaded either
transmitting or receiving. (Or probably both at once, but
I've not actually tested that.)

For instance, today it was triggered several times by a ufsdump
from a Solaris 8 box to the linux box. So far this has generated
4 of these events over about 50 minutes.

Thanks,

David Mathog
mathog@caltech.edu
Back to top
Darren
*nix forums addict


Joined: 01 Mar 2005
Posts: 84

PostPosted: Wed Jun 22, 2005 6:24 am    Post subject: Re: what does "link beat lost" indicate? Reply with quote

David Mathog wrote:

Quote:
What does it indicate when an interface shows pairs of
messages like these

Jun 21 10:09:20 myserver ifplugd(eth1)[1607]: Link beat lost.
Jun 21 10:09:21 myserver ifplugd(eth1)[1607]: Link beat detected.

when under heavy load?

In all cases these "outages" last exactly 1 second (or at
least the lost/detected are logged one second apart).

This happens when the interface is heavily loaded either
transmitting or receiving. (Or probably both at once, but
I've not actually tested that.)

For instance, today it was triggered several times by a ufsdump
from a Solaris 8 box to the linux box. So far this has generated
4 of these events over about 50 minutes.

Thanks,

David Mathog
mathog@caltech.edu
The ifplugd daemon is detecting a cable disconnection. I would say you

should check your cable/plug integrity, and since your problem is not so
random you may also need to look at your ethernet cards some components may
be failing under heavy traffic.
--
Peace
Back to top
David Mathog
*nix forums Guru Wannabe


Joined: 23 Feb 2005
Posts: 145

PostPosted: Wed Jun 22, 2005 2:05 pm    Post subject: Re: what does "link beat lost" indicate? Reply with quote

Darren wrote:
Quote:
David Mathog wrote:

What does it indicate when an interface shows pairs of
messages like these

Jun 21 10:09:20 myserver ifplugd(eth1)[1607]: Link beat lost.
Jun 21 10:09:21 myserver ifplugd(eth1)[1607]: Link beat detected.

when under heavy load?

The ifplugd daemon is detecting a cable disconnection. I would say you
should check your cable/plug integrity, and since your problem is not so
random you may also need to look at your ethernet cards some components may
be failing under heavy traffic.

The cables are all fine. How do I distinguish between an iffy
switch (one possible culprit) and some bug in the linux driver and or
tcp/ip stack and/or ifplugd (all equally likely)? For instance,
if the interface is very heavily loaded I worry that ifplugd might
not be able to access it as often as it wants, and so throws a "link
beat lost" message. ifconfig shows for this interface:

eth1 Link encap:Ethernet HWaddr 00:E0:81:22:2F:E7
inet addr:192.168.1.220
Bcast:192.168.1.255
Mask:255.255.255.0
inet6 addr: fe80::2e0:81ff:fe22:2fe7/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:85160803 errors:0 dropped:0 overruns:8356 frame:0
TX packets:63192569 errors:0 dropped:0 overruns:0 carrier:1
collisions:0 txqueuelen:1000
RX bytes:684742770 (653.0 Mb) TX bytes:390586093 (372.4 Mb)
Interrupt:19 Base address:0x2480

which looks reasonably healthy to me, with no errors and no dropped.
ifplugd is running at prio 16 (in top) as is atalkd, smbd and just
about every other network daemon, could this just be the result
of a busy system plus busy network? Perhaps I should just crank
the ifplugd poll time up from 1 to 5 seconds and then these messages
will go away?


I suppose I could also borrow another switch and see if it does the same
thing under load. That wouldn't be as definitive as I'd like though
since this bug, wherever it is, could be timing dependent, and that
might vary in very subtle ways between switches.

Can you think of a network diagnostic tool, preferably one that
runs on DOS or the lightest possible linux, to separate out these
variables?

Thanks,

David Mathog
mathog@caltech.edu
Back to top
De Kameel
*nix forums addict


Joined: 02 Mar 2005
Posts: 84

PostPosted: Wed Jun 22, 2005 6:38 pm    Post subject: Re: what does "link beat lost" indicate? Reply with quote

David Mathog wrote:

Quote:
The cables are all fine.  How do I distinguish between an iffy
switch (one possible culprit) and some bug in the linux driver and or
tcp/ip stack and/or ifplugd (all equally likely)?

I have a system with a Tulip NIC in it. Everything had been OK, until I
tried going to a newer Mandrake (8.x or 9.0, yes old story): during the
install I got the "link beat lost". Cause of the problem: the newer
Mandrake used another driver for the Tulip NIC.

So, yes, it could be a driver thing.

I never solved this and stayed on the old Mandrake.

De Kameel
Back to top
Moe Trin
*nix forums Guru


Joined: 20 Feb 2005
Posts: 972

PostPosted: Thu Jun 23, 2005 10:07 pm    Post subject: Re: what does "link beat lost" indicate? Reply with quote

In the Usenet newsgroup alt.os.linux.mandrake, in article
<d9c291$grh$1@naig.caltech.edu>, David Mathog wrote:

Quote:
For instance, if the interface is very heavily loaded I worry that
ifplugd might not be able to access it as often as it wants, and so
throws a "link beat lost" message. ifconfig shows for this interface:

Close - it definitely is a load problem on your computer.

Quote:
eth1 Link encap:Ethernet HWaddr 00:E0:81:22:2F:E7

[compton ~]$ etherwhois 00:E0:81
00-E0-81 (hex) TYAN COMPUTER CORP.
00E081 (base 16) TYAN COMPUTER CORP.
1753 S. MAIN STREET
MILPITAS CA 95035
UNITED STATES
00E081 Tyan Computer Corp. Onboard Intel 82558 10/100
00E081 Tyan Computer Corp. also reported as a 3C982 (3c59x.c)
[compton ~]$

Which driver is being used? Look at the boot messages, or 'lsmod'

Quote:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:85160803 errors:0 dropped:0 overruns:8356 frame:0

You are doing something that isn't leaving enough time for the system
to get the packets off the network stack before they get overwritten
by the next one. _NORMALLY_ this can be a DMA problem, or an IRQ
saturation.

Quote:
which looks reasonably healthy to me, with no errors and no dropped.

8356 / 85160803 is a small number, but it _should_ be zero.

Quote:
Interrupt:19 Base address:0x2480

IRQ 19 is _normally_ a low priority interrupt.

Quote:
ifplugd is running at prio 16 (in top) as is atalkd, smbd and just
about every other network daemon, could this just be the result
of a busy system plus busy network?

Do you need all the extra network stuff? Saw a similar problem recently
where the system was getting the snot beat out of it, and AN improvement
was had by tweaking kernel parameters. Try a repost over in the main line
group 'comp.os.linux.networking'.

Old guy
Back to top
Moe Trin
*nix forums Guru


Joined: 20 Feb 2005
Posts: 972

PostPosted: Sat Jun 25, 2005 9:21 pm    Post subject: Re: what does "link beat lost" indicate? Reply with quote

In the Usenet newsgroup comp.os.linux.networking, in article
<d9hccs$apm$1@naig.caltech.edu>, David Mathog wrote:

Quote:
Well, ideally. The compute nodes, which are Tyan S2466MPX boards
also show a small number of RX overruns when heavily loaded. But
they don't show the link beat lost messages.

Just for giggles, what kind of loads are you running? It might be that
the head node is more heavily loaded than the compute nodes.

Quote:
While both of these are SMP boards, only the head node actually has
two processors installed. Could it be an SMP issue?

I don't know why it would be, but I have limited experience with SMP.
Interestingly,

Quote:
cat /proc/interrupts (on the head node, which has this problem)
CPU0 CPU1
0: 5279498 1294858489 IO-APIC-edge timer

19: 62326629 78396949 IO-APIC-level ohci_hcd, eth1

The interrupts are (more or less) evenly split,

Quote:
20: 21834581 17790232 IO-APIC-level aic7xxx
21: 10723291 12359846 IO-APIC-level aic7xxx

and you're throwing a lot of bits at the SCSI chains. I still wonder if
this isn't a DMA issue.

Quote:
now on one of the compute nodes (which don't have "link beat lost")

yeah, but notice that the eth0 is dominating the IRQs - wayyyy beyond the
IRQ 0 timer.

Quote:
So the head node is sharing that interrupt with, umm, the USB
controller. There are no USB devices attached to this system.

and that should negate the sharing as a problem, unless there is an
electrically noisy location (and I'd expect that to show as errors
elsewhere).

Quote:
Looks like I'll need to go into the BIOS to change the interrupts
so that 19 isn't shared, assuming that is even possible. Have
to wait until the server can be shut down to try this.

Look at the DMA setup first.

Quote:
Do you need all the extra network stuff?

Yes.

I was just checking - AppleTalk and Samba in addition to regular TCP/IP?

Quote:
Try a repost over in the main line group 'comp.os.linux.networking'.

Cross posted this response there.

Hopefully, we'll see additional responses.

Quote:
Thanks,

You're quite welcome - wish I was being more helpful.

Old guy
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [6 Posts] View previous topic :: View next topic
The time now is Thu Jan 08, 2009 11:48 pm | All times are GMT
navigation Forum index » *nix » Linux » Distributions » Mandrake
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts CGI.pm and lost carriage returns Joseph Czapski Perl 21 Thu Jul 20, 2006 3:52 pm
No new posts link to part of a file Weiguang Shi shell 2 Tue Jul 18, 2006 10:45 pm
No new posts WARNING: Hard link count is wrong for /selinux amit Setup 8 Mon Jul 17, 2006 7:21 pm
No new posts HELP: lost part of /usr/sbin, have backup but no frecover... John Burns HP-UX 0 Mon Jul 17, 2006 6:55 pm
No new posts Lost my mouse and agpgrat after a reset yasker Debian 2 Sat Jul 15, 2006 8:10 am

Bankruptcy | Bankruptcy | Credit Cards | Fantasy | Neopets Cheats, Games and Neopoints
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2157s ][ Queries: 16 (0.1107s) ][ GZIP on - Debug on ]