niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » *nix » Tru64 » Tru64 managers mail-list
Tru64 v5.1: AdvFS file domain panic
Post new topic   Reply to topic Page 1 of 1 [1 Post] View previous topic :: View next topic
Author Message
Uwe Lienig
*nix forums beginner


Joined: 06 Aug 2002
Posts: 18

PostPosted: Tue Jun 06, 2006 2:50 pm    Post subject: Tru64 v5.1: AdvFS file domain panic Reply with quote

Dear managers,

today I had a serious AdvFS domain panic causing a total loss of one domain. The
explanation gets a bit longer, since I would give a more detailed information of
what I've done.

But first the necessary OS details:

system: AS 1200 5/533, 2 CPU
(CPU no and memsize changed during error searching)
harddisks: dsk0: RZ1DF-CB, 9 Gbyte
dsk1: DRHS36V, 36 GByte
dsk2: sgtst336704lc, 36 GByte
dsk3: sgtst336704lc 36 GByte
dsk4: ST336705 36 GByte
dsk5: OXYGENRAID, RAID5-Array, 8x160 GByte, 1 TByte netto
OS: TruUNIX v5.1, at time of AdvFS panic patch level 5, now patch level 6
advfs license installed

The message log from the last successful boot is as follows

Jun 6 12:44:45 muxs0et0 vmunix: Alpha boot: available memory from 0x1110000 to
0x2fffc000
Jun 6 12:44:45 muxs0et0 vmunix: Compaq Tru64 UNIX V5.1 (Rev. 732); Tue Jun 6
12:42:27 CEST 2006
Jun 6 12:44:45 muxs0et0 vmunix: physical memory = 512.00 megabytes.
Jun 6 12:44:45 muxs0et0 vmunix: available memory = 490.97 megabytes.
Jun 6 12:44:45 muxs0et0 vmunix: using 1930 buffers containing 15.07 megabytes
of memory
Jun 6 12:44:45 muxs0et0 vmunix: Master cpu at slot 0
Jun 6 12:44:45 muxs0et0 vmunix: Starting secondary cpu 1
Jun 6 12:44:45 muxs0et0 vmunix: Firmware revision: 6.0
Jun 6 12:44:45 muxs0et0 vmunix: PALcode: UNIX version 1.23
Jun 6 12:44:45 muxs0et0 vmunix: AlphaServer 1200 5/533 4MB
Jun 6 12:44:45 muxs0et0 vmunix: pci1 (primary bus:1) at mcbus0 slot 5
Jun 6 12:44:45 muxs0et0 vmunix: Loading SIOP: script c0000000, reg 7feef00,
data c000a000
Jun 6 12:44:45 muxs0et0 vmunix: scsi0 at psiop0 slot 0 rad 0
Jun 6 12:44:45 muxs0et0 vmunix: isp0 at pci1 slot 2
Jun 6 12:44:45 muxs0et0 vmunix: isp0: QLOGIC ISP1040B/V2

History
==========

A while back I had a system crash with the following error:

Apr 6 14:39:44 muxs0et0 vmunix:
Apr 6 14:39:45 muxs0et0 vmunix: idx_create_index_file: bmtr_put_rec failed
Apr 6 14:39:45 muxs0et0 vmunix: AdvFS Domain Panic; Domain raid_pdmn Id
0x3e3af2e6.00095d85
Apr 6 14:39:45 muxs0et0 vmunix: An AdvFS domain panic has occurred due to
either a metadata write error or an internal inconsistency. T
his domain is being rendered inaccessible.
Apr 6 14:39:45 muxs0et0 vmunix: Please refer to guidelines in AdvFS Guide to
File System Administration regarding what steps to take to
recover this domain.
Apr 6 14:59:24 muxs0et0 vmunix: NFS server: stale file handle fs(2869,368282)
file 2 gen 32769
Apr 6 14:59:24 muxs0et0 vmunix: RFS3_FSSTAT, client address = 141.56.22.41,
errno 5
Apr 6 15:00:33 muxs0et0 vmunix: AdvFS I/O error:
Apr 6 15:00:34 muxs0et0 vmunix: A read failure occurred - the AdvFS domain
is inaccessible (paniced)
Apr 6 15:00:34 muxs0et0 vmunix: Domain#Fileset: raid_pdmn#projekte
Apr 6 15:00:34 muxs0et0 vmunix: Mounted on: /Projekte
Apr 6 15:00:34 muxs0et0 vmunix: Volume: /dev/disk/dsk5d
Apr 6 15:00:34 muxs0et0 vmunix: Tag: 0x00000001.8001
Apr 6 15:00:34 muxs0et0 vmunix: Page: 50371
Apr 6 15:00:34 muxs0et0 vmunix: Block: 119461568
Apr 6 15:00:34 muxs0et0 vmunix: Block count: 16
Apr 6 15:00:34 muxs0et0 vmunix: Type of operation: Read
Apr 6 15:00:34 muxs0et0 vmunix: Error: 5
Apr 6 15:00:34 muxs0et0 vmunix: EEI: 0x300
Apr 6 15:01:43 muxs0et0 vmunix: AdvFS I/O error:
Apr 6 15:01:43 muxs0et0 vmunix: A read failure occurred - the AdvFS domain
is inaccessible (paniced)
Apr 6 15:01:43 muxs0et0 vmunix: Domain#Fileset: raid_pdmn#projekte
Apr 6 15:01:43 muxs0et0 vmunix: Mounted on: /Projekte
Apr 6 15:01:43 muxs0et0 vmunix: Volume: /dev/disk/dsk5d
Apr 6 15:01:43 muxs0et0 vmunix: Tag: 0x00000004.8001
Apr 6 15:01:43 muxs0et0 vmunix: Page: 0
Apr 6 15:01:43 muxs0et0 vmunix: Block: 182107584
Apr 6 15:01:43 muxs0et0 vmunix: Block count: 16
Apr 6 15:01:43 muxs0et0 vmunix: Type of operation: Read
Apr 6 15:01:43 muxs0et0 vmunix: Error: 5
Apr 6 15:01:43 muxs0et0 vmunix: EEI: 0x300
Apr 6 15:01:43 muxs0et0 vmunix: To obtain the name of the file on which
Apr 6 15:01:43 muxs0et0 vmunix: the error occurred, type the command:
Apr 6 15:01:43 muxs0et0 vmunix: /sbin/advfs/tag2name /Projekte/.tags/4
Apr 6 15:06:18 muxs0et0 vmunix: panic (cpu 0): kernel memory fault
Apr 6 15:06:18 muxs0et0 vmunix: syncing disks... 85 device string for dump =
SCSI 1 2 0 0 0 0 0.
Apr 6 15:06:18 muxs0et0 vmunix: DUMP.prom: dev SCSI 1 2 0 0 0 0 0, block 524288
Apr 6 15:06:18 muxs0et0 vmunix: device string for dump = SCSI 1 2 0 0 0 0 0.
Apr 6 15:06:18 muxs0et0 vmunix: DUMP.prom: dev SCSI 1 2 0 0 0 0 0, block 524288


The domain resides on the RAID-Array. The RAID-Array is running for about 2
years without any problem. The RAID-Array was partitioned into 8 partitions with
the following layout (comments removed):

# /dev/rdisk/dsk5c:
type: SCSI
disk: OXYGENRA
label:
flags: dynamic_geometry
bytes/sector: 512
sectors/track: 255
tracks/cylinder: 255
sectors/cylinder: 65025
cylinders: 38955
sectors/unit: 2147483647
rpm: 5411
interleave: 1
trackskew: 14
cylinderskew: 23
headswitch: 0 # milliseconds
track-to-track seek: 0 # milliseconds
drivedata: 0

8 partitions:
# size offset fstype [fsize bsize cpg]
a: 335544320 0 unused 0 0
b: 335544320 335544320 AdvFS
c: 2147483647 0 unused 0 0
d: 335544320 671088640 AdvFS
e: 335544320 1006632960 AdvFS 0 0
f: 335544320 1342177280 AdvFS 0 0
g: 335544320 1677721600 unused 0 0
h: 134217727 2013265920 AdvFS

The domain raid_pdmn consisted of the partitions 'd', 'e' and 'f' of the RAID
array (dsk5). One partition is 160 GB. The whole domain has therefore 480 GB.

I rebooted the system and everything worked without any hassle. On Friday, June
2nd, the system went down again. The syslog string was

Jun 2 14:38:07 muxs0et0 vmunix:
Jun 2 14:38:07 muxs0et0 vmunix: idx_create_index_file: bmtr_put_rec failed
Jun 2 14:38:07 muxs0et0 vmunix: AdvFS Domain Panic; Domain raid_pdmn Id \
0x3e3af2e6.00095d85
Jun 2 14:38:07 muxs0et0 vmunix: An AdvFS domain panic has occurred due to \
either a metadata write error or an internal \

inconsistency. This domain is being rendered \
inaccessible.
Jun 2 14:38:07 muxs0et0 vmunix: Please refer to guidelines in AdvFS Guide to \
File System Administration regarding what \
steps to take to recover this domain.

After that I reseated every mem module, cleaned the system from dust and so on.
After restarting the power-up tests failed with

IOD0 failed power-up selft test
IOD1 failed power-up selft test

Removing one CPU and populating only mem bank 0 with 256 MB (yes, I used both
mem cards) showed immediately CPU MEM test errors. After putting in mem without
errors the system came up again but kept falling over AdvFS errors. fixfdmn
rendered the domain raid_pdmn unusable. Nearly every directory in the root dir
of this file domain was removed. I tried to delete the file set. The system fell
over again! After that I had to remove the file domain by hand (removing the
entry in /etc/fdmns, setting the disklabel of dsk5{d,e,f} to unused). After that
I recreated raid_pdmn with

mkfdmn /dev/disk/dsk5d raid_pdmn
addvol /dev/disk/dsk5e raid_pdmn
addvol /dev/disk/dsk5f raid_pdmn
mkfset raid_pdmn projekte

Fortunately I'm running TIVOLI. After the domain was newly created I started
restoring everything from backup. But, even if the backup is stored on another
system on a raid system (no tape) the backup would take a considerable amount of
time.

Right after the beginning of the restoring process the system paniced again.
Even the newly created domain produced errors after some MB transferred from
backup. I was stumped! This domain has to become online immediately! All our
projects depend on this file domain!

Ok, I had a look into the latest patch kit I had downloaded in Oct 2003. It was
PK-06 for v5.1. Yes, I know I should upgrade to v5.1B, but this takes some more
time. So I decided to install PK-06. And I changed the domain layout to contain
only one partition as follows:

dsk5c
8 partitions:
# size offset fstype [fsize bsize cpg]
a: 335544320 0 unused 0 0
b: 335544320 335544320 AdvFS
c: 2147483647 0 unused 0 0
d: 1006632960 671088640 AdvFS
e: 0 0 unused 0 0
f: 0 0 unused 0 0
g: 335544320 1677721600 unused 0 0
h: 134217727 2013265920 AdvFS

raid_pdmn now consists only of dsk5d, that is now 480GB.

Due to the changes I'm very suspicious about the reliability of the failing
domain. I have no idea, why the domain in question paniced nor do I know what
caused the various panics. I'm not sure if there was any hardware error involved
in this.

Right now the system restores the data from backup. restoration is running for 3
hours now. I'm hoping everything will be restored without problem and data
corruption. But I'm not really sure. And I'd like to know what caused the panic.
Is it any known error?

Last but not least I have 6 mem modules lying around that I'm not sure if they
are ok. How to test the mem? System down time can easily be arranged but is
limited in the amount of time (say 2 or 3 hours) or at weekend. Where to get new
modules for not to much mem. Am I right, that the AS1200 uses PC-100 SDRAM with
parity?

OK. Thank you everyone who read to the end. It's become rather long. I hope I
didn't forget any useful information. Don't hesitate to ask me.

I'd like to know, if the AS1200 will work for the future as it has done for the
past 6 years. I love these Alpha systems. But now I'm anxious about the
stability of my AS1200.

Any hint is welcome. Many thanks in advance.

Best regards
--


Uwe Lienig
----------
fon: (+49 351) 462 2780
fax: (+49 351) 462 3476
mailto:uwe.lienig@fif.mw.htw-dresden.de

Forschungsinstitut Fahrzeugtechnik
<http://www.fif.mw.htw-dresden.de>
parcels: Gutzkowstr. 22, 01069 Dresden
letters: PF 12 07 01, 01008 Dresden

Hochschule für Technik und Wirtschaft Dresden (FH)
Friedrich-List-Platz 1, 01069 Dresden
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [1 Post] View previous topic :: View next topic
The time now is Wed Dec 03, 2008 11:40 pm | All times are GMT
navigation Forum index » *nix » Tru64 » Tru64 managers mail-list
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Postfix relay to specific domain and to everybody else golgotha Postfix 0 Thu Oct 23, 2008 11:11 pm
No new posts Need to convert domain name before relaying jfinn Postfix 0 Tue Sep 16, 2008 12:51 pm
No new posts Filtering internal mail and forwarding to another domain fmaa1 Postfix 0 Wed Apr 30, 2008 12:50 pm
No new posts AlphaServer/Tru64 reliability daveGPearson Tru64 0 Fri Mar 28, 2008 10:00 am
No new posts AlphaServer/Tru64 reliability daveGPearson Tru64 0 Fri Mar 28, 2008 9:59 am

Current Accounts | Computeach | Credit Card | Remortgages | Mobile Phone
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 28.1435s ][ Queries: 16 (27.9277s) ][ GZIP on - Debug on ]