Discussion:
xfs kernel panic raid5+lvm+sata eseten
KORN Andras
2005-05-11 10:39:11 UTC
Permalink
Joreggelt,

van itt nekem 4db SATA diszkem, amin csinaltam software raid5-ot (/dev/md2).
A /dev/md2-bol csinaltam egy LVM PV-ot, amibol aztan csinaltam egy LVM VG-ot
(csak az a PV van benne).

Ezutan letrehoztam par LV-ot a VG-ban, csinaltam rajuk xfs-t, es elkezdtem
(volna) atmigralni ra egy csomo adatot egy 4 IDE diszkbol allo LVM VG-bol,
rsync-kel. Csak nehany tucat megabyte utan mindig oops, panic, reboot
kovetkezett.

Kiprobaltam JFS-sel xfs helyett, azzal jo. Ebbol arra kovetkeztetek, hogy az
xfs-sel van a gond.

Probaltam 2.6.11.7-es es 2.6.12-rc4-es kernellel is, ugyanaz. Az oops sajnos
nem orzodik meg a logban es a netconsole-ra se kuldi el, a soros terminalbol
meg kiscrolloz, mert tul hosszu (sajnos nem tudom esszeru erofeszitessel
olyan dologra dugni a soroskabelt, amiben van scrollback).

Valamennyi azert van belole, de nekem nem tunik ertelmesnek:

2005-05-08_21:51:14.09327 192.168.0.4: kern.warn: ------------[ cut here ]------------
2005-05-08_21:51:14.14017 192.168.0.4: kernel BUG at kernel/sched.c:2634!
2005-05-08_21:51:14.18145 192.168.0.4: invalid operand: 0000 [#1]
2005-05-08_21:51:14.21347 192.168.0.4: PREEMPT
2005-05-08_21:51:14.22262 192.168.0.4:
2005-05-08_21:51:14.22490 192.168.0.4: Modules linked in:

[a rovidseg kedveert ezt a reszt tomoritem kicsit]

raid1 raid0 raid5 xor nfsd lockd sunrpc sg sr_mod police sch_ingress cls_u32
sch_sfq ipt_REDIRECT ipt_MASQUERADE ipt_state ipt_limit ipt_REJECT ipt_LOG
ip_nat_ftp ip_conntrack_ftp iptable_filter iptable_mangle iptable_nat
ip_conntrack ip_tables ppp_async crc_ccitt bsd_comp ppp_deflate zlib_inflate
zlib_deflate ppp_generic slhc netconsole bridge it87 ds1621 i2c_savage4
i2c_algo_bit via686a ip_queue lp dm_mod sch_htb tun autofs4 ne2k_pci 8390
tulip crc32 parport_pc parport w83781d eeprom i2c_sensor i2c_isa i2c_viapro
i2c_core sd_mod uhci_hcd ehci_hcd ide_cd cdrom md

2005-05-08_21:51:14.89867 192.168.0.4:
2005-05-08_21:51:14.90096 192.168.0.4: CPU: 0
2005-05-08_21:51:14.91356 192.168.0.4: EIP: 0060:[<c0119668>] Not tainted VLI
2005-05-08_21:51:14.96616 192.168.0.4: EFLAGS: 00010086 (2.6.11.7-hellgate-skas3-v8-rc2)
2005-05-08_21:51:15.02793 192.168.0.4: EIP is at add_preempt_count+0x28/0x40
2005-05-08_21:51:15.07254 192.168.0.4: eax: b4fb8035 ebx: c0119668 ecx: 00000001 edx: edecc000
2005-05-08_21:51:15.14459 192.168.0.4: esi: 00000000 edi: 00000086 ebp: edecc070 esp: edecc070
2005-05-08_21:51:15.21664 192.168.0.4: ds: 007b es: 007b ss: 0068
2005-05-08_21:51:15.25326 192.168.0.4: Unable to handle kernel NULL pointer dereference
2005-05-08_21:51:15.30824 192.168.0.4: at virtual address 00000080
2005-05-08_21:51:15.34259 192.168.0.4: printing eip:
2005-05-08_21:51:15.36098 192.168.0.4: c011608d
2005-05-08_21:51:15.37229 192.168.0.4: *pde = 00000000
2005-05-08_21:51:15.39173 192.168.0.4: Oops: 0000 [#2]
2005-05-08_21:51:15.41118 192.168.0.4: PREEMPT
2005-05-08_21:51:15.42032 192.168.0.4:
2005-05-08_21:51:15.42261 192.168.0.4: Modules linked in:
[mint fent]
2005-05-08_21:51:16.09857 192.168.0.4: CPU: 0
2005-05-08_21:51:16.11115 192.168.0.4: EIP: 0060:[<c011608d>] Not tainted VLI
2005-05-08_21:51:16.16378 192.168.0.4: EFLAGS: 00010002 (2.6.11.7-hellgate-skas3-v8-rc2)
2005-05-08_21:51:16.22554 192.168.0.4: EIP is at do_page_fault+0xbd/0x63d
2005-05-08_21:51:16.26681 192.168.0.4: eax: edecb000 ebx: edecc070 ecx: edecb0dc edx: 00000000
2005-05-08_21:51:16.33891 192.168.0.4: esi: edecc03c edi: c0115fd0 ebp: edecb188 esp: edecb0c0
2005-05-08_21:51:16.41092 192.168.0.4: ds: 007b es: 007b ss: 0068
[ ilyen nevu processzem biztos nem volt :) ]
2005-05-08_21:51:16.44759 192.168.0.4: Process /Qgi5dGzIsYbL5gQNLwqiDhGIkmV0QTuCuad16LF4cM6i64vHD4ioRGoRWHo1opQ1NrBgGOwHS0c
2005-05-08_21:51:16.54713 192.168.0.4: An4s+LXzdd3HErLlLvipq9fN13ccSsuUu+Kmr9uOhbqxZBWv2BRgMWDFBAmHavGtkAtQBU1PItIJ
2005-05-08_21:51:16.63753 192.168.0.4: SJ9SAYJCSYbAdubeE00QuAJmAADcGJDI7kGq1N7UikSUISEoaDNZgOzSkmdmaQSQRAVmClFAQJqr
2005-05-08_21:51:16.72799 192.168.0.4: FG0FIobvZ0dIgyJKAY7cwCsEE0ELJJJAFa/BMgy4F8EhQBhgMkY/i5hlRXdW5rVJizAMSwY4XIQA
2005-05-08_21:51:16.81834 192.168.0.4: wxXlG4lOAWmBvHDEB8HhgZE+EAsmaQG/O3CZG6Octz7ZFo4BOCrPh173zlyXGSUu+E1eve+cuS4y
2005-05-08_21:51:16.90867 192.168.0.4: Sl3wmrzyWlpO7KH/F+VKw4oMNgrGEMNclp+GAFBKQxibM5eTXINOEgS2QcIoS+WkjprWTDP8hMgH
2005-05-08_21:51:16.99901 192.168.0.4: GRUqgGbkrGokGk1aC6sKKAU00JloSN1KKEiAAvL9bQYIdW2lEFkFXDg9KQGuEoJoRRUbaT93WgnK
2005-05-08_21:51:17.08937 192.168.0.4: EONcoC3EWA6vKsSuO4ZYRsx30Boa2ypNQzKrSWru5bmGCHInyNXvHqkL5kCRr5QJqmt5mHh8h6OA
2005-05-08_21:51:17.17973 192.168.0.4: UeVWe3x863Uq9XmSVJfFTWuPnW6lXq8ySpL4qa1726mW7IFZ0PvzEU/thhpodASjUypQtGo1YyAT
2005-05-08_21:51:17.27008 192.168.0.4: BINzdgEoMkBWspkESwtJWsjcSSG7wwXXQRXMiRzjCoE91AX6tQYTGgrR4qkCSKiFiZkjCgVRNWml
2005-05-08_21:51:17.36048 192.168.0.4: 8mYLN7Mw6g0wSQTd2aQwXNxU5GofihvPHuSI8y6YY5SCq2EAAAAAggAACF0EQ78BAHMBATUAAAAA
2005-05-08_21:51:17.45079 192.168.0.4: CM8FAACfwAEARz4SrHuCMfStgk9bkkIqwpcAlrLtYbBZkgHkAjDgxmec688cnT50KLX0YGkGUi8f
2005-05-08_21:51:17.54115 192.168.0.4: UP1bTLeLE0cS4DolNENGw2B7fXqtovUrclXL4TV9eq2i9StyVcvhNXmFIoppEXCsqf0eNFNIqtkC
2005-05-08_21:51:17.63154 192.168.0.4: q
2005-05-08_21:51:17.63263 192.168.0.4:
2005-05-08_21:51:17.63492 192.168.0.4: Stack:
2005-05-08_21:51:17.64293 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.65322 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.66352 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.67381 192.168.0.4: 00000080
2005-05-08_21:51:17.68410 192.168.0.4: 00000000
2005-05-08_21:51:17.69440 192.168.0.4: 00000000
2005-05-08_21:51:17.70469 192.168.0.4: edecb190
2005-05-08_21:51:17.71499 192.168.0.4: edecb190
2005-05-08_21:51:17.72528 192.168.0.4:
2005-05-08_21:51:17.72756 192.168.0.4:
2005-05-08_21:51:17.73557 192.168.0.4: c03ef830
2005-05-08_21:51:17.74587 192.168.0.4: 00000000
2005-05-08_21:51:17.75616 192.168.0.4: 0000000e
2005-05-08_21:51:17.76648 192.168.0.4: 0000000b
2005-05-08_21:51:17.77675 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.78704 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.79739 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.80765 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.81797 192.168.0.4:
2005-05-08_21:51:17.82021 192.168.0.4:
2005-05-08_21:51:17.82822 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.83851 192.168.0.4: 00030001
2005-05-08_21:51:17.84881 192.168.0.4: a56b6b6b
2005-05-08_21:51:17.85910 192.168.0.4: 5a2cf071
2005-05-08_21:51:17.86940 192.168.0.4: c013849e
2005-05-08_21:51:17.87968 192.168.0.4: 170fc2a5
2005-05-08_21:51:17.88998 192.168.0.4: 0000000f
2005-05-08_21:51:17.90027 192.168.0.4: 00000000
2005-05-08_21:51:17.91056 192.168.0.4:
2005-05-08_21:51:17.91296 192.168.0.4: Call Trace:
2005-05-08_21:51:17.92773 192.168.0.4: =======================
2005-05-08_21:51:17.95747 192.168.0.4: Unable to handle kernel NULL pointer dereference
2005-05-08_21:51:18.01237 192.168.0.4: at virtual address 00000030
2005-05-08_21:51:18.04668 192.168.0.4: printing eip:
2005-05-08_21:51:18.06498 192.168.0.4: c0103f8a
2005-05-08_21:51:18.07648 192.168.0.4: *pde = 00000000
2005-05-08_21:51:18.09586 192.168.0.4: Recursive die() failure, output suppressed
2005-05-08_21:51:18.14618 192.168.0.4:
2005-05-08_21:51:18.14734 192.168.0.4: <0>Kernel panic - not syncing: Fatal exception in interrupt

Ez volt az egyik. Volt egy masik, az latszolag preempt-related volt, ugyhogy
csinaltam uj kernelt preempt nelkul. Azzal is volt oops, de abbol semmi nem
maradt meg a logban.

A gep amugy egy ilyen:

0000:00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0269
0000:00:00.1 Host bridge: VIA Technologies, Inc.: Unknown device 1269
0000:00:00.2 Host bridge: VIA Technologies, Inc.: Unknown device 2269
0000:00:00.3 Host bridge: VIA Technologies, Inc.: Unknown device 3269
0000:00:00.4 Host bridge: VIA Technologies, Inc.: Unknown device 4269
0000:00:00.7 Host bridge: VIA Technologies, Inc.: Unknown device 7269
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
0000:00:09.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
0000:00:0a.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
0000:00:0b.0 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
0000:00:0b.1 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
0000:00:0c.0 Network controller: AVM Audiovisuelles MKTG & Computer System GmbH A1 ISDN [Fritz] (rev 02)
0000:00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
0000:00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South]
0000:00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
0000:00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
0000:00:14.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
0000:01:00.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04)

Diszkek:

[4294692.658000] SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
[4294693.137000] SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
[4294693.614000] SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
[4294694.098000] SCSI device sdd: 488397168 512-byte hdwr sectors (250059 MB)
(ezeken van a raid5)

[4294703.358000] SCSI device sde: 78177792 512-byte hdwr sectors (40027 MB)
(ez usb-n van, a masik LVM VG resze)

[4294681.885000] hdi: SAMSUNG SP1614N, ATA DISK drive
[4294682.203000] hdj: ST3120023A, ATA DISK drive
[4294684.116000] hda: QUANTUM FIREBALL EL5.1A, ATA DISK drive
[4294684.443000] hdb: SAMSUNG SP1614N, ATA DISK drive
[4294685.667000] hdd: MATSHITADVD-ROM SR-8587, ATAPI CD/DVD-ROM drive
[4294687.473000] hdi: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
[4294687.769000] hdj: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
[4294688.076000] hda: 10018890 sectors (5129 MB) w/418KiB Cache, CHS=10602/15/63, UDMA(33)
[4294688.643000] hdb: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
[4294688.886000] hdd: ATAPI 48X DVD-ROM drive, 256kB Cache, UDMA(33)
(A hda-t es a hdd-t kiveve ezek is mind benne vannak a masik VG-ben)

/proc/mdstat:
Personalities : [raid1] [raid5]
md2 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
731840832 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
[===========>.........] resync = 58.9% (143859328/243946944) finish=34.3min speed=48576K/sec
md1 : active raid1 sdb1[0] sdd1[1]
248896 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdc1[1]
248896 blocks [2/2] [UU]

unused devices: <none>

mdadm --detail:

/dev/md2:
Version : 00.90.01
Creation Time : Sun May 8 21:36:18 2005
Raid Level : raid5
Array Size : 731840832 (697.94 GiB 749.41 GB)
Device Size : 243946944 (232.65 GiB 249.80 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Wed May 11 11:37:39 2005
State : active, resyncing
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Rebuild Status : 59% complete

UUID : 8ac805d9:cbd60f15:f86ab2db:bd0c9e90
Events : 0.2123

Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/scsi/host0/bus0/target0/lun0/part2
1 8 18 1 active sync /dev/scsi/host1/bus0/target0/lun0/part2
2 8 34 2 active sync /dev/scsi/host2/bus0/target0/lun0/part2
3 8 50 3 active sync /dev/scsi/host3/bus0/target0/lun0/part2

Az xfs-t -i size=512 -d unwritten=0 -l version=2 opciokkal hoztam letre
(kiserleteztem a raid stripe size-hoz igazito opciokkal is, de ugy is
elszallt).

Mas, az enyemtol majdnem minden lenyeges pontban eltero gepen ugyanez a
problema nem lep fel (lehet 4 diszkbol allo raid5-on letrehozott xfs-es
LV-re rsyncelni).

Kerdesek:

- ismertek-e esetleg workaroundot (azon kivul, hogy nem hasznalok xfs-t,
raidet, satat es lvm-et :)?

- vajon melyik alrendszer(ek egyuttallasa) okozza a problemat?

- kinek lenne erdemes bogarjelentest kuldeni? Milyen egyeb infora lenne
szukseg a hiba okanak felderitesehez? Csak erosen korlatozott mennyisegu ido
all rendelkezesre tovabbi kiserletek lefolytatasahoz...

- mas is latott mar ilyet?

- vajon csak ido/terheles kerdese, hogy a jfs-sel is ezt kezdje jatszani?

Guy
--
Andras Korn <korn at chardonnay.math.bme.hu>
<http://chardonnay.math.bme.hu/~korn/> QOTD:
Thank you for holding your breath while I smoke.
KELEMEN Peter
2005-05-11 11:07:25 UTC
Permalink
2005-05-08_21:51:16.44759 192.168.0.4: Process /Qgi5dGzIsYbL5[...]
[...]
2005-05-08_21:51:17.63154 192.168.0.4: q
2005-05-08_21:51:17.64293 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.65322 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.66352 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.67381 192.168.0.4: 00000080
2005-05-08_21:51:17.68410 192.168.0.4: 00000000
2005-05-08_21:51:17.69440 192.168.0.4: 00000000
2005-05-08_21:51:17.70469 192.168.0.4: edecb190
2005-05-08_21:51:17.71499 192.168.0.4: edecb190
[...]
2005-05-08_21:51:17.95747 192.168.0.4: Unable to handle kernel NULL pointer dereference
2005-05-08_21:51:18.01237 192.168.0.4: at virtual address 00000030
2005-05-08_21:51:18.06498 192.168.0.4: c0103f8a
2005-05-08_21:51:18.07648 192.168.0.4: *pde = 00000000
2005-05-08_21:51:18.09586 192.168.0.4: Recursive die() failure, output suppressed
Uh-oh, memória teljesen korrupt.
0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
Gondolom libatát használsz.
- vajon melyik alrendszer(ek egyuttallasa) okozza a problemat?
Ezt elég nehéz megmondani, XFS szempontjából jó lenne tudni,
hogy az lvm layer kihagyásával is fennáll-e a jelenség.
- kinek lenne erdemes bogarjelentest kuldeni?
XFS bugzillával kezdeném, hamar válaszolnak és általában a jó
irányba terelnek, bár a stacktrace alapján valami nagyon ordenáré
probléma van valahol.

Fuji^
--
.+'''+. .+'''+. .+'''+. .+'''+. .+''
Kelemen Péter / \ / \ ***@cern.ch
.+' `+...+' `+...+' `+...+' `+...+'
KORN Andras
2005-05-11 11:22:31 UTC
Permalink
Post by KELEMEN Peter
Uh-oh, memória teljesen korrupt.
Igen. Azota sikerult eloidezni meg egy olyat, hogy ~1k szemet van a process
name-ben...
Post by KELEMEN Peter
Post by KORN Andras
0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
Gondolom libatát használsz.
Igen.
Post by KELEMEN Peter
Post by KORN Andras
- vajon melyik alrendszer(ek egyuttallasa) okozza a problemat?
Ezt elég nehéz megmondani, XFS szempontjából jó lenne tudni,
hogy az lvm layer kihagyásával is fennáll-e a jelenség.
Ezt meg kiprobalom.

Guy
--
Andras Korn <korn at chardonnay.math.bme.hu>
<http://chardonnay.math.bme.hu/~korn/> QOTD:
The pen is mightier than the sword, and considerably easier to write with.
KORN Andras
2005-05-11 11:52:37 UTC
Permalink
Post by KELEMEN Peter
Post by KORN Andras
- vajon melyik alrendszer(ek egyuttallasa) okozza a problemat?
Ezt elég nehéz megmondani, XFS szempontjából jó lenne tudni,
hogy az lvm layer kihagyásával is fennáll-e a jelenség.
Mersekelt orommel jelenthetem, hogy igen.

2005-05-11_11:38:45.15778 192.168.0.4: [4297417.123000] Oops: 0000 [#1]
2005-05-11_11:38:45.17813 192.168.0.4: [4297417.123000] Modules linked in:

jfs nls_base police sch_ingress cls_u32 sch_sfq nfsd lockd sunrpc
ipt_REDIRECT sg sr_mod ipt_MASQUERADE ipt_state ipt_limit ipt_REJECT ipt_LOG
ip_nat_ftp ip_conntrack_ftp iptable_filter iptable_mangle iptable_nat
ip_conntrack ip_tables ppp_async crc_ccitt bsd_comp ppp_deflate zlib_inflate
zlib_deflate ppp_generic slhc netconsole bridge it87 ds1621 lm92 i2c_savage4
i2c_algo_bit via686a ip_queue lp dm_mod sch_htb autofs4 ne2k_pci 8390 tulip
parport_pc parport w83781d eeprom i2c_sensor i2c_isa i2c_viapro i2c_core
raid5 xor raid1 md

2005-05-11_11:38:45.83022 192.168.0.4:
2005-05-11_11:38:45.83253 192.168.0.4: [4297417.123000] CPU: 0
2005-05-11_11:38:45.86466 192.168.0.4: [4297417.123000] EIP: 0060:[<c01042da>] Not tainted VLI
2005-05-11_11:38:45.93681 192.168.0.4: [4297417.123000] EFLAGS: 00010087 (2.6.12-rc4--hellgate)
2005-05-11_11:38:46.00669 192.168.0.4: [4297417.123000] EIP is at show_trace+0x7a/0xb0
2005-05-11_11:38:46.06284 192.168.0.4: [4297417.123000] eax: fffffffd ebx: f1c2c000 ecx: ffffffff edx: 0000a2f3
2005-05-11_11:38:46.15531 192.168.0.4: [4297417.123000] esi: f1c2c000 edi: fffff000 ebp: f1c2c214 esp: f1c2c1fc
2005-05-11_11:38:46.24687 192.168.0.4: [4297417.123000] ds: 007b es: 007b ss: 0068
2005-05-11_11:38:46.30323 192.168.0.4: [4297417.123000] Unable to handle kernel paging request
2005-05-11_11:38:46.36624 192.168.0.4: at virtual address 6d7468c4
2005-05-11_11:38:46.40065 192.168.0.4: [4297417.123000] printing eip:
2005-05-11_11:38:46.43839 192.168.0.4: [4297417.123000] c01153dd
2005-05-11_11:38:46.46942 192.168.0.4: [4297417.123000] *pde = 00000000
2005-05-11_11:38:46.50837 192.168.0.4: [4297417.123000] Oops: 0000 [#2]
2005-05-11_11:38:46.54735 192.168.0.4: [4297417.123000] Modules linked in:

[l. fent]

2005-05-11_11:38:47.19928 192.168.0.4:
2005-05-11_11:38:47.20159 192.168.0.4: [4297417.123000] CPU: 0
2005-05-11_11:38:47.23372 192.168.0.4: [4297417.123000] EIP: 0060:[<c01153dd>] Not tainted VLI
2005-05-11_11:38:47.30588 192.168.0.4: [4297417.123000] EFLAGS: 00010002 (2.6.12-rc4--hellgate)
2005-05-11_11:38:47.37577 192.168.0.4: [4297417.123000] EIP is at do_page_fault+0xbd/0x639
2005-05-11_11:38:47.43648 192.168.0.4: [4297417.123000] eax: f1c1a000 ebx: f1c2c1fc ecx: f1c1a124 edx: 6d74683c
2005-05-11_11:38:47.52813 192.168.0.4: [4297417.123000] esi: f1c2c1c8 edi: c0115320 ebp: f1c1a1d0 esp: f1c1a108
2005-05-11_11:38:47.61968 192.168.0.4: [4297417.123000] ds: 007b es: 007b ss: 0068
2005-05-11_11:38:47.67588 192.168.0.4: [4297417.123000] Unable to handle kernel paging request
2005-05-11_11:38:47.73884 192.168.0.4: at virtual address 098b5394
2005-05-11_11:38:47.77324 192.168.0.4: [4297417.123000] printing eip:
2005-05-11_11:38:47.81109 192.168.0.4: [4297417.123000] c01153dd
2005-05-11_11:38:47.84206 192.168.0.4: [4297417.123000] *pde = 00000000
2005-05-11_11:38:47.88103 192.168.0.4: [4297417.123000] Recursive die() failure, output suppressed
2005-05-11_11:38:47.95112 192.168.0.4: [4297417.123000]
2005-05-11_11:38:47.97176 192.168.0.4: <0>Kernel panic - not syncing: Fatal exception in interrupt
2005-05-11_11:38:48.04162 192.168.0.4: [4297417.123000]
2005-05-11_11:38:48.06220 192.168.0.4: <0>Rebooting in 60 seconds..

Amugy szoktak lenni ilyenek is:

APIC error on CPU0: 40(40)

valamint

APIC error on CPU0: 00(40)

Azt olvastam, hogy ezt okozhatja, ha kicsi a tap, ami vegulis vedheto lenne,
mivel 9 hdd van a gepben. Ezert most a diszkek felet egy masik taprol
jaratom, mindketto 400W-os es egyik se vesz fel 120W-nal tobbet a 230V-bol.

Probaltam noapic acpi=off parameterekkel bootolva is, ugy ilyen uzenetek
nincsenek, de tovabbra is elszall. jfs-sel tovabbra se.

Guy
--
Andras Korn <korn at chardonnay.math.bme.hu>
<http://chardonnay.math.bme.hu/~korn/> QOTD:
Ever notice that the word "therapist" breaks down into "the rapist"?
KELEMEN Peter
2005-05-11 12:01:07 UTC
Permalink
Post by KORN Andras
Mersekelt orommel jelenthetem, hogy igen.
Well, tiszta ügy az XFS bugzilla számára. Ennyit megért a teszt.
:-)
Post by KORN Andras
Probaltam noapic acpi=off parameterekkel bootolva is, ugy ilyen
uzenetek nincsenek, de tovabbra is elszall. jfs-sel tovabbra se.
Mérd meg, hogy JFS-sel mennyit tud, ha összemérhető, akkor állj át
ha liheg a managger, egyébként jó lenne megtudni hogy mitől hasal
az XFS.

Fuji^
--
.+'''+. .+'''+. .+'''+. .+'''+. .+''
Kelemen Péter / \ / \ ***@cern.ch
.+' `+...+' `+...+' `+...+' `+...+'
KORN Andras
2005-05-11 12:22:19 UTC
Permalink
Post by KELEMEN Peter
Post by KORN Andras
Mersekelt orommel jelenthetem, hogy igen.
Well, tiszta ügy az XFS bugzilla számára. Ennyit megért a teszt.
Erdekesseg: kozben sikerult egy olyan elszallast produkalnom, amikor az oops
dumpban a processe name utan az rsyncelt file-ok kozul az egyiknek egy
felismerheto toredeke latszott.

Mar csak az a kerdes, hogy kerul az rsync altal mozgatott adat eppen oda. :)

Guy
--
Andras Korn <korn at chardonnay.math.bme.hu>
<http://chardonnay.math.bme.hu/~korn/> QOTD:
Crime, Sex, Alcohol, Drugs... God, I love Congress!
KELEMEN Peter
2005-05-11 12:25:33 UTC
Permalink
Post by KORN Andras
Erdekesseg: kozben sikerult egy olyan elszallast produkalnom,
amikor az oops dumpban a processe name utan az rsyncelt file-ok
kozul az egyiknek egy felismerheto toredeke latszott. Mar csak
az a kerdes, hogy kerul az rsync altal mozgatott adat eppen oda.
:)
Igen, attól tartok a probléma nem XFS-specifikus lesz, de az XFS
triggereli. Nem ez lenne az első eset. :-)

Fuji^
--
.+'''+. .+'''+. .+'''+. .+'''+. .+''
Kelemen Péter / \ / \ ***@cern.ch
.+' `+...+' `+...+' `+...+' `+...+'
Loading...