KORN Andras
2005-05-11 10:39:11 UTC
Joreggelt,
van itt nekem 4db SATA diszkem, amin csinaltam software raid5-ot (/dev/md2).
A /dev/md2-bol csinaltam egy LVM PV-ot, amibol aztan csinaltam egy LVM VG-ot
(csak az a PV van benne).
Ezutan letrehoztam par LV-ot a VG-ban, csinaltam rajuk xfs-t, es elkezdtem
(volna) atmigralni ra egy csomo adatot egy 4 IDE diszkbol allo LVM VG-bol,
rsync-kel. Csak nehany tucat megabyte utan mindig oops, panic, reboot
kovetkezett.
Kiprobaltam JFS-sel xfs helyett, azzal jo. Ebbol arra kovetkeztetek, hogy az
xfs-sel van a gond.
Probaltam 2.6.11.7-es es 2.6.12-rc4-es kernellel is, ugyanaz. Az oops sajnos
nem orzodik meg a logban es a netconsole-ra se kuldi el, a soros terminalbol
meg kiscrolloz, mert tul hosszu (sajnos nem tudom esszeru erofeszitessel
olyan dologra dugni a soroskabelt, amiben van scrollback).
Valamennyi azert van belole, de nekem nem tunik ertelmesnek:
2005-05-08_21:51:14.09327 192.168.0.4: kern.warn: ------------[ cut here ]------------
2005-05-08_21:51:14.14017 192.168.0.4: kernel BUG at kernel/sched.c:2634!
2005-05-08_21:51:14.18145 192.168.0.4: invalid operand: 0000 [#1]
2005-05-08_21:51:14.21347 192.168.0.4: PREEMPT
2005-05-08_21:51:14.22262 192.168.0.4:
2005-05-08_21:51:14.22490 192.168.0.4: Modules linked in:
[a rovidseg kedveert ezt a reszt tomoritem kicsit]
raid1 raid0 raid5 xor nfsd lockd sunrpc sg sr_mod police sch_ingress cls_u32
sch_sfq ipt_REDIRECT ipt_MASQUERADE ipt_state ipt_limit ipt_REJECT ipt_LOG
ip_nat_ftp ip_conntrack_ftp iptable_filter iptable_mangle iptable_nat
ip_conntrack ip_tables ppp_async crc_ccitt bsd_comp ppp_deflate zlib_inflate
zlib_deflate ppp_generic slhc netconsole bridge it87 ds1621 i2c_savage4
i2c_algo_bit via686a ip_queue lp dm_mod sch_htb tun autofs4 ne2k_pci 8390
tulip crc32 parport_pc parport w83781d eeprom i2c_sensor i2c_isa i2c_viapro
i2c_core sd_mod uhci_hcd ehci_hcd ide_cd cdrom md
2005-05-08_21:51:14.89867 192.168.0.4:
2005-05-08_21:51:14.90096 192.168.0.4: CPU: 0
2005-05-08_21:51:14.91356 192.168.0.4: EIP: 0060:[<c0119668>] Not tainted VLI
2005-05-08_21:51:14.96616 192.168.0.4: EFLAGS: 00010086 (2.6.11.7-hellgate-skas3-v8-rc2)
2005-05-08_21:51:15.02793 192.168.0.4: EIP is at add_preempt_count+0x28/0x40
2005-05-08_21:51:15.07254 192.168.0.4: eax: b4fb8035 ebx: c0119668 ecx: 00000001 edx: edecc000
2005-05-08_21:51:15.14459 192.168.0.4: esi: 00000000 edi: 00000086 ebp: edecc070 esp: edecc070
2005-05-08_21:51:15.21664 192.168.0.4: ds: 007b es: 007b ss: 0068
2005-05-08_21:51:15.25326 192.168.0.4: Unable to handle kernel NULL pointer dereference
2005-05-08_21:51:15.30824 192.168.0.4: at virtual address 00000080
2005-05-08_21:51:15.34259 192.168.0.4: printing eip:
2005-05-08_21:51:15.36098 192.168.0.4: c011608d
2005-05-08_21:51:15.37229 192.168.0.4: *pde = 00000000
2005-05-08_21:51:15.39173 192.168.0.4: Oops: 0000 [#2]
2005-05-08_21:51:15.41118 192.168.0.4: PREEMPT
2005-05-08_21:51:15.42032 192.168.0.4:
2005-05-08_21:51:15.42261 192.168.0.4: Modules linked in:
[mint fent]
2005-05-08_21:51:16.09857 192.168.0.4: CPU: 0
2005-05-08_21:51:16.11115 192.168.0.4: EIP: 0060:[<c011608d>] Not tainted VLI
2005-05-08_21:51:16.16378 192.168.0.4: EFLAGS: 00010002 (2.6.11.7-hellgate-skas3-v8-rc2)
2005-05-08_21:51:16.22554 192.168.0.4: EIP is at do_page_fault+0xbd/0x63d
2005-05-08_21:51:16.26681 192.168.0.4: eax: edecb000 ebx: edecc070 ecx: edecb0dc edx: 00000000
2005-05-08_21:51:16.33891 192.168.0.4: esi: edecc03c edi: c0115fd0 ebp: edecb188 esp: edecb0c0
2005-05-08_21:51:16.41092 192.168.0.4: ds: 007b es: 007b ss: 0068
[ ilyen nevu processzem biztos nem volt :) ]
2005-05-08_21:51:16.44759 192.168.0.4: Process /Qgi5dGzIsYbL5gQNLwqiDhGIkmV0QTuCuad16LF4cM6i64vHD4ioRGoRWHo1opQ1NrBgGOwHS0c
2005-05-08_21:51:16.54713 192.168.0.4: An4s+LXzdd3HErLlLvipq9fN13ccSsuUu+Kmr9uOhbqxZBWv2BRgMWDFBAmHavGtkAtQBU1PItIJ
2005-05-08_21:51:16.63753 192.168.0.4: SJ9SAYJCSYbAdubeE00QuAJmAADcGJDI7kGq1N7UikSUISEoaDNZgOzSkmdmaQSQRAVmClFAQJqr
2005-05-08_21:51:16.72799 192.168.0.4: FG0FIobvZ0dIgyJKAY7cwCsEE0ELJJJAFa/BMgy4F8EhQBhgMkY/i5hlRXdW5rVJizAMSwY4XIQA
2005-05-08_21:51:16.81834 192.168.0.4: wxXlG4lOAWmBvHDEB8HhgZE+EAsmaQG/O3CZG6Octz7ZFo4BOCrPh173zlyXGSUu+E1eve+cuS4y
2005-05-08_21:51:16.90867 192.168.0.4: Sl3wmrzyWlpO7KH/F+VKw4oMNgrGEMNclp+GAFBKQxibM5eTXINOEgS2QcIoS+WkjprWTDP8hMgH
2005-05-08_21:51:16.99901 192.168.0.4: GRUqgGbkrGokGk1aC6sKKAU00JloSN1KKEiAAvL9bQYIdW2lEFkFXDg9KQGuEoJoRRUbaT93WgnK
2005-05-08_21:51:17.08937 192.168.0.4: EONcoC3EWA6vKsSuO4ZYRsx30Boa2ypNQzKrSWru5bmGCHInyNXvHqkL5kCRr5QJqmt5mHh8h6OA
2005-05-08_21:51:17.17973 192.168.0.4: UeVWe3x863Uq9XmSVJfFTWuPnW6lXq8ySpL4qa1726mW7IFZ0PvzEU/thhpodASjUypQtGo1YyAT
2005-05-08_21:51:17.27008 192.168.0.4: BINzdgEoMkBWspkESwtJWsjcSSG7wwXXQRXMiRzjCoE91AX6tQYTGgrR4qkCSKiFiZkjCgVRNWml
2005-05-08_21:51:17.36048 192.168.0.4: 8mYLN7Mw6g0wSQTd2aQwXNxU5GofihvPHuSI8y6YY5SCq2EAAAAAggAACF0EQ78BAHMBATUAAAAA
2005-05-08_21:51:17.45079 192.168.0.4: CM8FAACfwAEARz4SrHuCMfStgk9bkkIqwpcAlrLtYbBZkgHkAjDgxmec688cnT50KLX0YGkGUi8f
2005-05-08_21:51:17.54115 192.168.0.4: UP1bTLeLE0cS4DolNENGw2B7fXqtovUrclXL4TV9eq2i9StyVcvhNXmFIoppEXCsqf0eNFNIqtkC
2005-05-08_21:51:17.63154 192.168.0.4: q
2005-05-08_21:51:17.63263 192.168.0.4:
2005-05-08_21:51:17.63492 192.168.0.4: Stack:
2005-05-08_21:51:17.64293 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.65322 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.66352 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.67381 192.168.0.4: 00000080
2005-05-08_21:51:17.68410 192.168.0.4: 00000000
2005-05-08_21:51:17.69440 192.168.0.4: 00000000
2005-05-08_21:51:17.70469 192.168.0.4: edecb190
2005-05-08_21:51:17.71499 192.168.0.4: edecb190
2005-05-08_21:51:17.72528 192.168.0.4:
2005-05-08_21:51:17.72756 192.168.0.4:
2005-05-08_21:51:17.73557 192.168.0.4: c03ef830
2005-05-08_21:51:17.74587 192.168.0.4: 00000000
2005-05-08_21:51:17.75616 192.168.0.4: 0000000e
2005-05-08_21:51:17.76648 192.168.0.4: 0000000b
2005-05-08_21:51:17.77675 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.78704 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.79739 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.80765 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.81797 192.168.0.4:
2005-05-08_21:51:17.82021 192.168.0.4:
2005-05-08_21:51:17.82822 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.83851 192.168.0.4: 00030001
2005-05-08_21:51:17.84881 192.168.0.4: a56b6b6b
2005-05-08_21:51:17.85910 192.168.0.4: 5a2cf071
2005-05-08_21:51:17.86940 192.168.0.4: c013849e
2005-05-08_21:51:17.87968 192.168.0.4: 170fc2a5
2005-05-08_21:51:17.88998 192.168.0.4: 0000000f
2005-05-08_21:51:17.90027 192.168.0.4: 00000000
2005-05-08_21:51:17.91056 192.168.0.4:
2005-05-08_21:51:17.91296 192.168.0.4: Call Trace:
2005-05-08_21:51:17.92773 192.168.0.4: =======================
2005-05-08_21:51:17.95747 192.168.0.4: Unable to handle kernel NULL pointer dereference
2005-05-08_21:51:18.01237 192.168.0.4: at virtual address 00000030
2005-05-08_21:51:18.04668 192.168.0.4: printing eip:
2005-05-08_21:51:18.06498 192.168.0.4: c0103f8a
2005-05-08_21:51:18.07648 192.168.0.4: *pde = 00000000
2005-05-08_21:51:18.09586 192.168.0.4: Recursive die() failure, output suppressed
2005-05-08_21:51:18.14618 192.168.0.4:
2005-05-08_21:51:18.14734 192.168.0.4: <0>Kernel panic - not syncing: Fatal exception in interrupt
Ez volt az egyik. Volt egy masik, az latszolag preempt-related volt, ugyhogy
csinaltam uj kernelt preempt nelkul. Azzal is volt oops, de abbol semmi nem
maradt meg a logban.
A gep amugy egy ilyen:
0000:00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0269
0000:00:00.1 Host bridge: VIA Technologies, Inc.: Unknown device 1269
0000:00:00.2 Host bridge: VIA Technologies, Inc.: Unknown device 2269
0000:00:00.3 Host bridge: VIA Technologies, Inc.: Unknown device 3269
0000:00:00.4 Host bridge: VIA Technologies, Inc.: Unknown device 4269
0000:00:00.7 Host bridge: VIA Technologies, Inc.: Unknown device 7269
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
0000:00:09.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
0000:00:0a.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
0000:00:0b.0 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
0000:00:0b.1 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
0000:00:0c.0 Network controller: AVM Audiovisuelles MKTG & Computer System GmbH A1 ISDN [Fritz] (rev 02)
0000:00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
0000:00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South]
0000:00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
0000:00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
0000:00:14.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
0000:01:00.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04)
Diszkek:
[4294692.658000] SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
[4294693.137000] SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
[4294693.614000] SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
[4294694.098000] SCSI device sdd: 488397168 512-byte hdwr sectors (250059 MB)
(ezeken van a raid5)
[4294703.358000] SCSI device sde: 78177792 512-byte hdwr sectors (40027 MB)
(ez usb-n van, a masik LVM VG resze)
[4294681.885000] hdi: SAMSUNG SP1614N, ATA DISK drive
[4294682.203000] hdj: ST3120023A, ATA DISK drive
[4294684.116000] hda: QUANTUM FIREBALL EL5.1A, ATA DISK drive
[4294684.443000] hdb: SAMSUNG SP1614N, ATA DISK drive
[4294685.667000] hdd: MATSHITADVD-ROM SR-8587, ATAPI CD/DVD-ROM drive
[4294687.473000] hdi: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
[4294687.769000] hdj: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
[4294688.076000] hda: 10018890 sectors (5129 MB) w/418KiB Cache, CHS=10602/15/63, UDMA(33)
[4294688.643000] hdb: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
[4294688.886000] hdd: ATAPI 48X DVD-ROM drive, 256kB Cache, UDMA(33)
(A hda-t es a hdd-t kiveve ezek is mind benne vannak a masik VG-ben)
/proc/mdstat:
Personalities : [raid1] [raid5]
md2 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
731840832 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
[===========>.........] resync = 58.9% (143859328/243946944) finish=34.3min speed=48576K/sec
md1 : active raid1 sdb1[0] sdd1[1]
248896 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdc1[1]
248896 blocks [2/2] [UU]
unused devices: <none>
mdadm --detail:
/dev/md2:
Version : 00.90.01
Creation Time : Sun May 8 21:36:18 2005
Raid Level : raid5
Array Size : 731840832 (697.94 GiB 749.41 GB)
Device Size : 243946944 (232.65 GiB 249.80 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Wed May 11 11:37:39 2005
State : active, resyncing
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 59% complete
UUID : 8ac805d9:cbd60f15:f86ab2db:bd0c9e90
Events : 0.2123
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/scsi/host0/bus0/target0/lun0/part2
1 8 18 1 active sync /dev/scsi/host1/bus0/target0/lun0/part2
2 8 34 2 active sync /dev/scsi/host2/bus0/target0/lun0/part2
3 8 50 3 active sync /dev/scsi/host3/bus0/target0/lun0/part2
Az xfs-t -i size=512 -d unwritten=0 -l version=2 opciokkal hoztam letre
(kiserleteztem a raid stripe size-hoz igazito opciokkal is, de ugy is
elszallt).
Mas, az enyemtol majdnem minden lenyeges pontban eltero gepen ugyanez a
problema nem lep fel (lehet 4 diszkbol allo raid5-on letrehozott xfs-es
LV-re rsyncelni).
Kerdesek:
- ismertek-e esetleg workaroundot (azon kivul, hogy nem hasznalok xfs-t,
raidet, satat es lvm-et :)?
- vajon melyik alrendszer(ek egyuttallasa) okozza a problemat?
- kinek lenne erdemes bogarjelentest kuldeni? Milyen egyeb infora lenne
szukseg a hiba okanak felderitesehez? Csak erosen korlatozott mennyisegu ido
all rendelkezesre tovabbi kiserletek lefolytatasahoz...
- mas is latott mar ilyet?
- vajon csak ido/terheles kerdese, hogy a jfs-sel is ezt kezdje jatszani?
Guy
van itt nekem 4db SATA diszkem, amin csinaltam software raid5-ot (/dev/md2).
A /dev/md2-bol csinaltam egy LVM PV-ot, amibol aztan csinaltam egy LVM VG-ot
(csak az a PV van benne).
Ezutan letrehoztam par LV-ot a VG-ban, csinaltam rajuk xfs-t, es elkezdtem
(volna) atmigralni ra egy csomo adatot egy 4 IDE diszkbol allo LVM VG-bol,
rsync-kel. Csak nehany tucat megabyte utan mindig oops, panic, reboot
kovetkezett.
Kiprobaltam JFS-sel xfs helyett, azzal jo. Ebbol arra kovetkeztetek, hogy az
xfs-sel van a gond.
Probaltam 2.6.11.7-es es 2.6.12-rc4-es kernellel is, ugyanaz. Az oops sajnos
nem orzodik meg a logban es a netconsole-ra se kuldi el, a soros terminalbol
meg kiscrolloz, mert tul hosszu (sajnos nem tudom esszeru erofeszitessel
olyan dologra dugni a soroskabelt, amiben van scrollback).
Valamennyi azert van belole, de nekem nem tunik ertelmesnek:
2005-05-08_21:51:14.09327 192.168.0.4: kern.warn: ------------[ cut here ]------------
2005-05-08_21:51:14.14017 192.168.0.4: kernel BUG at kernel/sched.c:2634!
2005-05-08_21:51:14.18145 192.168.0.4: invalid operand: 0000 [#1]
2005-05-08_21:51:14.21347 192.168.0.4: PREEMPT
2005-05-08_21:51:14.22262 192.168.0.4:
2005-05-08_21:51:14.22490 192.168.0.4: Modules linked in:
[a rovidseg kedveert ezt a reszt tomoritem kicsit]
raid1 raid0 raid5 xor nfsd lockd sunrpc sg sr_mod police sch_ingress cls_u32
sch_sfq ipt_REDIRECT ipt_MASQUERADE ipt_state ipt_limit ipt_REJECT ipt_LOG
ip_nat_ftp ip_conntrack_ftp iptable_filter iptable_mangle iptable_nat
ip_conntrack ip_tables ppp_async crc_ccitt bsd_comp ppp_deflate zlib_inflate
zlib_deflate ppp_generic slhc netconsole bridge it87 ds1621 i2c_savage4
i2c_algo_bit via686a ip_queue lp dm_mod sch_htb tun autofs4 ne2k_pci 8390
tulip crc32 parport_pc parport w83781d eeprom i2c_sensor i2c_isa i2c_viapro
i2c_core sd_mod uhci_hcd ehci_hcd ide_cd cdrom md
2005-05-08_21:51:14.89867 192.168.0.4:
2005-05-08_21:51:14.90096 192.168.0.4: CPU: 0
2005-05-08_21:51:14.91356 192.168.0.4: EIP: 0060:[<c0119668>] Not tainted VLI
2005-05-08_21:51:14.96616 192.168.0.4: EFLAGS: 00010086 (2.6.11.7-hellgate-skas3-v8-rc2)
2005-05-08_21:51:15.02793 192.168.0.4: EIP is at add_preempt_count+0x28/0x40
2005-05-08_21:51:15.07254 192.168.0.4: eax: b4fb8035 ebx: c0119668 ecx: 00000001 edx: edecc000
2005-05-08_21:51:15.14459 192.168.0.4: esi: 00000000 edi: 00000086 ebp: edecc070 esp: edecc070
2005-05-08_21:51:15.21664 192.168.0.4: ds: 007b es: 007b ss: 0068
2005-05-08_21:51:15.25326 192.168.0.4: Unable to handle kernel NULL pointer dereference
2005-05-08_21:51:15.30824 192.168.0.4: at virtual address 00000080
2005-05-08_21:51:15.34259 192.168.0.4: printing eip:
2005-05-08_21:51:15.36098 192.168.0.4: c011608d
2005-05-08_21:51:15.37229 192.168.0.4: *pde = 00000000
2005-05-08_21:51:15.39173 192.168.0.4: Oops: 0000 [#2]
2005-05-08_21:51:15.41118 192.168.0.4: PREEMPT
2005-05-08_21:51:15.42032 192.168.0.4:
2005-05-08_21:51:15.42261 192.168.0.4: Modules linked in:
[mint fent]
2005-05-08_21:51:16.09857 192.168.0.4: CPU: 0
2005-05-08_21:51:16.11115 192.168.0.4: EIP: 0060:[<c011608d>] Not tainted VLI
2005-05-08_21:51:16.16378 192.168.0.4: EFLAGS: 00010002 (2.6.11.7-hellgate-skas3-v8-rc2)
2005-05-08_21:51:16.22554 192.168.0.4: EIP is at do_page_fault+0xbd/0x63d
2005-05-08_21:51:16.26681 192.168.0.4: eax: edecb000 ebx: edecc070 ecx: edecb0dc edx: 00000000
2005-05-08_21:51:16.33891 192.168.0.4: esi: edecc03c edi: c0115fd0 ebp: edecb188 esp: edecb0c0
2005-05-08_21:51:16.41092 192.168.0.4: ds: 007b es: 007b ss: 0068
[ ilyen nevu processzem biztos nem volt :) ]
2005-05-08_21:51:16.44759 192.168.0.4: Process /Qgi5dGzIsYbL5gQNLwqiDhGIkmV0QTuCuad16LF4cM6i64vHD4ioRGoRWHo1opQ1NrBgGOwHS0c
2005-05-08_21:51:16.54713 192.168.0.4: An4s+LXzdd3HErLlLvipq9fN13ccSsuUu+Kmr9uOhbqxZBWv2BRgMWDFBAmHavGtkAtQBU1PItIJ
2005-05-08_21:51:16.63753 192.168.0.4: SJ9SAYJCSYbAdubeE00QuAJmAADcGJDI7kGq1N7UikSUISEoaDNZgOzSkmdmaQSQRAVmClFAQJqr
2005-05-08_21:51:16.72799 192.168.0.4: FG0FIobvZ0dIgyJKAY7cwCsEE0ELJJJAFa/BMgy4F8EhQBhgMkY/i5hlRXdW5rVJizAMSwY4XIQA
2005-05-08_21:51:16.81834 192.168.0.4: wxXlG4lOAWmBvHDEB8HhgZE+EAsmaQG/O3CZG6Octz7ZFo4BOCrPh173zlyXGSUu+E1eve+cuS4y
2005-05-08_21:51:16.90867 192.168.0.4: Sl3wmrzyWlpO7KH/F+VKw4oMNgrGEMNclp+GAFBKQxibM5eTXINOEgS2QcIoS+WkjprWTDP8hMgH
2005-05-08_21:51:16.99901 192.168.0.4: GRUqgGbkrGokGk1aC6sKKAU00JloSN1KKEiAAvL9bQYIdW2lEFkFXDg9KQGuEoJoRRUbaT93WgnK
2005-05-08_21:51:17.08937 192.168.0.4: EONcoC3EWA6vKsSuO4ZYRsx30Boa2ypNQzKrSWru5bmGCHInyNXvHqkL5kCRr5QJqmt5mHh8h6OA
2005-05-08_21:51:17.17973 192.168.0.4: UeVWe3x863Uq9XmSVJfFTWuPnW6lXq8ySpL4qa1726mW7IFZ0PvzEU/thhpodASjUypQtGo1YyAT
2005-05-08_21:51:17.27008 192.168.0.4: BINzdgEoMkBWspkESwtJWsjcSSG7wwXXQRXMiRzjCoE91AX6tQYTGgrR4qkCSKiFiZkjCgVRNWml
2005-05-08_21:51:17.36048 192.168.0.4: 8mYLN7Mw6g0wSQTd2aQwXNxU5GofihvPHuSI8y6YY5SCq2EAAAAAggAACF0EQ78BAHMBATUAAAAA
2005-05-08_21:51:17.45079 192.168.0.4: CM8FAACfwAEARz4SrHuCMfStgk9bkkIqwpcAlrLtYbBZkgHkAjDgxmec688cnT50KLX0YGkGUi8f
2005-05-08_21:51:17.54115 192.168.0.4: UP1bTLeLE0cS4DolNENGw2B7fXqtovUrclXL4TV9eq2i9StyVcvhNXmFIoppEXCsqf0eNFNIqtkC
2005-05-08_21:51:17.63154 192.168.0.4: q
2005-05-08_21:51:17.63263 192.168.0.4:
2005-05-08_21:51:17.63492 192.168.0.4: Stack:
2005-05-08_21:51:17.64293 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.65322 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.66352 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.67381 192.168.0.4: 00000080
2005-05-08_21:51:17.68410 192.168.0.4: 00000000
2005-05-08_21:51:17.69440 192.168.0.4: 00000000
2005-05-08_21:51:17.70469 192.168.0.4: edecb190
2005-05-08_21:51:17.71499 192.168.0.4: edecb190
2005-05-08_21:51:17.72528 192.168.0.4:
2005-05-08_21:51:17.72756 192.168.0.4:
2005-05-08_21:51:17.73557 192.168.0.4: c03ef830
2005-05-08_21:51:17.74587 192.168.0.4: 00000000
2005-05-08_21:51:17.75616 192.168.0.4: 0000000e
2005-05-08_21:51:17.76648 192.168.0.4: 0000000b
2005-05-08_21:51:17.77675 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.78704 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.79739 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.80765 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.81797 192.168.0.4:
2005-05-08_21:51:17.82021 192.168.0.4:
2005-05-08_21:51:17.82822 192.168.0.4: 6b6b6b6b
2005-05-08_21:51:17.83851 192.168.0.4: 00030001
2005-05-08_21:51:17.84881 192.168.0.4: a56b6b6b
2005-05-08_21:51:17.85910 192.168.0.4: 5a2cf071
2005-05-08_21:51:17.86940 192.168.0.4: c013849e
2005-05-08_21:51:17.87968 192.168.0.4: 170fc2a5
2005-05-08_21:51:17.88998 192.168.0.4: 0000000f
2005-05-08_21:51:17.90027 192.168.0.4: 00000000
2005-05-08_21:51:17.91056 192.168.0.4:
2005-05-08_21:51:17.91296 192.168.0.4: Call Trace:
2005-05-08_21:51:17.92773 192.168.0.4: =======================
2005-05-08_21:51:17.95747 192.168.0.4: Unable to handle kernel NULL pointer dereference
2005-05-08_21:51:18.01237 192.168.0.4: at virtual address 00000030
2005-05-08_21:51:18.04668 192.168.0.4: printing eip:
2005-05-08_21:51:18.06498 192.168.0.4: c0103f8a
2005-05-08_21:51:18.07648 192.168.0.4: *pde = 00000000
2005-05-08_21:51:18.09586 192.168.0.4: Recursive die() failure, output suppressed
2005-05-08_21:51:18.14618 192.168.0.4:
2005-05-08_21:51:18.14734 192.168.0.4: <0>Kernel panic - not syncing: Fatal exception in interrupt
Ez volt az egyik. Volt egy masik, az latszolag preempt-related volt, ugyhogy
csinaltam uj kernelt preempt nelkul. Azzal is volt oops, de abbol semmi nem
maradt meg a logban.
A gep amugy egy ilyen:
0000:00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0269
0000:00:00.1 Host bridge: VIA Technologies, Inc.: Unknown device 1269
0000:00:00.2 Host bridge: VIA Technologies, Inc.: Unknown device 2269
0000:00:00.3 Host bridge: VIA Technologies, Inc.: Unknown device 3269
0000:00:00.4 Host bridge: VIA Technologies, Inc.: Unknown device 4269
0000:00:00.7 Host bridge: VIA Technologies, Inc.: Unknown device 7269
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
0000:00:09.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
0000:00:0a.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
0000:00:0b.0 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
0000:00:0b.1 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
0000:00:0c.0 Network controller: AVM Audiovisuelles MKTG & Computer System GmbH A1 ISDN [Fritz] (rev 02)
0000:00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
0000:00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South]
0000:00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
0000:00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
0000:00:14.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
0000:01:00.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04)
Diszkek:
[4294692.658000] SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
[4294693.137000] SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
[4294693.614000] SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
[4294694.098000] SCSI device sdd: 488397168 512-byte hdwr sectors (250059 MB)
(ezeken van a raid5)
[4294703.358000] SCSI device sde: 78177792 512-byte hdwr sectors (40027 MB)
(ez usb-n van, a masik LVM VG resze)
[4294681.885000] hdi: SAMSUNG SP1614N, ATA DISK drive
[4294682.203000] hdj: ST3120023A, ATA DISK drive
[4294684.116000] hda: QUANTUM FIREBALL EL5.1A, ATA DISK drive
[4294684.443000] hdb: SAMSUNG SP1614N, ATA DISK drive
[4294685.667000] hdd: MATSHITADVD-ROM SR-8587, ATAPI CD/DVD-ROM drive
[4294687.473000] hdi: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
[4294687.769000] hdj: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
[4294688.076000] hda: 10018890 sectors (5129 MB) w/418KiB Cache, CHS=10602/15/63, UDMA(33)
[4294688.643000] hdb: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
[4294688.886000] hdd: ATAPI 48X DVD-ROM drive, 256kB Cache, UDMA(33)
(A hda-t es a hdd-t kiveve ezek is mind benne vannak a masik VG-ben)
/proc/mdstat:
Personalities : [raid1] [raid5]
md2 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
731840832 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
[===========>.........] resync = 58.9% (143859328/243946944) finish=34.3min speed=48576K/sec
md1 : active raid1 sdb1[0] sdd1[1]
248896 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdc1[1]
248896 blocks [2/2] [UU]
unused devices: <none>
mdadm --detail:
/dev/md2:
Version : 00.90.01
Creation Time : Sun May 8 21:36:18 2005
Raid Level : raid5
Array Size : 731840832 (697.94 GiB 749.41 GB)
Device Size : 243946944 (232.65 GiB 249.80 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Wed May 11 11:37:39 2005
State : active, resyncing
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 59% complete
UUID : 8ac805d9:cbd60f15:f86ab2db:bd0c9e90
Events : 0.2123
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/scsi/host0/bus0/target0/lun0/part2
1 8 18 1 active sync /dev/scsi/host1/bus0/target0/lun0/part2
2 8 34 2 active sync /dev/scsi/host2/bus0/target0/lun0/part2
3 8 50 3 active sync /dev/scsi/host3/bus0/target0/lun0/part2
Az xfs-t -i size=512 -d unwritten=0 -l version=2 opciokkal hoztam letre
(kiserleteztem a raid stripe size-hoz igazito opciokkal is, de ugy is
elszallt).
Mas, az enyemtol majdnem minden lenyeges pontban eltero gepen ugyanez a
problema nem lep fel (lehet 4 diszkbol allo raid5-on letrehozott xfs-es
LV-re rsyncelni).
Kerdesek:
- ismertek-e esetleg workaroundot (azon kivul, hogy nem hasznalok xfs-t,
raidet, satat es lvm-et :)?
- vajon melyik alrendszer(ek egyuttallasa) okozza a problemat?
- kinek lenne erdemes bogarjelentest kuldeni? Milyen egyeb infora lenne
szukseg a hiba okanak felderitesehez? Csak erosen korlatozott mennyisegu ido
all rendelkezesre tovabbi kiserletek lefolytatasahoz...
- mas is latott mar ilyet?
- vajon csak ido/terheles kerdese, hogy a jfs-sel is ezt kezdje jatszani?
Guy
--
Andras Korn <korn at chardonnay.math.bme.hu>
<http://chardonnay.math.bme.hu/~korn/> QOTD:
Thank you for holding your breath while I smoke.
Andras Korn <korn at chardonnay.math.bme.hu>
<http://chardonnay.math.bme.hu/~korn/> QOTD:
Thank you for holding your breath while I smoke.