Porta SATA sendo redefinida a cada minuto. É um cabo com defeito ou a placa de multiplicador de porta (imagem de silício)?

1

Meu servidor Ubuntu tem nove Placas de "5-para-1 Multiplicador de Porta Sata" da Silicon Image Eu estou recebendo as seguintes mensagens no syslog. Essas mensagens estão sendo repetidas a cada minuto ou mais. Pelo que entendi, uma das placas tem alguns problemas, e a porta para ela está sendo redefinida. E o reset está demorando cerca de 4-5 segundos para todo o processo.

Alguém pode me dizer o que alguém pode fazer com esses erros? Eu tenho que substituir o cartão? Ou é o cabo que está com defeito? Eu poderia ter acabado de substituir o cabo, mas com este design de servidor específico (o que é feito sob medida), trocar o cabo exigiria muito esforço (quase igual à substituição da própria placa).

Alguém me disse que poderia ser um dos discos rígidos conectados a essa placa em particular que pode ser o culpado (demorando demais para girar ou algo assim). Isso é verdade?

Além disso, um smartctl em todos os discos rígidos deste cartão, mostra um alto número de UDMA_CRC_Error_Count erros.

Sep 28 20:54:26 zapdb1 kernel: [56523.744913] ata15.00: failed to read SCR 1 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744921] ata15.00: failed to read SCR 0 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744924] ata15.01: failed to read SCR 1 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744929] ata15.01: failed to read SCR 0 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744932] ata15.02: failed to read SCR 1 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744936] ata15.02: failed to read SCR 0 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744939] ata15.03: failed to read SCR 1 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744943] ata15.03: failed to read SCR 0 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744946] ata15.04: failed to read SCR 1 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744950] ata15.04: failed to read SCR 0 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744953] ata15.05: failed to read SCR 1 (Emask=0x40)
Sep 28 20:54:26 zapdb1 kernel: [56523.744960] ata15.15: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 28 20:54:26 zapdb1 kernel: [56523.745009] ata15.15: irq_stat 0x00060002, PMP DMA CS errata
Sep 28 20:54:26 zapdb1 kernel: [56523.745040] ata15.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 28 20:54:26 zapdb1 kernel: [56523.745088] ata15.00: failed command: READ DMA EXT
Sep 28 20:54:26 zapdb1 kernel: [56523.745120] ata15.00: cmd 25/00:00:80:9e:91/00:04:52:00:00/e0 tag 1 dma 524288 in
Sep 28 20:54:26 zapdb1 kernel: [56523.745122]          res 86/15:06:06:00:00/00:00:c0:12:86/00 Emask 0x2 (HSM violation)
Sep 28 20:54:26 zapdb1 kernel: [56523.745212] ata15.00: status: { Busy }
Sep 28 20:54:26 zapdb1 kernel: [56523.745237] ata15.00: error: { IDNF ABRT }
Sep 28 20:54:26 zapdb1 kernel: [56523.745265] ata15.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 28 20:54:26 zapdb1 kernel: [56523.745311] ata15.01: irq_stat 0x00060002, device error via D2H FIS
Sep 28 20:54:26 zapdb1 kernel: [56523.745341] ata15.01: failed command: READ DMA EXT
Sep 28 20:54:26 zapdb1 kernel: [56523.745373] ata15.01: cmd 25/00:00:78:9e:91/00:04:52:00:00/e0 tag 2 dma 524288 in
Sep 28 20:54:26 zapdb1 kernel: [56523.745375]          res 51/84:61:17:a0:91/00:02:52:00:00/02 Emask 0x10 (ATA bus error)
Sep 28 20:54:26 zapdb1 kernel: [56523.745466] ata15.01: status: { DRDY ERR }
Sep 28 20:54:26 zapdb1 kernel: [56523.745492] ata15.01: error: { ICRC ABRT }
Sep 28 20:54:26 zapdb1 kernel: [56523.745519] ata15.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 28 20:54:26 zapdb1 kernel: [56523.745565] ata15.02: failed command: READ DMA EXT
Sep 28 20:54:26 zapdb1 kernel: [56523.745597] ata15.02: cmd 25/00:00:78:9e:91/00:04:52:00:00/e0 tag 0 dma 524288 in
Sep 28 20:54:26 zapdb1 kernel: [56523.745599]          res 86/15:06:06:00:00/00:00:00:01:86/00 Emask 0x2 (HSM violation)
Sep 28 20:54:26 zapdb1 kernel: [56523.745689] ata15.02: status: { Busy }
Sep 28 20:54:26 zapdb1 kernel: [56523.745714] ata15.02: error: { IDNF ABRT }
Sep 28 20:54:26 zapdb1 kernel: [56523.745741] ata15.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 28 20:54:26 zapdb1 kernel: [56523.745788] ata15.03: failed command: READ DMA EXT
Sep 28 20:54:26 zapdb1 kernel: [56523.745819] ata15.03: cmd 25/00:d8:98:9a:91/00:03:52:00:00/e0 tag 4 dma 503808 in
Sep 28 20:54:26 zapdb1 kernel: [56523.745821]          res 86/15:06:06:00:00/00:00:00:47:86/00 Emask 0x2 (HSM violation)
Sep 28 20:54:26 zapdb1 kernel: [56523.745911] ata15.03: status: { Busy }
Sep 28 20:54:26 zapdb1 kernel: [56523.745936] ata15.03: error: { IDNF ABRT }
Sep 28 20:54:26 zapdb1 kernel: [56523.745963] ata15.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 28 20:54:26 zapdb1 kernel: [56523.746010] ata15.04: failed command: READ DMA EXT
Sep 28 20:54:26 zapdb1 kernel: [56523.746041] ata15.04: cmd 25/00:d8:98:9a:91/00:03:52:00:00/e0 tag 3 dma 503808 in
Sep 28 20:54:26 zapdb1 kernel: [56523.746043]          res 86/15:06:06:00:00/00:00:80:37:86/00 Emask 0x2 (HSM violation)
Sep 28 20:54:26 zapdb1 kernel: [56523.746133] ata15.04: status: { Busy }
Sep 28 20:54:26 zapdb1 kernel: [56523.746158] ata15.04: error: { IDNF ABRT }
Sep 28 20:54:26 zapdb1 kernel: [56523.746185] ata15.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 28 20:54:26 zapdb1 kernel: [56523.746234] ata15.15: hard resetting link
Sep 28 20:54:26 zapdb1 kernel: [56523.746237] ata15: controller in dubious state, performing PORT_RST
Sep 28 20:54:29 zapdb1 kernel: [56525.973515] ata15.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Sep 28 20:54:29 zapdb1 kernel: [56525.974240] ata15.00: hard resetting link
Sep 28 20:54:29 zapdb1 kernel: [56526.293478] ata15.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep 28 20:54:29 zapdb1 kernel: [56526.293625] ata15.01: hard resetting link
Sep 28 20:54:29 zapdb1 kernel: [56526.613082] ata15.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 28 20:54:29 zapdb1 kernel: [56526.613131] ata15.02: hard resetting link
Sep 28 20:54:30 zapdb1 kernel: [56526.932262] ata15.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 28 20:54:30 zapdb1 kernel: [56526.932304] ata15.03: hard resetting link
Sep 28 20:54:30 zapdb1 kernel: [56527.252366] ata15.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 28 20:54:30 zapdb1 kernel: [56527.252417] ata15.04: hard resetting link
Sep 28 20:54:30 zapdb1 kernel: [56527.572270] ata15.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 28 20:54:30 zapdb1 kernel: [56527.572346] ata15.05: hard resetting link
Sep 28 20:54:31 zapdb1 kernel: [56527.891317] ata15.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
Sep 28 20:54:31 zapdb1 kernel: [56527.894471] ata15.00: configured for UDMA/33
Sep 28 20:54:31 zapdb1 kernel: [56527.897816] ata15.01: configured for UDMA/33
Sep 28 20:54:31 zapdb1 kernel: [56527.901165] ata15.02: configured for UDMA/33
Sep 28 20:54:31 zapdb1 kernel: [56527.904446] ata15.03: configured for UDMA/33
Sep 28 20:54:31 zapdb1 kernel: [56527.907628] ata15.04: configured for UDMA/33
Sep 28 20:54:31 zapdb1 kernel: [56527.908096] ata15: EH complete
Richard Whitman
fonte

Respostas:

1

Eu sempre achei que isso seja um problema de energia. não há energia suficiente sendo fornecida à unidade.

d4v3y0rk
fonte
Durante a montagem do servidor, tentei ver a quantidade de energia que vai para essas placas (observe: as unidades estão sendo alimentadas pela placa, que está sendo alimentada diretamente pela unidade de fonte de alimentação). O servidor tem 2 PSUs. PSU1 alimenta a placa-mãe e 4 placas, PSU2 alimenta as restantes 5 placas. Eu notei que os cartões alimentados por PSU2 tinham uma tensão ligeiramente menor nos pinos de 5v (era de 4,2 eu acho). A tensão nos pinos de 12v foi de 12,2v. E este cartão específico é alimentado por PSU2.
Richard Whitman