Exadata: Disk controller was hung. Cell was power cycled
Just another manic magic Monday.
I’ve moved my blog from https://insanedba.blogspot.com to https://dincosman.com
Please update your bookmarks and follow/subscribe at the new address for all the latest updates and content. More up-to-date content of this post may be available there.
After a great weekend, we came to the office and performed our daily health checks like every Monday. One of our storage servers (cell) of Exadata X2-2 X4270 M2 had lost 11 ASM disks out of a total of 34 ASM disks. We struck it lucky, all databases were still up despite all the losses.
Let's examine what happened to our cell server. When I checked the mailbox, I saw an alert mail from the problematic cell stating that "Disk controller was hung. Cell was power cycled." It looks like the cell disk controller was not performing well (maybe a bug or a peak moment) and forced the server to reboot. But normally reboots do not end up with disk losses.
I started by checking the cell's physical disk status.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CellCLI> list physicaldisk | |
20:0 XXXXXX normal | |
20:1 XXXXXX normal | |
20:2 XXXXXX normal | |
20:3 XXXXXX failed ---> Failed Disk | |
20:4 XXXXXX normal | |
20:5 XXXXXX normal | |
20:6 XXXXXX normal | |
20:7 XXXXXX import failure ---> Import Failure | |
20:8 XXXXXX normal | |
20:9 XXXXXX normal | |
20:10 XXXXXX normal | |
20:11 XXXXXX normal | |
FLASH_1_0 1111M00AAA normal | |
FLASH_1_1 1111M00AAA normal | |
FLASH_1_2 1111M00AAA normal | |
FLASH_1_3 1111M00AAA normal | |
FLASH_2_0 1111M00AAA normal | |
FLASH_2_1 1111M00AAA normal | |
FLASH_2_2 1111M00AAA normal | |
FLASH_2_3 1111M00AAA normal | |
FLASH_4_0 1111M00AAA normal | |
FLASH_4_1 1111M00AAA normal | |
FLASH_4_2 1111M00AAA normal | |
FLASH_4_3 1111M00AAA normal | |
FLASH_5_0 1111M00AAA not present --> fmods of failed flashdisk | |
FLASH_5_1 1111M00AAA not present --> fmods of failed flashdisk | |
FLASH_5_2 1111M00AAA not present --> fmods of failed flashdisk | |
FLASH_5_3 1111M00AAA not present --> fmods of failed flashdisk |
What I got from the output was; we had one flash disk and one hard disk failure (disk number 3) and also one hard disk was in import failure status (disk number 7). But that did not explain 11 ASM disks failure. It should have been 6 ASM disks according to the output. There should be something more.
I continued with checking grid disks.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CellCLI> list griddisk attributes name,size,offset,asmdeactivationoutcome,status | |
DATA_CD_00_exacel13 423G 32M Yes cacheContentLost | |
DATA_CD_01_exacel13 423G 32M Yes active | |
DATA_CD_02_exacel13 423G 32M Yes active | |
DATA_CD_03_exacel13 423G 32M Yes not present | |
DATA_CD_04_exacel13 423G 32M Yes active | |
DATA_CD_05_exacel13 423G 32M Yes active | |
DATA_CD_06_exacel13 423G 32M Yes active | |
DATA_CD_07_exacel13 423G 32M Yes not present | |
DATA_CD_08_exacel13 423G 32M Yes active | |
DATA_CD_09_exacel13 423G 32M Yes active | |
DATA_CD_10_exacel13 423G 32M Yes active | |
DATA_CD_11_exacel13 423G 32M Yes cacheContentLost | |
MORE_CD_02_exacel13 29.125G 528.734375G Yes active | |
MORE_CD_03_exacel13 29.125G 528.734375G Yes not present | |
MORE_CD_04_exacel13 29.125G 528.734375G Yes active | |
MORE_CD_05_exacel13 29.125G 528.734375G Yes active | |
MORE_CD_06_exacel13 29.125G 528.734375G Yes active | |
MORE_CD_07_exacel13 29.125G 528.734375G Yes not present | |
MORE_CD_08_exacel13 29.125G 528.734375G Yes active | |
MORE_CD_09_exacel13 29.125G 528.734375G Yes active | |
MORE_CD_10_exacel13 29.125G 528.734375G Yes active | |
MORE_CD_11_exacel13 29.125G 528.734375G Yes cacheContentLost | |
RECO_CD_00_exacel13 105.6875G 423.046875G Yes cacheContentLost | |
RECO_CD_01_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_02_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_03_exacel13 105.6875G 423.046875G Yes not present | |
RECO_CD_04_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_05_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_06_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_07_exacel13 105.6875G 423.046875G Yes not present | |
RECO_CD_08_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_09_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_10_exacel13 105.6875G 423.046875G Yes active | |
RECO_CD_11_exacel13 105.6875G 423.046875G Yes cacheContentLost |
5 grid disks related to two physical disks (disk number 0 and disk number 11) were in "cacheContentLost" status. I checked Oracle Support for the grid disks with "cacheContentLost" status.
Doc Id 2346075.1 was related to our problem. The document was clear and explained the steps to recover the grid disks with cacheContentLost state.
When write-back flash cache is active on storage cells, in a flash disk failure, the grid disks cached by the failed ones can be storing stale data. If the flash disk failure occurs while the Exadata storage software is running, a resilvering operation is started to resynchronize the stale blocks from the other storage servers.
But if the flash disk failure happens while the storage software is not running or during the rebooting phase of the cell, the resilvering operation is not started and the grid disks will be labeled with 'cacheContentLost' state. The grid disks stay offline to prevent the databases from accessing the stale data.
In our state, the disk controller was hung and it ended up rebooting the server. During the reboot phase, the flash disk failure occurred and grid disks stuck in "cacheContentLost". Our team checked the gv$asm_operation view for ongoing rebalance operations, but there were no rows. ASM disks related to that grid disk were already dropped, and disk repair time had already passed.
We decided to recreate the grid disks which are in the "cacheContentLost" state to make them visible in ASM.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CellCLI> drop griddisk DATA_CD_00_exacel13 | |
CellCLI> drop griddisk RECO_CD_00_exacel13 | |
CellCLI> create griddisk DATA_CD_00_exacel13 CELLDISK=CD_00_exaybsmcel13,size=423G,offset=32M | |
CellCLI> create griddisk RECO_CD_00_exacel13 CELLDISK=CD_00_exaybsmcel13,size=105.6875G,offset=423.046875G | |
SYS@+ASM1> alter diskgroup DATA add disk 'o/192.168.31.21/DATA_CD_00_exacel13' rebalance power 10; | |
SYS@+ASM1> alter diskgroup RECO add disk 'o/192.168.31.21/RECO_CD_00_exacel13' rebalance power 10; | |
CellCLI> drop griddisk DATA_CD_11_exacel13 | |
CellCLI> drop griddisk RECO_CD_11_exacel13 | |
CellCLI> drop griddisk MORE_CD_11_exacel13 | |
CellCLI> create griddisk DATA_CD_11_exacel13 CELLDISK=CD_11_exacel13,size=423G,offset=32M | |
CellCLI> create griddisk RECO_CD_11_exacel13 CELLDISK=CD_11_exacel13,size=105.6875G,offset=423.046875G | |
CellCLI> create griddisk MORE_CD_11_exacel13 CELLDISK=CD_11_exacel13,size=29.125G,offset=528.734375G | |
SYS@+ASM1> alter diskgroup DATA add disk 'o/192.168.31.21/DATA_CD_11_exacel13' rebalance power 10; | |
SYS@+ASM1> alter diskgroup RECO add disk 'o/192.168.31.21/RECO_CD_11_exacel13' rebalance power 10; | |
SYS@+ASM1> alter diskgroup MORE add disk 'o/192.168.31.21/MORE_CD_11_exacel13' rebalance power 10; |
After executing those commands, we now have 1 hard disk with an import failure, 1 hard disk with a failure, and 1 flash disk with 4 fmods in a failed state. We opened SRs for the flash disk and the hard disk with failure. We replaced them with the spare new ones we had. Normally, no additional steps are required to re-create the cell disks or grid disks for flash disk and hard disk replacement.
Now let's continue with our case. We had only one hard disk left with problems. We were only missing three grid disks and ASM disks. That disk was in an import failure status. We executed the commands below to check hard disk information.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[root@exacel13 trace]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 |egrep 'Slot Number|Firmware state' | |
... | |
Slot Number: 7 | |
Firmware state: Unconfigured(good), Spun Up | |
... |
Foreign state is not looking good. I will try to change the foreign state of that hard disk. The commands below are executed for clearing the foreign state and reconfiguring RAID on that hard disk.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[root@exacel13 trace]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgForeign -Clear -a0 | |
Foreign configuration 0 is cleared on controller 0. | |
[root@exacel13 trace]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -R0 [20:7] WB NORA Direct NoCachedBadBBU -strpsz1024 -a0 | |
Adapter 0: Created VD 14 | |
Adapter 0: Configured the Adapter!! |
Grid disks for that disk were still in "not present" state. We decided to go for the re-enable command for that physical disk. The commands for re-enable are as follows.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CellCLI> alter physicaldisk 20:7 drop for replacement | |
Physical disk 20:7 was dropped for replacement. | |
CellCLI> alter physicaldisk 20:7 reenable | |
Physical disk 20:7 was reenabled. |
Now everything is perfect again. It was really a manic Monday. After that case, to avoid experiencing a similar situation again, we also decided to update our Exadata servers image to the latest one. The issue has not happened again yet.
Hope it helps.
Comments
Post a Comment