Restore OCR from backup located in ASM diskgroup. (A Failure Story Part 2)

OCR lost. Where is OCR backup? On ASM diskgroup.

I’ve moved my blog from https://insanedba.blogspot.com to https://dincosman.com Please update your bookmarks and follow/subscribe at the new address for all the latest updates and content. More up-to-date content of this post may be available there.

After Mirrored Disk Failure in Normal Redundancy Mode, CRS was down. We could not take the faulty disks and one disk group (DATA) online again and decided to restore OCR config and change VOTING disk location from DATA to RECO.


Started CRS in exclusive mode and searched for backup locations, unfortunately we had no backups in local file system. OCR backups were on one of ASM disk group and this disk group could not be online.

We mounted that disk group in restricted mode and tried copying the latest OCR backup to a local directory with the commands below, but could not achieve.


We searched Oracle support and found "Doc ID 2569847.1, How to Restore ASM Based OCR when OCR backup is located in ASM diskgroup." According to the document, "amdu" command was the one we were looking for. We executed the commands below and restored the latest OCR backup (file number 875) to our current working directory.


We followed "Doc Id 1062983.1, How to Restore ASM Based OCR After Complete Loss of the CRS Diskgroup on Linux/Unix Systems". First, we stopped CRS and changed OCR location from +DATA to another disk group that can be online. (+RECO). Then, we restored OCR config to this disk group and also replaced the existing voting file location. We used the commands below.


ASM database parameter file (spfile) was also on an offline disk group, we used amdu command to copy it to local storage, created pfile from it and started ASM database on one node, dropped +DATA diskgroup and tried to recreate it.


We could not achieve to recreate it with the above command because we had 2 faulty disks on different storage servers. In a full rack healthy Exadata X2-2, there are 14 storage servers and each storage server has 12 disks and totally 168 disks. We had 166 disks, 2 disks are missing on different servers. We did not want to wait for new disks to arrive and created DATA disk group with 11 disks from each server. We will be adding other disks to the disk group. We changed OCR location, voting file location, and spfile location to +DATA disk group.


Everything went smoothly until we tried to start CRS on other nodes. CRS started only on the first node. ASM was only starting one node at a time.
ASM was only starting one node at a time.
Error messages were as mentioned above; we had forgotten to restore the password file from the DATA disk group. That was our mistake; we recreated it, and it is available on the next post.

 
Hope it helps.

Comments

Popular posts from this blog

Oracle Grid Release Update by using Ansible Playbooks

Oracle Database Release Update by Using Ansible Playbooks

How to Upgrade PostgreSQL, PostGIS and Patroni in Air-Gapped Environments