Restore OCR from backup located in ASM diskgroup. (A Failure Story Part 2)

OCR lost. Where is OCR backup? On ASM diskgroup.

I’ve moved my blog from https://insanedba.blogspot.com to https://dincosman.com Please update your bookmarks and follow/subscribe at the new address for all the latest updates and content. More up-to-date content of this post may be available there.

After Mirrored Disk Failure in Normal Redundancy Mode, CRS was down. We could not take the faulty disks and one disk group (DATA) online again and decided to restore OCR config and change VOTING disk location from DATA to RECO.


Started CRS in exclusive mode and searched for backup locations, unfortunately we had no backups in local file system. OCR backups were on one of ASM disk group and this disk group could not be online.

We mounted that disk group in restricted mode and tried copying the latest OCR backup to a local directory with the commands below, but could not achieve.

SYS@+ASM1> alter diskgroup data mount restricted force for recovery;
Diskgroup altered.
ASMCMD> cp backup00.ocr.866.1047360579 /tmp/ocrbackup/backup00.ocr
ASMCMD> ASMCMD-8012: cannot determine file type for file
ORA-15236: diskgroup DATA mounted in restricted mode
ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 518
ORA-06512: at line 3 (DBD ERROR: OCIStmtExecute)

We searched Oracle support and found "Doc ID 2569847.1, How to Restore ASM Based OCR when OCR backup is located in ASM diskgroup." According to the document, "amdu" command was the one we were looking for. We executed the commands below and restored the latest OCR backup (file number 875) to our current working directory.

[oracle@exadb01 ~]$ amdu -diskstring 'o/*/DATA_*' -extract data.875
amdu_2020_08_05_14_04_32/
AMDU-00204: Disk N0115 is in currently mounted diskgroup DATA.
AMDU-00201: Disk N0115: 'o/192.168.31.12/DATA_CD_08_exacel04'
AMDU-00204: Disk N0143 is in currently mounted diskgroup DATA.
AMDU-00201: Disk N0143: 'o/192.168.31.13/DATA_CD_10_exacel05'
AMDU-00204: Disk N0100 is in currently mounted diskgroup DATA.
AMDU-00201: Disk N0100: 'o/192.168.31.11/DATA_CD_07_exacel03'
[oracle@exadb01 ~]$ cd amdu_2020_08_05_14_04_32/
[oracle@exadb01 amdu_2020_08_05_14_04_32]$ ll
total 4248
drwxr-xr-x 2 oracle oinstall 4096 Aug 5 14:04 .
drwx------ 8 oracle oinstall 4096 Aug 5 14:04 ..
-rw-r--r-- 1 oracle oinstall 4079616 Aug 5 14:04 DATA_875.f
-rw-r--r-- 1 oracle oinstall 261915 Aug 5 14:04 report.txt

We followed "Doc Id 1062983.1, How to Restore ASM Based OCR After Complete Loss of the CRS Diskgroup on Linux/Unix Systems". First, we stopped CRS and changed OCR location from +DATA to another disk group that can be online. (+RECO). Then, we restored OCR config to this disk group and also replaced the existing voting file location. We used the commands below.

[root@exadb01 ~]# crsctl stop crs -f
[root@exadb01 ~]# crsctl start crs -excl -nocrs
[oracle@exab01 ~]$ cat /etc/oracle/ocr.loc
#Device/file +DATA getting replaced by device +DATA/exa/OCRFILE/registry.255.1025972873
ocrconfig_loc=+DATA/exa/OCRFILE/registry.255.1025972873
[oracle@exadb01 ~]$ vi /etc/oracle/ocr.loc
ocrconfig_loc=+RECO ---> Restore location.
[root@exadb01 amdu_2020_08_05_14_04_32]# ocrconfig -restore DATA_875.f
[root@exadb01 amdu_2020_08_05_14_04_32]# cat /etc/oracle/ocr.loc
#Device/file +RECO getting replaced by device +RECO/exa/OCRFILE/registry.255.1047651703
[root@exadb01 amdu_2020_08_05_14_04_32]# crsctl replace votedisk +RECO
Successful addition of voting disk 4e8a91ba0ae64f7abfd7ea9306073fb7.
Successful addition of voting disk a5a7c14a9fc84ff5bf9cd5df4e542871.
Successful addition of voting disk d2fb498f1d3e4fc7bf9c3d2c862182c3.
Successful deletion of voting disk 451c697b49c14fc0bf12e649e6f5a9aa.
Successful deletion of voting disk 693fb97f37ef4f97bfecf646831a2ea4.
Successful deletion of voting disk 519a983e23da4fffbf3e580aa22dd9c9.
Successfully replaced voting disk group with +RECO.
CLSU-00107: operating system function: kgfnStmtExecute; failed with error data: 0; at location: kgfdvfDel01
CLSU-00101: operating system error message: Error 0
CLSU-00104: additional error information: ORA-15001: diskgroup "DATA" does not exist or is not mounted
CLSU-00104: additional error information: ORA-06512: at line 4
CLSU-00104: additional error information: ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 562
CLSU-00104: additional error information: ORA-06512: at line 2
CRS-4266: Voting file(s) successfully replaced
ASMCMD> lsdg
State Type Rebal Sector Logical_Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED NORMAL N 512 512 4096 4194304 4115712 4090576 29824 2030376 0 N MORE/
MOUNTED NORMAL N 512 512 4096 4194304 17965184 4483348 108224 2187562 0 ===> Y RECO/
view raw restore_ocr.txt hosted with ❤ by GitHub

ASM database parameter file (spfile) was also on an offline disk group, we used amdu command to copy it to local storage, created pfile from it and started ASM database on one node, dropped +DATA diskgroup and tried to recreate it.

SYS@+ASM1> show parameter spfile;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +DATA/exa/ASMPARAMETERFILE/registry.253.1025972865
[oracle@exadb01 ~]$ amdu -diskstring 'o/*/DATA_*' -extract data.253
amdu_2020_08_05_14_29_07/
[oracle@exadb01 ~]$ cd amdu_2020_08_05_14_29_07/
[oracle@exadb01 amdu_2020_08_05_14_29_07]$ ll
total 268
drwxr-xr-x 2 oracle oinstall 4096 Aug 5 14:29 .
drwx------ 9 oracle oinstall 4096 Aug 5 14:29 ..
-rw-r--r-- 1 oracle oinstall 3584 Aug 5 14:29 DATA_253.f
-rw-r--r-- 1 oracle oinstall 261446 Aug 5 14:29 report.txt
SYS@+ASM1> create pfile='/tmp/initasm.ora' from spfile='/home/oracle/amdu_2020_08_05_14_29_07/DATA_253.f';
File created.
SYS@+ASM1> create spfile='+RECO' from pfile='/tmp/initasm.ora';
File created.
SYS@+ASM1> startup;
SYS@+ASM1> drop diskgroup data force including contents;
Diskgroup dropped.
SYS@+ASM1> select DATABASE_COMPATIBILITY, COMPATIBILITY, NAME from v$asm_diskgroup;
DATABASE_COMPATIBILITY COMPATIBILITY NAME
------------------------- ------------------------------------------------------------ ------------------------------
11.2.0.4.0 19.0.0.0.0 MORE
11.2.0.4.0 19.0.0.0.0 RECO
SYS@+ASM1> create diskgroup data normal redundancy disk 'o/*/DATA*' ATTRIBUTE 'content.type' = 'DATA', 'AU_SIZE'='4M', 'cell.smart_scan_capable'='TRUE', 'compatible.rdbms'='11.2.0.4.0' , 'compatible.asm'='19.0.0.0.0';
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15411: Failure groups in disk group DATA have different number of disks.

We could not achieve to recreate it with the above command because we had 2 faulty disks on different storage servers. In a full rack healthy Exadata X2-2, there are 14 storage servers and each storage server has 12 disks and totally 168 disks. We had 166 disks, 2 disks are missing on different servers. We did not want to wait for new disks to arrive and created DATA disk group with 11 disks from each server. We will be adding other disks to the disk group. We changed OCR location, voting file location, and spfile location to +DATA disk group.

SYS@+ASM1> create diskgroup data normal redundancy disk 'o/192.168.31.10/DATA_CD_00_exacel02',
'o/192.168.31.10/DATA_CD_01_exacel02',
'o/192.168.31.10/DATA_CD_02_exacel02',
'o/192.168.31.10/DATA_CD_03_exacel02',
'o/192.168.31.10/DATA_CD_04_exacel02',
'o/192.168.31.10/DATA_CD_05_exacel02',
'o/192.168.31.10/DATA_CD_06_exacel02',
'o/192.168.31.10/DATA_CD_07_exacel02',
'o/192.168.31.10/DATA_CD_08_exacel02',
'o/192.168.31.10/DATA_CD_09_exacel02',
'o/192.168.31.10/DATA_CD_10_exacel02',
'o/192.168.31.11/DATA_CD_00_exacel03',
'o/192.168.31.11/DATA_CD_01_exacel03',
'o/192.168.31.11/DATA_CD_02_exacel03',
'o/192.168.31.11/DATA_CD_03_exacel03',
'o/192.168.31.11/DATA_CD_04_exacel03',
'o/192.168.31.11/DATA_CD_05_exacel03',
'o/192.168.31.11/DATA_CD_06_exacel03',
'o/192.168.31.11/DATA_CD_07_exacel03',
'o/192.168.31.11/DATA_CD_08_exacel03',
'o/192.168.31.11/DATA_CD_09_exacel03',
'o/192.168.31.11/DATA_CD_10_exacel03',
'o/192.168.31.12/DATA_CD_00_exacel04',
'o/192.168.31.12/DATA_CD_01_exacel04',
'o/192.168.31.12/DATA_CD_02_exacel04',
'o/192.168.31.12/DATA_CD_03_exacel04',
'o/192.168.31.12/DATA_CD_04_exacel04',
'o/192.168.31.12/DATA_CD_05_exacel04',
'o/192.168.31.12/DATA_CD_06_exacel04',
'o/192.168.31.12/DATA_CD_07_exacel04',
'o/192.168.31.12/DATA_CD_08_exacel04',
'o/192.168.31.12/DATA_CD_09_exacel04',
'o/192.168.31.12/DATA_CD_10_exacel04',
'o/192.168.31.13/DATA_CD_00_exacel05',
'o/192.168.31.13/DATA_CD_01_exacel05',
'o/192.168.31.13/DATA_CD_02_exacel05',
'o/192.168.31.13/DATA_CD_03_exacel05',
'o/192.168.31.13/DATA_CD_04_exacel05',
'o/192.168.31.13/DATA_CD_05_exacel05',
'o/192.168.31.13/DATA_CD_06_exacel05',
'o/192.168.31.13/DATA_CD_07_exacel05',
'o/192.168.31.13/DATA_CD_08_exacel05',
'o/192.168.31.13/DATA_CD_09_exacel05',
'o/192.168.31.13/DATA_CD_10_exacel05',
'o/192.168.31.14/DATA_CD_00_exacel06',
'o/192.168.31.14/DATA_CD_01_exacel06',
'o/192.168.31.14/DATA_CD_02_exacel06',
'o/192.168.31.14/DATA_CD_03_exacel06',
'o/192.168.31.14/DATA_CD_04_exacel06',
'o/192.168.31.14/DATA_CD_05_exacel06',
'o/192.168.31.14/DATA_CD_06_exacel06',
'o/192.168.31.14/DATA_CD_07_exacel06',
'o/192.168.31.14/DATA_CD_08_exacel06',
'o/192.168.31.14/DATA_CD_09_exacel06',
'o/192.168.31.14/DATA_CD_10_exacel06',
'o/192.168.31.15/DATA_CD_00_exacel07',
'o/192.168.31.15/DATA_CD_01_exacel07',
'o/192.168.31.15/DATA_CD_02_exacel07',
'o/192.168.31.15/DATA_CD_03_exacel07',
'o/192.168.31.15/DATA_CD_04_exacel07',
'o/192.168.31.15/DATA_CD_05_exacel07',
'o/192.168.31.15/DATA_CD_06_exacel07',
'o/192.168.31.15/DATA_CD_07_exacel07',
'o/192.168.31.15/DATA_CD_08_exacel07',
'o/192.168.31.15/DATA_CD_09_exacel07',
'o/192.168.31.15/DATA_CD_10_exacel07',
'o/192.168.31.16/DATA_CD_00_exacel08',
'o/192.168.31.16/DATA_CD_01_exacel08',
'o/192.168.31.16/DATA_CD_02_exacel08',
'o/192.168.31.16/DATA_CD_03_exacel08',
'o/192.168.31.16/DATA_CD_04_exacel08',
'o/192.168.31.16/DATA_CD_05_exacel08',
'o/192.168.31.16/DATA_CD_06_exacel08',
'o/192.168.31.16/DATA_CD_07_exacel08',
'o/192.168.31.16/DATA_CD_08_exacel08',
'o/192.168.31.16/DATA_CD_09_exacel08',
'o/192.168.31.16/DATA_CD_10_exacel08',
'o/192.168.31.17/DATA_CD_00_exacel09',
'o/192.168.31.17/DATA_CD_01_exacel09',
'o/192.168.31.17/DATA_CD_02_exacel09',
'o/192.168.31.17/DATA_CD_03_exacel09',
'o/192.168.31.17/DATA_CD_04_exacel09',
'o/192.168.31.17/DATA_CD_05_exacel09',
'o/192.168.31.17/DATA_CD_06_exacel09',
'o/192.168.31.17/DATA_CD_07_exacel09',
'o/192.168.31.17/DATA_CD_08_exacel09',
'o/192.168.31.17/DATA_CD_09_exacel09',
'o/192.168.31.17/DATA_CD_10_exacel09',
'o/192.168.31.18/DATA_CD_00_exacel10',
'o/192.168.31.18/DATA_CD_01_exacel10',
'o/192.168.31.18/DATA_CD_02_exacel10',
'o/192.168.31.18/DATA_CD_03_exacel10',
'o/192.168.31.18/DATA_CD_04_exacel10',
'o/192.168.31.18/DATA_CD_05_exacel10',
'o/192.168.31.18/DATA_CD_07_exacel10',
'o/192.168.31.18/DATA_CD_08_exacel10',
'o/192.168.31.18/DATA_CD_09_exacel10',
'o/192.168.31.18/DATA_CD_10_exacel10',
'o/192.168.31.18/DATA_CD_11_exacel10',
'o/192.168.31.19/DATA_CD_00_exacel11',
'o/192.168.31.19/DATA_CD_01_exacel11',
'o/192.168.31.19/DATA_CD_02_exacel11',
'o/192.168.31.19/DATA_CD_03_exacel11',
'o/192.168.31.19/DATA_CD_04_exacel11',
'o/192.168.31.19/DATA_CD_05_exacel11',
'o/192.168.31.19/DATA_CD_06_exacel11',
'o/192.168.31.19/DATA_CD_07_exacel11',
'o/192.168.31.19/DATA_CD_09_exacel11',
'o/192.168.31.19/DATA_CD_10_exacel11',
'o/192.168.31.19/DATA_CD_11_exacel11',
'o/192.168.31.20/DATA_CD_00_exacel12',
'o/192.168.31.20/DATA_CD_01_exacel12',
'o/192.168.31.20/DATA_CD_02_exacel12',
'o/192.168.31.20/DATA_CD_03_exacel12',
'o/192.168.31.20/DATA_CD_04_exacel12',
'o/192.168.31.20/DATA_CD_05_exacel12',
'o/192.168.31.20/DATA_CD_06_exacel12',
'o/192.168.31.20/DATA_CD_07_exacel12',
'o/192.168.31.20/DATA_CD_08_exacel12',
'o/192.168.31.20/DATA_CD_09_exacel12',
'o/192.168.31.20/DATA_CD_10_exacel12',
'o/192.168.31.21/DATA_CD_00_exacel13',
'o/192.168.31.21/DATA_CD_01_exacel13',
'o/192.168.31.21/DATA_CD_02_exacel13',
'o/192.168.31.21/DATA_CD_03_exacel13',
'o/192.168.31.21/DATA_CD_04_exacel13',
'o/192.168.31.21/DATA_CD_05_exacel13',
'o/192.168.31.21/DATA_CD_06_exacel13',
'o/192.168.31.21/DATA_CD_07_exacel13',
'o/192.168.31.21/DATA_CD_08_exacel13',
'o/192.168.31.21/DATA_CD_09_exacel13',
'o/192.168.31.21/DATA_CD_10_exacel13',
'o/192.168.31.22/DATA_CD_00_exacel14',
'o/192.168.31.22/DATA_CD_01_exacel14',
'o/192.168.31.22/DATA_CD_02_exacel14',
'o/192.168.31.22/DATA_CD_03_exacel14',
'o/192.168.31.22/DATA_CD_04_exacel14',
'o/192.168.31.22/DATA_CD_05_exacel14',
'o/192.168.31.22/DATA_CD_06_exacel14',
'o/192.168.31.22/DATA_CD_07_exacel14',
'o/192.168.31.22/DATA_CD_08_exacel14',
'o/192.168.31.22/DATA_CD_09_exacel14',
'o/192.168.31.22/DATA_CD_10_exacel14',
'o/192.168.31.9/DATA_CD_00_exacel01',
'o/192.168.31.9/DATA_CD_01_exacel01',
'o/192.168.31.9/DATA_CD_02_exacel01',
'o/192.168.31.9/DATA_CD_03_exacel01',
'o/192.168.31.9/DATA_CD_04_exacel01',
'o/192.168.31.9/DATA_CD_05_exacel01',
'o/192.168.31.9/DATA_CD_06_exacel01',
'o/192.168.31.9/DATA_CD_07_exacel01',
'o/192.168.31.9/DATA_CD_08_exacel01',
'o/192.168.31.9/DATA_CD_09_exacel01',
'o/192.168.31.9/DATA_CD_10_exacel01' ATTRIBUTE 'content.type' = 'DATA', 'AU_SIZE'='4M', 'cell.smart_scan_capable'='TRUE', 'compatible.rdbms'='11.2.0.4.0' , 'compatible.asm'='19.0.0.0.0';
[root@exadb01 daily]# ocrconfig -add +DATA
[root@exadb01 daily]# ocrconfig -delete +RECO
[root@exadb01 daily]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 4
Total space (kbytes) : 491684
Used space (kbytes) : 86028
Available space (kbytes) : 405656
ID : 1379897650
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
[root@exadb01 daily]# cat /etc/oracle/ocr.loc
#Device/file +RECO/exa/OCRFILE/registry.255.1047651703 getting replaced by dev
ocrconfig_loc=+DATA/exa/OCRFILE/registry.255.1047659393
[root@exadb01 daily]# crsctl replace votedisk +DATA
Successful addition of voting disk dbcdc9efd9c04fdabf49052d131f4599.
Successful addition of voting disk cab255e606e94f07bf98bbee346929be.
Successful addition of voting disk 415bca1c343d4fd9bf1b41ae3c46f832.
Successful deletion of voting disk 4e8a91ba0ae64f7abfd7ea9306073fb7.
Successful deletion of voting disk a5a7c14a9fc84ff5bf9cd5df4e542871.
Successful deletion of voting disk d2fb498f1d3e4fc7bf9c3d2c862182c3.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced
SYS@+ASM1> create spfile='+DATA' from pfile='/tmp/initasm.ora';
File created.
SYS@+ASM1> select path from v$asm_disk where header_status<>'MEMBER';
o/192.168.31.10/DATA_CD_11_exacel02
o/192.168.31.11/DATA_CD_11_exacel03
o/192.168.31.12/DATA_CD_11_exacel04
o/192.168.31.13/DATA_CD_11_exacel05
o/192.168.31.14/DATA_CD_11_exacel06
o/192.168.31.15/DATA_CD_11_exacel07
o/192.168.31.16/DATA_CD_11_exacel08
o/192.168.31.17/DATA_CD_11_exacel09
o/192.168.31.20/DATA_CD_11_exacel12
o/192.168.31.21/DATA_CD_11_exacel13
o/192.168.31.22/DATA_CD_11_exacel14
SYS@+ASM1> alter diskgroup data add disk 'o/192.168.31.9/DATA_CD_11_exacel01',
'o/192.168.31.10/DATA_CD_11_exacel02',
'o/192.168.31.11/DATA_CD_11_exacel03',
'o/192.168.31.12/DATA_CD_11_exacel04',
'o/192.168.31.13/DATA_CD_11_exacel05',
'o/192.168.31.14/DATA_CD_11_exacel06',
'o/192.168.31.15/DATA_CD_11_exacel07',
'o/192.168.31.16/DATA_CD_11_exacel08',
'o/192.168.31.17/DATA_CD_11_exacel09',
'o/192.168.31.20/DATA_CD_11_exacel12',
'o/192.168.31.21/DATA_CD_11_exacel13',
'o/192.168.31.22/DATA_CD_11_exacel14';
Diskgroup altered.

Everything went smoothly until we tried to start CRS on other nodes. CRS started only on the first node. ASM was only starting one node at a time.
ASM was only starting one node at a time.
[root@exadb02 trace]# vi /u01/app/oracle/diag/crs/exadb02/crs/trace/alert.log
...
2020-08-06 11:24:10.239 [ORAROOTAGENT(278746)]CRS-5019: All OCR locations are on ASM disk groups [DATA], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/exadb02/crs/trace/ohasd_orarootagent_root.trc".
...
[root@exadb02 trace]# tail -f /u01/app/oracle/diag/crs/exadb02/crs/trace/ohasd_orarootagent_root.trc
...
2020-08-06 11:29:37.160 : USRTHRD:697435904: [ INFO] {0:5:3} [ora.storage] Error [kgfoAl06] in [kgfokge] at kgfo.c:3169
2020-08-06 11:29:37.160 : USRTHRD:697435904: [ INFO] {0:5:3} [ora.storage] ORA-01017: invalid username/password; logon denied
...
Error messages were as mentioned above; we had forgotten to restore the password file from the DATA disk group. That was our mistake; we recreated it, and it is available on the next post.

 
Hope it helps.

Comments

Popular posts from this blog

Secure PostgreSQL : Patroni, Etcd, Pgbackrest Included

How to Upgrade PostgreSQL, PostGIS and Patroni in Air-Gapped Environments

Oracle Grid Release Update by using Ansible Playbooks