Posts

Showing posts from 2022

+ASM and +APX Instances are not registered with listener.

Image
Listener does not currently know of SID given in connect descriptor.  After upgrading Oracle grid infrastructure from 11.2.0.4 to 19c; we have detected that +ASM and +APX(ASM Proxy) instances used for advm are not registered with listener and OEM target status shows down.   They were only accessible from private-network(192.168.*.*).We have checked configuration attributes for asm instance with srvctl command. The easy and first step to take was to set it with "srvctl modify" command. Oops something went wrong and that is not possible.  It looks like this command is not working anymore, searched docs.oracle and checked oracle support but no easy way out. Checked asm proxy instance local_listener parameter and it was not set. According to the "Real Application Clusters Installation Guide for Linux and UNIX"  , If you do not set LOCAL_LISTENER, then the Database Agent process automatically updates the database associated with the local listener in the Grid ho

Parse Error Warnings in database alert.log file

Image
 Too many parse errors - how much do you mean? SQL syntax errors are always normal and commonly dealing with them is up to software developers.  But sometimes, they might bother database administrators too. When an sql is syntactically(E.g. syntax error) or semantically(E.g. projection of a nonexistent column) incorrect, its processing fails at parsing stage and it never gets executed. If that happens too often, it can have a negative impact on overall database performance. With 12.2 release, these failing sqls are recorded in alert.log as below if they are called excessively. In the above example, "select dual" statement is failing with "ORA-00923: FROM keyword not found where expected" error code. This is a syntax error, which is observed 100 times  within 4 minutes and it is recorded. Now the question is coming. How much amount of failure is required for an sql to got written in the alert.log file? According to the  Doc ID 16945190.8 , By default the diagnostic w

How To Recreate The ASM Password File? (A Failure Story Part 3)

Image
Is it possible to get credentials from crs?      In last two posts, I have been talking about the catastrophic situation we have experienced which was triggered with the mirrored disk failures. Up until now, we restored OCR config and recreated lost ASM disk group which was hosting OCR before. but we could not be able to start the CRS on more than one node.      We got the error "CRS-5019: All OCR locations on ASM disk groups [DATA], and none of these disk groups are mounted". We recreated asm password file.  (We should have restore it from offline disk group.) It was not enough to recreate it. There were some missing internal users. According to the " Doc ID 2341753.1, The users used in Flex ASM ", CRSUSER__ASM_001 user is needed by crsd and it should have sysasm privilege, so we gave crs what it needed. We defined CRSUSER__ASM_001 user's password on our own, but that was not the proper way. It is an internal user which is created at the grid installation part

Restore OCR from backup located in ASM diskgroup. (A Failure Story Part 2)

Image
OCR lost. Where is OCR backup? On ASM diskgroup. After Mirrored Disk Failure in Normal Redundancy Mode, CRS was down. We could not take the faulty disks and one disk group (DATA)  online again and decided to restore OCR config and change VOTING disk location from DATA to RECO.  Started CRS in exclusive mode and searched for backup locations, unfortunately we had no backups in local file system. OCR backups were on one of ASM disk group and this disk group could not be online. We mounted that disk group in restricted mode and tried copying latest ocr backup to local directory with the commands below, but could not achieve. We searched Oracle support and found, " Doc ID 2569847.1, How to Restore ASM Based OCR when OCR backup is located in ASM diskgroup . " According to the document, "amdu" command was the one we were looking for. We executed the commands below and restored latest ocr backup (file number 875) to our current working directory. We followed, " Doc I

Mirrored Disk Failure in Normal Redundancy Mode. (A Failure Story Part 1)

Image
 IRON MAN  WAS DOWN Lately, in our DR (Disaster Recovery) site, we have experienced two mirrored disk failure in normal redundancy mode which ended up with recreating of dataguard databases.  I will try to explain our problem in detail.     Our databases were down. CRS state was offline. ASM was down. Iron Man was down. We started diagnosing the issue with manually starting up ASM instance on one node. We have 3 diskgroups. 2 of them got mounted, but  one diskgroup(+DATA) could not get mounted. This diskgroup (+DATA) was holding ocrconfig and serving as voting disk.  This is the command and output.  We checked all ASM instances alert.logs to clarify chronological order of events. Let's examine the findings.      On 07:17:22, on exacel11 tried to offline disk 8. This was the first faulty disk.      On 07:17:25, after offlining the first faulty disk, Exadata disk worker process(XDWK) tried to access  partner disks of these ASM disks. All subsequent IOs to faulty ASM disks will be dir

Exadata: Disk controller was hung. Cell was power cycled

Image
 Just another manic magic Monday. After a great weekend, we came to office and performed our daily health checks like every Monday. One of our storage server(cell) of Exadata X2-2 X4270 M2 had lost 11 ASM disks of total 34 ASM disks. We struck it lucky, all databases were still up despite all the lost. Lets examine what happened to our cell server. When i checked the mailbox, i saw an alert mail from the problematic cell stating that "Disk controller was hung. Cell was power cycled". It looks like cell disk controller was not performing well (may be a bug or a peak moment) and forced server to reboot. But normally reboots does not end up with disk losses. I started with checking with cells physicaldisk status. What I got from output was; we had one flashdisk and one harddisk failure (disk number 3) and also one harddisk was in import failure status(disk number 7). But that did not explain 11 ASM disks failure. It should had been 6 ASM disks according to output. There shoul

Bizarre tables: starting with MD* . Let's drop some.

Image
What are these strange tables starting with MD*?  Can I drop them?  I will answer first:        Yes, You can drop some of them if your Oracle DB version is 12cR2 or later.  But which ones? There were over 3000 tables starting with 'MD*' letters in one of our production database.  I knew that those tables are related with Spatial indexes. But that was a huge amount. So I took a deep dive into the spatial indexes.  For each spatial index created, one table named like  "MDRT_#" is created also. There is a one to one relationship (except the partitioned ones) between them. There are also tables named like "MDXT_#, MDXT_#_BKTS, MDXT_#_MBR".  According to the Doc Id 1916251.1 , these tables are created for supporting statistics on Spatial Index and after the statistic analyzing is finished, MDXT_#_BKTS and  MDXT_#_MBR should be dropped automatically. But somehow, we had lots of them.  Also Doc Id 2029072.1 states that  "Dropping these temporary tables ( MDX

Node Eviction after Applying Release Update 19.13

Image
Be CAREFUL! Before Applying Release Update        After applying release update 19.13 to our standby site cluster, consisting of two physical machines (non Oracle-engineered), we started to experience Node Evictions and Reboot Problems. We immediately started to search for the root cause of the issue and followed some steps to make them stable again before applying RU 19.13 to our production sites.  * We chose a sample that is reflecting the problem. On Feb 22, at 13:35, the host machine(bltdb02) rebooted. * We looked the lastgasp log files to get the details why the node got rebooted. Grid cssdagent or cssdmonitor can record node reboots here.      * This files are in ASCII text file format,   there is nothing worhty of mention except the timestamps. The records in these files were related to another date.