Excellent post from Brent Ozar on what to do when a database gets corrupted

You are running DBCCs on your systems at proper intervals, right?  If not, you are keeping your resume up to date, right?  Having a database get corrupted and not being able to recover it can definitely be an RGE (resume generating event), i.e. your job just ended.

Brent goes over an excellent plan for how to address what happens if a DBCC reports an error, stressing the importance of backups and regularly running DBCCs.  If there’s one weakness in his plan, it’s that he does not say to never use the REPAIR_ALLOW_DATA_LOSS option.  This should always be the absolute last option as your data is going to fall in to the bit bucket and will never be seen again.

An excellent defense against database corruption is to use the WITH_CHECKSUM option on your backups.  My job for full backups looks like this:


DECLARE @backupSetId AS INT;

SELECT @backupSetId = POSITION FROM msdb..backupset WHERE database_name=N'BlahBlahBlahBlah' AND backup_set_id=(SELECT MAX(backup_set_id) FROM msdb..backupset WHERE database_name=N'BlahBlahBlahBlah' );

IF @backupSetId IS NULL BEGIN RAISERROR(N'Verify failed. Backup information for database ''BlahBlahBlahBlah'' not found.', 16, 1); END;


You see the CHECKSUM on the backup along with the RESTORE VERIFYONLY.  The code was generated by right clicking on the database, selecting Tasks, then Backup, plug in the parameters, and select Script.  I put it in a new query window as I may back up several databases in the same job.  Sometimes I’ll just do a find/replace for the other databases since my backup.  The Restore Verifyonly gives you some confidence that your backup is recoverable: NEVER assume that just because your backup ran that the database is restorable!  The ONLY way to know is to actually restore it to another file!  You don’t want to accidentally clobber your production that probably has newer data in it.

(I love backing up with compression!  If your CPU is not under undue strain, it will speed up your backups because your CPU is faster than disk I/O so the CPU is able to compress faster than your disks can take the writes, resulting in shorter backup times and smaller backup sets!)

Of course, the best backup methodology would be to do a full restore to another database after the backup, and then DBCC CHECKDB on that copy.

There is absolutely nothing wrong with belts and suspenders.  It may look silly, but if it preserves your job then it’s a good thing.

Another good thing to do is to look at the contents of msdb.dbo.suspect_pages table, hopefully it will always have zero records.  With older versions of SQL Server I’ve seen records appear in this before the DBCC noticed major problems.  And yet another is to make sure that you receive emails when alerts 823, 824, and 825.  Those can give you an early indication that something bad is looming over the horizon.

DBCCs are most commonly caused by a bug in SQL Server, thus it is a good idea to keep your server fully patched with Service Packs and Cumulative Updates, also caused by errors in your storage system.  If you’re suddenly seeing corruption across multiple databases, then your storage has a serious problem and you’re in for a long day.


One thought on “Excellent post from Brent Ozar on what to do when a database gets corrupted

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s