(One of the reasons for this blog is for me to document my processes for myself. These are the ways that I do certain things. In your career as a DBA, if that is your job, you’ll develop your own preferred techniques. If these are useful for you, bonus! Regardless, it gives me a source code repository that if I go somewhere to do some consulting to help them get their system maintaining itself properly, I can pull up posts like this and get them going quickly.)
The most critical thing as a SQL Server DBA is to ensure that your databases can be restored in the event of the loss of a server for whatever reason: disk crash, fire in the server room, tribble invasion, whatever. To do this, not only do you have to back up your databases, you also have to test restores! Create a database and restore the backups of your production DB to them. It’s the safest way to make sure that everything works. This test restore can be automated to run every night, but that’s outside the scope of what I want to talk about right now.
There are lots of places that problems can creep in, this is just one part of how you’ll need to monitor systems. This is how I’ve done things for a number of years, and thus far it has served me well.
First and foremost, backups. Your requirements will vary depending on your Recovery Point Objectives (RPO, how much data loss is acceptable) and Recovery Time Objectives (RTO, how long before the system is available). If your normal business hours are Monday to Friday 8-5 with your largest database just a few gigabytes, you’ll have radically different requirements than someone with databases in the hundreds or thousands of gigabytes that require 24/7 uptime.
We’ll go with demonstrating a simple installation.
When your operation is 8-5 Mon-Fri, you have a huge maintenance window that you can get everything done in, so there’s no reason not to get everything done.
My personal method is to do my maintenance tasks at night. These include DBCC CheckDB, Update Statistics, and checking the suspect pages table. Note that I DO NOT USE MAINTENACE PLANS. Gail Shaw recently described them as brain dead, doing far more than what is needed. I particularly don’t like the miniscule amount that they report, so you’re never clear what was done. So I do everything through SQL jobs. And if I have a SQL Express installation, since everything is based on .CMD and .SQL files, it’s not a big deal to run everything through Window’s Task Scheduler.
Since my normal business hours normally see no activity at night, that’s when I do my maintenance. I do a full backup of each database, followed by a full transaction log backup. Those will be caught by the server backup later that night and written to tape, which will then be moved to another building the next work day. I do differential database backups at noon, and transaction backups throughout the day, typically every 10 minutes. As a rule you don’t want to start multiple backups at the exact same time, so stagger the start time slightly. If you start the transaction log backups at 7:30 to capture people who come in early, fine. I have the first transaction log backup initialize the log, then the appending t-log backups would actually begin at 7:40. Have the noon differential fire at 12:02pm. Stop the t-log backups at 6pm, start your nightly maintenance at 7pm. You will develop a feel for how long each task takes based on the size of the database.
On to the job specifics.
My first step calls a file called NightlyDBCC.cmd, which is this:
osql -S"[servername]" -E -id:\dbccs\nightlydbcc.sql -od:\dbccs\dbccresult.txt -w200
osql -S"[servername]" -E -id:\dbccs\nightlysuspectpages.sql -od:\dbccs\suspectpagesresult.txt -w200
This .CMD file is two lines, each starting with osql -S.
The OSQL program is a SQL Server command line tool that lets you run T-SQL commands from the command line. The switches that I use are:
- –S: the name of your SQL Server
- -E: use the current user’s credentials for authentication
- -I: the input file that contains the commands to be run
- -O where to write the output
- -W: column width
Like most Windows utilities, you can get a full list of options by typing OSQL /? in a command prompt window.
The nightlydbcc.sql file contains the following:
exec sp_MSforeachdb @command1="print getdate() print '? checkdb' DBCC CHECKDB(?)"
The stored procedure, sp_MSforeachdb, is an undocumented system stored procedure that runs a command string against every database on a server. It accepts a question mark as a wildcard representing each database name. So the string being run might be better visualized as
For Every Database:
print getdate() --print the date for the output log
print '? checkdb' --print header record of the database name
DBCC CHECKDB(?) --run DBCC CHECKDB against every database
And the output is routed to dbccresult.txt. Inspect that file every morning and you’ll have a good idea if your system is clean or if something nasty is lurking inside waiting to leap out, like the chestburster from the movie Alien.
NightlySuspectPages.sql has a little bit more going on.
SET NOCOUNT ON;
SELECT 'Run date:', GETDATE();
SET NOCOUNT OFF;
--Server and SQL uptime
SELECT DATEADD(SECOND, -1 * (ms_ticks / 1000), GETDATE()) AS ServerStartDate,
sqlserver_start_time AS SQLStartDate
SELECT 'Suspect Pages' AS SuspectPages, *
SELECT GETDATE() AS RunTime, LEFT(name, 15) AS DBName,
LEFT(filename, 60) AS Filename, LEFT(status ,6) AS Status
WHERE dbid > 4
-- AND status <> 528
ORDER BY name;
The first code selects the time stamp of when the code ran in to the output file. The second block prints when the server was last started and when SQL Server started up. They should be fairly close, but could vary if you’d installed some SQL service packs or updates that could have restarted the SQL services.
The third select looks for any pages in msdb.dbo.suspect_pages. These pages represent an I/O error, which could be a transient glitch or a warning of a problem in your disk system: a SAN or disk, whatever, it’s something to look at. Take a look at Books Online for more information, just enter suspect_pages in to the search box. Hopefully, normally, there won’t be any records here and you can ignore it. But if something ever appears here, you have a problem!
And finally, the fourth select lists every database on that instance and shows the Status field. It will be a number like 65536 or 65544. That number will change depending on what options are selected for the database, like file recovery mode, etc. The thing that I watch for is if it changes — if someone changes a database from full recovery to simple, I want to know that!
More on this soon.