Reasons Why I Don’t Like “The Cloud”

Part X of an ongoing series….

First off, the word itself.  The Cloud.  What is The Cloud?  It’s a server that you don’t own.  You can’t touch it, it’s in someone else’s data center.  It may or may not be virtual.  Amazon’s Cloud or Microsoft’s or Google’s are several data centers with racks and racks of servers.  They are physical, just not at your location.  And they’re accessed across the Internet.  This is something that we’ve been doing for 30 years, it’s called a Wide-Area Network, just scaled up bigger.  We had bi-coastal WANs before the World Wide Web came along.

So you’re paying for a server that you have no physical relation to.  Now, on the one hand, you’re also not responsible if something breaks.  You don’t pay the electricity bill for power and cooling or to keep the lights on, or off, as is more common in a lot of data centers.  The concept of backups becomes much more worrying for me because I have to trust them utterly that my machine is being backed up, and in the case of the server that I’m working on now, I don’t know when that VM backup happens.  I perform my due diligence with my SQL Server backups, but if I don’t know WHEN that VM backup takes place, then I don’t know what my recovery window is.  My full backups go off, along with the rest of my maintenance, at about 23:50.  If they back up the VM at 21:00 and I have to restore from that previous backup, I have to know that the previous DBCCs and index maintenance and whatever didn’t run.  Ultimately I’m going to set up a form of log shipping where backups will be compressed, encrypted, and emailed to a repository, but I don’t have that in place right now.

But my big gripe is downtime.  We’ve had Microsoft working on our server trying to resolve a problem.  Yesterday my boss comes in and tells me to sign off that box so they could reboot it, so I do.  Then shortly after he says he can’t connect to it, so he’s wondering if they might have accidentally done a shutdown rather than a restart, so I open up the control panel for the VM.  And it’s unresponsive.

The entire data center was offline.  Our one server plus who knows how many others, all gone.  It was down for a good half an hour.

Let’s relate that to local hardware.  If my server crashes, I reboot it, put the database in to administrator mode, run DBCCs to make sure all is well, then I open it up for the users.  If there’s a larger outage like a disk failure or the entire box goes up in flame, then we’ve got something that’s going to take longer to address.  If a backhoe eats our internet connection, then external users can’t access my system but internal users are fine, and the line will be fixed in a couple of days.

In ALL of these cases, I KNOW WHAT’S GOING ON and I can tell users and management what’s up and give them a SWAG (semi-wild assed guess) as to when normal operations will resume.  When a Cloud goes down?  No idea.  You get an email like this:

We are currently investigating reports of an incident affecting the Network  in the WKRP data center.  This incident will impact your ability to manage current assets, will impact your ability to generate new assets, and does impact availability of current assets.  We are in the process of engaging the appropriate teams to quickly mitigate and resolve this incident and will provide additional information as soon as it is available. …

Isn’t that nice and reassuring?

No, I’m not a fan.  EVERY CLOUD PROVIDER HAS OUTAGES.  Microsoft has had them, Amazon has had them, Google has had them.  Some have had serious security breeches (looking at you, AWS) where it was pretty easy to commandeer someone else’s virtual hosts.  Not good.

It’s hard to do security right.  We’d like to think that people like AWS has ‘Top People’ doing it, but they make mistakes just like us mere mortals.  There’s no easy answers: if you have local servers, you’re going to have problems and outages.  If you throw everything in to The Cloud, you’re still going to have problems and outages, and there won’t be a blessed thing that you can do about it.

So which is better?  Flip a coin, I don’t know.  But for my $0.002, I’d prefer a server that I can touch.

Advertisements

3 thoughts on “Reasons Why I Don’t Like “The Cloud”

  1. If the server backup is an image of the system, SQL Server will respond when database files are imaged. The SQL Server log should have messages about frozen IO. The msdb.dbo.backupset table should have a record for the backup with a device_type = 7 (virtual device). The nice thing about this is that a full system restore will include a fully operational SQL Server with all the databases. The bad thing is that the system might have to freeze a little bit to get the snapshot established before the imaging starts.

    The value of is_copy_only column in backupset is a critical bit of info about the imaging software. If is_copy_only is 1, the image will not affect the SQL backup chain. If it is 0, then the image is now part of the SQL backup chain. If it is part of the backup chain, who does a point in time restore of a single database? Can the imaging system restore a single a single database and apply diff or log restores? If the vendor does not do this, then why is a COPY_ONLY VSS snapshot not good enough for a server image?

    Like

    • Interestingly, I have no logged messages about the data center’s backup process because they backed up when my server was paused, so they have a solid point in time for whenever the backup ran. My server is only up during business hours and for a maintenance window at 23:50, it then pauses again. And I can’t see their logs as to what was backed up.

      Recently our provider changed their backup methodology and they’re no longer backing up the VMs: each client is responsible for their servers. I’ve set up an appropriate backup policy, but again, I have no control beyond selecting include/exclude directories and how often it is to run (every 24 hours) and retention period, but I can’t set the start time. It runs, and after it finishes, it starts the timer for when to run again. And again, I can’t see their logs. I have to assume that if the frequency timer ends while the server is paused, that the backup starts when the server powers up on the next morning.

      This new backup process is more like a conventional backup and won’t back up databases while SQL Server is running, but it will capture my SQL Server database and transaction log backups. I’m not yet sure of the granularity of what I can restore, that’s something that I’ll be investigating.

      I do frequent enough log backups to satisfy any need for a point in time restore, but honestly, in the 20some years that I’ve been doing this I suspect that I’ve done less than a dozen P-I-T restores.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s