Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>If you are a CEO you should be asking this question: "How many people in this company can unilaterally destroy our entire business model?"

This is a question that the person in charge of backups needs to think about, too. I mean, rephrase it as "Is there any one person who can write to both production and backup copies of critical data?" but it means the same thing as what you said.

(and if the CTO, or whoever is in charge of backups screws up this question? the 'perfect storm' means "all your data is gone" - dono about you, but my plan for that involves bankruptcy court, and a whole lot of personal shame. Someone coming in and stealing all the hardware? not nearly as big of a deal, as long as I've still got the data. My own 'backup' house is not in order, well, for lots of reasons, mostly having to do with performance, so I live with this low-level fear every day.)

Seriously, think, for a moment. There's at least one kid with root on production /and/ access to the backups, right? At most small companies, that is all your 'root-level' sysadmins.

That's bad. What if his (or her) account credentials get compromised? (or what if they go rogue? it happens. Not often, and usually when it does it's "But this is really best for the company" It's pretty rare that a SysAdmin actively and directly attempts to destroy a company.)

(SysAdmins going fully rogue is pretty rare, but I think it's still a good thought experiment. If there is no way for the user to destroy something when they are actively hostile, you /know/ they can't destroy it by accident. It's the only way to be sure.)

The point of backups, primarily, is to cover your ass when someone screws up, primarily. (RAID, on the other hand, is primarily to cover your ass when hardware fails) - RAID is not Backup and Backup is not RAID. You need to keep this in mind when designing your backup, and when designing your RAID.

(Yes, backup is also nice when the hardware failure gets so bad that RAID can't save you; but you know what? that's pretty goddamn rare, compared to 'someone fucked up.')

I mean, the worst case backup system would be a system that remotely writes all local data off site, without keeping snapshots or some way of reverting. That's not a backup at all; that's a RAID.

The best case backup is some sort of remote backup where you physically can't overwrite the goddamn thing for X days. Traditionally, this is done with off-site tape. I (or rather, your junior sysadmin monkey) writes the backup to tape, then tests the tape, then gives the tape to the iron mountain truck to stick in a safe. (if your company has money; if not, the safe is under the owner's bed.)

I think that with modern snapshots, it would be interesting to create a 'cloud backup' service where you have a 'do not allow overwrite before date X' parameter, and it wouldn't be that hard to implement, but I don't know of anyone that does it. The hard part about doing it in house is that the person who managed the backup server couldn't have root on production and vis-a-vis, or you defeat the point, so this is one case where outsourcing is very likely to be better than anything you could do yourself.



> If there is no way for the user to destroy something when they are actively hostile, you /know/ they can't destroy it by accident.

Which also means they can't fix something in case of a catastrophic event. "Recover a file deleted from ext3? Fix a borked NTFS partition? Salvage a crashed MySQL table? Sorry boss, no can do - my admin powers have been neutered so that I don't break something 'by accident, wink wink nudge nudge'." This is, ultimately, an issue of trust, not of artificial technical limitations.

> one case where outsourcing is very likely to be better than anything you could do yourself.

Hm. Your idea that "cloud is actually pixie dust magically solving all problems" seems to fail your very own test. Is there a way to prevent the outsourced admins from, um, destroying something when they are actively hostile? Nope, you've only added a layer of indirection.

(also, "rouge" is "#993366", not "sabotage")


>> If there is no way for the user to destroy something when they are actively hostile, you /know/ they can't destroy it by accident.

>Which also means they can't fix something in case of a catastrophic event. "Recover a file deleted from ext3? Fix a borked NTFS partition? Salvage a crashed MySQL table? Sorry boss, no can do - my admin powers have been neutered so that I don't break something 'by accident, wink wink nudge nudge'." This is, ultimately, an issue of trust, not of artificial technical limitations.

All of the problems you describe can be solved by spare hardware and read only access to the backups. I mean, your SysAdmin needs control over the production environment, right? to do his or her job. but a sysadmin can function just fine without being able to overwrite backups. (assuming there is someone else around to admin the backup server.)

fixing my spelling now.

Yes, it's about trust. but anyone who demands absolute trust is, well, at the very least an overconfident asshole. I mean, in a properly designed backup system (and I don't have anything at all like this at the moment) I would not have write-access to the backups, and I'm majority shareholder and lead sysadmin.

That's what I'm saying... backups are primarily there when someone screwed it up... in other words, when someone was trusted (or trusted themselves) too much.


Okay, now I think I understand you, and it seems we're actually in agreement - there is still absolute power, but it's not all concentrated in one user :)

(that rouge/rogue thing is my pet peeve)


>Hm. Your idea that "cloud is actually pixie dust magically solving all problems" seems to fail your very own test. Is there a way to prevent the outsourced admins from, um, destroying something when they are actively hostile? Nope, you've only added a layer of indirection.

the idea here is to make sure that the people with write-access to production don't have write-access to the backups and vis-a-vis. The point is that now two people have to screw it up before I lose data.

Outsourcing has it's place. You are an idiot if you outsource production and backups to the same people, though. This is why I think "the cloud" is a bad way of thinking about it. Linode and rackspace are completely different companies... one of them screwing it up is not going to effect the other.


>> I think that with modern snapshots, it would be interesting to create a 'cloud backup' service where you have a 'do not allow overwrite before date X' parameter, and it wouldn't be that hard to implement, but I don't know of anyone that does it.

I test backups for F500 companies on a daily basis (IT Risk Consulting) - this would be missing the point really, the business process around this problem is really moving towards live mirrored replication. This allows much faster recall time, and also mitigates many risks with the conventional 'snapshot' method through either tapes, cloud, etc.


I think that with modern snapshots, it would be interesting to create a 'cloud backup' service where you have a 'do not allow overwrite before date X' parameter, and it wouldn't be that hard to implement, but I don't know of anyone that does it.

Does Amazon Glacier offer this?


I think since it uses generated IDs for each archive, it's impossible to overwrite anything.


From a sysadmin:

Redundancy = Reduce the number of component failures that can lead to system failure (RAID, live replication, hot standby).

Backup = Recover from an obvious failure or overwite (Weekly full backups, daily differentials).

Archival = Recover from a non-obvious failure and/or malicious activity (WORM tapes, offsite backup).

As a failsafe against malicious sysadmins is to split up the responsibilities. The guy handling backups isn't handling archival etc...


I've said it elsewhere, but it bears repeating: RAID is about availability first and foremost. The fact that it happens to preserve your data in the case of one form of hardware failure is a side effect of its primary goal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: