Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not many companies explicitly prepare for the scenario where every single data storage unit in the company is effectively wiped and you have to redeploy from zero.

If you never bootstrap from zero (nor simulate this) then your systems probably have cycles in their deployment dependencies. Your config pusher is deployed from Jenkins/Puppet/Ansible but 2 years ago someone made Jenkins dependent on the config pusher for its own config. Now you cannot just deploy these systems in order, you have to replay the history before that change.



Almost everything will have cycles in IT. People want and security requires some kind of SSO. Now SSO is a dependency for almost everything, including the administration of underlying systems that run SSO. Same for the network. Same for a lot of things.

Bootstrapping from zero will never be easy and will always take some time. I don't think you can prepare your way out of this, short of preparing a fully redundant, fully separate secondary infrastructure.


This is called "break-glass procedure" in enterprise IT (as in "break glass in case of emergency"), and often consists of independent, normally unused, admin accounts on key systems, access info for which is locked in some safe location, e.g. physical safe in a secure location.

Testing this reliably is difficult, though, and often these procedures and their documentation is outdated.


I agree that fully redundant & separate infrastructure is unrealistic. I'm also not saying you can be 100% prepared. My point is that you can improve your posture.

What you can do is to have a sandbox environment where you periodically do a full setup exercise from a prepper disk. Conceptually it's not that different from testing backup recovery (ok, most companies neglect this too, so maybe you have a point :) ).


Problem is, the value of proper recovery procedures and testing those in all aspects only becomes apparent to the bean-counters when things really break. But until they have been in that situation where nothing works for a month, it will always be too expensive, too cumbersome and too resource-hungry to do those preparations.

Which gives me an idea for an "Ask HN"... Edit: submitted https://news.ycombinator.com/item?id=44582994


Reminds me of compiler bootstrapping. If you only have the source code of, let's say, Rust and want to build it you first need to have a Rust compiler of a previous version. And for that you need another compiler. Then you reach times when rustc was written in OCaml. And know what? Now you need to build OCaml compiler!

Other way is to build a stripped down version of rustc only capable of compiling latest rustc, e.g. using https://github.com/dtolnay/bootstrap

Great post on bootstrapping and its problems: https://bootstrappable.org/

Actual project capable of bootstrapping Linux system from scratch: https://github.com/fosslinux/live-bootstrap


This reminds me of troubles in a parallel universe.

Construction industry have products with typical lifetime of 50+, in some cases multiple hundreds. Computing and digitalization are hot topic now and for the past several decades with various buzzwords (probably 'digital twins' is the newest one) however when I am unable to open construction design files made in the beginning of my career less than 30 years ago due to obsolescence for various reasons then all those efforts seem for nothing eventually beyond immediate needs. Good old outdated 2D drawings seen as unfeasible practice might save the day in the future (... perhaps, assuming that current pdf files could still be opened some decades down the line, as that is a common 'digital paper' approach nowadays, actual physical world paper are used less and less).


That happened to company I am familiar with a year ago. The main storage cluster,that everything depended on died. They recovered by deploying everything again from dev laptops.


black start is a hard problem. Even facebook apparently had to drill datacenter door locks open to get back up one time.


So how could a company handle this? Can they bootstrap from printed documentation or is that assumed to be wiped as well?


It's a model of a realistic scenario. Hackers (like in the article), long running ransomware that managed to corrupt lots of data, maybe a natural disaster. So by "wiping all data storage units" I meant the dynamic ones used in production. You can assume a static backup exists and contains a sensible set of sources and binaries, although obviously creating such a backup is part of the recovery plan.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: