> this is also why I never like modifying a working system unless it's absolutel...

jsankey · on July 2, 2016

Exactly. It's analogous to the difference between big bang integration and continuous integration. The lesson there is: if something hurts, do it more often. Little steps let you know exactly what changed when something breaks.

xyience · on July 1, 2016

Tests don't magically make changes safe or give the confidence to make changes.

vacri · on July 2, 2016

A culture of many small changes means that you deal with smaller problems relatively quickly. The more you fall behind, the bigger the jump to where you should be, and it's not a linear relationship.

At one place where I work, we're on nodejs 0.10, which is several release versions behind. It's causing us a bunch of problems, because while 0.10 is still technically not EOL'd yet, npm modules behave like it is... however we've left it so long, that the jump to current stable is a giant task, which we don't have the time for given other business reqs.

Tests indeed don't guarantee safety, but lots of small changes are easier to deal with than the occasional massive change. It's also the basic concept behind version control.

Osiris · on July 3, 2016

This is my experience as well with node and shrinkwrap. I see people using shrinkwrap to avoid potential issues, but what ends up happening is they get stuck on old versions of dependencies and when there's a bug fix or new feature that's needed it can be very difficult to upgrade. Instead, I prefer to try to always keep my dependencies up to date, especially with new major versions to avoid exactly this problem.

tamana · on July 2, 2016

No, they scientifically make changes safe and give confidence to make changes.

TeMPOraL · on July 2, 2016

Do you think that, in case of the problem from this article, Perl devs should have had a test checking if their update doesn't break someone's Emacs when they try to use it in client-server mode, launching one via a Perl script and other via some other means, on a Linux with "capabilities" feature?

This story wasn't about trivial day-to-day developer bugs, but what kind of problems happen in really complex systems.

xoa · on July 2, 2016

>this is safe, but it paints you into a corner over time, where you become paralyzed and can't improve anything.

As someone who went through the process of a painful, long delayed upgrade not too long ago I definitely second this, although as a much more generalized principle I think it'd be more accurate to say that there's a fine, eternal balancing act between "work" and "meta work", and that this principle applies to way more areas of life then systems work. However much fun (or "fun") it may be, as mjd said there most/all of us primarily have work to do using our tools ("tools" being in the most generic sense here, including knowledge) rather then working on our tools. To some extent, a few days spent on tools/skills is a few days not spent applying them, and it's all too easy to sink so much time going down various rabbit holes that "actual work" loses out. But of course on the flip side improving our tools/skill sets is key to realizing major boosts in long term productivity, keeping up with changing standards, and so on. I remember a few years back at one workplace when a number of senior engineers (50s/60s) all finally bit the bullet and started to work to get up to speed on the latest CAD developments. Or myself a decade back when I decided I really needed to update my shell usage, read the full ZSH manual and spend some time seeing how I could improve my speed in general. There were many significant projects going on, but then there always were, always something that "needs to be done next week!". I personally find it can be a tough balancing act to optimize the savings gained from increased productivity down the road vs the time expenditure needed to begin realizing them in the first place, particularly if "everything is working fine". I know that over the years I cumulatively lost plenty of time on manual involvement in tasks I could have automated, but each individualized instance seemed trivial and it was easy to default to just hacking something quick and getting on with the day vs deciding it'd be worth spending time to improve it for good.

Of course that's all assuming there aren't any other barriers in the way. My extremely oddball pain point on one workstation was that I'd enthusiastically built an tower Mac Pro OS X system around ZEVO, an short lived attempt to salvage Apple's old ZFS work and bring a fully functioning version to OS X. And despite a few niggles (some which didn't matter to me, like CLI-only), by the time it was getting ready to go it was fantastic, nicely integrated and all that. I was pumped, it was exactly what I'd wanted under OS X ever since I'd seen Sun's original presentation, and I hopped fully onboard. But of course the company developing it promptly went under just as they were launching, were bought for IP/people by GreenBytes (which itself was subsequently acquired by Oracle), and after a single bug release that was it. It only worked under 10.8 and not one version later, and there was no clear upgrade path (I really didn't want to revert that system back to pure HFS). So 10.8 was where I stayed until OpenZFS and in turn O3X came along to save the day, but by that point I was out of the habit of frequent upgrades there. Testing is definitely helpful (along with a nice rollback system) but sadly can't always save you, frequent upgrades definitely help keep key meta-knowledge fresh.

This was a really cool bug track down article though, and inspiring.