> by filtering any "books" (rather, files) that are larger than 30 MiB we can re...

> by filtering any "books" (rather, files) that are larger than 30 MiB we can reduce the total size of the collection from 51.50 TB to 18.91 TB, shaving a whopping 32.59 TB

Books greater than 30 MiB are all the textbooks.

You are killing the knowledge.

Also killing a lot of rare things.

If you want to do something amazing and small, OCR them.

As an example of greater than 30 meg, I grabbed a short story by Greg Bear the other day not available digitally, it was in a 90 meg copy of a 1983 Analog Science Fiction and Fact

Side note de-duping is an incredibly hard project, how will you diff a mobi and a epub and then make a decision? Or a decision between a mobi and a mobi?

Books also change with time. Even in the 90's kids books from the 60's had been 'edited' These can be hidden gems to collectors. Cover art also.