Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, too simple.

Sometimes you want to include data and sometimes you don't for different reasons in different contexts. It's not a data handlers job to decide what data is or isn't included, it's the senders job to decide what not to include and the receivers job to decide what to ignore.

The simplest example is probably just the file path. tar or zip don't try to say whether or not a file in the container includes the full absolute path, a portion of the path, or no path.

The container should ideally be able to contain anything that any filesystem might have, or else it's not a generally useful tool, it's some annoying domain-specific specialized tool that one guy just luuuuuvs for his one use-case he thinks is obviously the most rational thing for anyone.

If you don't want to include something like a uid, say for security reasons not to disclose the internal workings of something on your end, then arrange not to include it when creating the archive, the same way you wouldn't necessarily include the full path to the same files. Or outside of a security concern like that, include all the data and let the recipient simply ignore any data that it doesn't support.



Good argument, I've mostly come around to your view. The little "but" that I still see is that the current file formats don't let you omit fields you don't want to pass on, and most decoders don't let you omit fields you don't want to interpret/use while unpacking.

Even if a given decoder could, though, most users wouldn't be able to use that and so they'd get files from 1970 or 1980 if I don't want to pass that on and set it to zeroes, so better is if the field can be omitted (as in, if the header wasn't fixed length but extensible like an IP packet). So I'd still like a "better" archiving format than the ones we have today (though I'm not familiar with the internals of every one of them, like 7z or the mentioned squashfs so tell me if this already exists), but I agree such a format should just support everything ~every filesystem supports


Oh sure, I was talking in generalities and an imaginary archiver, what should an archiver have, not any particular existing actual one.

os and filesystem features differ all over the place, and there will be totally new filesystems and totally new metadata tomorrow. There is practically no common denominator, not even the basic ascii for the filename let alone any other metadata.

So there should just be metadata fields where about the only thing actually parrt of the spec is the structure of a metadata field, not any particular keys or values or number or order of fields. The writer might or might not even include a filed for say, creation time, and the reader might or might not care about that. If the reader doesn't recognize some strange new xattr field that only got invented yesterday, no problem, because it does know what a field is, and how to consume and discard fields it doesn't care about.

There would be a few fields that most readers and writers would all just recognize by convention, the usual basics like filename. Even the filename might not be technically a requirement but maybe an rfc documents a short list of standard fields just to give everyone a reference point. But for instance it might be legal to have nothing but some uuids or something.

That's getting a bit weird but my main point was just that it's wrong to say an archiver shouldn't include timestamps or uids just because one use of archive files is to transfer files from a unix system to a windows system, or from a system with a "bob" user to a system with no "bob" user.


The arguments for tar are --preserve-permissions and --touch (don't extract file modified time).

For unzip, -D skips restoration of timestamps.

For unrar, -ai ignores file attributes, -ts restores the modification time.

There are similar arguments for omitting these when creating the archive, they set the field to a default or specified value, or omit it entirely, depending on the format.


Those are user controls, to allow the user on one end to decide what to put into the container, and there are others to allow the user at the other side to decide what to take out of the container, not limits of the container.

The comment I'm replying to suggested that since one use case results in metadata that is meaningless or mis-matched between sender and receiver, the format itself should not even have the ability to record that metadata.


Is "absolute path" a coherent concept when you are talking about 2 systems?


Is this question a coherent concept when it doesn't change anything when you substitute any other term like "full path" or "as much path as exists" or "any path"?


D:\etc\your.conf would like a word, they seem lost and confused.


It can be if you make assumptions about the basic structure of both systems. Some people rely on this behavior. It can be a good idea or a bad idea, depending on what you're doing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: