More

codyps · on Aug 6, 2017

When drives are never replaced, yes. In the case where the chassis lasts longer than the drives (which I'd imagine is often), extra cost in drives adds up.

codyps · on July 9, 2017

The idea with having eBPF in the kernel is that we can limit the amount of trust given to a particular user-space task.

Accepting compiled stuff in the form of a kernel module requires root privileges and requires that the kernel essentially have complete trust in the code being loaded.

Loading eBPF eliminates the need to trust the process/user doing the loading to that level.

dangisafascist · on July 9, 2017

The bpf() system call and SOCK_RAW both require root. Is there an example of using bpf that doesn't require root?

sargun · on July 9, 2017

The BPF syscalls don't require cap sys admin. Only specific invocations. You can setup a socket filter without sys admin, and a device or XDP filter with net admin.

dangisafascist · on July 9, 2017

Sure but how common is that case? How common are multi-tenant Linux systems with untrusted users that give those specific permissions? Do you want untrusted users sniffing the packets of others?

codyps · on April 23, 2017

`alloca` is a simple addition to the stack pointer, so a single instruction, presuming it isn't folded into the normal bump of the stack pointer to allocate the fixed size local variables. There isn't really much cost to doing a dynamic stack allocation rather than a fixed one. Variable length arrays (VLAs) allow the same thing but can be slightly more portable.

Normal C caveats do apply here though: alloca is POSIX, not C (but is widely implemented outside of POSIX systems). VLAs are an optional standard feature. Neither is required to actually use the stack for storage.

Not sure if there are any platforms supported by curl which would prevent it's use of VLAs or alloca.

tjalfi · on April 23, 2017

tl;dr - alloca costs, history, and why it is problematic

Alloca is somewhat more expensive on x86/x64 than a single instruction.

[0] shows the code generation for four functions that generate and sum an iota array. I used -O1 to make the differences more apparent.

iota_sum_alloca and iota_sum_vla generate similar code. They both require a frame pointer (RBP) and code to preserve the 16 byte alignment of the stack frame.

iota_sum_const_alloca and iota_sum_array generate identical code. Clang recognizes that alloca is invoked with a constant argument.

History of Alloca

Alloca was originally written for unix V7 [1]. Doug Gwyn wrote a public domain implementation [2] in the early 80s for porting existing programs. The FSF used Gwyn's alloca implementation in GDB, Emacs, and other programs. This helped to spread the idea.

Problems of Alloca

[3] is a comp.compilers thread that discusses some of the issues with alloca. Linus does not want either VLAs or alloca in the Linux kernel [4].

References:

[0] https://godbolt.org/g/1JyXhQ

[1] http://yarchive.net/comp/alloca.html

[2] https://github.com/darchons/android-gdb/blob/android-gdb_7_5...

[3] http://compilers.iecc.com/comparch/article/91-12-079

[4] https://groups.google.com/forum/#!msg/fa.linux.kernel/ROgkTB...

Edited for minor formatting changes.

tjalfi · on April 24, 2017

https://godbolt.org/g/XKAZOb fixes the bugs in the sample code in the parent post.

If you compile with -O2 then iota_sum_const_alloca and iota_sum_array are both evaluated at compile time.

codyps · on March 7, 2017

For those actually curious about the implementation on solaris/illumos, heres a quick rundown (from looking at current illumos source):

- comm_page (usr/src/uts/i86pc/ml/comm_page.s) is literally a page in kernel memory with specific variables that is mapped (usr/src/uts/intel/ia32/os/comm_page_util.c) as user|read-only (to be passed to userspace, kernel mapping is normal data, AFAICT)

- the mapped comm_page is inserted into the aux vector at AT_SUN_COMMPAGE (usr/src/uts/common/exec/elf/elf.c)

- libc scans auxv for this entry, and stashes the pointer it containts (usr/src/lib/libc/port/threads/thr.c)

- When clock_gettime is called, it looks at the values in the COMMPAGE (structure is in usr/src/uts/i86pc/sys/comm_page.h, probing in usr/src/lib/commpage/common/cp_main.c) to determine if TSC can be used.

- If TSC is usable, libc uses the information there (a bunch of values) to use tsc to read time (monotonic or realtime)

Variables within comm_page are treated like normal variables and used/updated within the kernel's internal timekeeping.

Essentially, rather than having the kernel provide an entry point & have the kernel know what the (in the linux case) internal data structures look like, here libc provides the code and reads the exported data structure from the kernel.

So it isn't reading the time from this memory page, it's using TSC. In the case of CLOCK_REALTIME, corrections that are applied to TSC are read from this memory page (comm_page).

binarycrusader · on March 8, 2017

So it isn't reading the time from this memory page, it's using TSC. In the case of CLOCK_REALTIME, corrections that are applied to TSC are read from this memory page (comm_page).

This summary only applies to Illumos. The Solaris implementation diverged significantly around build 167 (2011) long after the last OpenSolaris build Illumos was based on (build 147). It changed again significantly in 2015.

I believe Circonus contributed an alternate implementation that does some of the same things as Solaris in 2016:

https://www.circonus.com/2016/09/time-but-faster/

With that said, you are correct that whether or not it will read from a memory page instead depends on which interfaces you are using (e.g. get_hrusec()) and other subtle details.

codyps · on March 8, 2017

So the only things I'm seeing in the linked circonus code that differ from illumos:

1. no use of a kernel supplied page, determines skew/etc itself in userspace 2. stores information on a per-cpu level, and tries to execute cpuid on the same cpu as rdtsc.

I'm presuming you're talking about #2 (and #1 is just due to the linked item being a library without kernel integrations)? Perhaps with some more kernel support so that the actual cpu rdtsc ran on can be reliably determined?

This still doesn't clarify the part about "shared page in which the time is updated" and is read from. This statement appears to imply TSC is not (necessarily) used (otherwise I'd categorize it under "uses values from memory page to fixup TSC", like Illumos' current implimentation). I'm still not sure how that can be done reasonably.

Is there just a 1 micro second timer running whenever a user task is being executed that is bumping the value? Wouldn't that be quite a bit of overhead? Or some HW trick? I mean, you could generate a fault on every read, and have the kernel populate the current data, but that seems just as bad as a syscall.

codyps · on Feb 17, 2017

I end up using it when fixing up local changes which had broken due to upstream modifications.

It's generally not seen as very good to send things upstream that conflict with existing changes, and including merges (from master, primarily) in upstream submissions is frowned upon.

When I was using mercurial, I ended up using the mq extension [1] to provide a similar work flow. I actually prefer mq's work flow to rebasing in git (simplifies some things when maintaining a set of changes), but the programs like that for git are lacking (I've tried guilt and quilt).

BeetleB · on Feb 17, 2017

I used mq, and really liked it when I used it. Admittedly do not use it much any more (too lazy to do the extra typing).

codyps · on Feb 13, 2017

A bit clearer with the full quote:

> Although limiting liability online was intended to protect sites hosting digital content, it carried over to service platforms

breischl · on Feb 13, 2017

The intent of TFA is clearer, but it's still false. AirBnB and Uber didn't get some legal limitation of liability, they just started doing something new and asserted that they weren't liable. Turns out some jurisdictions agree, and others don't. But nobody thought that Common Carrier or the CDA exemptions applied to cars or apartments.

codyps · on Nov 23, 2016

distcc is not a cache. It doesn't keep the output of the compiler around after it builds things. It (only) distributes running the compiler around (and to do that handles some details of preprocessing, in certain modes) and then gets the output back to the requester of the compilation task.

At that point, the requester could cache that output, should they want to.

codyps · on Oct 2, 2016

There is a per-user limit to the number of inotify handles available (max_user_watches) and the default value is 8192.

The limit exists because there is a ~1KiB kernel memory overhead per watch (though there should really be a way for them to take part in normal memory accounting per-process).

If one wants to watch a directory tree, one needs an inotify watch handle per subdirectory in that tree. On large trees (or if more than 1 process is using inotify), that number of watches can be exceeded.

As lots of folks are looking for recursive watches, they aren't happy with needing to allocate & manage a bunch of handles when they see what they want as a single item.

That said, I'm not sure the way the kernel thinks about fs notifications internally would allow a single handle recursive watch at the moment.

In any case, the amount of info one can obtain by using fuse (or any fs like nfs or 9p) to intercept filesystem accesses is a bit larger. At the very least, one can (in most cases) directly observe the ranges of the file that were modified (though that's not quite so important for tup, afaik). There also aren't any queue overruns (which can happen in inotify) because one will just slow the filesystem operations down instead (whether this is desirable or not depends on the application).

codyps · on Oct 2, 2016

regarding "out of tree": I'm not quite sure about your explanation here (just looks like a list of source files), but presuming you mean "creates output files in a seperate directory from source", it doesn't really have complete support for that. You can use "variants" to place output files in a subdirectory of the source tree, though.

> "some way for tup to manage discovering the files to build"

Well, no. It's not a "convention" build tool like rust's `cargo` where you just place things in the default locations and it figures it out.

You can use the `run ./script args` mechanism in tup to run your own script that emits tup rules, though.

The manual has details: http://gittup.org/tup/manual.html

gravypod · on Oct 2, 2016

By out of tree I mean discover all the source files from the file tree.

codyps · on March 5, 2016

The author also doesn't cover autoconf/automake, which operates in the same vein as cmake (generating files for use by another build tool).

I agree, it would be useful to evaluate it here as it provides a bunch of the features the other "build tools" provide.