In my opinion, it's libraries that secretly use threads under the covers (but still access state shared with the rest of the program like the malloc heap) that are what's dangerous.
guh. Those bugs aren't fun to hunt down. I don't think you really understand the importance of avoiding global or semi-global states until you have to do concurrent stuff, or when you have to start being serious about tests.
What libraries need to do GC? That is a language issue and not a library issue.
When you are embedding a language like Go in a C program, its runtime becomes a library. But that's the example the proves the rule: there are (or were) well-known problems with that. Libraries shouldn't be doing GC.
Yes. Libraries should provide algorithms and data structures, and parameterize a threading interface if necessary (though I think it's usually not necessary).
Applications can then instantiate their own threads and pass them to the library (i.e. dependency injection.)
So if you have 10 different concurrent libraries in an application, then you don't end up with 10 different threading paradigms or 10 different thread pools. This is a matter of application architecture -- libraries shouldn't bake in a specific policy.
...this applies not just to threading, but other things that are not in the domain of the library such as memory allocation strategy or communication mechanism.
Libraries should be small, focused and decoupled. This is the way to proper re-usability.
I suppose another rule should be that there libraries should encapsulate a sufficient amount of complexity to make their existence worthwhile.... unlike left-pad.
There's nothing really wrong with packaging small isolated pieces of code. The debacle came from consumers' bad habit of randomly upgrading their dependencies without doing full regression testing (partly because the tooling encourages this terrible practice).
I know it's very different in JVM land, but many Akka libraries do something similar where they take am actor system, allowing the developer to share or separate the actor systems.
> Threads are inherently the right way to do concurrency for more complex tasks.
True, but they're more often used as a crutch where a select-style loop and/or state machine would be more suitable, leaving race conditions in their wake.
I completely disagree. The only part of this that seems dangerous is the fact that it can turn a single-threaded program into a multi-threaded one, and the only real problem there is it makes fork-without-exec unsafe. But fork-without-exec is basically unsafe anyway unless you can absolutely guarantee that nothing has ever spawned a thread. Meanwhile, preventing libraries from using threading under the covers basically means throwing away lot of performance gains for no good reason.
This is the very first time I have heard about fork-without-exec as dangerous, but it is not the first time I have heard about having to be careful about libraries and hidden threads.
The thing is that any thread you launch is a big piece of machinery and your attempt to hide it is inevitably going to be a leaky abstraction.
> The thing is that any thread you launch is a big piece of machinery and your attempt to hide it is inevitably going to be a leaky abstraction.
That's not true at all. Threads aren't that complicated, and there's lots of ways of using them that aren't complicated and don't leak anything. As a trivial example, if I want to process a bunch of data, it might be faster to process it in parallel, and I can use a thread to do that without causing any observable difference in behavior (beyond potentially turning a single-threaded program into a multithreaded one). And if I'm on a platform like OS X or iOS, I can use higher-level threading APIs like libdispatch that make it even simpler.
This is exaclty the kind of reasoning that makes library writers stumpble into using abstractions that turn out leaky. The whole point of the OP is to show one (of many) machanisms through which it does cause an observable change in behaviour.
And your example of a big calculation in a library is particurly bad, because the user can threadify that herself if she wants to.
Now it's true those five lines would become a single function call if your library forced the user to user a particular dispatching library. But I think that would be poor seperation of concerns.
It's leaky in one extremely minor way that affects almost no programs whatsoever. It's extremely rare for anyone to do fork-without-exec, and anyone doing that already has a large number of restrictions they must be aware of.
And no, the user can't threadify that themselves. Your sample code is putting the library call on a thread. That's not at all what I was talking about. I was talking about a library using multiple threads (or, more likely, some abstraction over a thread pool that handles scheduling chunks of work on threads for you, such as libdispatch on Apple platforms) in order to process a bunch of data in parallel, data that only the library knows about. The user makes a single call into the library, the library processes a bunch of data in parallel, and then returns to the user. From the user's perspective it's a synchronous call, but it runs in a fraction of the time it would if the library wasn't allowed to use threading internally.
You can have your cake and eat it too -- there's no throwing away of performance gains. Libraries can use threads, but not directly or "silently". They should take threads as parameters. Then the application knows when they're being used and the application programmer can reason about it.
compilers and runtimes can spawn threads implicitly when required (think of openmp, cilk+, or even autopar). Are you saying that libraries shouldn't use these facilities?
If you're spawning threads implicitly then you're no longer a library, you're a framework. Sometimes a framework is a good fit, but frameworks don't compose (if you try to use two frameworks in the same application you're gonna have a bad time) so there is a cost. But in any case, better a framework that's explicitly a framework than a "library" which can't actually be used freely in general-purpose code.
No. His definition is correct. If you make the call, it's a library, that it might use threads during that call does not make it a framework. Libraries don't mean you're in full control of everything, they only mean you invoke all interactions and get results back rather than you subclass something and plug a part into a framework. If you have to subclass something, it's a framework.
I've used OpenMP a little bit, and I would probably avoid it for libraries.
I might use it to start out an application, since it requires very little code for dumb parallelism. But if I were making a real library, I would try to abstract the concurrency policy away so the user can control it.
This should be trivial by layering the library. Just provide a bunch of single-threaded functions, and then some OpenMP wrappers. Then user can call the single-threaded functions if she wants to do something different.
What I don't want is OpenMP baked deeply into the program logic. I think it shares the problem I was talking about. If you have 10 different libraries using OpenMP in an app, and the app is ITSELF multi-threaded, it seems like you have a mess of threads that you have no control over.
What is dangerous is that the interactions between threads and processes aren't part of POSIX standard and its behavior is OS specific across UNIX implementations.
Hence why fork() is just kind-of implemented on their POSIX compatibility layers.
In UNIX it is only an issue, because IEEE doesn't want to set in stone in POSIX how threads, signals and processes should interact, due to the way threads were added to UNIX.
A process shall be created with a single thread. If a multi-threaded process
calls fork(), the new process shall contain a replica of the calling thread
and its entire address space, possibly including the states of mutexes and
other resources.
I would advise people that lack experience writing portable UNIX code to read "Advanced Programming in the UNIX Environment" by W. Richard Stevens and Stephen A. Rago.
Just to get a glimpse of what POSIX says and what each UNIX actually does.
A thread belongs to a process so, for example, whatever the "second" thread does is the same as what a process does.
POSIX defines a few mechanisms for IPC.
What is optional and what is mandatory in pthreads doesn't concern me as i don't even like pthreads. (will still read what you said, later)
CSP stile concurrency over pipes (be they named or not), shared memory and synchronizations over it. POSIX defines mechanisms that work as expected over UNIX platforms.
I wonder how many (standards compliant) programs work differently between linux, *bsd, solaris, etc.
>Wrong, you can fork() and only the thread issuing the fork () call survives the fork(), for example.
Thinking about it, it makes perfect sense.
Threads are processes that share the same memory space. When a thread is created (via pthread_whatever() or clone()) it starts executing some code that is practically in a $RANDOM place. So how would one go about copying the state of a process that has threads ? Freeze all the threads and copy everything ? That way the thread, that is probably doing some job at the moment, would end up being cloned without it knowing. So you would get two processes (threads are processes) that are doing exactly the same thing. Until something external changes their (internal) state, that is.
Non-determinism should always be avoided if possible.
This still does not say that one couldn't write portable, threaded, UNIX compliant code. Nor that those undefined behaviors from the specs would be fixable by rewriting the spec (by adding atomic fork+exec, that seems to me to be the context of this whole thing).
Libraries should not be disallowed from spawning threads. Threads are inherently the right way to do concurrency for more complex tasks. fork() is a tremendously complex call and has huge implications on all things in the OS.
For many years now I have the rule that new code must not use fork() and I am living a happier life. It's a bad call and it has no use if you have good threading support in the language. There are many programming languages that do not support fork() andy more or heavily restrict the use.
I would add, in many/most cases using fork() without calling exec() just doesn't make sense. Even at the system call level you have clone(), which is how actual threading is implemented under the covers. And really, if you need a thread, just get a thread. I see extremely few or no advantages to doing fork() instead of spawning a new thread normally using whatever interface is provided (except in the case of exec() where fork() or vfork() is necessary).
Yap. Erlang 19 (latest version) switched to using a smaller spawner executable and spawns all OS processes from there by forking. So it forks something restricted and small not the whole VM.
This is not a bad strategy, especially because between the moment you fork() and the child calls exec(), you have a lot of memory regions shared in copy-on-write (because it would be Too costly to actually duplicate the universelle). Which means that busy threads writing in memory in the parent area will trigger minor page faults, possibly impacting performances...
The "historic description" in the man page you cite disagrees: vfork(2) was added before BSD had copy-on-write, so the memory was actually copied and vfork(2) is essentially a crude hack to avoid that.
Unfortunately once copy-on-write was implemented, the crude hack was retained for backwards compatibility....
Thanks for pointing this out! Interesting that the code states that part of the concern is increasing memory usage. I don't quite understand why it would "duplicate the memory usage" since fork uses copy-on-write?
Anyway, I didn't explain it well, but this was what I was trying to get at with "fork a worker at the beginning of your program, before there can be other threads."
Copy-on-write doesn't work the same on every OS... Linux uses over-commit so it's not a big deal, but Solaris (and NT) don't over-commit so you actually need enough VM at the time when you call fork or fork will fail, so you may need to provide a lot of swap space on those systems to successfully call fork in large processes.
In some programs, fork(2) is the right way to do concurrency. It's simple (API-wise), and it makes IPC explicit, as opposed to the implicit IPC of threads. clone isn't posix, so you can forget it, if you don't feel like being behind systemd in the "screw any system that isn't Linux" line.
In places where you need the speed, threads are useful, but they're harder to use than forks.
Annoyingly, threads and forks don't work well together. zzzcpan already talked about how to fork threaded code. As for the problem of libraries using threads, if a library you're using is using threads, and it's documented (it probably is), and you didn't know about it, I'll pencil that in as your fault.
Other comments in this thread have dismissed fork(2) as a bad job entirely, but I don't think I agree. It's an effective way to do simple multiprocessing, and it's a lot simpler than threads in many contexts.
It sounds like if you use threads and fork it might be a disaster. Its true, but the author blames only fork and skip the second part of the problem, threads. This is not fair, because fork is much older than POSIX threads. In fact, POSIX threads were poorly designed to use with fork.
Fair point. My opinion is that in today's age of multi-core CPUs, shared memory concurrency is extremely useful for making efficient use of computing resources. As a result, I find threads to be unavoidable in most large systems I've dealt with recently.
Agree, that multi-threading probably the best choice for CPU-intensive applications. But for databases or http-servers multi-process + coroutines/fibers can be a better solution. For example, Redis is few-thread application. It creates additional threads only for disk I/O, because unfortunately file descriptors in Linux/UNIX do not work in async mode.
Btw, Redis reminds me, one really awesome usage of fork. To make a snapshot of itself Redis forks the main process. After that the child simply goes through tuples and write them to a files with out any worries that somebody will modify a records. Copy-on-write mechanism simply prevents it.
If you are in control of all the threads that an application is running, you can call fork() safely by making sure the threads are put in a safe state (no critical locks retained) when the fork() call happens. Also note that many things that should normally be unsafe, like having running threads calling malloc() while another thread is forking, are actually safe in the real world using certain implementation of fork, since there are pre/post fork "hooks" in the malloc implementation in order to fix the state of the child.
So if you control very closely the libs you link with and what they do, as well as the threads you use yourself, it is possible to use fork() in a reliable way.
glibc attempts this using the pthread_atfork() hook. it takes the global malloc (and possibly other) forks in the parent before forking, then both the parent and child release the lock before returning to the callee's code. obviously if you have locks in your own code then you may or may not need to hold these before calling fork() so that both parent and child can release them after the fork().
the apple way of just throwing its hands in the air and abort()ing sound either like giving up because the apple devs think it is too hard, or because they feel the need to impose their way in crippling code that uses fork() without calling exec()
in my opinion if you are going to write a library that uses threads and is supposed to be thread safe then remember you need to handle the issue of fork(); don't be lazy, do the job properly, even if doing it right it is hard and ugly.
You are completely correct! The thing that surprised me is how difficult that can be. For example, on Mac OS X if you want to read the system proxy settings, you magically get threads added to your program that you can't control.
The fork-exec code in my program currently does some things that are supposedly unsafe, but I can't figure what to use instead of initgroups() which is not async-signal-safe. I want to run the child as a different user for which I do initgroups+setgid+setuid between fork and exec. The only solution I see is to run getgrouplist() before the fork then in the child use setgroups() instead of initgroups(), but both of these functions are non-standard.
EDIT: Never mind, seems like initgroups() is also nonstandard but generally available on Unix-like systems.
Hmm, consider we'd want to keep fork(), then the only safe way to deal with this would be, that critical sections were actually transactions implemented on the OS level and at after fork() in the child all transactions in flight get rolled back before returning from fork().
I see two implementation challenges with that suggestion:
1.) implementing that transaction mechanism as a kernel feature: When entering a CS mark all pages CoW, upon leaving the CS merge modified pages (problem: Whole pages are then mutual exclusive, dealing with this is the challenge)
2.) battling with user space implemented locks that use atomics.
-----
An immediate mitigation I see is, that fork() itself is a CS on _all_ the locks of a process. If we consider that only the standard locking mechanisms are used, then whenever a CS is entered (which includes the creation process of a locking primitive) it raises/posts a global fork-lock semaphore. And upon leaving that semaphore is lowered.
This still leaves the DIY-locking primitives problem open. But it should be more or less straightforward to add this to the system libc/pthread libraries' locking primitive implementation and fork() syscall wrappers.
Or did I miss something essential here? Talk is cheap, so if nobody has any obvious objections I'd actually go ahead implement it.
EDIT: Okay, one immediate problem I see is, that this would pose a challenge for calling fork inside a CS. Technically this is a situation where thread recursive locks would help, but as we all know, recursive locks are highly problematic.
Sounds like you are trying to re-invent Software Transactional Memory - was a nice idea 10 years ago, but after lots of research it is still not available in common runtimes...
Some more problems to consider:
1. how do you roll back IO that happens in a critical section?
2. your global semaphore will be highly contended and basically destroy scalability
re 2) the fork global lock satisfies every requirement for a multiple-readers/single-writer, with fork() being the only writer. Unless the program does a lot of fork()-ing it should hardly ever run into contention. The moment fork() waits on the lock, further attempts on read-lock are delayed until after fork() completes and the existing reads are waited for, before fork()-ing.
Even fork _with_ exec can be real trouble. This is one of my bugaboos at work: due to poor life choices and high pain tolerance, I own the infrastructure we use to spawn subprocesses (carefully.) For various reasons (security most notably) we have to do some very tricky things in and to the forked child before exec(), and pretty much all of this code is a disaster waiting to happen. Every so often I get feature requests for more stupid pet tricks people would like out of subprocesses, and they're always surprised by what their "simple" change would entail.
I'd like it if Linux had native support for posix_spawn, but even that would require a lot of extensions to be useful.
Don't get me started on the teams that want to break forking rules and thus ask me how to guarantee a process has no non-main threads. There are few ways you can make me more upset than by building software that breaks if some one else happens to call pthread_create and doesn't tell you.
I ran into this exact problem recently in a multithreaded Python app and spent two days trying to figure out wtf was going on. The multiprocessing spawn startup mode that was added in 3.4 solves this for most use cases at the expense of a small performance hit. For 2.x you are SOL however.
Part of my motivation for writing these things is because after I've wasted so much time, I'd love to help others not do the same. Hopefully the article shows up when you search for the right error message. Its also so I remember what the heck was happening.
Re: multiprocessing in 3.4: I think it is unfortunate that due to backwards compatibility, the "forkserver" mode can't become the default on Unix, since it would help avoid this particular issue.
I would like to see the reasons this comment was down-voted. It is true that fork(2) is simple and elegant design, but it involves a huge accidental complexity and hidden implications like in the original article. There is a reason that Linux has clone(2) underneath.
> It is true that fork(2) is simple and elegant design
What exactly is simple and elegant about it? Have you looked into how much tooling is necessary everywhere in unix to make it work? It's insane. It has a huge footprint and it does not provide standardized APIs to make it work for non covered cases (the best we have is pthread_atfork which is not portable).
What makes for simple? I really need to understand this. I implemented fork many years ago for a university operating system and nothing about either the implementation nor the usage is simple. I use fork on a daily basis and the POSIX version of it is insanely complex for everybody involved.
> I implemented fork many years ago for a university operating system and nothing about either the implementation nor the usage is simple
Me too. I found it simple: Create new child process, copy the address space, copy the instruction pointer, let it run.
If you want to improve performance via copy-on-write, it gets more complex to implement. If you combine it with other features like threads, mmap, limits, etc use becomes complex, but the mechanism itself is simple.
You might be able to design an easy variant of fork, which handles threads, mmap, limits, etc in a sane way, but that would not be simple anymore.
Fork is actually a good example, why simple and easy are different concepts and why it is really hard to achieve both.
> Me too. I found it simple: Create new child process, copy the address space, copy the instruction pointer, let it run.
Even in the total absence of threads (which you never have on a modern OS) you already fucked up. Because you did not account for file handles, signal handling, pid management etc.
fork() even on the simplest real world operating system is a massive pain for the OS.
Spawn-style APIs have to take a billion parameters that are mostly set to defaults, for things like working directory, environment variables, user ID, controlling terminal. Whereas fork+exec means you can express these things in a more compositional, buildery style: fork, change the two things you actually need to change, then exec.
Meh, no one ever remembers all those stateful hidden parameters to fork() for process creation. This results in painful bugs later on, possibly affecting security. Speaking from experience.
Python says explicit is better than implicit. I agree in this case.
I don't think this makes the case for elegance at all.
Global vars set wherever in a program are preferable to explicit params when creating a new process? And there's no way to have default params or anything?
> Global vars set wherever in a program are preferable to explicit params when creating a new process?
The outside world is always going to be implicit global mutable state; spawning an arbitrary process to do who-knows-what is inherently that kind of problem. If you could express it as an actual function you wouldn't need to run a separate process.
> And there's no way to have default params or anything?
Well not in C. You can sort-of do it by having a struct with fields that you fill in where all-bits-zero is the default but that's generally more trouble than it's worth.
> Well by that reasoning there's never any need for function purity since you can't eliminate global state on current computer designs.
If you care about purity you don't use spawn-like functionality at all.
> C easily allows you to have multiple functions. exec_with_current_state or something.
That doesn't help. You'd need to have one variant where you just want to change the working directory, one variant where you just wanted to set an environment variable, one variant where you want to do both, one variant where you want to run as a different UID and change working directory but not touch the environment variables...
Only if your app is multi-threaded. Unfortunately, many apps are multi-threaded "under the covers" due to libraries that spawn threads, so you need to be careful.
This is possible, but I find these restrictions to be hard to follow. As soon as you need to call a function, you now need to audit that function to determine that it only calls other async signal safe functions. When you come back to the code to fix a bug six months later, you need to remember these restrictions.
Do this when you must, but it is easy to screw up.
> How to use fork safely: 1. Only use fork to immediately call exec. 2. Fork a worker at the beginning of your program, before there can be other threads. 3. Only use fork in toy programs.
4. Stop writing broken multithreaded code altogether or if you must at least run an event loop per core/thread and use a wrapper for fork() to put the system into a fork()able state before forking. It's nice and reliable.
Multithreading by itself is just not a high-level concept to be used reliably by programs.
You are completely correct: it is in fact possible to use fork if you can carefully control the state of threads in your program. My point is that can be difficult in large software projects, particularly when random APIs like "get the system proxy settings" use threads without you having any ability to control them.
Fork/Exec is only good if you are going to run a different program then you are currently executing.
If you want a series of worker threads that are effectively the same as the master/initial thread. The proper way to do this clone(2) not fork, or fork/exec.
When used properly clone allows your group of threads to share a single PID, and TGID (Thread Group IDentifier). This cuts down on kernel resources your process(es) are using. They can natively share file descriptors and memory between each other (improving cache coherence). Also their signals are handled globally for all threads at once, not each thread managing it's own signals like Fork/Exec will result in.
...And why would you want that? The advantage for fork(2) over clone is that you CAN'T share those datastructures, making it harder to write code resulting a deadlock: you have to share explicitely.
As for the kernel resources for the process, you'd be surprised how large RAM has gotten these days...
All your applications share the same virtual memory space. So you have a much higher hit rate in your TLB, then without. It greatly lowers the chance that shared memory will leave L3 cache.
Cache hit rates are very important...
The advantage for fork(2) over clone is that you CAN'T
share those datastructures, making it harder to write
code resulting a deadlock
Dead locking has nothing to do with how RAM is accessed or how virtual memory is partitioned, it has to do with how you are managing your locking. Modern memory/instruction re-ordering is stupidly fast. If you are locking memory/data structure locations you are likely doing something wrong. Concurrent Memory fences are around 3 orders of magnitude faster then locks.
As for the kernel resources for the process, you'd be
surprised how large RAM has gotten these days...
TLB/L2/L3 cache is still premium real-estate. RAM size is awesome! I love waiting 100,000cycles to get the page I requested.
Cache hit rates aren't as important as you seem to think. If you're not google, and your app doesn't have hard/soft realtime constraints (videogames, flight control systems, etc.) your app will probably be fast enough. You'd be surprised how fast computers are nowadays...
>If you are locking memory/data structure
locations you are likely doing something wrong.
Oh, concurrent memory fences! Well, that solves all my problems. Let's see. All I have to do is give up all hope of my code ever being portable to mutiple architectures, and than dig around in asm to add support for them to my environment, which can range from mildly annoying (C), to near-impossible (JVM). Or I can use fork(2), and do what programmers have been doing for decades: trading performance for simplicity, and productivity.
All I have to do is give up all hope of my code ever
being portable to multiple architectures
You know nothing about modern compiler atomics.
Post 2011 compilers (LLVM, GCC, MSVC, ICC) standardized generic memory fences for C++ and C. These are fully portable as the compiler itself determines how the layout of Acquire/Releases needs to be changed on platform you are compiling too.
Atomics are supported on ARM, x64, MIPS64, SPARC64, POWER8, POWER9...
So they're very portable. The compiler even manages removing unneeded fences. Like if you add an acquire fence after a CAS load. The CAS load is an acquire fence on x64, but not on POWER8.
Okay, what about HLLs? Memory fences don't work if you aren't writing C. And as I've said, forks are an easier way to get these guarantees, albeit at a moderate perf cost.
Why are you concerned about system calls in a HLL?
Because in many HLLs, fork(2) is the best (or sometimes the only) option for concurrency. As in C, it is certainly the simplest.
Many things written in HLLs use fork. Unicorn is perhaps the most famous example.
Why are you concerned with concurrency guarantees in a HLL?
Because if I have multiple threads of execution, I want them to run in parallel if possible, and I want to minimize the risk of deadlocks. fork(2) does both.
You care about none of these, you care about your run time. This is why you are working in a HLL.
No. I care about them, as demonstrated above. And I don't know what you mean by care about my runtime. I would go so far as to claim that the above is an insult to me and everybody else who uses an HLL.
I care about getting stuff done. If C is the right tool for that, I'll use it. If an HLL is the right tool, I'll use it. And if syscalls are the right tool for the job, I'll use them, too.
In my opinion, it's libraries that secretly use threads under the covers (but still access state shared with the rest of the program like the malloc heap) that are what's dangerous.
guh. Those bugs aren't fun to hunt down. I don't think you really understand the importance of avoiding global or semi-global states until you have to do concurrent stuff, or when you have to start being serious about tests.
I dunno--under that definition every concurrent GC is "dangerous".
What libraries need to do GC? That is a language issue and not a library issue.
When you are embedding a language like Go in a C program, its runtime becomes a library. But that's the example the proves the rule: there are (or were) well-known problems with that. Libraries shouldn't be doing GC.
> What libraries need to do GC? That is a language issue and not a library issue.
Boehm GC disagrees.
Yes. Libraries should provide algorithms and data structures, and parameterize a threading interface if necessary (though I think it's usually not necessary).
Applications can then instantiate their own threads and pass them to the library (i.e. dependency injection.)
So if you have 10 different concurrent libraries in an application, then you don't end up with 10 different threading paradigms or 10 different thread pools. This is a matter of application architecture -- libraries shouldn't bake in a specific policy.
true...
...this applies not just to threading, but other things that are not in the domain of the library such as memory allocation strategy or communication mechanism.
Libraries should be small, focused and decoupled. This is the way to proper re-usability.
Like left-pad?
jeez, no.... extremes of everything are terrible.
I suppose another rule should be that there libraries should encapsulate a sufficient amount of complexity to make their existence worthwhile.... unlike left-pad.
That's a function deployed as a library, not a real library; the word library implies many functions.
There's nothing really wrong with packaging small isolated pieces of code. The debacle came from consumers' bad habit of randomly upgrading their dependencies without doing full regression testing (partly because the tooling encourages this terrible practice).
I know it's very different in JVM land, but many Akka libraries do something similar where they take am actor system, allowing the developer to share or separate the actor systems.
> Threads are inherently the right way to do concurrency for more complex tasks.
True, but they're more often used as a crutch where a select-style loop and/or state machine would be more suitable, leaving race conditions in their wake.
I completely disagree. The only part of this that seems dangerous is the fact that it can turn a single-threaded program into a multi-threaded one, and the only real problem there is it makes fork-without-exec unsafe. But fork-without-exec is basically unsafe anyway unless you can absolutely guarantee that nothing has ever spawned a thread. Meanwhile, preventing libraries from using threading under the covers basically means throwing away lot of performance gains for no good reason.
This is the very first time I have heard about fork-without-exec as dangerous, but it is not the first time I have heard about having to be careful about libraries and hidden threads.
The thing is that any thread you launch is a big piece of machinery and your attempt to hide it is inevitably going to be a leaky abstraction.
> The thing is that any thread you launch is a big piece of machinery and your attempt to hide it is inevitably going to be a leaky abstraction.
That's not true at all. Threads aren't that complicated, and there's lots of ways of using them that aren't complicated and don't leak anything. As a trivial example, if I want to process a bunch of data, it might be faster to process it in parallel, and I can use a thread to do that without causing any observable difference in behavior (beyond potentially turning a single-threaded program into a multithreaded one). And if I'm on a platform like OS X or iOS, I can use higher-level threading APIs like libdispatch that make it even simpler.
This is exaclty the kind of reasoning that makes library writers stumpble into using abstractions that turn out leaky. The whole point of the OP is to show one (of many) machanisms through which it does cause an observable change in behaviour.
And your example of a big calculation in a library is particurly bad, because the user can threadify that herself if she wants to.
Now it's true those five lines would become a single function call if your library forced the user to user a particular dispatching library. But I think that would be poor seperation of concerns.
It's leaky in one extremely minor way that affects almost no programs whatsoever. It's extremely rare for anyone to do fork-without-exec, and anyone doing that already has a large number of restrictions they must be aware of.
And no, the user can't threadify that themselves. Your sample code is putting the library call on a thread. That's not at all what I was talking about. I was talking about a library using multiple threads (or, more likely, some abstraction over a thread pool that handles scheduling chunks of work on threads for you, such as libdispatch on Apple platforms) in order to process a bunch of data in parallel, data that only the library knows about. The user makes a single call into the library, the library processes a bunch of data in parallel, and then returns to the user. From the user's perspective it's a synchronous call, but it runs in a fraction of the time it would if the library wasn't allowed to use threading internally.
You can have your cake and eat it too -- there's no throwing away of performance gains. Libraries can use threads, but not directly or "silently". They should take threads as parameters. Then the application knows when they're being used and the application programmer can reason about it.
Yes, in theory. In practice, every library developer/maintainer uses different guidelines.
In practice, a lot of library developers/maintainers read this list, or know someone who does.
compilers and runtimes can spawn threads implicitly when required (think of openmp, cilk+, or even autopar). Are you saying that libraries shouldn't use these facilities?
If you're spawning threads implicitly then you're no longer a library, you're a framework. Sometimes a framework is a good fit, but frameworks don't compose (if you try to use two frameworks in the same application you're gonna have a bad time) so there is a cost. But in any case, better a framework that's explicitly a framework than a "library" which can't actually be used freely in general-purpose code.
would you consider a parallelized BLAS a framework?
Yes. Isn't that what I just said?
well, then you have a pretty non-standard definition of framework.
> If you're spawning threads implicitly then you're no longer a library, you're a framework.
That't not what makes a framework. It's all about inversion of control, you call a library and a framework calls you.
http://stackoverflow.com/questions/3057526/framework-vs-tool...
If it's spawning threads, it's controlling the control flow. Either it's calling you, or the control flow is out of control.
No. His definition is correct. If you make the call, it's a library, that it might use threads during that call does not make it a framework. Libraries don't mean you're in full control of everything, they only mean you invoke all interactions and get results back rather than you subclass something and plug a part into a framework. If you have to subclass something, it's a framework.
I've used OpenMP a little bit, and I would probably avoid it for libraries.
I might use it to start out an application, since it requires very little code for dumb parallelism. But if I were making a real library, I would try to abstract the concurrency policy away so the user can control it.
This should be trivial by layering the library. Just provide a bunch of single-threaded functions, and then some OpenMP wrappers. Then user can call the single-threaded functions if she wants to do something different.
What I don't want is OpenMP baked deeply into the program logic. I think it shares the problem I was talking about. If you have 10 different libraries using OpenMP in an app, and the app is ITSELF multi-threaded, it seems like you have a mess of threads that you have no control over.
What is dangerous is that the interactions between threads and processes aren't part of POSIX standard and its behavior is OS specific across UNIX implementations.
This isn't an issue on non-UNIX OSes.
How many non-unixes do have fork_without_exec()?
Most non-UNIX OSes only have exec(), not fork().
Hence why fork() is just kind-of implemented on their POSIX compatibility layers.
In UNIX it is only an issue, because IEEE doesn't want to set in stone in POSIX how threads, signals and processes should interact, due to the way threads were added to UNIX.
Non-unix systems don't have exec; they have spawn.
Windows has exec.
Exec is substitution of current process with new image without closing fds or changing pid. Which winapi function does that?
_wexec and friends. It internally uses CreateProcess obviously.
Right, but it's not used in the sense of fork+exec. I should have said that Windows doesn't only have exec, it has spawn too.
Yeah, sure. I always think of spawn as a plain wrapper exec/fork, hence why I wrote it like that, but you are right correcting me.
Pthreads are part of the POSIX standard. Hence the "p".
Then there is pipe() and probably some more similar ones.
Then you should read what POSIX says about pthreads, regarding mandatory and optional API support, specially the implementation semantics and UB.
For example, what happens to the interactions of signal handlers semantics and thread scheduling.
Or what happens to the threads when the process does a fork().
> Or what happens to the threads when the process does a fork().
You mean this? http://pubs.opengroup.org/onlinepubs/009695399/functions/for...
Granted, it's SUSv3.
I would advise people that lack experience writing portable UNIX code to read "Advanced Programming in the UNIX Environment" by W. Richard Stevens and Stephen A. Rago.
Just to get a glimpse of what POSIX says and what each UNIX actually does.
EDIT: Not talking about you, rather in general.
A thread belongs to a process so, for example, whatever the "second" thread does is the same as what a process does.
POSIX defines a few mechanisms for IPC.
What is optional and what is mandatory in pthreads doesn't concern me as i don't even like pthreads. (will still read what you said, later)
CSP stile concurrency over pipes (be they named or not), shared memory and synchronizations over it. POSIX defines mechanisms that work as expected over UNIX platforms.
I wonder how many (standards compliant) programs work differently between linux, *bsd, solaris, etc.
> A thread belongs to a process so, for example, whatever the "second" thread does is the same as what a process does.
Wrong, you can fork() and only the thread issuing the fork () call survives the fork(), for example.
There are quite a few other differences.
> POSIX defines mechanisms that work as expected over UNIX platforms.
Not really, there are a few semantic differences and corner cases specially outside the GNU/LINUX, *BSD ones.
>Wrong, you can fork() and only the thread issuing the fork () call survives the fork(), for example.
Thinking about it, it makes perfect sense.
Threads are processes that share the same memory space. When a thread is created (via pthread_whatever() or clone()) it starts executing some code that is practically in a $RANDOM place. So how would one go about copying the state of a process that has threads ? Freeze all the threads and copy everything ? That way the thread, that is probably doing some job at the moment, would end up being cloned without it knowing. So you would get two processes (threads are processes) that are doing exactly the same thing. Until something external changes their (internal) state, that is.
Non-determinism should always be avoided if possible.
This still does not say that one couldn't write portable, threaded, UNIX compliant code. Nor that those undefined behaviors from the specs would be fixable by rewriting the spec (by adding atomic fork+exec, that seems to me to be the context of this whole thing).
Libraries should not be disallowed from spawning threads. Threads are inherently the right way to do concurrency for more complex tasks. fork() is a tremendously complex call and has huge implications on all things in the OS.
For many years now I have the rule that new code must not use fork() and I am living a happier life. It's a bad call and it has no use if you have good threading support in the language. There are many programming languages that do not support fork() andy more or heavily restrict the use.
I would add, in many/most cases using fork() without calling exec() just doesn't make sense. Even at the system call level you have clone(), which is how actual threading is implemented under the covers. And really, if you need a thread, just get a thread. I see extremely few or no advantages to doing fork() instead of spawning a new thread normally using whatever interface is provided (except in the case of exec() where fork() or vfork() is necessary).
Yap. Erlang 19 (latest version) switched to using a smaller spawner executable and spawns all OS processes from there by forking. So it forks something restricted and small not the whole VM.
Here are the details of how it works:
https://github.com/erlang/otp/blob/a5256e5221aff30f6d2cc7fab...
They also claim a 3-5x speedup for launching external commands because of it. So there is a nice performance boost as well.
So basically can add 4th strategy -- fork once a small program at the start, then fork from there from then on.
This is not a bad strategy, especially because between the moment you fork() and the child calls exec(), you have a lot of memory regions shared in copy-on-write (because it would be Too costly to actually duplicate the universelle). Which means that busy threads writing in memory in the parent area will trigger minor page faults, possibly impacting performances...
yeah, that's why vfork has been introduced back in the days. See http://man7.org/linux/man-pages/man2/vfork.2.html
The "historic description" in the man page you cite disagrees: vfork(2) was added before BSD had copy-on-write, so the memory was actually copied and vfork(2) is essentially a crude hack to avoid that.
Unfortunately once copy-on-write was implemented, the crude hack was retained for backwards compatibility....
Thanks for pointing this out! Interesting that the code states that part of the concern is increasing memory usage. I don't quite understand why it would "duplicate the memory usage" since fork uses copy-on-write?
Anyway, I didn't explain it well, but this was what I was trying to get at with "fork a worker at the beginning of your program, before there can be other threads."
Copy-on-write doesn't work the same on every OS... Linux uses over-commit so it's not a big deal, but Solaris (and NT) don't over-commit so you actually need enough VM at the time when you call fork or fork will fail, so you may need to provide a lot of swap space on those systems to successfully call fork in large processes.
Ah thanks, that makes sense. Fun trade-offs: Either run out of memory when you call fork, or fight with the out-of-memory killer at a later time :)
In some programs, fork(2) is the right way to do concurrency. It's simple (API-wise), and it makes IPC explicit, as opposed to the implicit IPC of threads. clone isn't posix, so you can forget it, if you don't feel like being behind systemd in the "screw any system that isn't Linux" line.
In places where you need the speed, threads are useful, but they're harder to use than forks.
Annoyingly, threads and forks don't work well together. zzzcpan already talked about how to fork threaded code. As for the problem of libraries using threads, if a library you're using is using threads, and it's documented (it probably is), and you didn't know about it, I'll pencil that in as your fault.
Other comments in this thread have dismissed fork(2) as a bad job entirely, but I don't think I agree. It's an effective way to do simple multiprocessing, and it's a lot simpler than threads in many contexts.
It sounds like if you use threads and fork it might be a disaster. Its true, but the author blames only fork and skip the second part of the problem, threads. This is not fair, because fork is much older than POSIX threads. In fact, POSIX threads were poorly designed to use with fork.
Fair point. My opinion is that in today's age of multi-core CPUs, shared memory concurrency is extremely useful for making efficient use of computing resources. As a result, I find threads to be unavoidable in most large systems I've dealt with recently.
Agree, that multi-threading probably the best choice for CPU-intensive applications. But for databases or http-servers multi-process + coroutines/fibers can be a better solution. For example, Redis is few-thread application. It creates additional threads only for disk I/O, because unfortunately file descriptors in Linux/UNIX do not work in async mode.
Btw, Redis reminds me, one really awesome usage of fork. To make a snapshot of itself Redis forks the main process. After that the child simply goes through tuples and write them to a files with out any worries that somebody will modify a records. Copy-on-write mechanism simply prevents it.
Redis save snapshot code: https://github.com/antirez/redis/blob/unstable/src/rdb.c#L99...
If you are in control of all the threads that an application is running, you can call fork() safely by making sure the threads are put in a safe state (no critical locks retained) when the fork() call happens. Also note that many things that should normally be unsafe, like having running threads calling malloc() while another thread is forking, are actually safe in the real world using certain implementation of fork, since there are pre/post fork "hooks" in the malloc implementation in order to fix the state of the child.
So if you control very closely the libs you link with and what they do, as well as the threads you use yourself, it is possible to use fork() in a reliable way.
glibc attempts this using the pthread_atfork() hook. it takes the global malloc (and possibly other) forks in the parent before forking, then both the parent and child release the lock before returning to the callee's code. obviously if you have locks in your own code then you may or may not need to hold these before calling fork() so that both parent and child can release them after the fork().
the apple way of just throwing its hands in the air and abort()ing sound either like giving up because the apple devs think it is too hard, or because they feel the need to impose their way in crippling code that uses fork() without calling exec()
in my opinion if you are going to write a library that uses threads and is supposed to be thread safe then remember you need to handle the issue of fork(); don't be lazy, do the job properly, even if doing it right it is hard and ugly.
You are completely correct! The thing that surprised me is how difficult that can be. For example, on Mac OS X if you want to read the system proxy settings, you magically get threads added to your program that you can't control.
The fork-exec code in my program currently does some things that are supposedly unsafe, but I can't figure what to use instead of initgroups() which is not async-signal-safe. I want to run the child as a different user for which I do initgroups+setgid+setuid between fork and exec. The only solution I see is to run getgrouplist() before the fork then in the child use setgroups() instead of initgroups(), but both of these functions are non-standard.
EDIT: Never mind, seems like initgroups() is also nonstandard but generally available on Unix-like systems.
Hmm, consider we'd want to keep fork(), then the only safe way to deal with this would be, that critical sections were actually transactions implemented on the OS level and at after fork() in the child all transactions in flight get rolled back before returning from fork().
I see two implementation challenges with that suggestion:
1.) implementing that transaction mechanism as a kernel feature: When entering a CS mark all pages CoW, upon leaving the CS merge modified pages (problem: Whole pages are then mutual exclusive, dealing with this is the challenge)
2.) battling with user space implemented locks that use atomics.
-----
An immediate mitigation I see is, that fork() itself is a CS on _all_ the locks of a process. If we consider that only the standard locking mechanisms are used, then whenever a CS is entered (which includes the creation process of a locking primitive) it raises/posts a global fork-lock semaphore. And upon leaving that semaphore is lowered.
This still leaves the DIY-locking primitives problem open. But it should be more or less straightforward to add this to the system libc/pthread libraries' locking primitive implementation and fork() syscall wrappers.
Or did I miss something essential here? Talk is cheap, so if nobody has any obvious objections I'd actually go ahead implement it.
EDIT: Okay, one immediate problem I see is, that this would pose a challenge for calling fork inside a CS. Technically this is a situation where thread recursive locks would help, but as we all know, recursive locks are highly problematic.
Sounds like you are trying to re-invent Software Transactional Memory - was a nice idea 10 years ago, but after lots of research it is still not available in common runtimes...
Some more problems to consider:
1. how do you roll back IO that happens in a critical section? 2. your global semaphore will be highly contended and basically destroy scalability
re 1) good point, didn't think of that
re 2) the fork global lock satisfies every requirement for a multiple-readers/single-writer, with fork() being the only writer. Unless the program does a lot of fork()-ing it should hardly ever run into contention. The moment fork() waits on the lock, further attempts on read-lock are delayed until after fork() completes and the existing reads are waited for, before fork()-ing.
Endorsed.
Even fork _with_ exec can be real trouble. This is one of my bugaboos at work: due to poor life choices and high pain tolerance, I own the infrastructure we use to spawn subprocesses (carefully.) For various reasons (security most notably) we have to do some very tricky things in and to the forked child before exec(), and pretty much all of this code is a disaster waiting to happen. Every so often I get feature requests for more stupid pet tricks people would like out of subprocesses, and they're always surprised by what their "simple" change would entail.
I'd like it if Linux had native support for posix_spawn, but even that would require a lot of extensions to be useful.
Don't get me started on the teams that want to break forking rules and thus ask me how to guarantee a process has no non-main threads. There are few ways you can make me more upset than by building software that breaks if some one else happens to call pthread_create and doesn't tell you.
I ran into this exact problem recently in a multithreaded Python app and spent two days trying to figure out wtf was going on. The multiprocessing spawn startup mode that was added in 3.4 solves this for most use cases at the expense of a small performance hit. For 2.x you are SOL however.
Part of my motivation for writing these things is because after I've wasted so much time, I'd love to help others not do the same. Hopefully the article shows up when you search for the right error message. Its also so I remember what the heck was happening.
Re: multiprocessing in 3.4: I think it is unfortunate that due to backwards compatibility, the "forkserver" mode can't become the default on Unix, since it would help avoid this particular issue.
Face the fact, fork() is fundamentally flawed!
I would like to see the reasons this comment was down-voted. It is true that fork(2) is simple and elegant design, but it involves a huge accidental complexity and hidden implications like in the original article. There is a reason that Linux has clone(2) underneath.
Every method has limitations. If you want to call these flaws that's just semantics.
> It is true that fork(2) is simple and elegant design
What exactly is simple and elegant about it? Have you looked into how much tooling is necessary everywhere in unix to make it work? It's insane. It has a huge footprint and it does not provide standardized APIs to make it work for non covered cases (the best we have is pthread_atfork which is not portable).
Fork is simple, but not easy.
What makes for simple? I really need to understand this. I implemented fork many years ago for a university operating system and nothing about either the implementation nor the usage is simple. I use fork on a daily basis and the POSIX version of it is insanely complex for everybody involved.
> I implemented fork many years ago for a university operating system and nothing about either the implementation nor the usage is simple
Me too. I found it simple: Create new child process, copy the address space, copy the instruction pointer, let it run.
If you want to improve performance via copy-on-write, it gets more complex to implement. If you combine it with other features like threads, mmap, limits, etc use becomes complex, but the mechanism itself is simple.
You might be able to design an easy variant of fork, which handles threads, mmap, limits, etc in a sane way, but that would not be simple anymore.
Fork is actually a good example, why simple and easy are different concepts and why it is really hard to achieve both.
> Me too. I found it simple: Create new child process, copy the address space, copy the instruction pointer, let it run.
Even in the total absence of threads (which you never have on a modern OS) you already fucked up. Because you did not account for file handles, signal handling, pid management etc.
fork() even on the simplest real world operating system is a massive pain for the OS.
> What exactly is simple and elegant about it?
Spawn-style APIs have to take a billion parameters that are mostly set to defaults, for things like working directory, environment variables, user ID, controlling terminal. Whereas fork+exec means you can express these things in a more compositional, buildery style: fork, change the two things you actually need to change, then exec.
Meh, no one ever remembers all those stateful hidden parameters to fork() for process creation. This results in painful bugs later on, possibly affecting security. Speaking from experience.
Python says explicit is better than implicit. I agree in this case.
I don't think this makes the case for elegance at all.
Global vars set wherever in a program are preferable to explicit params when creating a new process? And there's no way to have default params or anything?
> Global vars set wherever in a program are preferable to explicit params when creating a new process?
The outside world is always going to be implicit global mutable state; spawning an arbitrary process to do who-knows-what is inherently that kind of problem. If you could express it as an actual function you wouldn't need to run a separate process.
> And there's no way to have default params or anything?
Well not in C. You can sort-of do it by having a struct with fields that you fill in where all-bits-zero is the default but that's generally more trouble than it's worth.
Well by that reasoning there's never any need for function purity since you can't eliminate global state on current computer designs.
C easily allows you to have multiple functions. exec_with_current_state or something. And a struct is a way, too.
> Well by that reasoning there's never any need for function purity since you can't eliminate global state on current computer designs.
If you care about purity you don't use spawn-like functionality at all.
> C easily allows you to have multiple functions. exec_with_current_state or something.
That doesn't help. You'd need to have one variant where you just want to change the working directory, one variant where you just wanted to set an environment variable, one variant where you want to do both, one variant where you want to run as a different UID and change working directory but not touch the environment variables...
You can set all the sensible defaults automatically. In fact, most spawn APIs do set sensible defaults.
Elegant in a 'clever trick' sense, not as good engineering.
fork() is not per se "dangerous" in such context. You just have to be careful enough and only use asynchronous-signal-safe fonctions, as you would do in a signal handler (see https://www.securecoding.cert.org/confluence/display/c/SIG30...)
Typically calls such as malloc(), printf() etc. are strictly forbidden in the child after a fork().
Only if your app is multi-threaded. Unfortunately, many apps are multi-threaded "under the covers" due to libraries that spawn threads, so you need to be careful.
This post does the best job explaining the issue: http://www.linuxprogrammingblog.com/threads-and-fork-think-t...
This is possible, but I find these restrictions to be hard to follow. As soon as you need to call a function, you now need to audit that function to determine that it only calls other async signal safe functions. When you come back to the code to fix a bug six months later, you need to remember these restrictions.
Do this when you must, but it is easy to screw up.
If there are specific issues with locks from dead child processes that deadlock, maybe the locks could be addressed specifically.
> How to use fork safely: 1. Only use fork to immediately call exec. 2. Fork a worker at the beginning of your program, before there can be other threads. 3. Only use fork in toy programs.
4. Stop writing broken multithreaded code altogether or if you must at least run an event loop per core/thread and use a wrapper for fork() to put the system into a fork()able state before forking. It's nice and reliable.
Multithreading by itself is just not a high-level concept to be used reliably by programs.
You are completely correct: it is in fact possible to use fork if you can carefully control the state of threads in your program. My point is that can be difficult in large software projects, particularly when random APIs like "get the system proxy settings" use threads without you having any ability to control them.
Fork/Exec is only good if you are going to run a different program then you are currently executing.
If you want a series of worker threads that are effectively the same as the master/initial thread. The proper way to do this clone(2) not fork, or fork/exec.
When used properly clone allows your group of threads to share a single PID, and TGID (Thread Group IDentifier). This cuts down on kernel resources your process(es) are using. They can natively share file descriptors and memory between each other (improving cache coherence). Also their signals are handled globally for all threads at once, not each thread managing it's own signals like Fork/Exec will result in.
$ man 2 clone
man: No entry for clone in the manual.
...And why would you want that? The advantage for fork(2) over clone is that you CAN'T share those datastructures, making it harder to write code resulting a deadlock: you have to share explicitely.
As for the kernel resources for the process, you'd be surprised how large RAM has gotten these days...
All your applications share the same virtual memory space. So you have a much higher hit rate in your TLB, then without. It greatly lowers the chance that shared memory will leave L3 cache.
Cache hit rates are very important...
Dead locking has nothing to do with how RAM is accessed or how virtual memory is partitioned, it has to do with how you are managing your locking. Modern memory/instruction re-ordering is stupidly fast. If you are locking memory/data structure locations you are likely doing something wrong. Concurrent Memory fences are around 3 orders of magnitude faster then locks.
TLB/L2/L3 cache is still premium real-estate. RAM size is awesome! I love waiting 100,000cycles to get the page I requested.
Cache hit rates aren't as important as you seem to think. If you're not google, and your app doesn't have hard/soft realtime constraints (videogames, flight control systems, etc.) your app will probably be fast enough. You'd be surprised how fast computers are nowadays...
Oh, concurrent memory fences! Well, that solves all my problems. Let's see. All I have to do is give up all hope of my code ever being portable to mutiple architectures, and than dig around in asm to add support for them to my environment, which can range from mildly annoying (C), to near-impossible (JVM). Or I can use fork(2), and do what programmers have been doing for decades: trading performance for simplicity, and productivity.
That'll be a really hard decision.
You know nothing about modern compiler atomics.
Post 2011 compilers (LLVM, GCC, MSVC, ICC) standardized generic memory fences for C++ and C. These are fully portable as the compiler itself determines how the layout of Acquire/Releases needs to be changed on platform you are compiling too.
Atomics are supported on ARM, x64, MIPS64, SPARC64, POWER8, POWER9...
So they're very portable. The compiler even manages removing unneeded fences. Like if you add an acquire fence after a CAS load. The CAS load is an acquire fence on x64, but not on POWER8.
Okay, what about HLLs? Memory fences don't work if you aren't writing C. And as I've said, forks are an easier way to get these guarantees, albeit at a moderate perf cost.
Seeing as this a conversation thread about raw system calls I assume we were working C not a HLL.
Are you just a professional contrarian?
Why are you concerned about system calls in a HLL?
Why are you concerned with concurrency guarantees in a HLL?
You care about none of these, you care about your run time. This is why you are working in a HLL.
No.
Because in many HLLs, fork(2) is the best (or sometimes the only) option for concurrency. As in C, it is certainly the simplest.
Many things written in HLLs use fork. Unicorn is perhaps the most famous example.
Because if I have multiple threads of execution, I want them to run in parallel if possible, and I want to minimize the risk of deadlocks. fork(2) does both.
No. I care about them, as demonstrated above. And I don't know what you mean by care about my runtime. I would go so far as to claim that the above is an insult to me and everybody else who uses an HLL.
I care about getting stuff done. If C is the right tool for that, I'll use it. If an HLL is the right tool, I'll use it. And if syscalls are the right tool for the job, I'll use them, too.
https://en.wikipedia.org/wiki/Thrashing_(computer_science)