Preemptive multi-tasking. We all kinda assume it's wonderful, but in the last 1/2 year or so, I've started wonder about what the tradeoffs are.
Let's design a hypothetical batch system. Assume you have one master task scheduler that's responsible for dispatching large numbers of of tasks. The system you are running on is multi-core so you definitely want threads of some sort to take advantage of the cores. Each task involves some amount of IO and some amount of processing. What kind of threading paradigm would be good for this situation? Here are a couple of possible solutions:
- Create one thread for each work item as it comes in and set it to run.
- Create a thread pool where num_threads = num_cpus+k and dispatch against the pool.
- Create a cooperative thread system that is servied by num_threads = num_cpus
Each of these systems has its ups and downs.
The first one's strength is in its simplicity. It works okay if tasks come in at a relatively constant rate that's lower than the completion rate. Teleco switches using erlang basically use this model to route SMS messages and get great throughput + uptime. This system does not do so hot if the tasks generation is bursty where the bursts often surpass the rate at which the system can complete a task. In this situation, depending on the resource allocation algorithm, you can get thrashing and the average throughput goes down. Another thing that can cause this is if each task has a different cpu/IO pattern. In that case, even if the task creation rate is constant, it's possible that you will get "beating" effects if too many tasks are dormant at once, but then suddenly wake up together because of IO or similar. This can again cause thrashing. Thrashing because of bursty IO and because of uneven task scheduling can often be seen in our operating system. Take each process to the the thread analogy and think of what happens when you get a bunch of torrents suddenly waking up, or when you open up 5 programs at once.
The second strategy with thread pools largely prevents the trashing described above. (Note: when I say threadpool here, I expect there to be N threads that will be leased out to a task until the task completes. While the task can't completely, no one else may use this thread). This system is also fairly simple (at least for the one using the pool) and is used quite often in network servers (eg. webservers and RPC servers). The queuing that happens because only num_threads tasks are dispatched at once smooths out effects from bursty task generation. This model works great in many situations. The down side is that if your tasks are mostly IO and are of high latency, then the threadpool becomes very inefficient because it limits the parallelization of tasks. It can really only buffer k=num_threads-num_cpu IO tasks before it starts underutilizing processors. You can increase the throughput by increasing k, but the large k is, the more change you have of causing periodic thrashing due to beating effects in the IO responses.
The third strategy avoids most of the hazards above. Conceptually, you create a set of worker threads equal t the number of CPUs. However, for each task, you structure it so that when the task begins to wait on IO, it can put itself back into the queue and free up the thread to service a different task. Since num_threads = num_cpus, preemption becomes irrelevant and you've really just got a cooperative threading model that has M user threads serviced by N kernel threads. Because the system is not preemptive, you can queue up as many tasks as you want, have them all become "scheduleable" at once, but still not thrash as context switches will really only happen when a task can no longer make good user of the thread/CPU servicing it. This system is resilient to burst task creation, uneven IO latencies, and widely varying task lengths; in all these situations, the CPU should be doing good work most of the time. The downside however is latency. In this setup, if one were to create N=num_threads, long-running, CPU-only tasks, nothing else would get serviced until those tasks finished. This isn't acceptable for many systems. However, for a batch system executing background tasks (eg., continuous builds, or periodic report generation), it might be a good fit.
Basic conclusion: Preemptive multi-threading is good for low latency, but is susceptible to thrashing. Threadpools can help control that, but may reduce throughput due to its artificial limits. To tune the size well, you want each task to have a fairly regular execution length and pattern. Cooperative threading is robust to thrashing and high variances in task creation/dispatch time. However, it doesn't give much any latency guarantees.