Vaughan Reid's blog

Why concurrency with async beats parallelism for scalability

Two terms that are often used interchangeably are concurrency and parallelism. It doesn’t help that in English, doing something concurrently means that you are doing more than one thing at a time. In software its a little more complicated. A good explanation on the difference that I have read before is as follows:

Concurrency
  1. You go for a run (task 1)
  2. You stop running to tie your shoe laces (task 2)
  3. You carry on running (task 1)

You can do concurrency with a single thread. In the above example, task 2 can be executed on a different thread but it doesn’t have to.

Parallelism
  1. You go for a run (task 1)
  2. You also listen to music with your headphones (task 2)

You need separate threads for parallelism. They both run at the same time.

One more point that will be important later is that threads are expensive. For each thread that you need, you will make your application work harder to create and manage it. The more threads you use, the more context switching that will be needed. This will eventually start slowing down your application past certain number of active threads.


I’ve created a small project to show the different ways you can execute async/parallel code and to also investigate the performance implications of each when they are under load.

There is a simple service that will double/halve a number in a sync and async way. Its using a task delay to simulate some remote call. The Double calculation takes 2 seconds and the Halve takes 1 second.

public class CalculationService
{

	public int Double(int num)
	{
		Task.Delay(TimeSpan.FromSeconds(2)).Wait();

		return num * 2;
	}

	public int Halve(int num)
	{
		Task.Delay(TimeSpan.FromSeconds(1)).Wait();

		return num / 2;
	}

	public async Task<int> DoubleAsync(int num)
	{
		await Task.Delay(TimeSpan.FromSeconds(2));

		return num * 2;
	}

	public async Task<int> HalveAsync(int num)
	{
		await Task.Delay(TimeSpan.FromSeconds(1));

		return num / 2;
	}
}

I then exposed the following endpoints.


[Route("api/[controller]")]
[ApiController]
public class CalculationController : ControllerBase
{
	private readonly CalculationService calculationService;

	public CalculationController(CalculationService calculationService)
	{
		this.calculationService = calculationService;
	}

	[HttpGet]
	[Route("sync/{num}")]
	public IActionResult Sync(int num)
	{
		var dbl = calculationService.Double(num);

		var half = calculationService.Halve(num);

		return Ok(dbl + half);
	}

	[HttpGet]
	[Route("syncoverasync/{num}")]
	public IActionResult SyncOverAsync(int num)
	{
		var dbl = calculationService.DoubleAsync(num).Result;

		var half = calculationService.HalveAsync(num).Result;

		return Ok(dbl + half);
	}

	[HttpGet]
	[Route("parallel/{num}")]
	public IActionResult ParallelCalc(int num)
	{
		int dbl = 0;
		int half = 0;
		Parallel.Invoke(
			() => dbl = calculationService.Double(num),
			() => half = calculationService.Halve(num)
		);

		return Ok(dbl + half);
	}

	[HttpGet]
	[Route("async/{num}")]
	public async Task<IActionResult> Async(int num)
	{
		var dbl = await calculationService.DoubleAsync(num);
		var half = await calculationService.HalveAsync(num);

		return Ok(dbl + half);
	}

	[HttpGet]
	[Route("asyncwhenall/{num}")]
	public async Task<IActionResult> AsyncWhenAll(int num)
	{
		var dblTask = calculationService.DoubleAsync(num);
		var halfTask = calculationService.HalveAsync(num);

		var results = await Task.WhenAll(dblTask, halfTask);
		return Ok(results.Sum());
	}
}

These are the actions I will compare.

Sync

This is just a standard synchronous call.

SyncOverAsync

This is a synchronous call that is calling Wait on the async tasks. Quite often this is discouraged because it doesn’t take advantage of concurrency. It is pretty much like a sync call but it uses more resources. When async came out, this was pretty much how you had to do it for older applications because existing applications couldn’t just be rewritten.

ParallelCalc

This will run both calculations in parallel and use their results when they are done.

Async

This is using the async calls from an async controller. It looks very similar to the sync call.

AsyncWhenAll

This is a little advanced but this is deferring the await until they are all needed. A little more complicated to read and to manage but there are cases when its needed. You can think of it as concurrency with potentially a little bit of parallelism.


To run some load tests I will use a popular free performance testing tool called Jmeter. Its very powerful and easy to get started. I saved my test plan in the repo so you can just run it on your machine if you want.

For each endpoint I connected with 50 users each making 100 requests. These were the results.

Call Total Time (s) Median Response (s) Thoughput Threadpool Thread Count
Sync 279 3 14.8 53
SyncOverAsync 282 3 14.6 59
Parallel 204 2 20.1 108
Async 250 3 16.6 2
AsyncWhenAll 167 2 24.8 7

To be clear this could be very different on your machine but the results are still quite interesting. Thinking of the Sync call as the baseline, the SyncOverAsync is actually slower. It makes sense since its still using the same context but at the same time managing waiting on tasks. You can also see that the tasks are run sequentially because the median response is 3 seconds. ie: Wait 2 seconds for Double and then wait 1 second for Halve. The threadpool thread count also roughly matches the number of concurrent users which also matches our theory that its using a single thread at a time.

Comparing Parallel to the first two looks great if you just look at total time and median response. Two seconds median response shows that both requests are running at the same time. Something concerning is the threadpool thread count number. It seems to need 2 active threads per concurrent user. This will become a problem as there are more users.

Async is quicker than the first two but slower than parallel. The reason for this is that its waiting for the first request with await before it makes the second request. What is really interesting is that with 50 concurrent users, it seems to be able to handle them with only a couple of threads.

AsyncWhenAll is the quickest by far. The median response time shows that its not waiting for the first response before starting the second and the threadpool thread count shows that it can do all of this with a couple of threads.

To really show the difference in how these work at scale I reran but with 150 concurrent users making 50 requests.

150 users and 50 requests

Call Total Time (s) Median Response (s) Thoughput Threadpool Thread Count
Parallel 292 2 23.1 275
Async 150 3 16.6 2
AsyncWhenAll 101 2 74.1 5

I excluded the first two since at this point its really a race between async and parallel. You can now see that even though async doesn’t run anything in parallel, it starts outperforming the parallel calls. The parallel load test actually took most of the time trying to scale up the thread pool counts since at a threadpool limit, the framework will delay before adding new threads.

Whats really impressive is how little resources the async calls are making even at this load. Scaling your application is all about making the most use of the resources that are available. A single call being faster means nothing if each concurrent user adds new resources.

Just for fun I’m going to see how much more the AsyncWhenAll can handle and up the load test to using 2000 concurrent users each making 50 requests.

Call Total Time (s) Median Response (s) Thoughput Threadpool Thread Count
AsyncWhenAll 108 2 925.1 10

This is really impressive even though admittedly not many production applications will just do task delays. Thinking about the results we saw in the previous tests, the parallel version would need 4000 threads to handle 2000 users.

The key take home of this is that in order to scale up your application you need to factor in both speed of response and how hard your application needs to work to return that response. What makes async really powerful is that it releases the thread on await to do other work. This allows much more work to be done at the same time and hence it allows your application to scale across more users.