I’ve been spending a little time building out a serverless web application as a small holiday project and as this is just a side project I’ve taken the opportunity to try out the new .NET Core based v2 runtime for Azure Functions and the new tooling and support in Visual Studio 2017.
As soon as I had an end to end vertical slice I wanted to run some load tests to ensure it would scale up reliably – the short version is that it didn’t. The .NET Core v2 runtime is still in preview (and you are warned not to use this environment for production workloads due to potential breaking changes) so you would hope that this will get fixed by general release. For now though, there seem to be some serious shortcomings in the scalability and performance of this environment, rendering it fairly unusable.
I used the VSTS load testing system to hit a single URL initially with a high volume of users for a few minutes. In isolation, i.e. if I run it from a browser with no activity, this function runs in less than 100ms and normally around the 70ms mark, however, as the number of users increases performance quickly takes a serious nosedive with requests taking seconds to return, as can be seen below:
After things settled down a little (hitting a system like this from cold with a high concurrency is going to cause some chop while things scale out) average request time began to range in the 3 to 9 seconds and the anecdotal experience (me running it in a browser / PostMan while the test was going on) gave me a highly variable performance. Some requests would take just a few hundred milliseconds while others would take over 20 seconds.
Worryingly no matter how long the test was run this never improved.
I began by looking at my code, assuming I’d made a silly mistake, but I couldn’t see anything and so boiled things down to a really simple test case, essentially the one that is created for you by the Visual Studio template:
public static IActionResult Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)]HttpRequest req, TraceWriter log)
log.Info("C# HTTP trigger function processed a request.");
var result = new OkObjectResult("hello world");
I expected this to scale and perform much better as it was so simple: return a hard-coded string. However, to my surprise this exhibited very similar issues:
The response time to return a string hovered around the 7 second mark and the system never scaled sufficiently to deal with a small percentage of failures, due to the volume.
Having run a fair few tests and racking up a lot of billable virtual user minutes on my credit card I tweaked the test slightly at this point, moving to a 5 minute test length with step up concurrent user growth. Running this on the same simple test gave me, again, poor results with average response times of between 1.5 and 2 seconds for 100 concurrent users and a function that is as close to doing nothing as it gets. The response time is hidden by the page time in the performance chart below, it tracks almost exactly. The step up of users to a low volume eliminates the errors, as you’d expect.
What these graphs don’t show are variance around this average response time which still ranged from a few hundred milliseconds up to around 15 seconds.
At this point I was beginning to suspect the Functions 2.0 preview runtime might be the issue and so created myself a standard Functions 1.0 runtime and deployed this simple function as a CSX script:
public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
var response = req.CreateResponse();
response.StatusCode = HttpStatusCode.OK;
response.Content = new StringContent("hello world", System.Text.Encoding.UTF8, "text/plain");
Running the same ramp up test as above shows that this function behaves much more as you’d expect with average response times in the 300ms to 400ms range when running at 100 concurrent users:
Intrigued I did run a short 5 minute, 400 concurrent user test with no ramp up. Again, the csx based function behaved much more in line with what I think are reasonable expectations, taking a short time to scale up to deal with the sudden demand, without generating errors and eventually settling down to a response time similar to the test above:
Finally, I deployed a .NET 4.6 based function into a new 1.0 runtime Function app. I made a slight mistake when setting up this test and ramped it up to 200 users rather than 100 but it scales much more as you’d expect and holds a steady response time of around 150ms. Interestingly this gives longer response times than .NET Core for single requests run in isolation around 170ms for .NET 4.6 vs. 70ms for .NET Core.
At this point I felt confident that the issue I was seeing in my application was due to the v2 Function runtime and so made a quick change to target .NET 4.6 instead, spinning up a new v1 runtime and running my initial 400 concurrent user test again:
As the system scales up, giving no errors, this test eventually settles at around the 500ms average request per second mark, which is something I can move ahead with. I’d like to get it closer to 150ms and it will be interesting to see what I can tweak to achieve this on the consumption plan. However, I think I’m already starting to bump up against some of the other limits with Functions. Ironically, resolving that involves taking advantage of what is going on with the Functions runtime implementation and accepting that it’s a somewhat flawed serverless implementation as it stands today.
As a more general conclusion the only real takeaway I have from the above, beyond the general point that it’s always worth doing some basic load testing even on what you assume to be simple code, is that the Azure Function 2.0 runtime has some way to go before it comes out of Preview. What’s running in Azure currently is suitable only for the most trivial of workloads – I wouldn’t feel able to run this even in a beta system today.
Something else I’d like to see from Azure Functions is a more aggressive approach to scaling up/out. For spiky workloads where low latency is important, there is a significant drag factor at the moment. While you can run on an App Service Plan and handle the scaling yourself this kind of flies in the face of the core value proposition of serverless computing – I’m back to renting servers. A reserved throughput or Premium Consumption offering might make more sense.
I do plan on running these tests again once the runtime moves out of preview – I’m confident the issue will be fixed, after all to be usable as a service it basically has to be.