GraphQL with .NET Core (Part - XI: Data Loader)
Code samples used in this blog series have been updated to latest version of .NET Core (5.0.4) and GraphQL-Dotnet (4.2.0). Follow this link to get the updated samples.
With the updated version (4.2.0) of GraphQL-Dotnet, the DataLoader feature is no longer a part of the core library. It's shipped as a stand-alone NuGet package,
Install-Package GraphQL.DataLoader -Version 4.2.0
Our GraphQL
queries are not quite optimized. Take the Orders
query from CustomerType
for example,
Here, we are getting all the orders from the data store. This is all fun and games till you stay in the scaler zone of OrderType
i.e. only querying the scaler properties of OrderType
. But what happens when you query for one of the navigational property. For example, code in the OrderType
is as following,
public OrderType(IRepository repository)
{
Field(o => o.Tag);
Field(o => o.CreatedAt);
FieldAsync<CustomerType, Customer>("customer",
resolve: ctx =>
{
return repository.GetCustomerById(ctx.Source.CustomerId);
});
}
So, when you try to access the Customer
field, practically you are initiating a separate request to your repository to load the related customer for a particular order.
If you are using the dotnet-cli
, you can actually see all the EF query logs in the console for a query such as,
query GetOrders {
orders {
tag
createdAt
customer {
name
billingAddress
}
}
}
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (26ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [o].[OrderId], [o].[CreatedAt], [o].[CustomerId], [o].[Tag]
FROM [Orders] AS [o]
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (23ms) [Parameters=[@__p_0='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
SELECT TOP(1) [c].[CustomerId], [c].[BillingAddress], [c].[Name]
FROM [Customers] AS [c]
WHERE [c].[CustomerId] = @__p_0
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (1ms) [Parameters=[@__p_0='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
SELECT TOP(1) [c].[CustomerId], [c].[BillingAddress], [c].[Name]
FROM [Customers] AS [c]
WHERE [c].[CustomerId] = @__p_0
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (1ms) [Parameters=[@__p_0='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
SELECT TOP(1) [c].[CustomerId], [c].[BillingAddress], [c].[Name]
FROM [Customers] AS [c]
WHERE [c].[CustomerId] = @__p_0
The logs very well suggest that; first, we are querying for all the orders and then for each order, we are querying for the customer as well. Here, for 3
customers we have 3 + 1 = 4
queries (total 4 hits on the database). Now, do your math and figure out how many times we will hit the database if we have N numbers of customers. Well, we will have a total N + 1
queries hence, the problem is named N + 1
problem.
To overcome this problem, we introduce DataLoader
in our solution. DataLoader
adds support for batching and caching in your GraphQL
queries.
Adding support for DataLoader
needs some configurations up front. Register the IDataLoaderContextAccessor
and DataLoaderDocumentListener
with a singleton lifetime in your ConfigureServices
method,
services.AddSingleton<IDataLoaderContextAccessor, DataLoaderContextAccessor>();
services.AddSingleton<DataLoaderDocumentListener>();
IDataLoaderContextAccessor
will be injected later in the constructors of graph types where data loader is needed. But first, in the middleware; we have to add the DataLoaderDocumentListener
to the list of listeners of IDocumentExecutor
's ExecutionOptions
.
public async Task InvokeAsync(HttpContext httpContext, ISchema schema, IServiceProvider serviceProvider)
{
if (httpContext.Request.Path.StartsWithSegments(_options.EndPoint) && string.Equals(httpContext.Request.Method, "POST", StringComparison.OrdinalIgnoreCase))
{
var request = await JsonSerializer
.DeserializeAsync<GraphQLRequest>(
httpContext.Request.Body,
new JsonSerializerOptions
{
PropertyNameCaseInsensitive = true
});
var result = await _executor
.ExecuteAsync(doc =>
{
doc.Schema = schema;
doc.Query = request.Query;
doc.Inputs = request.Variables.ToInputs();
doc.Listeners.Add(serviceProvider.GetRequiredService<DataLoaderDocumentListener>());
}).ConfigureAwait(false);
httpContext.Response.ContentType = "application/json";
httpContext.Response.StatusCode = 200;
await _writer.WriteAsync(httpContext.Response.Body, result);
}
else
{
await _next(httpContext);
}
}
Next, add a new method to your repository which takes a list of customer ids and returns a dictionary of customers with their ids as keys.
You can replace the Customer
field with the following,
Idea behind GetOrAddBatchLoader
is that it waits until all the customer ids are queued. Then it fires of the GetCustomersByIdAsync
method only when all the ids are collected. Once the dictionary of customers is returned with the passed in ids; a customer that belongs to a particular order is returned from the field with some internal object mapping. Remember, this technique of queueing up ids is called batching. We will always have a single request to load related customers for orders no matter what i.e. we will at most have 2 requests.
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (26ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [o].[OrderId], [o].[CreatedAt], [o].[CustomerId], [o].[Tag]
FROM [Orders] AS [o]
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (5ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [c].[CustomerId], [c].[BillingAddress], [c].[Name]
FROM [Customers] AS [c]
WHERE [c].[CustomerId] IN (1, 2, 3)
Notice the second query. See how it queries for all the customers with the incoming ids.
Similarly, for a collection navigation property, you have GetOrAddCollectionBatchLoader
. Take the Orders
field of the CustomerType
for example. You add a new repository method as following,
public async Task<ILookup<int, Order>> GetOrdersByCustomerId(IEnumerable<int> customerIds, CancellationToken cancellationToken)
{
var orders = await _applicationDbContext.Orders.Where(i => customerIds.Contains(i.CustomerId)).ToListAsync();
return orders.ToLookup(i => i.CustomerId);
}
Notice, here we are returning an ILookup
data structure instead of a dictionary. The only difference between them is ILookup
can have multiple values against a single key whereas for the dictionary; a single key belongs to a single value.
Modify the Orders
value inside the CustomerType
as following,
GetOrAddCollectionBatchLoader
and GetOrAddBatchLoader
both caches the values of the field for the lifetime of a GraphQL
query. If you only want to use the caching feature and ignore batching, you can simply use the GetOrAddLoader
.
Caching is good for fields you request too frequently. So, you can add caching in your Items
field of the GameStoreQuery
as following,
Using data loader can resolve the issue of parallel query execution we tackled in the last post. We can go with the default DocumentExecuter
instead of implementing of our own SerialDocumentExecuter