Tech tutorials Fundamentals: Entity Framework
By Insight Editor / 10 Jan 2017 , Updated on 16 May 2019 / Topics: Application development
By Insight Editor / 10 Jan 2017 , Updated on 16 May 2019 / Topics: Application development
Microsoft's Entity Framework (EF) is a powerful Object-Relational Mapper (ORM) that boosts developer productivity by automating much of the work associated with designing and interfacing with an application database. Although the framework is simple at first glance, no shortage of blog posts have been dedicated to describing its correct use — and for good reason.
Like any complex framework, EF can introduce difficult-to-diagnose bugs and performance problems due to its misuse, particularly as projects grow — either in scope or in scale. This article will discuss four guidelines for using EF effectively.
Lazy loading is a feature in Entity Framework that allows you to defer loading of connected entities (via foreign keys) until they’re accessed for the first time. There are two main requirements for lazy loading of a navigation property to work:
The following is an example of a DbContext with lazy-load-enabled navigation properties:
public class OrganizationDbContext : DbContext
{
public OrganizationDbContext()
: base("name=OrganizationDbContext")
{
//Uncomment this line to disable lazy loading
//Configuration.LazyLoadingEnabled = false;
}
public virtual DbSet<Organization> Organizations { get; set; }
public virtual DbSet<Department> Departments { get; set; }
public virtual DbSet<Employee> Employees { get; set; }
public virtual DbSet<Log> Logs { get; set; }
}
public class Organization
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Department> Departments { get; set; } //navigation property (note the 'virtual')
}
public class Department
{
public int Id { get; set; }
public string Name { get; set; }
[ForeignKey("Organization")]
public int OrganizationId { get; set; }
public virtual Organization Organization { get; set; } //navigation property (note the 'virtual')
public virtual ICollection<Employee> Employees { get; set; } //navigation property (note the 'virtual')
}
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
public bool Salaried { get; set; }
[ForeignKey("Department")]
public int DepartmentId { get; set; }
public virtual Department Department { get; set; } //navigation property (note the 'virtual')
}
public class Log
{
public int Id { get; set; }
public string Severity { get; set; }
public string Message { get; set; }
}
And here’s an example of lazy loading in action:
public class OrganizationRepository
{
private readonly OrganizationDbContext _context;
public OrganizationRepository(OrganizationDbContext context)
{
this._context = context;
}
public string GetDepartmentNameForEmployee(int id)
{
Employee emp = _context.Employees.Find(id); //SQL query executed here
return emp.Department.Name; //SQL query also executed here since 'Department' was accessed
}
}
In some scenarios, this feature grants our code a performance boost since it’s possible to conditionally avoid loading an entity’s contents from the database:
public class OrganizationRepository
{
private readonly OrganizationDbContext _context;
public OrganizationRepository(OrganizationDbContext context)
{
this._context = context;
}
public string GetDeparmentNameForSalariedEmployee(int id)
{
Employee emp = _context.Employees.Find(id);//1. SQL query executed here
if (emp.Salaried)
return emp.Department.Name;//2. SQL query executed here since 'Department' was accessed
else
throw new InvalidOperationException("Employee is not salaried");//query '2' above was skipped (hooray!)
}
}
However, incorrect use of this feature has serious performance consequences. The following code, while innocent at first glance, will actually execute a SQL query for each salaried employee in the database (plus the first query) — the infamous ‘N+1 Query’ problem. As the number of items in ‘salariedEmployees’ grows, the number of round trips to the database increases.
public class OrganizationRepository
{
private readonly OrganizationDbContext _context;
public OrganizationRepository(OrganizationDbContext context)
{
this._context = context;
}
public List<string> GetDepartmentsWithSalariedEmployees()
{
//get all salaried employees (1 query)
List<Employee> salariedEmployees = _context.Employees.Where(x => x.Salaried).ToList();
//return department names for these employees (1 query for each 'Department' access)
return salariedEmployees.Select(x => x.Department.Name).Distinct().ToList();
}
}
There are other problems, too. If a repository returns an entity that happens to outlive the DbContext from which it came, it’s possible that accessing a navigation property will throw a runtime exception. Usually this isn’t a problem in web applications since the lifetime of the DbContext is tied to the lifetime of the web request (using your favorite Dependency Injection (DI) container). However, in other types of applications (services, etc.), this can become a problem.
public static void Main()
{
Employee emp;
using(OrganizationDbContext context = new OrganizationDbContext())
{
OrganizationRepository respository = new OrganizationRepository(context);
emp = respository.GetEmployee(1);
}
var department = emp.Department; //explosion -- context is already disposed
}
Lastly, if an entity is eventually serialized as part of an API response, lazy loading will cause every property to be sequentially loaded from the database as part of the serialization (since serialization must touch every property). Usually this will lead to runtime errors since entities typically have circular references; in other cases, you’ll be left with performance problems down the road.
Note: As pointed out by my colleague, serialization of Common Language Runtime (CLR) objects with circular references can be accomplished using additional third-party libraries.
The above examples can all be written with lazy loading disabled. Here's what that looks like:
public class OrganizationRepositoryNonLazy
{
private readonly OrganizationDbContext _context;
public OrganizationRepositoryNonLazy(OrganizationDbContext context)
{
this._context = context;
}
public string GetDeparmentNameForSalariedEmployee(int id)
{
Employee emp = _context.Employees
.Include(x => x.Department) //Make sure EF brings back department as part of this query
.SingleOrDefault(x => x.Id == id);
if (emp.Salaried)
//lazy loading disabled so no nav-property query here
//Department is not null since we 'Include'd it above
return emp.Department.Name;
else
throw new InvalidOperationException("Employee is not salaried");
}
public string GetDepartmentNameForEmployee(int id)
{
Employee emp = _context.Employees
.Include(x => x.Department) //Make sure EF brings back department as part of this query
.SingleOrDefault(x => x.Id == id);
//lazy loading disabled so no nav-property query here
//Department is not null since we 'Include'd it above
return emp.Department.Name;
}
public List<string> GetDepartmentsWithSalariedEmployees()
{
//get all salaried employees (1 query)
List<Employee> salariedEmployees =
_context.Employees
.Include(x=>x.Department)//Make sure EF brings back department for each employee
.Where(x => x.Salaried).ToList();
//lazy loading disabled so no nav-property query here
//Department is not null since we 'Include'd it above
return salariedEmployees.Select(x => x.Department.Name).Distinct().ToList();
}
}
A lot of debate surrounds the usefulness of lazy loading. For most code, the additional cost incurred by disabling lazy loading is simply a few calls to ‘Include’ (as seen above). However, some applications depend heavily on this functionality. In one design pattern, the repository returns entities directly, and the calling code accesses the navigation properties to retrieve what’s needed from the database. In my opinion, this defeats the point of using a relational database management system since the join operator is completely avoided, and all data access is boiled down to single-table queries.
In addition, this pattern causes code that has nothing to do with the data access layer to directly cause database round trips. This kind of code is also (once again) less performant than loading the needed information upfront. Given the subtlety of the problems introduced by lazy loading and the limited upside, it’s best to disable it.
Using the same repository above, consider these two repository methods that return a Data Transfer Object (DTO) representation of an organization:
public class OrganizationSummary
{
public string Name { get; set; }
public List<DepartmentSummary> Departments { get; set; }
}
public class DepartmentSummary
{
public string Name { get; set; }
public int EmployeeCount { get; set; }
}
public class OrganizationRepository
{
private readonly OrganizationDbContext _context;
public OrganizationRepository(OrganizationDbContext context)
{
this._context = context;
}
//mapped after the fact
public OrganizationSummary GetOrganizationSummary(int id)
{
//note: the 'Include' line below is VERY important
//if lazy loading is on, 'Include' forces eager loading of Departments and Employees
//if lazy loading is off, Departments and Employees will not load without 'Include'
var org = _context.Organizations
.Include(x => x.Departments.SelectMany(y => y.Employees))
.Where(x => x.Id == id).SingleOrDefault();
return new OrganizationSummary()
{
Name = org.Name,
Departments = org.Departments.Select(y => new DepartmentSummary()
{
Name = y.Name,
EmployeeCount = y.Employees.Count //note that all employees are loaded into memory
}).ToList()
};
}
//using projection
public OrganizationSummary GetOrganizationSummary2(int id)
{
return _context.Organizations.Where(x => x.Id == id)
.Select(x => new OrganizationSummary()
{
Name = x.Name,
Departments = x.Departments.Select(y => new DepartmentSummary()
{
Name = y.Name,
EmployeeCount = y.Employees.Count //here, Employees do *not* get loaded into memory.
}).ToList()
}).SingleOrDefault();
}
}
The first loads the entity (as well as everything else we need via ‘Include’) in the first step, and then maps the result to the DTO, which is then returned. The second uses projection to map the query directly to the object.
There are two advantages to the projection approach. First, EF is smart enough to notice we’re using only the count of the ‘Employees’ collection in the projection (rather than the employees themselves). The generated SQL will not bring back properties for employees since we don’t need them. The performance gain due to this fact can be enormous — especially if departments have many employees. In contrast, the first technique loads all of the employees into memory (and, in fact, there’s no way to avoid doing so).
Secondly, the second version is easier to maintain. Suppose a new navigation property is added to department, and a corresponding property is added to DepartmentSummary. In the second version of this function, the projection simply needs to be modified to set the DTO’s property. In the first version, doing the same will result in a null reference exception (or an N+1 situation if lazy loading is enabled) if the programmer forgets to ‘Include’ the related entity.
The management of DbContext in the context of business transactions is a difficult topic. Mehdi El Gueddari has written the best post I've seen on the subject; I won't duplicate his efforts, but I'll try to summarize the important points below and present a way to avoid most of the mess.
We're assuming a typical architecture in the following discussion, where the DI container has set DbContext's lifetime to 'InstancePerRequest.' Note that in this architecture, the DbContext associated with the web request is injected into both Repository1 and Repository2.
In nonweb projects, such as services or console apps, there’s no web request — so configuring your DI to use 'InstancePerRequest' for your DbContexts will throw an exception. As a result, we must configure DbContext's lifetime to be singleton or transient. Singleton won't work since DbContext isn't thread-safe and the cache will quickly go stale (it's generally advised that DbContext's lifetime be kept short).
Configuring DbContext to be transient, or instance-per-dependency, will cause Repository1 and Repository2 to receive different instances of DbContext. This can cause all sorts of issues:
Your service classes shouldn't require a web context to function correctly. If a standalone Windows service or scheduled task eventually makes its way into the project, you’ll want to reuse the data access layer, but ignoring the above points will keep you from doing so. Even if you’re sure your project will never require a nonweb component, there are still problems.
In this scenario, all repositories work on the same instance of DbContext — the one associated with the web request. There are two main issues in this scenario:
For the first point above, the first solution you may consider involves having each repository SaveChanges after each of its public methods. The problem is that doing so prevents you from composing two or more repository methods in a single service call without losing atomicity (since SaveChanges is called more than once).
This greatly reduces the reusability of the repository's code. In addition, each repository shares the same DbContext, so calling SaveChanges in one repository can (rather unexpectedly) save changes made by another.
The second solution involves injecting the DbContext into the service class and calling SaveChanges from there instead of inside the repository. Although this solution allows you to compose repository methods while maintaining atomicity, it's still unsatisfactory from a clarity standpoint — the service object is calling SaveChanges on an object it didn't seem to make changes to.
Note that this solution makes the situation much, much worse in the nonweb context (where each service/repository gets its own instance of DbContext). In that context, the repositories aren't calling SaveChanges at all, and the service class is saving an empty set of changes.
For the second point above, suppose the controller calls service method A() that makes some database changes, and a runtime exception is thrown after DbContext has been modified but before SaveChanges is called — presumably, A() failed to update the database.
The controller catches and logs the exception, and calls a different service method B(). Once B() calls SaveChanges, A()'s partially complete changes are incorrectly committed to the database.
This bug is a consequence of the service class reusing the same instance of DbContext to perform different operations and is very difficult to detect — it only occurs in exceptional cases. In all other cases, A() is atomic.
The following points need to be addressed:
I've included an example architecture that accomplishes all of the above:
public class OrganizationSummary
public class ServiceClass
{
private readonly OrganizationDbContextFactory _contextFactory;
private readonly OrganizationRepositoryFactory _orgFactory;
private readonly LogRepositoryFactory _logFactory;
//factories for the DbContext and each needed repository.
//This enables us to explicitly control creation of a per-operation context that's used by all repositories
//without relying on 'Singleton' or 'Instance per Request'
public ServiceClass(OrganizationDbContextFactory contextFactory, OrganizationRepositoryFactory orgFactory, LogRepositoryFactory logFactory)
{
_contextFactory = contextFactory;
_orgFactory = orgFactory;
_logFactory = logFactory;
}
public void Save()
{
using(var context = _contextFactory.Create())
{
//each repository works on the same context, and only within this block
var logRepo = _logFactory.Create(context);
var orgRepo = _orgFactory.Create(context);
//neither of these calls SaveChanges, so they're safe to compose
orgRepo.SetEmployeeSalaried(1);
logRepo.LogMessage("INFO", "Employee 1 is now salaried");
//one and only one call to SaveChanges
//note that if an exception is thrown in this block,
//SaveChanges is not called and the DbContext is disposed, safely rejecting the in-progress changes
context.SaveChanges();
}
}
}
public class OrganizationRepository : IOrganizationRepository
{
private readonly OrganizationDbContext _context;
private readonly OtherDependencies _dependencies;
public OrganizationRepository(OrganizationDbContext context, OtherDependencies dependencies)
{
this._context = context;
this._dependencies = dependencies;
}
//note there is *no* call to SaveChanges, so this method is composable with others
public void SetEmployeeSalaried(int id)
{
var employee = _context.Employees.Find(id);
employee.Salaried = true;
}
}
public class LogRepository : ILogRepository
{
private readonly OrganizationDbContext _context;
private readonly OtherDependencies _dependencies;
public LogRepository(OrganizationDbContext context, OtherDependencies dependencies)
{
this._context = context;
this._dependencies = dependencies;
}
//note there is *no* call to SaveChanges, so this method is composable with others
public void LogMessage(string severity, string message)
{
Log log = new Log();
log.Severity = severity;
log.Message = message;
_context.Logs.Add(log);
}
}
//factories for mockability, partial injection into repositories
public class OrganizationDbContextFactory : IOrganizationDbContextFactory
{
public IOrganizationDbContext Create()
{
return new OrganizationDbContext();
}
}
public class OrganizationRepositoryFactory : IOrganizationRepositoryFactory
{
private readonly OtherDependencies _otherDependencies;
public OrganizationRepositoryFactory(OtherDependencies otherDependencies)
{
_otherDependencies = otherDependencies;
}
public IOrganizationRepository Create(OrganizationDbContext context)
{
return new OrganizationRepository(context, _otherDependencies);
}
}
public class LogRepositoryFactory : ILogRepositoryFactory
{
private readonly OtherDependencies _otherDependencies;
public LogRepositoryFactory(OtherDependencies otherDependencies)
{
_otherDependencies = otherDependencies;
}
public ILogRepository Create(OrganizationDbContext context)
{
return new LogRepository(context, _otherDependencies);
}
}
The service class creates the DbContext on a per-operation basis using a mockable DbContextFactory. It then passes the DbContext through the (also mockable) repository factories in order to create repositories that will work on that DbContext.
Since the service class explicitly controls the DbContext, it can take responsibility for calling SaveChanges once and only once, satisfying the atomicity requirement above. This also means the repositories don't have to call SaveChanges, allowing us to compose their methods as needed.
Each DbContext is local to its own method, resulting in no partial update bugs. All dependencies can be set to either 'Transient' or 'Singleton' in this code (doesn't matter), so it will work in both contexts, and there aren't any hard-coded dependencies.
Although this approach satisfies all of our objectives above, it isn't perfect. As it currently stands, there’s no way to combine multiple service methods into a single transaction since the creation of DbContext prompts the opening of a connection.
In addition, since we're explicitly disposing DbContext as soon as the service method has completed, the service method shouldn't return any entities if lazy loading is enabled. Trying to access a navigation property while the DbContext is disposed will throw a runtime exception (as discussed earlier).
Prior to EF 6, the only way to achieve atomicity across multiple calls to ‘SaveChanges’ was using TransactionScope:
public class OrganizationRepository
{
private readonly OrganizationDbContext _context;
public OrganizationRepository(OrganizationDbContext context)
{
this._context = context;
}
public void DoSomething()
{
//the following two calls to SaveChanges are wrapped in a transaction
using(TransactionScope scope = new TransactionScope())
{
_context.Employees.Find(1).Salaried = true;
_context.SaveChanges();
_context.Employees.Find(2).Salaried = false;
_context.SaveChanges();
scope.Complete();
}
}
}
This approach has a few problems. By design, TransactionScope can detect when a local transaction isn’t sufficient for atomicity and automatically escalates the transaction to a ‘distributed transaction.’ A distributed transaction is designed to span more than one machine and is very expensive. Unfortunately, it’s surprisingly easy to inadvertently write code that triggers this escalation.
In SQL Server 2005, very simple code (Update 4 in the link) can cause TransactionScope to escalate from a local transaction to a distributed transaction. This has been fixed in SQL Server 2008 and above.
Any time you have more than one DbContext (more than one database), and you accidentally update both within the same TransactionScope, you get an escalation — even if this wasn't explicitly your intention. I've done this personally when adding logging to an external database to existing service methods.
Since distributed transactions are uncommon, it's better to use DbContext.Database.BeginTransaction() (example below) which always creates a lightweight local transaction. Only switch to TransactionScope if you’re absolutely sure you need a distributed transaction.
public class OrganizationRepository
{
private readonly OrganizationDbContext _context;
public OrganizationRepository(OrganizationDbContext context)
{
this._context = context;
}
public void DoSomething()
{
//the following two calls to SaveChanges are wrapped in a transaction
using (var trans = _context.Database.BeginTransaction())
{
_context.Employees.Find(1).Salaried = true;
_context.SaveChanges();
_context.Employees.Find(2).Salaried = false;
_context.SaveChanges();
trans.Commit();
}
}
}
TransactionScopeAsyncFlowOption.Enabled isn't set, whereas BeginTransaction() doesn't have this problem.
When used improperly, Entity Framework can cause subtle defects that aren't detected until late in the product's lifecycle. Since many apps' central concern is data, it pays to carefully architect the code surrounding your data to be as reliable, reusable and performant as possible. With these guidelines in mind, you'll avoid many of the traps of Entity Framework while still leveraging its full potential.
Big thanks to Tom Straub, Jonathan Gardner and Adam Howard for lending their guidance and experience for this post.