A simple architecture to manage data seeds with Entity Framework Code First 5.0


Entity Framework Code First has introduced many included features that most team projects had to build manually, such as code and database schema synchronisation during the developement lifecycle using initializers (eg. DbMigrations) and data seeding. To use the later, the developer simply has to override the Seed Method of the initializer he has chosen to use. But the developer (or the architect) still has the responsibility to ensure that the seed code is well made, and easily maintainable. With a number of EF entities of dozens or even hundreds of table on most medium sized projects, the developer will have to think of an architecture that can divide the complexity of seeding the whole database. This article will focus on one architecture that can handle the complexity of the seed process without being too hard to use by the developer. As often, simpler is better.

One simple way to go where most teams comes naturally is to divide the seeding process into many classes, each of them being related to one type of entity, or one functional set of entities. In this topic we will focus on a really simple system that will be responsible to manage products, invoices and users. Each invoice is related to a single product (to keep it simple), and an invoice is made by a user. Our seeding classes could be composed of at least three classes: ProductSeed, InvoiceSeed and UserSeed. One class will then be responsible to execute each seeds classes in the right order : GlobalSeeder. Not all order are correct. Invoices cannot be seeded before products and users are in the database.

In our example, suitable orders are :

  • ProductSeed, UserSeed, InvoiceSeed
  • UserSeed, ProductSeed, InvoiceSeed

If the developer wants to code GlobalSeeder manually, he will have to determine the correct order to use. In such a simple system, the order can easily be found. But as the system will start to grow in size, this order can become extremely difficult to find and to maintain.

One way to go to handle this difficulty is to go for an automated process. In our architecture, we will annotate seeds classes to explicitly says that the currently annotated seed class depends on zero, one or many other seeds classes. In our example, ProductSeed and UserSeed depends on nothing, but InvoiceSeed depends on both. It is only when all the dependant seeds are made the the current seed can be made. The GlobalSeeder is now only responsible of finding all seeds, and execute them in the correct order depending on those attributes.

First, we will need an interface for all the seed classes. One could be :

    interface ISeed
    {
        void SeedData(OurEFContext context);
    }

We will then create a new attribute that can be applied to classes to specify the dependance relationship.

    [AttributeUsage(AttributeTargets.Class, Inherited = false, AllowMultiple = true)]
    sealed class DependsOnAttribute : Attribute
    {
        public Type DependingType { get; private set; }

        public DependsOnAttribute(Type dependingType)
        {
            if (!(typeof(ISeed).IsAssignableFrom(dependingType)))
                throw new ArgumentException("dependingType should implement ISeed", "dependingType");

            this.DependingType = dependingType;
        }
    }

Then, we could write our seed classes as :

class UserSeed : ISeed
    {
        public void SeedData(OurEFContext context)
        {
            context.Users.Add(new Model.User()
            {
                Name="John"
            });
            context.Users.Add(new Model.User()
            {
                Name = "Jack"
            });
            context.Users.Add(new Model.User()
            {
                Name = "Bill"
            });
        }
    }

    class ProductSeed : ISeed
    {
        public void SeedData(OurEFContext context)
        {
            context.Products.Add(new Model.Product()
            {
                ProductName = "Orange"
            });
            context.Products.Add(new Model.Product()
            {
                ProductName = "Banana"
            });
            context.Products.Add(new Model.Product()
            {
                ProductName = "Apple"
            });
        }
    }

    [DependsOn(typeof(UserSeed))]
    [DependsOn(typeof(ProductSeed))]
    class InvoiceSeed : ISeed
    {
        public void SeedData(OurEFContext context)
        {
            context.Invoices.Add(new Model.Invoice()
            {
                Product = context.Products.Local.FirstOrDefault(p => p.ProductName == "Banana"),
                Owner = context.Users.Local.FirstOrDefault(p => p.Name == "Jack")
            });
        }
    }

Now, by using those attributes and the ISeed heritage, we can easilly write the GlobalSeeder class. Its algorithm will be simple :

  1. Determine all the Seed classes (by finding all the classes that implements ISeed)
  2. While there is still some seed classes not processed
  3. execute all the seeds where the unprocessed “depending” list is empty
  4. mark of all of them as processed

The code could be as :

    class GlobalSeeder
    {
        public void SeedDatabase(OurEFContext context)
        {
            //Get all the "ISeed" implementations in this assembly
            var seedTypes = typeof(GlobalSeeder).Assembly.GetTypes().Where(t => typeof(ISeed).IsAssignableFrom(t) && t.IsClass);
            
            //Little bit of Linq to object to get all the types in a suitable format.
            var seeds =
                seedTypes.Select(st => new
                {
                    SeedType = st,
                    DependingSeeds = st.GetCustomAttributes<DependsOnAttribute>().Select(dst => dst.DependingType).ToList()
                }).ToList();

            //While there is still some seeds to process
            while (seeds.Count>0)
            {
                //Find all the seeds without anymore depending seeds to process
                var oprhenSeeds = seeds.Where(s => s.DependingSeeds.Count == 0).ToList();
                foreach (var orphenSeed in oprhenSeeds)
                {
                    //Instanciate the current seed
                    ISeed seedInstance = (ISeed)Activator.CreateInstance(orphenSeed.SeedType);
                    //Execute seed process
                    seedInstance.SeedData(context);

                    //Remove the processed seed from all the dependant seeds
                    var relatedSeeds = seeds.Where(s => s.DependingSeeds.Any(ds => ds == orphenSeed.SeedType));
                    foreach (var relatedSeed in relatedSeeds)
                    {
                        relatedSeed.DependingSeeds.Remove(orphenSeed.SeedType);
                    }
                    //Remove the processed seed from the "to be processed list".
                    seeds.Remove(orphenSeed);
                }
            }
            //Finally save all changes to the Entity framework context.
            context.SaveChanges();
        }
    }

And our Initializer :

    class EFTestInitalizer : DropCreateDatabaseAlways<OurEFContext>
    {
        protected override void Seed(OurEFContext context)
        {
            GlobalSeeder seeder = new GlobalSeeder();
            seeder.SeedDatabase(context);
            base.Seed(context);
        }
    }

Whenever the developer will add new entities to the system, he just has to add the relevant “DependsOn” attributes at the top of his class and his code will be executed at the right time. Each Seed class can be read “on its own” as there is no more undocumented dependance. And as we have a centralized way to manage seeds, we could easilly add new custom attributes if new architecture requirements were made. We have made an easy to use, easy to read and easy to maintain seed architecture.

Don’t hesitate to comment this solution if you have any improvements in mind, or even criticism about this process.

C#