in

 

Jimmy Bogard

Assistant to the assistant to the regional manager

June 2008 - Posts

  • Separation of Concerns by example: Part 3

    We made quite a bit of progress separating out the concerns in Part 2, but there are still some issues with our current design.  Other parts in this series include:

    To review, here's our CustomerFinder so far, after refactoring away from the static class and creating the specialized ICustomCache interface:

    public class CustomerFinder
    {
        private readonly ICustomCache _customCache;
    
        public CustomerFinder(ICustomCache customCache)
        {
            _customCache = customCache;
        }
    
        public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (_customCache.Contains(key))
            {
                customers = _customCache.GetItem<List<Customer>>(key);
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    _customCache.AddItemExpringTomorrow(key, customers);
            }
    
            return customers;
        }
    
    }
    

    Although we refactored away from the HttpContext.Current.Cache dependency, we still have one nagging issue: the DataGateway.Context dependency.  The DataGateway.Context class is just a wrapper around LINQ to SQL:

    public class DataGateway
    {
        public static NorthwindDataContext Context
        {
            get
            {
                return new NorthwindDataContext("");
            }
        }
    }
    

    If we created a unit test against CustomerFinder as-is, it would run, but only if I had access to the database.  Since the unit test calls out-of-process, it's no longer a unit test.  It's just a regular automated test, running in a unit test framework.

    Our goal is not only to increase testability, but readability and usability as well.  Encapsulation is great for information hiding, but not dependency hiding.  We'd like to be as explicit as possible, telling users of the CustomerFinder exactly what is needed to get it to work properly.  We want to develop not only for users of the application, but future maintainers of the system as well.

    We have quite a few options to extract the DataGateway dependency, each of which has its benefits and issues.

    Designing the dependency

    This dependency looks like a plain static dependency at first, but on closer inspection, it looks like our other HttpContext.Current.Cache dependency, where we're poking deep inside a static property to get the "real" dependency, the Customers property.

    Given the same arguments as the previous part in this series, it looks like we'll choose "create a specialized interface".  But what should the interface look like?  What should its responsibilities be?

    One really interesting thing to look at is paging.  Should our extracted dependency be responsible for paging, or should it just return everything?  What should the return type be, still List<T>?  We could make our Application layer responsible for paging, and the Data Access layer be paging-ignorant.  After all, paging is dependent on who is looking at the data, some parts of the application might care less about it.  Paging is something that matters strictly to the user interface.

    However, we don't want to sacrifice performance at the sake of our lofty goals.  What if there are something like 100K products, or a million?  That can be a typical number on some established systems.  We don't want to pull back a million records if only ten are going to be displayed.

    To balance these concerns, we'll need to take a closer look on the existing behavior to determine the right way to go.

    A peek behind the curtains

    First, I'd like to examine what happens in the existing system.  To do so, I created a simple test:

    [Test]
    public void Check_out_the_profiling()
    {
        var finder = new CustomerFinder(new FakeCache());
        List<Customer> customers = finder.FindAllCustomers(1, 10);
        customers.Count.ShouldBeGreaterThan(0);
    }
    
    private class FakeCache : ICustomCache
    {
        public void AddItemExpringTomorrow<T>(string key, T item)
        {
            
        }
    
        public bool Contains(string key)
        {
            return false;
        }
    
        public T GetItem<T>(string key)
        {
            return default(T);
        }
    }
    

    I just used a fake cache, that's always empty to force the CustomerFinder to go against the DataContext.  I purposefully get back the second page (it's zero-based indexed), to see if the Skip and Take are smart enough in LINQ to SQL.  Here's what came back from the profiler:

    exec sp_executesql N'SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
    FROM (
        SELECT ROW_NUMBER() OVER (ORDER BY [t0].[CustomerID] DESC) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
        FROM [dbo].[Customers] AS [t0]
        ) AS [t1]
    WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
    ORDER BY [t1].[ROW_NUMBER]',N'@p0 int,@p1 int',@p0=1,@p1=10

    Now this is VERY interesting.  LINQ to SQL is smart enough to turn the Skip and Take into actual server-side paging.  Looking at the query, I wouldn't really want to write this query, but it works very well.  Which reminds me:

    Stop writing custom data access code.  This is a problem that has been solved.

    I've had to write custom dynamic paging logic, and it was not fun.  Paging gets very, very complicated when doing sub queries, arbitrary sorting, filtering and grouping.

    Anyway, it looks like paging works well on the database side, so we'd like to preserve that behavior.  But we'd also like the paging logic to be where we want, in the Application Layer.  One feature of LINQ will help us out on this side - deferred execution.  Remember, the SQL isn't actually executed until you iterate over the results, which happens with the .ToList() method call.  It will also happen in a foreach loop or the ToArray() method call.

    So it looks like we'll be able to have the best of both worlds.

    Extracting our dependency

    Although it was a long discussion, this is something that usually happens in our teams pretty regularly.  After extracting lots of dependencies, or doing TDD for some time, you start to get a good feel for how the behavior should be laid out.

    First, we'll need to Extract Method on the LINQ query part, except for the paging and ToList() part:

    public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
    {
        List<Customer> customers = null;
        string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
        if (_customCache.Contains(key))
        {
            customers = _customCache.GetItem<List<Customer>>(key);
        }
        else
        {
            customers =
                FindAllCustomers()
                    .Skip(startRowIndex)
                    .Take(maximumRows)
                    .ToList();
    
            if ((customers != null) && (customers.Count > 0))
                _customCache.AddItemExpringTomorrow(key, customers);
        }
    
        return customers;
    }
    
    private IEnumerable<Customer> FindAllCustomers()
    {
        return from
                   c in DataGateway.Context.Customers
               orderby
                   c.CustomerID descending
               select
                   c;
    }
    

    Our extracted method only returns IEnumerable<T>, and not List<T>.  Returning List<T> before paging means we'd lose the server-side paging goodness that LINQ to SQL provides.

    Another benefit to returning IEnumerable<T> is that it better communicates the intent of our code: we're giving you a read-only collection of customers, don't even try to change it.

    Next, we'll perform Extract Class on the FindAllCustomers method we just extracted to a new class, a CustomerRepository class.  Why the Repository name?  It comes from Martin Fowler's excellent Patterns of Enterprise Application Architecture book.  Here's the new CustomerRepository class:

    public class CustomerRepository
    {
        public IEnumerable<Customer> FindAllCustomers()
        {
            return from
                       c in DataGateway.Context.Customers
                   orderby
                       c.CustomerID descending
                   select
                       c;
        }
    }
    

    For those that read part 2, our next steps should seem familiar.  We'll now use Extract Interface, so that our CustomerFinder class only deals with the interface and not the concrete class.  This will allow us to swap out implementations for testing or debugging purposes.  Our new interface will be ICustomerRepository:

    public interface ICustomerRepository
    {
        IEnumerable<Customer> FindAllCustomers();
    }
    

    Of course, our CustomerFinder class no longer compiles, so we'll change the class to get the ICustomerRepository from the constructor through Constructor Injection.  The FindAllCustomers method that uses paging will use this instance instead:

    public class CustomerFinder
    {
        private readonly ICustomerRepository _customerRepository;
        private readonly ICustomCache _customCache;
    
        public CustomerFinder(ICustomerRepository customerRepository, ICustomCache customCache)
        {
            _customerRepository = customerRepository;
            _customCache = customCache;
        }
    
        public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (_customCache.Contains(key))
            {
                customers = _customCache.GetItem<List<Customer>>(key);
            }
            else
            {
                customers =_customerRepository
                            .FindAllCustomers()
                            .Skip(startRowIndex)
                            .Take(maximumRows)
                            .ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    _customCache.AddItemExpringTomorrow(key, customers);
            }
    
            return customers;
        }
    
    }
    

    One annoying thing left is that the return value of the FindAllCustomers betrays its intent: a read-only view of a single page of customers.  It makes zero sense that a user of this class will add new customers to a single page of customers.  We want to be as explicit as possible in our interface, so I'm going to change the return type to an array instead of a List<T>:

    public class CustomerFinder
    {
        private readonly ICustomerRepository _customerRepository;
        private readonly ICustomCache _customCache;
    
        public CustomerFinder(ICustomerRepository customerRepository, ICustomCache customCache)
        {
            _customerRepository = customerRepository;
            _customCache = customCache;
        }
    
        public Customer[] FindAllCustomers(int startRowIndex, int maximumRows)
        {
            Customer[] customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (_customCache.Contains(key))
            {
                customers = _customCache.GetItem<Customer[]>(key);
            }
            else
            {
                customers =_customerRepository
                            .FindAllCustomers()
                            .Skip(startRowIndex)
                            .Take(maximumRows)
                            .ToArray();
    
                if ((customers != null) && (customers.Length > 0))
                    _customCache.AddItemExpringTomorrow(key, customers);
            }
    
            return customers;
        }
    
    }
    

    This a much clearer interface to those using the CustomerFinder, as they know exactly what the class depends on, and the Customer[] array indicates that we're looking for a read-only, immutable collection of Customers.

    Of course, our original CustomerManager class no longer compiles, so we'll need to fix the method by changing the return type and fixing the constructor call:

    public class CustomerManager
    {
        [DataObjectMethod(DataObjectMethodType.Select, false)]
        public static Customer[] GetCustomers(int startRowIndex, int maximumRows)
        {
            var finder = new CustomerFinder(new CustomerRepository(), new CustomCache());
            return finder.FindAllCustomers(startRowIndex, maximumRows);
        }
    }
    

    Our ASP.NET page, with its data source control, does not need to change.  It works just as well with arrays as it does with List<T>.  Executing our original profile test with the new CustomerRepository and checking the profiler confirms that...oh wait, server-side paging is broken.  Here's what the profiler says:

    SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
    FROM [dbo].[Customers] AS [t0]
    ORDER BY [t0].[CustomerID] DESC

    So what happened?  I'm still doing deferred query execution, but now it's forgotten how to do server side paging?  The culprit lies in our return type in ICustomerRepository.  The LINQ query actually returns IOrderedQueryable<T>, which ReSharper originally did for me before I changed it.  Bad me.

    So, I'll change ICustomerRepository.FindAllCustomers to return the IOrderedQueryable interface instead of IEnumerable:

    public interface ICustomerRepository
    {
        IOrderedQueryable<Customer> FindAllCustomers();
    }
    
    public class CustomerRepository : ICustomerRepository
    {
        public IOrderedQueryable<Customer> FindAllCustomers()
        {
            return from
                       c in DataGateway.Context.Customers
                   orderby
                       c.CustomerID descending
                   select
                       c;
        }
    }
    

    The LINQ query doesn't change, but my profiling confirms that the changes worked.  So was this a good change?  Yes.  In fact, I'd argue that the IOrderedQueryable better expresses the intent of what we're returning.  In effect, we're telling users of the ICustomerRepository that we're returning a smart collection, that you can do paging and all sorts of things on it in an efficient manner.

    And that's what we're really trying to do here, communicate the intent of our classes and interfaces to the users of these, without forcing them to look at the code in the class.  If another developer understands intent based solely on the names and signatures of our methods and types, we've done our job.

    Quick review

    To get rid of our dependency on the static DataContext class, we performed similar refactorings as we did in part 2:

    • Extract Method
    • Extract Class
    • Extract Interface

    We picked another specialized interface, ICustomerRepository, to house our data access code, borrowing the name from the pattern in Fowler's book.  We also saw that LINQ to SQL requires IOrderedQueryable to effectively take advantage of server-side paging.  In doing so, we created a very explicit interface, describing exactly what we intend to support (and nothing more).

    Our CustomerFinder no longer contains any opaque dependencies.  Which is good, because in the next part, we'll see how dependency inversion saves the day when a bug report comes in and we need to add some unit tests to ensure our bug doesn't get un-fixed.

  • When a space isn't a space

    I ran into a scenario recently where this test failed:

    [Test]
    public void You_have_to_be_kidding_me()
    {
        string a = "You have to be kidding me       ";
        string b = "You have to be kidding me       ";
    
        a.ShouldEqual(b);
    }
    

    It took me quite a while to determine the problem, as it originally came from a column in the database with natural keys.

    When a test like this fails, it's time to call it a day.  Any guesses as to the problem?

    Posted Jun 26 2008, 07:15 AM by bogardj with 8 comment(s)
    Filed under:
  • Entity Framework vote of no confidence

    As was announced initially (as far as I can tell) on Bil Simser's blog, some concerned citizens of the .NET world put out an online vote-of-no-confidence concerning the Entity Framework:

    Somewhere on the list you'll find my name.  I agree with the letter, and I encourage you to read the letter and sign if you also agree.  I also have two additional reasons I signed:

    • I got totally hosed by the initial MSDN documentation on data access, when I was getting started on programming
    • The overall direction and strategy towards data access hasn't seemed to change since then

    Using DataSets and typed DataSets, complete with all my logic inside stored procedures, seemed bad, but I didn't know any better.  Why should I?  The articles from the folks who created the framework described in great detail the correct way to use their framework.

    But people who write articles aren't always the same as the people who develop real-world applications.  Now, I don't pretend to know how the design process works, they can't be completely in a vacuum, but I'm not sure how much opportunity people developing frameworks get to build applications built on top of those frameworks.

    When you don't live in the house you built, you have to rely on the current inhabitants to tell you the house works.  Unless said inhabitants have only known houses where the toilet sits inside the kitchen, they'll probably ask for a closer fridge.

    There have already been quite a few reactions from bloggers (thank you Google):

    One article:

    A thread, the original announcement:

    Some MS responses:

    The goal of this letter is to inform customers, as well as create more dialog.  I believe this letter will be successful on both points.

  • Separation of Concerns by example: Part 2

    Separation of concerns is one of the fundamental tenets of good object-oriented design.  Anyone can throw a bunch of code in a method and call it a day, but that's not the most maintainable approach.  So far, we've looked at:

    In this part, I'd like to look at removing the dependency on HttpContext.  Here's what our classes look like thus far:

    public class CustomerManager
    {
        [DataObjectMethod(DataObjectMethodType.Select, false)]
        public static List<Customer> GetCustomers(int startRowIndex, int maximumRows)
        {
            var finder = new CustomerFinder();
            return new CustomerFinder().FindAllCustomers(startRowIndex, maximumRows);
        }
    }
    
    public class CustomerFinder
    {
        public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (HttpContext.Current.Cache[key] != null)
            {
                customers = (List<Customer>) HttpContext.Current.Cache[key];
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    HttpContext.Current.Cache.Insert(key, customers, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
            }
    
            return customers;
        }
    }
    

    The CustomerManager class is our presentation-layer service, and is only a very thin wrapper on top of the domain and application services.  We want to push the important business logic down into the heart of the domain as much as possible, where it belongs.

    The goal is to write a single passing test for the CustomerFinder class.  Right now, that isn't possible because of the dependencies on HttpContext and DataGateway.

    For now, the immediate concern has to be HttpContext.  Having a dependency directly on Linq to SQL is bad, but at least the test can run without it.  It's not so simple to remove the HttpContext dependency, as we have quite a few choices on how to do so.

    Designing the dependency

    We know we're going to use constructor injection for this dependency.  What that means is that we'll use an interface to act as a facade, and the CustomerFinder will use that interface to do its work.

    Under test, we'll pass a fake instance to test the indirect inputs and outputs to the test, while at runtime, the "real" instance will be used.

    So the basic idea will be to hide the caching part behind an interface.  But what should we hide?  We have three options:

    • Create an IHttpContext interface to hide HttpContext.Current
    • Create an ICache interface to hide the HttpContext.Current.Cache property
    • Create a specialized interface to hide the entire cache calls

    Abstracting HttpContext.Current

    This is a huge rabbit hole we don't want to go down.  HttpContext is a very large class with a ton of methods and properties.  We could create an IHttpContext interface that only has one property for Cache, but that has its own issues, too.

    The Cache class is sealed, meaning we can't subclass it to create a fake instance.  So we'd have to create an ICache implementation anyway to get around this limitation.  And at this point, what is IHttpContext giving us other than a window to the Cache?  Nothing.

    Some MVC frameworks opt to hide the entire HttpContext behind an interface or an abstract class.  These still violate the Interface Segregation Principle, as implementers of the interface have to provide implementations of Request, Response, Session, Cookies, etc etc.

    This isn't a good choice, but what about a targeted Cache implementation?

    Abstracting Cache

    This looks like a better choice, as the Cache class is much smaller and lighter and HttpContext.  However, it still suffers from similar problems as the IHttpContext, where implementers have to know a great deal about hows and whys of Cache to get it to work properly.

    One other issue an ICache implementation won't solve is the non-intention-revealing interface the Cache.Insert method contains. It's difficult to read the code inside CustomerFinder to see what the caching strategy is.

    With that strategy out, we're down to our final option.

    Specialized interface

    Specialized interface entails creating an interface that only hides what we use, and providing use-specific names for the methods.  Instead of Cache.Insert, we might have Cache.InsertItemExpiringTomorrow.  It's a much more explicit interface, describing the intent of what our caching logic is doing.

    Creating the specialized interface

    As always, we'll first want to extract our methods out of the CustomerFinder class, providing good names for each of the methods.  We'll need to extract both the section that peeks inside Cache, retrieves an item from Cache, and inserts an item.  Along the way, we want to make sure we use the same names as what our interface will provide, which will save a rename step later.

    After the Extract Method refactorings, our class now looks like this:

    public class CustomerFinder
    {
        public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (Contains(key))
            {
                customers = GetItem(key);
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    AddItemExpringTomorrow(key, customers);
            }
    
            return customers;
        }
    
        private void AddItemExpringTomorrow(string key, List<Customer> customers)
        {
            HttpContext.Current.Cache.Insert(key, customers, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
        }
    
        private List<Customer> GetItem(string key)
        {
            return (List<Customer>) HttpContext.Current.Cache[key];
        }
    
        private bool Contains(string key)
        {
            return HttpContext.Current.Cache[key] != null;
        }
    }
    

    But this isn't quite right, is it?  We have some problems already:

    • The Insert and Get methods are coupled to the List<Customer> type.  We're not making an ICustomersCache, so generics will help here.

    After fixing the changes above, our extracted methods now look like this:

    private void AddItemExpringTomorrow<T>(string key, T item)
    {
        HttpContext.Current.Cache.Insert(key, item, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
    }
    
    private T GetItem<T>(string key)
    {
        return (T) HttpContext.Current.Cache[key];
    }
    
    private bool Contains(string key)
    {
        return HttpContext.Current.Cache[key] != null;
    }
    

    That's much better.  Now that we have a group of methods with our Cache logic we need, we can perform Extract Class to create our specialized Cache implementation.  All we'll need to do is create a CustomCache class and move these methods to that class:

    public class CustomCache
    {
        public void AddItemExpringTomorrow<T>(string key, T item)
        {
            HttpContext.Current.Cache.Insert(key, item, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
        }
    
        public T GetItem<T>(string key)
        {
            return (T)HttpContext.Current.Cache[key];
        }
    
        public bool Contains(string key)
        {
            return HttpContext.Current.Cache[key] != null;
        }
    }
    

    Our CustomerFinder class isn't compiling, but we have one more step for the CustomCache class before it's ready for prime time.  We need to Extract Interface so that our CustomerFinder implementation has something less concrete to work with.  ReSharper lets us extract the interface easily, which leaves us with our final CustomCache implementation:

    public interface ICustomCache
    {
        void AddItemExpringTomorrow<T>(string key, T item);
        T GetItem<T>(string key);
        bool Contains(string key);
    }
    
    public class CustomCache : ICustomCache
    {
        public void AddItemExpringTomorrow<T>(string key, T item)
        {
            HttpContext.Current.Cache.Insert(key, item, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
        }
    
        public T GetItem<T>(string key)
        {
            return (T)HttpContext.Current.Cache[key];
        }
    
        public bool Contains(string key)
        {
            return HttpContext.Current.Cache[key] != null;
        }
    }
    

    Now the ICustomCache is something that the CustomerFinder can deal with.  Since we're using constructor injection, we'll want to create a constructor that takes an ICustomCache implementation.  Our CustomerFinder will now look like:

    public class CustomerFinder
    {
        private readonly ICustomCache _customCache;
    
        public CustomerFinder(ICustomCache customCache)
        {
            _customCache = customCache;
        }
    
        public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (_customCache.Contains(key))
            {
                customers = _customCache.GetItem<List<Customer>>(key);
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    _customCache.AddItemExpringTomorrow(key, customers);
            }
    
            return customers;
        }
    
    }
    

    Our CustomerFinder now has no opaque dependency on the Cache or HttpContext classes.  We only deal with our wrapper, which has more explicit names about what we're trying to do.

    But now our CustomerManager class isn't compiling, as we removed the no-argument constructor for CustomerFinder.  In this case, I'll just allow the CustomerManager to instantiate the correct implementations of ICustomCache:

    public class CustomerManager
    {
        [DataObjectMethod(DataObjectMethodType.Select, false)]
        public static List<Customer> GetCustomers(int startRowIndex, int maximumRows)
        {
            var finder = new CustomerFinder(new CustomCache());
            return finder.FindAllCustomers(startRowIndex, maximumRows);
        }
    }
    

    We could have gone several ways on that one, from a creation method, to a factory, all the way to an IoC container like Spring or StructureMap.  We'll wait on any changes like that.

    Quick review

    Our CustomerFinder class had an opaque dependency on HttpContext.Current and Cache.  We looked at a few options at making that dependency explicit through constructor injection, settling on a specialized implementation that contained intention-revealing names for only the operations we support.  To do so, we performed a number of refactorings:

    Finally, we fixed the CustomerManager class to use our new constructor with the appropriate dependency.  The maintainability of the CustomerFinder class has improved significantly, as well as usability, as both users and maintainers of this class can see immediately the explicit dependencies this class requires.

    Next time, we'll look at getting rid of that pesky Linq to SQL dependency.

  • Separation of Concerns by example: Part 1

    In the prelude to this series, I looked at a snippet of code that took the kitchen sink approach to concerns.  Here's what we started out with:

    public class CustomerManager
    {
        [DataObjectMethod(DataObjectMethodType.Select, false)]
        public static List<Customer> GetCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (HttpContext.Current.Cache[key] != null)
            {
                customers = (List<Customer>) HttpContext.Current.Cache[key];
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    HttpContext.Current.Cache.Insert(key, customers, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
            }
    
            return customers;
        }
    }
    

    Before we can even think about separating the concerns out, we already have the design working against us.  I prefer a dependency inversion approach to separate concerns, as opposed to having the class or method instantiate them all at once.

    When we have a static method, how can we give the dependencies to the class?  I can think of some clever ways, but clever is rarely simple.  I'd like to err on the side of simple.

    Additionally, we don't want our domain services mixed in with presentation concerns, such as the DataObjectMethod attribute.  That is in place for the ObjectDataSource control, a presentation object.

    So, first order of business, let's make this a plain old class, instead of a static class.

    Refactoring away from static methods

    Now, presentation-specific classes are just fine.  If our presentation layer needs static methods and some crazy attributes, good for them.

    But I don't want those concerns bleeding down into my BLL domain layer.

    To fix this, first we'll need to extract method and get all of the contents out of that method (Ctrl+Alt+M ftw):

    public class CustomerManager
    {
        [DataObjectMethod(DataObjectMethodType.Select, false)]
        public static List<Customer> GetCustomers(int startRowIndex, int maximumRows)
        {
            return FindAllCustomers(startRowIndex, maximumRows);
        }
    
        private static List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (HttpContext.Current.Cache[key] != null)
            {
                customers = (List<Customer>) HttpContext.Current.Cache[key];
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    HttpContext.Current.Cache.Insert(key, customers, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
            }
    
            return customers;
        }
    }
    

    Now that we have a private method, we can use Move Method to move this method to a whole new class.  If this method used any other private (or public) methods, we'd have to deal with them one at a time.  For now, we'll have a new CustomerFinder application-level service class:

    public class CustomerFinder
    {
        public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;
    
            if (HttpContext.Current.Cache[key] != null)
            {
                customers = (List<Customer>) HttpContext.Current.Cache[key];
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    HttpContext.Current.Cache.Insert(key, customers, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
            }
    
            return customers;
        }
    }
    

    Now, we don't have any tests in place (this is a full-blown legacy app), so we're just focusing on dependency-breaking techniques.  These techniques preserve existing behavior, which is what we need when we don't have a safety net of unit tests in place.

    The existing CustomerManager now needs to change, as it needs to use our new class.  No matter, we'll just instantiate a new CustomerFinder and use that:

    public class CustomerManager
    {
        [DataObjectMethod(DataObjectMethodType.Select, false)]
        public static List<Customer> GetCustomers(int startRowIndex, int maximumRows)
        {
            return new CustomerFinder().FindAllCustomers(startRowIndex, maximumRows);
        }
    }
    

    Now our presentation layer is just a very thin layer on top of our application and domain layer.  Which is good, as the presentation layer is the hardest layer to test (and the absolute slooooooowest).

    Quick review

    We want to change the CustomerManager, but it's a presentation layer class that we still need to service our user interface.  To get around this, we applied a series of refactorings to move the behavior to an application layer service, which we can now work against for future enhancements.  These refactorings included:

    We did these in a way such the existing behavior of CustomerManager remained unchanged, but its responsibilities were moved to the CustomerFinder service.

    Next time, we'll look at the CustomerFinder to see how we can remove some of the static dependencies (I think I saw a screencast somewhere on that one...)

  • Adios, Netflix profiles

    It was a good run.  But it looks like your creators are giving you the axe, judging by the email I received today:

    We wanted to let you know we will be eliminating Profiles, the feature that allowed you to set up separate DVD Queues under one account, effective September 1, 2008.

    Each additional Profile Queue will be unavailable after September 1, 2008. Before then, we recommend you consolidate any of your Profile Queues to your main account Queue or print them out.

    While it may be disappointing to see Profiles go away, this change will help us continue to improve the Netflix website for all our customers.

    My wife and I enjoyed you while it lasted, as it let us manage separate queues in one account.  My wife had here movies, and I had my (better) movies.  Although I'll get fewer Paltrow and Hugh Grant movies, we'll see a definite increase in Stallone and Bruce Willis movies.  So, not a total loss.

    I do appreciate the complete lack of any semblance of an explanation to your untimely demise.  But, these unnamed improvements surely must be worth the sacrifice, right?

    So adios Netflix profiles, we hardly knew ye.

    Posted Jun 18 2008, 10:01 PM by bogardj with 5 comment(s)
    Filed under:
  • Separation of Concerns - how not to do it

    In a recent article on layered LINQ applications in the latest ASP.NET PRO magazine (no link, you have to pay), the author proposes separating your application into three distinct layers:

    • User Interface (UI) layer
    • Business Logic Layer (BLL)
    • Data Access Layer (DAL)

    I certainly would have agreed, at least back in 2004 or so.  Many of the sample applications coming out of MSDN have a similar architecture, with namespaces like Northwind.DAL to reinforce that architecture.

    While separating out concerns into these layers is a good first step, it's only the first step.  Layered architecture and separation of concerns goes well beyond splitting your code into three classes.  Our code itself gives hints for when a class is concerned with too much going on.

    Too many concerns

    A snippet of the BLL class looked something like this:

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Linq;
    using System.Web;
    
    public class CustomerManager
    {
        [DataObjectMethod(DataObjectMethodType.Select, false)]
        public static List<Customer> GetCustomers(int startRowIndex, int maximumRows)
        {
            List<Customer> customers = null;
            string key = "Customers_Customers_" + startRowIndex + maximumRows;
    
            if (HttpContext.Current.Cache[key] != null)
            {
                customers = (List<Customer>) HttpContext.Current.Cache[key];
            }
            else
            {
                customers =
                    (
                        from
                            c in DataGateway.Context.Customers
                        orderby
                            c.CustomerID descending
                        select
                            c
                    ).Skip(startRowIndex).Take(maximumRows).ToList();
    
                if ((customers != null) && (customers.Count > 0))
                    HttpContext.Current.Cache.Insert(key, customers, null, DateTime.Now.AddDays(1), TimeSpan.Zero);
            }
    
            return customers;
        }
        
    }
    

    This CustomerManager class would also have methods for all of the CRUD operations, delegating to LINQ to SQL for the heavy lifting.  Already we have some inconsistencies and pain points with this class:

    • It manages CRUD operations, yet is situated in the BLL.  Is it data access or not?
    • The name is nebulous.  It Manages Customers, but how?  What aspects?  Persistence?  Caching?  Business Rules?  It might as well implement IKitchenSink.
    • Persistence and caching concerns are mixed together
    • Customer management and UI concerns are mixed together (i.e., the DataObjectMethod attribute)
    • Not an intention-revealing interface - List<Customer> allows me to add, remove and insert items to the list.  This should be a read-only affair.

    Outside of these issues, there are other problems we can see off-hand:

    • Potential for InvalidCastException - anyone can stick something in Cache with that key.  Use the "as" operator instead
    • The "customers" variable is never null.  No need to check it.
    • Several static dependencies - the DataGateway.Context and HttpContext.Current static properties
    • The whole CustomerManager class is impossible to unit test

    Or test in any way possible.  For me to add an automated test, the whole HttpContext needs to be up and running.  I'll get an immediate NullReferenceException as soon as any piece of code hits HttpContext.Current.

    Layering applications is a great step in the right direction to removing concerns from the ASP.NET codebehind.  But it's only the first step.  Next time, we'll see how we can split out the many concerns from this CustomerManager.

  • Some improved LINQ operators

    I ran across a couple of scenarios the other day that were made pretty difficult given the current LINQ query operators.  First, I needed to see if an item existed in a collection.  That's easy with the Contains method, when you want to find item that matches all the attributes you're looking for.

    Suppose I want only one attribute to match?  For example, I have a Person class:

    public class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
    }

    What if I want to see if a collection of Persons contains someone with the last name "Smith"?  Contains only gives me two options:

    public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value);
    public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value, IEqualityComparer<TSource> comparer);
    

    That doesn't help, I have to implement some interface just to match against the LastName.  Typically, this is solved with one of two options:

    // Inefficient Contains replacement
    values
        .Where(person => person.LastName == "Smith")
        .Count()
        .ShouldBeGreaterThan(0);
    
    // Efficient, but ugly and hard to use
    values
        .Where(person => person.LastName == "Smith")
        .FirstOrDefault()
        .ShouldNotBeNull();
    

    The first example is inefficient because Count() iterates through all of the values found, where I only really care if one is found.  The second example works, but loses the intent of what I'm trying to find out.

    I also had the same types of problems with Distinct, where I'd like to find distinct elements, but only looking at a certain value.  I had to implement the same IEqualityComparer (very annoying).

    Better LINQ extensions

    Instead of implementing some crazy interface, I'd like to just give the Contains and Distinct query operators an expression of what to look for.  I'd like this test to pass:

     [Test]
     public void Better_enumerable_extensions()
     {
         var values = new[]
                          {
                              new Person {FirstName = "Bob", LastName = "Smith"},
                              new Person {FirstName = "Don", LastName = "Allen"},
                              new Person {FirstName = "Bob", LastName = "Sacamano"},
                              new Person {FirstName = "Chris", LastName = "Smith"},
                              new Person {FirstName = "George", LastName = "Allen"}
                          };
    
         values
             .Distinct(person => person.LastName)
             .Count()
             .ShouldEqual(3);
    
         values
             .Distinct(person => person.FirstName)
             .Count()
             .ShouldEqual(4);
    
         values
             .Contains("Smith", person => person.LastName)
             .ShouldBeTrue();
    
         values
             .Contains("Nixon", person => person.LastName)
             .ShouldBeFalse();
    }
    

    In the Distinct example, I pass in a lambda expression of the distinct attribute I'm looking for.  In the Contains example, I pass in the lambda expression, as well as the value I'm looking for.

    To do this, I'll need to create my extensions class with the new extension methods:

    public static class BetterEnumerableExtensions
    {
        public static IEnumerable<TSource> Distinct<TSource, TResult>(
            this IEnumerable<TSource> source, Func<TSource, TResult> comparer)
        {
            return source.Distinct(new DynamicComparer<TSource, TResult>(comparer));
        }
    
        public static bool Contains<TSource, TResult>(
            this IEnumerable<TSource> source, TResult value, Func<TSource, TResult> selector)
        {
            foreach (TSource sourceItem in source)
            {
                TResult sourceValue = selector(sourceItem);
                if (sourceValue.Equals(value))
                    return true;
            }
            return false;
        }
    }

    Yeah yeah, all those angle-brackets really start to get ugly.  The new Contains method takes in the selector method now, in the form of a Func delegate.  In the body, I just loop through the source items, evaluating the selector for each item.  If the source value matches the value I'm searching for, I return "true" immediately and stop looping.  Otherwise, I return false.

    The new Distinct method uses the existing Distinct, but now it's using a new DynamicComparer class:

    public class DynamicComparer<T, TResult> : IEqualityComparer<T>
    {
        private readonly Func<T, TResult> _selector;
    
        public DynamicComparer(Func<T, TResult> selector)
        {
            _selector = selector;
        }
    
        public bool Equals(T x, T y)
        {
            TResult result1 = _selector(x);
            TResult result2 = _selector(y);
            return result1.Equals(result2);
        }
    
        public int GetHashCode(T obj)
        {
            TResult result = _selector(obj);
            return result.GetHashCode();
        }
    }
    

    It has to do similar things as the Contains method, where I evaluate the items passed in against the selector method delegate passed in earlier.  In any case, the existing Distinct method works the way I want to, without me needing to re-implement its internal logic as I did with the Contains.

    I tried using the DynamicComparer with the Contains method, but it just worked out better re-implementing the logic.

    Intention-revealing interfaces == good

    The Where/Count or even the Where/FirstOrDefault ways of getting the Contains is just plain ugly.  By passing in a selector method, I can describe exactly what I'm looking for.  In the case of Distinct, having to create a custom IEqualityComparer just for that is unnecessary most of the time.  When I saw that initially, it just looked like more trouble than it was worth.  But with the new and improved extensions, I get a much cleaner implementation.

    Posted Jun 07 2008, 07:39 PM by bogardj with 13 comment(s)
    Filed under: ,
  • Forbidden Void type in C#

    I've had this come up a couple of times.  I'd really like to be able to do something like this:

    Func<bool, Void> whyNot = test => Console.WriteLine(test);

    This is equivalent to:

    Action<bool> okThisWorks = test => Console.WriteLine(test);

    Although an actual Void type exists in the .NET Framework.  There error is very specific:

    error CS0673: System.Void cannot be used from C# -- use typeof(void) to get the void type object

    It even calls out the C# language!  Having a first class, supported Void type helps out quite a bit in various Functional Programming scenarios I've run into, and I've always had to resort to using the Action delegates.

    I'm sure there's a good reason for the explicit forbidding here, much like the generic variance issues.  Not being on the C# team, I can't really think of any.  Anyone else have any insight here?

    Posted Jun 05 2008, 10:26 PM by bogardj with 3 comment(s)
    Filed under:
  • Designing primary keys

    When creating a primary key for a table, we have a few options:

    • Composite key
    • Natural key
    • Surrogate key

    Each has their own advantages and disadvantages, but by and large I always wind up going with the last option.

    Composite keys

    Composite keys are primary keys made up of multiple columns.  For example, consider a system with and Order and LineItem table:

    The LineItems table has an identity that is composed of two columns: the OrderID and the ProductID.  This makes sense in that an Order can only have one LineItem per Product.

    The problem happens when I need to make a table that has a foreign key relationship to the LineItems table:

    Blech! Now our composite key is invading every table that relates to it.  Order is a natural aggregate root, but what happens if we decide later down the line that a LineItem is a standalone concept?

    I've seen this a few times in databases I've created and some legacy ones.  You assume that a row's identity is defined by its parent, but later it begins to have a life of its own.

    Unless the table is a junction table (one that holds a many-to-many relationship), and it doesn't have and attributes of its own, a single primary key is the safest bet.

    Natural keys

    Ugh.

    Please don't continue this (worst) practice going forward.  Natural keys are keys with business uniqueness, such as Social Security Number, Tax ID, Driver's License Number, etc.

    Besides performance issues with data like these as keys, one thing to keep in mind is that an entity's identity can never change, from now until the end of time.

    Things like SSN's and such are likely entered by humans.  Ever mistype something?  If something needs to be unique, there are other ways besides a primary key to ensure uniqueness.

    Surrogate keys

    Surrogate keys are computer-generated keys that likely have no meaning, or might never be shown to the end user.  Typical types are seeded integers and GUIDs.

    Personally, I prefer GUIDs as they're guaranteed to only to be unique to the table, but globally unique (it's in the name for pete's sake!)  Combed GUIDs provide comparable performance to integers or longs, as well as the added benefit that they can be created by the application instead of the database.  GUIDs also play very nicely in replication scenarios.

    You can still use natural key information, such as SSN, to identify or search for a particular record, but a surrogate key ensures uniqueness and performance.

    Frankenstein keys

    Unfortunately I have found another kind of primary key: the Frankenstein key.  Here's a small taste:

    Hmmm...let's see, a CUSTOMER table, an ADDRESS_MASTER table, and one table that should join the two together, the CUST_ADDR_DTL.  But it doesn't have any foreign key relationships to the two tables it should have, it has a cryptic CUST_ADDR_NO column instead.

    Looking at the data reveals a mind-boggling design:

    CUST_NUM
    ----------
         89480
    
    ADDR_KEY
    ----------
        441839
    
    CUST_ADDR_NO
    --------------------
    89480         441839
    

    Wow.  Fixed-length column values aren't new, especially in databases ported from mainframes.  The last value, instead of having two columns in a composite key, just jams both foreign key values into one single Frankenstein column.

    Just about the ugliest thing I'd ever seen, until I saw a SELECT statement, with where clauses using string functions to parse out the individual values.

    Surrogate GUIDs by default

    It doesn't cost anything, and it's the easiest and simplest choice to make going forward.  Composite keys can still be represented as foreign keys and unique constraints, but it's tough to add identity later.

    Not to mention, domain models look pretty terrible and are quite brittle with composite keys.  As any system is so much more than data it contains, why not go with surrogate keys, which give so much flexibility to your design?

  • Failures of aimless large-scale refactorings

    At the recent Austin Code Camp, I heard a few stories after my Legacy Code talk about teams attacking their legacy code in prolonged refactoring binges that never seemed to end.  Never ending, because no one had a good idea of what success looked like.

    Once technical debt has built up to a certain point, some folks opt to declare bankruptcy and start over.  A place I came from recently did this fairly regularly, about once every two years.  It wasn't that a new technology would solve new business problems (although this was how IT sold the re-work), but the current codebase was completely unable to change at the rate the business needed.

    Starting over

    When we have a large legacy codebase, how should we turn it around?  How much resources do we allocate to re-sculpting the big ball of mud?  In a previous job, we did this by hijacking an entire release as an "engineering" release, rewriting the entire plumbing of the application.

    Huge disaster, to say the least.

    Needless to say, the refactoring itself was technically successful, but in the business side, a complete failure.  We paused adding business value for months while we tinkered.  The business absolutely resented us for this, and held it against us up until I left.

    I've personally never seen a developer-induced month-long refactoring session succeed, and the failures of these refactoring efforts do serious long-term harm on the credibility of the team.  The business never EVER likes to see basically a work stoppage to fix a mess that, in their eyes, are of the developers own doing.

    Risk and angles of attack

    Another angle of attack besides charging up the hill to the machine gun nest, armed with only a bayonet, is to take a more strategic aim at de-gunking your system.

    One approach that worked well for us was to only refactor in areas that needed to be changed.  Change is needed for many reasons, whether it's new features, bugs, performance problems, or others.

    By only refactoring areas that needed to change, we were able to mitigate the risk of performing large-scale refactorings by making small, targeted steps towards a brighter future.  We didn't always know what the end result would look like, but that was a good thing.

    If anything, developers are terrible guessers.  My design guesses are always always always wrong the first time.  It's a waste of time to argue relentlessly on the best way to refactor a certain area, as tools like ReSharper and methods laid out in Kerievsky, Fowler and Feathers' books make it fairly easy to change direction.

    A pact for continuous improvement

    To make sure that our codebase continuously improved, our team agreed that every change we made would improve our codebase.  Even something as small as eliminating a couple duplicate methods was a big improvement.  In a big ball of mud, we deleted code far more than we added, as it was rampant duplication that got us in to our original mess.

    Over time, small changes allowed us to see larger refactorings that became more apparent.  Since the large refactorings were driven from actual needs to change, we had confidence that we were moving in the right direction.

    The whole point of the refactorings was to eliminate duplication so that we could add value, q