in

 

GrabBag<T>

  • LINQ query operators: lose that foreach already!

    Now that .NET 3.5 is out with all its LINQ query operator goodness, I feel like going on a mean streak of trashing a lot of our (now) pointless foreach loops.  Some example operations include:

    • Transformations
    • Aggregations
    • Concatenations
    • Filtering

    As I mentioned in my last post, temporary list creation is a great pointer to find opportunities for losing the foreach statements.  I keep the foreach statement when the readability and understandability of the code drops with the LINQ change, but otherwise, a lot less temporary objects are floating around.  Personally, the jury is still out for me whether it's clearer to return "IEnumerable<LineItem>" over "LineItem[]", but the temporary array creation doesn't seem to have much of a point.

    Transformations

    Transformations are easy to spot. You'll create a new List<Something>, then loop through some other List<OtherThing> and create a Something from the OtherThing:

    public OrderSummary[] FindOrdersFor(int customerId)
    {
        IEnumerable<Order> orders = GetOrdersForCustomer(customerId);
        List<OrderSummary> orderSummaries = new List<OrderSummary>();
    
        foreach (Order order in orders)
        {
            orderSummaries.Add(new OrderSummary
                                {
                                    CustomerName = order.Customer.Name,
                                    DateSubmitted = order.DateSubmitted,
                                    OrderTotal = order.GetTotal()
                                });
        }
    
        return orderSummaries.ToArray();
    }
    

    Note the temporary list creation, just to return an array.  With LINQ query operators, I can use the Select method to do the same transformation in less code:

    public OrderSummary[] FindOrdersFor(int customerId)
    {
        return GetOrdersForCustomer(customerId)
            .Select(order => new OrderSummary
                                 {
                                     CustomerName = order.Customer.Name,
                                     DateSubmitted = order.DateSubmitted,
                                     OrderTotal = order.GetTotal()
                                 })
            .ToArray();
    }
    

    By chaining the methods together, it comes out much more readable.

    Aggregations

    Aggregations can be found when you're looping through some list for some kind of calculation.  For example, the GetTotal method on Order loops to build up the total based on each item's ItemPrice:

    public decimal GetTotal()
    {
        decimal total = 0.0m;
    
        foreach (LineItem lineItem in GetLineItems())
        {
            total += lineItem.ItemPrice;
        }
    
        return total;
    }
    

    Again, this can be greatly reduced using LINQ query operators and the Sum method:

    public decimal GetTotal()
    {
        return GetLineItems()
            .Sum(item => item.ItemPrice);
    }
    

    Not only is the code much smaller, but the intent is much easier to discern.  Sometimes a calculation can be tricky, and in those cases LINQ isn't bringing anything to the table.  As always, use good judgement and keep an eye on readability.

    Concatenations

    Oftentimes I need to combine many lists into one flattened list.  For example, suppose I need a list of OrderLineItem summary items, perhaps to display on a grid to the end user.  However, I want to display all order line items for all orders, which is difficult to build up manually:

    public LineItemSummary[] FindLineItemsFor(int customerId)
    {
        IEnumerable<Order> orders = GetOrdersForCustomer(customerId);
        List<LineItemSummary> lineItemSummaries = new List<LineItemSummary>();
    
        foreach (Order order in orders)
        {
            IEnumerable<LineItemSummary> tempSummaries = TransformLineItems(order, order.GetLineItems());
    
            lineItemSummaries.AddRange(tempSummaries);
        }
    
        return lineItemSummaries.ToArray();
    }
    

    Note the two temporary lists: one to hold the concatenated list, and the other to hold each result as we loop through.  With the SelectMany method, this becomes much shorter:

    public LineItemSummary[] FindLineItemsFor(int customerId)
    {
        return GetOrdersForCustomer(customerId)
            .SelectMany(order => TransformLineItems(order, order.GetLineItems()))
            .ToArray();
    }
    

    No temporary lists are created, and all of the LineItemSummary objects are concatenated correctly.  Nested foreach loops as well as the AddRange method are indicators that the SelectMany method could be used.

    Filtering

    Filtering looks similar to transformations, except there's an "if" statement that controls adding to the temporary list.  Suppose we want only the expensive LineItemSummary items:

    public LineItemSummary[] FindExpensiveLineItemsFor(int customerId)
    {
        IEnumerable<Order> orders = GetOrdersForCustomer(customerId);
        List<LineItemSummary> lineItemSummaries = new List<LineItemSummary>();
    
        foreach (Order order in orders)
        {
            foreach (LineItem item in order.GetLineItems())
            {
                if (item.ItemPrice > 100.0m)
                    lineItemSummaries.Add(new LineItemSummary
                                              {
                                                  CustomerName = order.Customer.Name,
                                                  DateSubmitted = order.DateSubmitted,
                                                  ItemPrice = item.ItemPrice,
                                                  ItemName = item.ProductName
                                              });
            }
        }
    
        return lineItemSummaries.ToArray();
    }
    

    This example has both concatenation and filtering.  The filtering can be taken care of with the Where method, and we'll use the same technique earlier with the SelectMany method:

    public LineItemSummary[] FindExpensiveLineItemsFor(int customerId)
    {
        return GetOrdersForCustomer(customerId)
            .SelectMany(order => TransformLineItems(order, order.GetLineItems()))
            .Where(item => item.ItemPrice > 100.0m)
            .ToArray();
    }
    

    By adding the Where method, we can filter out only the expensive line items.  The method chaining looks much better than the nested foreach loops coupled with the if statement, and got rid of the temporary list creation.

    Lose the foreach

    With the new LINQ query operators, any temporary list creation and foreach loop should be considered suspect.  By understanding the operations LINQ gives us, we can not only reduce the amount of code we create, but the end result matches the original intent far better.

    Not every foreach or temporary list should be removed, as sometimes long chains and large lambdas tend to muddy rather than clear the picture.  But for a great deal of scenarios, LINQ query operators can vastly improve the readability of transformation (Select), aggregation (Sum), concatenation (SelectMany) and filtering (Where) sections of your code.

    Posted May 09 2008, 06:11 PM by bogardj with 5 comment(s)
    Filed under: ,
  • Enhancing mappers with LINQ

    The "big 3" higher-order functions in functional programming are Filter, Map and Reduce.  When looking at the new C# 3.0 LINQ query operators, we find that all three have equivalents:

    • Filter = Where
    • Map = Select
    • Reduce = Aggregate

    Whenever you find yourself needing one of these three higher-order functions, just translate them into the correct query operator.  "Select" doesn't have the same dictionary meaning as "Map", but the signature is exactly the same.

    The trick to knowing you can use these higher order functions is to look out for situations where you:

    1. Create a new collection
    2. Iterate through some other collection
    3. Add items from the other collection to the new collection

    Any time you see this general algorithm, there's a much terser syntax available with LINQ.

    Mapper patter example

    For example, consider the Mapper pattern:

    public interface IMapper<TInput, TOutput>
    {
        TOutput Map(TInput input);
    }
    

    A common scenario to map is when I'm creating DTOs or message objects from Domain objects.  Serializing domain objects generally isn't a concern of the domain object, as DTOs tend to be flattened out somewhat.  I might have the following domain objects:

    public class Customer
    {
        public Guid Id  { get; set; }
        public string Name { get; set; }
    }
    public class SalesOrder
    {
        public Customer Customer { get; set; }
        public decimal Total { get; set; }
    }
    

    I'd like to send a Sales Order over the wire for display in some client application.  But suppose the Customer object contains dozens of properties, perhaps things like a billing address, a shipping address, and so on.  The service I'm creating only needs a summary of customer information, so I create a SalesOrderSummary message class:

    public class SalesOrderSummary
    {
        public string CustomerName { get; private set; }
        public Guid CustomerId { get; private set; }
        public decimal Total { get; private set; }
    
        // For serialization
        private SalesOrderSummary() { }
    
        public SalesOrderSummary(string customerName, Guid customerId, decimal total)
        {
            CustomerName = customerName;
            CustomerId = customerId;
            Total = total;
        }
    }

    The corresponding mapper would look like:

    public class SalesOrderSummaryMapper : IMapper<SalesOrder, SalesOrderSummary>
    {
        public SalesOrderSummary Map(SalesOrder input)
        {
            return new SalesOrderSummary(input.Customer.Name, input.Customer.Id, input.Total);
        }
    }
    

    Nothing too exciting so far, right?  Well, suppose now I need to return an array of SalesOrderSummary, perhaps for display in a grid.  In that case, I'll need to build up a list of SalesOrderSummary objects based on a list of SalesOrder objects:

    public SalesOrderSummary[] FindSalesOrdersByMonth(DateTime date)
    {
        // get the sales orders from the repository first
        IEnumerable<SalesOrder> salesOrders = GetSalesOrders();
    
        var salesOrderSummaries = new List<SalesOrderSummary>();
        var mapper = new SalesOrderSummaryMapper();
    
        foreach (var salesOrder in salesOrders)
        {
            salesOrderSummaries.Add(mapper.Map(salesOrder));
        }
    
        return salesOrderSummaries.ToArray();
    }
    

    This isn't too bad, but the creation of a second list just to build up an array seems rather pointless.  But by borrowing some ideas from JP, we can make this a lot easier.

    Using LINQ

    We can already see the higher order function we need, it's in the name of the mapper!  Instead of "Map", we'll use "Select" to do the transforming.  But since we have the interface, we can create an extension method to do the Map function:

    public static class MapperExtensions
    {
        public static IEnumerable<TOutput> MapAll<TInput, TOutput>(this IMapper<TInput, TOutput> mapper, 
            IEnumerable<TInput> input)
        {
            return input.Select(x => mapper.Map(x));
        }
    }
    

    This new MapAll function is the functional programming Map function, where it takes an input list and returns a new IEnumerable with the mapped values.  Internally, the Select LINQ query operator will loop through our input, calling the lambda function we passed in (mapper.Map).  This is the exact same operation in our original example, but now our service method now becomes much smaller:

    public SalesOrderSummary[] FindSalesOrdersByMonth(DateTime date)
    {
        // get the sales orders from the repository first
        IEnumerable<SalesOrder> salesOrders = GetSalesOrders();
        var mapper = new SalesOrderSummaryMapper();
    
        return mapper.MapAll(salesOrders).ToArray();
    }
    

    Much nicer, our service method is reduced to just a handful of lines.  The nice thing about this syntax is that it removes all of the unnecessary cruft of a temporary list creation that clouded the intent of this method.

    So any time you find yourself creating a temporary list just to build up some filtered, mapped or reduced values, stop yourself.  There's a higher calling available with functional programming and LINQ.

  • Mike Cohn in town

    Mike Cohn, author of User Stories Applied and Agile Estimation and Planning, is speaking tomorrow night as a part of Agile Austin's Distinguished Speaker Series.  The topic is "Succeeding with Agile: A Guide to Transitioning", with the description:

    Transitioning to an agile development process is unlike most transitions an organization may make. Many transitions begin when a strong, visionary leader plants a stake in the ground and says, “Let’s take our organization there.” Other transitions start with a lone team thinking, “Who cares what management thinks, let’s do this.” The problem in transitioning to agile is that neither of these approaches alone is likely to lead to the long-term sustainable change required.

    In this session we will look at the primary reasons why agile transitions fail and how to overcome them. We will explore what is necessary for self-organizing teams to emerge and thrive within the transitioning organization. We will also the role self-organization must play during the transition itself. This session will include a description of the eight primary ways of getting started—including going All In, Start Small, and Impending Doom—and the advantages of each. You will leave knowing what approach will work best for you as well as what you must and must not do to succeed with agile.

    Transitioning to Agile is a very bumpy ride, as the rise in communication works to surface problems that were previously hidden and swept under the rug.  Mike's books have had a large influence on how I approach software, so I'm pretty excited about Mike coming to town.  The registration fee for non-Agile Austin members is $10, or in my terms, 20 tacos from Jack in the Box.

    For more information, check out the event page:

    http://mikecohnpresentation.eventbrite.com/

    I know it's not Hannah Montana, but it should be a nice event nonetheless.

    Posted May 07 2008, 07:26 AM by bogardj with 3 comment(s)
    Filed under:
  • PabloTV: Eliminating static dependencies screencast

    Nature abhors a vacuum.  It turns out she also abhors static dependencies (I have my sources).  Static dependencies are the modern-day globals, often exposed through classes named "Helper".  I've certainly been guilty of overusing static dependencies in the past, with classes like "LoggingHelper", "SessionHelper", "DBHelper" and so on.

    The problem with static dependencies is that they are opaque to the extreme, enforcing a strong coupling that is impossible to see from users of the class.  To demonstrate techniques for eliminating static dependencies, Ray Houston and I created a short screencast:

    Eliminating static dependencies screencast

    Our screencast demonstrates using TDD along with ideas and techniques laid out in Michael Feathers' Working Effectively with Legacy Code and Joshua Kerievsky's Refactoring to Patterns.  It details how to make safe, responsible changes to an existing legacy codebase, while improving the design by breaking out dependencies to a static class.

    Hope you enjoy it!

  • A pointless exercise

    I caught this last night from Scott Hanselman on Twitter:

    http://www.betterwebapp.com/drupal/?q=screencasts

    It's a side-by-side comparison of the time to create a simple web app for:

    • Ruby
    • Perl
    • ASP.NET
    • Java

    The website compares a few other frameworks to compare which languages and frameworks are the fastest to develop against.  Not that it matters, but ASP.NET came out on top for the simple application profile, while Python/Django came out on top for the three-tier application profile.

    While viewing the screencast, all I could think was "Holy jeebus, is there a more pointless exercise than timing the creation of software?"  Creating software is easy (some would say too easy).  If our problem was "how fast can we whip out software", the issues of today's software developers would have been solved decades ago.

    The problem is that software maintenance dwarfs the cost of software creation, by a factor of at least 3 to 1.  So if we're trying to optimize software development, isn't it the most expensive aspect, maintenance, what we should focus on?  We use ReSharper to optimize responsible code creation (i.e., micro-codegen vs. macro-codegen), but it also helps with responsible code maintenance through automated refactorings.  Many other responsible engineering practices, such as those espoused by XP, aim to reduce the maintenance costs of software development.

    In the end, I could care less how long it takes to slap out some sample application.  In six years of professional development, no one has ever paid me to write an application on the level of what was demonstrated.  Some managers will care how fast you sling code, but the smart ones care about:

    • Can the software you created be easily changed?
    • Can the software you created be easily tested?
    • Can the software you created be easily deployed?
    • Can the software you created be easily diagnosed for bugs?
    • Is the software you created correct?
    • Is the software you created what we actually need?

    I'd love to see those issues in a screencast, but who would want to watch a screencast that lasted for weeks or months?

    Posted May 05 2008, 07:40 AM by bogardj with 1 comment(s)
    Filed under: ,
  • Developers or engineers?

    I've had quite a few job titles where I basically did the same function: Software Engineer, Software Developer, Technical Lead, and so on.  In some companies, a Software Developer is a completely different position than Software Engineer, and in others they're used interchangeably.

    The connotations and implications from each title are strikingly different.  The term "engineer" has a much different meaning than "developer", taken outside the context of software. A Developer might lay out the plan of a new subdivision, where its green areas are, what style homes should be built and what theme the neighborhood should adhere to.  The Engineer lays out the water and sewage lines, elevations, roads, utilities and other functions vital for sustainable human habitability.  A Developer creates the vision and designs the implementation, while the Engineer ensures that quality, regulatory and ethical standards have been met.

    That's not to say an Engineer isn't creative.  In a family full of Engineering graduates (including myself), no Engineer would have a job if they weren't creative.  Every problem is an implementation of a different pattern, but the details make each challenge completely different than the one before.

    Having worked with many types of real Engineers (i.e., they're doing the job they went to school for), I didn't see what they were doing was much different than me.  They were concerned with:

    • Safety
    • Reliability
    • Maintainability
    • Aesthetics
    • Usability

    As well as a host of other "ilities".  One critical difference between me as a Developer and they as an Engineer was that Engineers are licensed by an independent group.

    In many fields and jurisdiction, including Civil Engineering in Texas, licensing is required by law when performing engineering services for the public.  Licensing isn't easy, either.  To become a licensed Professional Engineer PE, you must:

    • Graduate from an accredited engineering program
    • Take an initial exam to become an Engineering-in-Training (EIT)
    • Have a minimum of 4 years experience as an EIT, preferably as an apprentice to another PE
    • Take a rigorous exam, which takes 6-8 hours to complete

    Licensing isn't a one-time affair, you have to renew your license, which has its own requirements including continuing education.  Even if not working in the public sector, licensing is considered a career necessity.

    But we have certifications, right?

    Anyone that has interviewed candidates for positions knows the sad state of software certifications.  At best, software certifications signify that your are certified to be an absolute beginner for that topic.  For some reason, software certifications still hold value, though all involved, managers, HR and candidates, know that the certifications aren't worth bytes it was emailed with.

    Would licensing help our industry?  We live in a sad state of affairs when identities are not stolen because of sifting through your mail or stealing your wallet, but because of careless, ignorant developers.  In a Twitter conversation with an ex-colleague, Terry noted that software development lacked discipline, respect and accountability.  Can licensing help prevent the ridiculous failures, by both raising the bar and raising the stakes for individual failure?

    If a bridge fails, the engineers can be held liable if they are found to cut corners, or even sheer ignorance.  Oversight in the form of inspectors is supposed to help, but in the end, the engineers are held accountable.

    I see Developers as code monkeys, slapping code out without regard to reliability and maintainability.  Engineers have a much longer horizon too look at, they must ensure the bridge they build lasts 100 years, not six months.  Shouldn't we hold ourselves up to the same standard?

    Posted Apr 30 2008, 07:50 PM by bogardj with 21 comment(s)
    Filed under:
  • Understanding Mock Objects: an alternate solution

    In AzamSharp's recent post Understanding Mock Objects, he poses a problem of testing with volatile data.  His example extends on an article on AspAlliance, which exhibits the same problems with its solution.  Suppose I have an image service that returns images based on the time of day:

    public static class ImageOfTheDayService
    {
        public static string GetImage(DateTime dt)
        {
            int hour = dt.Hour;
    
            if (hour > 6 && hour < 21) return "sun.jpg";
    
            return "moon.jpg";
        }
    }
    

    The initial test uses the current date and time to perform the test:

    [Test]
    public void should_return_sun_image_when_it_is_day_time()
    {
        string imageName = ImageOfTheDayService.GetImage(DateTime.Now);
        Assert.AreEqual(imageName, "sun.jpg");
    }
    

    Since DateTime.Now is non-deterministic, this test will pass only some of the time, and will at other times.  The problem is that this test has a dependency on the system clock.  AzamSharp's solution to this problem was to create an interface to wrap DateTime:

    public interface IDateTime
    {
        int GetHour();
    }
    

    Now the ImageOfTheDayService uses IDateTime to determine the hour:

    [Test]
    public void MOCKING_should_return_night_image_when_it_is_night_time()
    {
        var mock = new Mock<IDateTime>();
        mock.Expect(e => e.GetHour()).Returns(21); // 9:00 PM
    
        Assert.AreEqual("moon.jpg", ImageOfTheDayService.GetImage(mock.Object));
    }
    

    I really don't like this solution, as the test had the external non-deterministic dependency, not the image service.

    Alternative solution

    Here's another solution that doesn't use mocks, and keeps the original DateTime parameter:

    [Test]
    public void should_return_night_image_when_it_is_night_time()
    {
        DateTime nightTime = new DateTime(2000, 1, 1, 0, 0, 0);
    
        string imageName = ImageOfTheDayService.GetImage(nightTime);
        Assert.AreEqual(imageName, "moon.jpg");
    }
    

    Note that I eliminated the dependency on the system clock by simply creating a DateTime that represents "night time".  Another test creates a "day time" DateTime, and perhaps more to fill in edge cases.  Instead of creating an interface to wrap something that didn't need fixing, we used DateTime exactly how they were designed.  DateTime.Now is not the only way to create a DateTime object.

    Solving a non-deterministic test with mocks only works when it's the component under test that has the non-deterministic dependencies.  In the example AzamSharp's example, it was the test, and not the component that had the non-deterministic dependency.  Creating the DateTime using its constructor led to both a more readable test and a more cohesive interface for the image service.

    It's easy to believe everything is a nail if all you want to use is that shiny hammer.  Keep in mind the purpose of mocks: to verify the indirect inputs and outputs of a component, not to fix every erratic test under the sun.

  • Raising the bar

    Continuous improvement is absolutely essential for any serious software developer.  Personally, my drive for constant improvement is not so much the next shiny developer toy (though this happens occasionally), but the idea that there is always some way to deliver value to the customer better and cheaper than what I'm doing now.

    Driving personal improvement can be a great example to teammates, but for an entire team (or even teams you don't work with) to improve, you'll have to expand your horizons.  You'll need to maximize the value of your keystrokes, so to speak.

    So you sent an email to your teammates about a possible design improvement.  Why not blog about it?  So you read a new design or patterns book.  Why not give your team a presentation and summary?  So you refactored some gnarly code.  Why not tell the whole team on your improvements?

    Keeping self-improvement to your self might not improve your situation as much as you might like.  Unless you're flying solo, other team members will eventually write code that you will despise.

    Driving local improvement

    Instead of deriding poor design, look towards collective improvement and raise your team's collective bar.  Some different bar-raising techniques include:

    • Creating a team blog
    • Starting a book club
    • Hold brown-bag sessions on patterns, practices, refactorings, and others
    • Pairing and mentoring sessions with junior developers

    If you're only improving yourself, your code will improve, but it won't impact the team's codebase as much as everyone's code improving.  And it's not just code either.  Communication and process improvements will provide at least, if not more value than coding improvements.  Body language can tell you if you're on the right track when running ideas past your domain experts.  If you can't recognize these signals, you're creating quite a bit of waste.  Driving awareness and action of improvement areas are part of what's required of you to earn your pay.

    Driving community improvement

    Something I've made a personal commitment to do is to try and drive some community improvement.  I'm already at a great place that expects XP practices, and we can always expand and improve, but every shop I run into has serious dysfunctions.  I'd like to try to raise the bar through:

    • This blog and the literally tens of people that read it
    • Screencasts on shiny new objects values, principles and practices I think are important
    • Local book clubs
    • Skype/VNC sessions with interested folks (probably for a small fee)
    • Code camp sessions

    I have my selfish reasons for these things, as everyone else's code is terrible.  Except mine, of course.  You can hear angels singing when I apply ReSharper shortcuts.

    There are about an order of magnitude more bad developers than good ones, and another order of magnitude from good to great.  The onus is on the upper-level practitioners to mentor others and raise the collective bar of our community.

    Posted Apr 25 2008, 11:34 PM by bogardj with 2 comment(s)
    Filed under:
  • Should you TDD when flying solo?

    A couple of weeks ago a question came up on the ALT.NET message board:

    Does TDD make sense when you're the only developer in your company?

    To me, this is akin to the following questions:

    • Is quality important?
    • Is maintainability important?
    • Is design important?

    Remember, TDD is all about design.  Unit tests are icing on the cake.  TDD shows me where I violate the Dependency Inversion Principle.  TDD shows me if my design makes sense from the client's perspective.  TDD encourages low coupling and high cohesion (but doesn't guarantee it).  TDD gives me immediate visibility into the pain points of my system.  TDD gives me confidence that my design is right.  TDD gives me confidence to refactor carefully or recklessly.

    Without TDD, I have zero confidence in my design is both what I intended nor what is needed.  Without client code exercising concrete behavior for explicit contexts, I have no visibility into the "why" of the design.  Without TDD, I'm flying completely blind.  I can draw UML diagrams, sketch out code, even write little test applications.  But unless I can demonstrate behavior in specific contexts, I have no evidence that my design is right.

    Now pair programming solo, that requires some extreme dexterity...

    Posted Apr 22 2008, 08:39 PM by bogardj with 11 comment(s)
    Filed under:
  • Auto-mocking container pitfalls?

    I'm taking a closer look at the auto-mocking container idea, specifically as we're including it in the upcoming release of NBehave.  I'm a little wary of prolonged use, but wanted to get some feedback (it's also on the ALT.NET message board).  Some pitfalls I thought of offhand were:

    • Can allow dependencies to get out of hand
    • Can hide a code smell where you have too many dependencies
    • Forces a reliance on an IoC container for all creation, inside and outside of tests

    Specifically the one I'm most worried about is making it too easy to add a bunch of dependencies and creating god-classes that have too much responsibility and coordinate too much.  If I hide that complication in a unit test, it might be perpetuating a bad design.

    On the other hand, plenty of design patterns exist to help hide that problem, including creation methods, factory methods, factories and abstract factories.  But I'm thinking I might rather have those abstract away the "what and how" than "just too many dependencies".

    Finally, AMC could be another sharp tool in the toolbox.  It can cut you if you're inexperienced, but sometimes it can make the job much easier.  Luckily, no mistake is ever more than an iteration's length away from reversal.  I'm going to proceed carefully, mindful of the pitfalls I think I might land in.

    Posted Apr 21 2008, 07:57 PM by bogardj with 3 comment(s)
    Filed under: ,
  • Version control with Subversion: so easy my wife can do it

    Yes folks, it's true.  I have converted my wife to be a loyal Subversion user.  My wife is not technical, not by a long shot.  But the power of Subversion and the simplicity of TortoiseSVN made the convincing very easy.

    Ed. note: that's not my wife in the above screen.  Please don't get me in any more trouble than I already am.  Thanks.

    It wasn't even that hard to convince her, either.  She doesn't do any development, but she has lots of files that need versioning.  For example, this was her versioning system before Subversion:

      

    She would create many copies of a document, noting the date and sometimes a little comment describing the changes.  I noticed that she would even do complex branching, creating new forks of documents where a sweeping change needed to be introduced, but not necessarily affect the "master" copy just yet.

    The setup

    I mentioned earlier my wife isn't technical, but she's not a complete beginner around her laptop either.  She is comfortable:

    • Organizing documents in her "My Documents" folder
    • Creating, renaming and deleting files and folders
    • Not spilling Starbucks onto her laptop
    • Right-clicking to bring up the context menu

    To use Subversion, all you really need is the ability to organize and right-click.  If your significant other or tech-challenged friend don't know what a context menu is, you might have to start at square zero.

    I turned my wife onto source control by addressing some pretty common complaints:

    • Those she wrote these documents for changed their mind frequently, leading to a lot of wasted work and backtracking
    • She would forget to create a new copy occasionally, and closing Word means she lost her "undo" history and therefore her starting point
    • Managing the copies of documents is tedious and time-consuming, resulting in only creating new baseline documents or copies pretty rarely

    It took my wife some time to perfect her copy-rename strategy, as before then she would work off of one copy only to be thoroughly defeated when she needed to undo changes and she had closed Word.

    The pitch

    I've used source control for far more than just source code files.  I've never been a fan of SharePoint, which in my experience is rarely used for more than a glorified file share.  It can version documents, but the web interface for doing so is v-e-e-e-r-y clunky and primitive.  It takes far too many clicks to edit a document, and even then you're in a interface that wasn't meant for file and folder browsing (the web).

    Source control is perfect for situations where you want to save multiple revisions of a document and be able to restore older versions at any time.  I showed my wife one of our trunks from work, where we keep all project documents including source code in a single repository.

    I explained to her the concepts of a centralized change repository, and the idea of saving your document twice (once to save, once to commit the changes).

    But the easiest pitch was to just show it in action.  A simple demonstration showing real-world scenarios sealed the deal.

    Sealing the deal

    With my wife, I set up a local Subversion repository and trunk.  To eliminate the "magic", I walked through these steps with her:

    • Installing Subversion
    • Installing TortoiseSVN
    • Creating a local repository using TortoiseSVN
    • Checking out the new repository using TortoiseSVN

    For consistency's sake, I checked out the new repository to a folder in her "My Documents", calling it "Versioned Files".  A descriptive name helped her remember why this folder was special and had all of those new icons.  With the icons TortoiseSVN provides, my wife could easily see if she had changes she needed to commit to the repository.

    Next, we walked through a couple of scenarios she runs into frequently:

    • Making changes and setting a new "baseline" document
    • Reverting changes to a previous version

    In the first scenario, we walked through:

    • Creating documents and folders
    • Committing the new documents and folders to the repository using TortoiseSVN
    • Entering descriptive comments

    And in the second scenario, we walked through:

    • Viewing the log, and seeing the changes and comments
    • Reverting a change

    To make it easy to commit changes, I created a shortcut on her Desktop to the "Versioned Files" folder.  TortoiseSVN's context menu works with shortcuts, so she is able to commit and revert straight from her desktop.

    Finally, I let her drive some real-world scenarios to make sure she understood all of the concepts and details.  She's very happy with the results, and it has greatly simplified her document management.  Any file that she wants to keep a history of, she drops it into her folder and commits the file to the repository.

    Zero-friction toolset

    With any tool I use, in development or otherwise, the less friction it adds to my life, the more valuable it is to me.  Any client I talk to, I'll always recommend Subversion over any source control provider, simply because it introduces the least amount of friction of any source control provider I've used.  I don't have to open another application, open a web browser or open an IDE.  It's all right there in Explorer, in a simple and intuitive interface.

    And if the client still won't believe me, it's hard to argue against a source control provider so easy my wife can do it.  Unless they believe a complicated interface implies a more powerful tool, there's always this list of companies who are Subversion users.

    Next project: get my dad to use Subversion for his Fortran programs...

    Posted Apr 19 2008, 01:06 PM by bogardj with 10 comment(s)
    Filed under:
  • Profiling a legacy app

    Approaching a legacy application can be a daunting task.  You may or may not have access to the original developers (if they even still work for the company), and the domain experts might not be able to commit to teaching you the software full time.  If you're lucky enough to have access to true domain experts, it's rare that they know the system from a technical standpoint, or are familiar with the entire system.  It's more likely they have intimate knowledge of one piece, and cursory knowledge of the rest.

    We've gone with an approach that's allowed us to glean quite a bit of information about the domain in a fairly short period of time.  It can be overwhelming trying to see where to start, especially if you're looking at a codebase with hundreds of thousands, if not over one million lines of code.

    Pitfalls

    Some pitfalls we wanted to avoid were:

    • Getting bogged down in code
    • Worrying/complaining about code quality
    • Searching for a pot o' gold (magical class that gives you complete insight into the system)

    The biggest one is the complaining.  Many very successful systems are built with duck tape and baling wire.  It's rather pointless to vilify an application or system that's netted a company millions of dollars.  Lack of structure or tests might be bogging down the company now, but it's put a lot of food on people's tables.

    When seeing code that I wouldn't necessarily write, I like to step back and ask myself, "how many meals did this code buy for a family?"  I've written my share of stinkers over the years, so who am I to point fingers?

    Getting started

    The absolute first step is to get access to:

    1. A working application
    2. Its outputs

    Without a working application, we're forced to comb through a codebase.  That's hardly productive as without an original developer, we have to make educational guesses about how it works and what's important.

    Also vital is insight to the outputs of the application, whether it be files, web service calls or a database.  We want to treat the application like a black box.  We're going to poke and prod the application, so it's necessary to see what pops out the other side.

    If the application writes to a database, get access to a profiler so you can watch traffic.  If it makes network calls, use a port sniffer or something like Fiddler to watch the traffic.  If it writes to files or to MSMQ, again, find a profiler so you can watch traffic.

    Collect some stats

    Chances are your legacy app writes to some kind of database.  The older the system, the more variety you'll see in design, or more alien the design will seem to you.  Modern databases haven't been around too long considering the timeline of mainframes, and these designs were much different than the databases you're probably used to.

    Don't be surprised if you don't find any referential integrity.  You might not be able to run a dependency analysis tool (like RedGate's excellent Dependency Tracker) and find anything connected.  If the application uses SQL Server, do yourself a favor and get one of RedGate's bundles.

    What worked well for us was not the diagrams from the legacy app, which can have more orphans than a Dickens novel, but a simple listing of tables with some important extra data.  We created another "Analysis" database with two tables:

    • TableInformation
    • ColumnInformation

    The TableInformation contained columns for Database, TableName, ColumnCount, RowCount, create/modify dates, and anything else we could glean from the metadata information.  RowCount is important as you can query TableInformation, sort by RowCount, and have a good idea of what the most important tables are.  Chances are a table with 10 million rows is fairly important.  Tables with zero rows can be crossed off the list immediately, as your database probably has tables that were created but never used.

    By seeing a list of all of the tables with their RowCount, you can get an idea of which tables are Transactional (lots of reads/writes), Lookup (written once, now just for lookup values, like states or country lists), or Unused (one or fewer rows).  The number of "important" tables is now a fraction of the original number of tables you were looking at.

    The ColumnInformation contained columns for Database, TableName, ColumnName, as well as data type information.  Collecting column information is extremely useful when your database doesn't have any relationships explicitly defined.  You can perform queries like "SELECT * FROM ColumnInformation WHERE ColumnName = 'ORDER_NUM'".  This can give you a great indication of what is related to what.

    Poke and watch

    Finally, with a running application and some base stats, we're ready to profile the application.  The basic idea is to perform a concrete operation and examine the traffic.  Pull up that Customer page, and with the profiler open, capture a slice of traffic related to your operation.  It helps if no one else is using the system at the time, as you don't want to collect false data.

    For each operation we find, we'll:

    • Start the profiler
    • Perform the operation
    • Stop the profiler
    • Archive/examine the profiler results

    By doing something like examining a product, looking up a (valid) customer, we can see not only what the main Entity table is, but any ancillary tables are.  Most SQL profilers (such as SQL Server's) allow you to copy the actual SQL script being used.  We can then paste this SQL script into our query tool to re-run the script to examine the data returned.

    Finally, as we're noting relationships between the tables, we create a completely new database that contains only the tables and relationships.  We can't add relationships to the existing database, as referential integrity probably wasn't enforced (maybe it was only suggested or encouraged).  This allowed us to create a very descriptive diagram that contained all of the tables and relationships of the legacy database, just without all of the other stuff that gets in the way.

    We keep the original names of the tables and columns too, as it let us go back to our stats database and do additional queries.  As soon as we can determine what the main "Customer" table is (and its primary key or identifier column), we can query to see if any other tables reference it in some way.  Testing the connection through counts and joins lets us confirm the relationship.

    Don't get discouraged

    It can be easy to get bogged down and discouraged when a legacy app falls on your lap.  A application with millions of lines of code and hundreds, if not thousands of database tables can be completely overwhelming.

    But often the goal is not to understand the codebase, but the entities, relationships and business processes.  Focus on key scenarios with domain experts, and profile the results.  With any application, importance is not equally distributed to all features.  With some targeted analysis and heavy conversation with the domain experts, you'll be able to gain a deep insight into the business behind the legacy application.

  • Guidelines aren't rules

    I'm a huge fan of the Framework Design Guidelines book.  It provides great instruction on creating reusable libraries, based on Microsoft's design on the .NET Framework.

    But it's important to remember that guidelines aren't rules.  Guidelines are recommendations based on a set of perceived best practices.  Even a "DO" or "MUST" guideline can be broken in rare cases where following the guideline creates a negative user experience.  In these cases, you need a pretty overwhelming case against the guideline to not follow it.

    ASP.NET MVC Example

    Looking at Action Filters in ASP.NET MVC (Preview 2), I found an...interesting naming convention for creating action filters:

    public class AdminRoleAttribute : ActionFilterAttribute
    {
        public override void OnActionExecuting(FilterExecutingContext filterContext)
        {
            if (!filterContext.HttpContext.User.IsInRole("Administrator"))
                throw new ApplicationException("For cool dudes only.");
        }
    }
    

    In the ASP.NET MVC framework, we're required to create a custom attribute to create action filters.  Taking a look at the method I override, when exactly does this method get executed?  Is it before, after or during?  The present tense of the verb implies that it's during, but that's not possible.  I have to deduce that it gets executed before the action.

    The naming convention here is a little strange, with "OnActionExecuting" the method for "before" and "OnActionExecuted" for "after".  This naming style is normally used for event raising, where you'd create protected "On<EventName>" methods that would in turn raise the event.

    In the ActionFilterAttribute, there are no events being raised, so this guideline shouldn't apply. That's where more clear names like say, "BeforeAction" and "AfterAction" would be far more explicit about when these methods get called.

    Other frameworks

    Action filters are fairly common in MVC frameworks.  I wouldn't want to call out this design choice without looking at a few others, so let's check out what the other cool kids are doing.

    MonoRail

    To create a custom filter in MonoRail, you don't need to create a new attribute type.  In the ASP.NET MVC example, I'd decorate my controllers with the specific "AdminRole" attribute.  The filters in MonoRail are somewhat decoupled from when they're executed:

    public class AdminRoleFilter : IFilter
    {
        public bool Perform(ExecuteEnum exec, IRailsEngineContext context, Controller controller)
        {
            if (!context.CurrentUser.IsInRole("Administrator"))
            {
                controller.Flash.Add("For cool dudes only.");
                controller.Redirect("", "login", "login");
                return false;
            }
    
            return true;
        }
    }
    

    You then decorate your controller with the FilterAttribute, specifying when the filter gets executed:

        [Filter(ExecuteEnum.BeforeAction, typeof(AdminRoleFilter))]
        public class AdminController : SmartDispatcherController
    

    Note the nice pretty name, "BeforeAction".  Other values for the ExecuteEnum include "AfterAction", "Around", "AfterRendering", etc.  These names are very expressive for exactly when this action should get executed.  I don't get names like "OnViewRendered" or "OnActionExecutingOhAndByTheWayOnActionExecutedAlso".

    Non-.NET frameworks

    Rails has equally expressive manner of doing actions.  This makes sense as MonoRail was heavily influenced by Rails.  In Rails, you can specify then name of the methods to be executed:

      class BankController < ActionController::Base
        before_filter :audit
    
        private
          def audit
            # record the action and parameters in an audit log
          end
      end
    
      class VaultController < BankController
        before_filter :verify_credentials
    
        private
          def verify_credentials
            # make sure the user is allowed into the vault
          end
      end

    Rails provides many descriptive methods (which can be chained) to modify the filters for a particular controller: "before_filter", "after_filter", "around_filter" and others.  Note again the obviously named methods, where the time the filter executes is explicitly named "before" or "after".

    Finally, in Django, filter-like features are created through the Decorator pattern, where you explicitly wrap a controller action in another method that adds functionality:

    from django.contrib.auth.decorators import login_required
    
    def my_view(request):
        # ...
    my_view = login_required(my_view)

    Instead of specifying when the filter is executed, it's up to the implementer of the decorator to determine when to call the underlying action.  This gives quite a bit of flexibility to the designers of the decorator, but takes it away from the developer, as they can no longer specify when the filter should execute.

    Intention-revealing interfaces

    In the end, the difference between a good and not-so-good framework can be found in its success in creating Intention-revealing interfaces.  If you can't look at a component or class and infer its responsibility and purpose without exercising the component, chances are good that you have an intention-revealing interface.

    In the ASP.NET MVC filters, I had to guess which one executed before the action, as the word "before" was nowhere to be seen.  When guidelines directly contradict a clear, intention-revealing interface, it's safe to choose to not follow the guideline.  If we had an ActionFilterAttribute that had only "BeforeFilter" and "AfterFilter", I don't think the FDG police will come knocking at anyone's door.

  • Dear software tool vendors, RE: I'm breaking up with you

    Dear software tool vendors,

    Reading Chad's ReSharper love letter reminded me we need to talk.

    I'm breaking up with you.

    Your solutions seemed so enticing.  It seemed my excitement had no bounds, as I waited longingly for each press release and blog post detailing your new features.  How did I ever live without that AJAX-y web grid?  My life was so empty before you.

    You solved problems I didn't even know I had.  I didn't know I needed three different types of XML models to describe my data.  Since you are the experts of the domain of your tool, I trusted you to guide me in picking the tool.

    Straightforward problems of yesterday seem impossible today, given the volume of features you said you were delivering.  Because of my zeal to use the latest and greatest, I would pressure the business to use your pre-pre-pre alpha versions instead.

    I had no reason to doubt you would deliver on the promises of ludicrous increases in productivity, as there is a never-ending supply of supporters of your tool in the blogosphere.

    But I got wise.

    These bloggers, although not paid employees for you, received plenty of other pecuniary gains from their myopic praise and support.  They spoke at conferences, got better jobs, received awards, all for being an expert in your tool.

    Familiarity with your tool gives them no authority in the domain of your tool.  In only gives them authority in the subject of your tool.  Unless I can talk reasonably about ORM and database mapping strategies, I can only be an expert in NHibernate.

    I, like many others, misplaced and gave undue authority to these experts.  I did not see what they and your company put out as how-to's, best practices and examples: pure marketing.

    And now I'm better for it.

    I won't get excited over V.Next.  I won't follow releases intently.  I won't badger my co-workers, "have you checked out the latest Floogle release?"

    Because tools come and go.  Design and architecture values and principles last.

    Unless you solve an existing problem, match my values, then keep on walking.  I'm not answering the door, you can peddle your wares elsewhere.  Invent problems for some other sucker.

    And tool vendors that I do decide to use?  I'll only recommend you after putting you through a serious gauntlet.  But don't expect me to be loyal.  As soon as something solves my problems better than you, I'll switch in a heartbeat.

    Ultimately, all that matters is that I give my employers the best return on their investment I can.  That means good design and a maintainable ecosystem.

    Look, it's not me, it's you.  And it feels so good to say goodbye.

    We may meet again in the future, but only on my terms and only after I've chunked your marketing message out the window.

    Sincerely,

    A liberated developer

    Posted Apr 08 2008, 07:27 PM by bogardj with 12 comment(s)
    Filed under:
  • Reacting to change

    When dealing with the possibility of change in requirements in the middle of development, I've generally seen three reactions:

    • Explicitly reject the possibility
    • Ignore it completely, hope it goes away
    • Accept and embrace it

    Of these three, only two are valid reactions.  There's a time for explicit rejection of new requirements, even in Agile and iterative development.  Unfortunately, most of the places I've encountered fall into the middle bucket, ignoring new requirements or actively subverting those expressing them.

    Explicit rejection

    There's nothing wrong with explicit rejection of new requirements.  In iterative development, once the stories have been selected for an iteration, no new stories can be added to that iteration.  Stories are often split during planning, and sometimes new stories are surfaced during an iteration.  Agile teams have to be disciplined not to take on new work during an iteration, as this can jeopardize finishing the original stories selected.

    By keeping iterations short (1-2 weeks), business owners never feel too bad about new requirements (and eventually stories) being surfaced.  It's a natural part of the development process.  As software is delivered incrementally, assumptions are challenged and directions can change.  But not during an iteration, as the team needs stable direction towards a defined goal.

    Explicit rejection also occurs in phased-based development.  I'm moving away from the term "waterfall", as it's beginning to be a loaded term.  Instead, think hand-offs, sign-offs, and phases (planning, development, testing, delivery, etc.)  Changes in requirements are explicitly rejected during the development phase, but the realities of software development tend to force compromises in this area.  This can lead to contentious relationships between analysts and developers, as each tries to maneuver and politic to get their way.

    The way I've seen this go down is usually developers will assign very high estimates on items surfaced late in the game, so that those signing the checks can't justify the new features getting added.  Then it becomes a trick of the analysts to ask for 100 features all with "Must Have" priority, with the full knowledge that only 10 will actually get delivered.

    Ignore and hope

    Explicit rejection in phase-based processes lead to ignoring and hoping changes don't come in.  There are plenty of ways to ignore change requests, such as ignoring emails and dismissing changes out-of-hand.  Even explicit rejection is a form of ignoring, as it ignores the reality of software development.  As soon as a user sees the software for the first time, all presumptions of usability are thrown out the window.

    Teams hope for no change requests by filling their plate as much as possible, so even the slightest deviation from the plan would cause delivery delays.  If change requests do come in, I've seen software teams keep track of them, for ammunition in later "blamestorming" meetings.

    We can hope change requests won't happen by attempting to design as long as possible up front.  Elaborate UML diagrams create the illusion that if we design everything before our fingers hit the keyboard, we won't have to worry about analysts asking for change.  They had their chance, right?

    Again, phase-based approaches lead to contentious (and even bilious) relationships between owners of each phase.  By assuming that all decisions can be made in the design phase, it's little more than hoping that changes won't get requested during development.

    Accept and embrace

    In iterative and incremental development, such as XP, changes are accepted and even embraced.  Because all assumptions are shattered the moment the user uses the software for the first time, your process should take this into account.  By allowing regular feedback, changes can be introduced, implemented and scrutinized in a very short time-frame.

    It's not all wine and roses of course, as feedback only surfaces problems, not solve them.  But instead of ignoring or rejecting changes, you have a regular mechanism for dealing with them.  Our team deals with changes by writing every task or story on a card.  If it's on a card, it can be estimated and prioritized.

    Since we use cards, a change in a story takes as much effort as ripping up the incorrect story card and writing a new one.  The less friction our process documents change requests, the more the business will be encouraged to explore new ideas.  Nothing's worse than a hulking software requirements tool that stifles imagination through the tedium of managing lists of requirements.

    Flexibility and Control

    Since change is inevitable, we want to introduce a system that allows us to introduce change, as well as garner feedback on the effectiveness of these changes.  But we don't want too much change, as this will introduce unnecessary chaos and churn.

    The trick is to gather and respond to enough feedback, but not too much where the team is overwhelmed and is unable to deliver.  We want to be flexible to handle feedback from a variety of sources but control over when and how we receive it.

    So how does a team perfect flexibility and control?  With more feedback of course!  Through regular Scrum retrospectives, a team can reflect on how they're delivering, making small tweaks along the way.  In the end, we want a system that can handle and respond to change to maximize the business' return on investment.

    Posted Apr 07 2008, 08:11 PM by bogardj with 3 comment(s)
    Filed under:
More Posts Next page »
Copyright Los Techies 2007. All rights reserved.
Powered by Community Server (Commercial Edition), by Telligent Systems