The Transparent Persistence Paradox

Introduction

In the beginning, code was written sequentially. Developers spotted repeated sections of code, and created sub-routines for this code; code was then written procedurally. A method emerged of combining methods with the data they worked on; object-oriented programming was born. Object-oriented programming is now widely regarded as one of the best methods of writing code.

A lot of these programs stored information, and a relational database has become the de-facto standard (largely due to vendor lock-in, rather than suitability) for facilitating this.

Developers recognised that object-oriented programming techniques have an impedance mismatch with relational database technology, and started to produce inventive methods of combining the two. Most (if not all) of the design patterns used to implemented this are discussed in [Fowler]. This problem is complex, and many frameworks/tools have emerged to help developers handle this: Jaxor, NEO, Hibernate, NHibernate, ... (the full list is getting quite long now).

This text makes use of the following references:

[Fowler] - "Patterns of Enterprise Application Architecture", Addison-Wesley, 2003 (a summary of the main patterns is available online at http://www.martinfowler.com/eaaCatalog/)
[Richardson] - "Speeding Up J2EE Development and Increasing Reusability Using a Two Level Domain Model"

The Ideal Target

[Fowler] describes the three main ways of coding domain logic as Domain Model, Table Module, and Transaction Script. The Domain Model is the object-oriented way of coding your application, and is the method that this text discusses.

Once you have a Domain Model, you want to be able to persist your objects. Again there are different ways of doing this; for example Active Record makes each object responsible for persisting itself. My preference is to use a Data Mapper, and this is the view that most of the 3rd party frameworks take. The primary reason for using a Data Mapper is to keep persistence logic out of the Domain Model.

The ideal target then is a 'pure' object-oriented Domain Model - that is, the Domain Model is ignorant of the fact that the objects are persisted. Most of the work in developing an object-oriented persistence framework is in making the persistence transparent.

Sitting on top of the Domain Model is often a Service Layer, but this should not contain any business logic or you risk going down the road of an Anemic Domain Model. Instead the Service Layer performs application logic such as logging, transaction management, exception handling, etc. The Service Layer is aware of object persistence.

Consider a simple Domain Model consisting of a Plant object (as in a manufacturing plant), and a Part object. A Plant contains a list of Parts that they deal with.


class Plant
{
    PartList    m_partList;

    void AddPart(Part part)
    {
        m_partList.Add(part);
    }
}


class Part
{
    String  m_number;
    String  m_description;

    ... accessors for number and description ...
}

Incidentally, I am a C# programmer, so the examples are in C# - but you should hardly (if at all) notice the difference between this and Java. I do not know of a .NET or C# equivalent term, so until someone comes up with another name, the domain objects I refer to are POJOs!

The Service Layer function for adding a Part to a Plant might look like:


class PlantService
{
    void AddPart(   Plant   plant,
                    Part    part)
    {
        PersistenceLayer storage = new PersistenceLayer();
        storage.BeginSession();

        Plant storedPlant = storage.LoadPlant(plant);
        storedPlant.AddPart(part);

        storage.Commit();
    }
}

This code explicitly loads the persisted version of the Plant object to make changes to it. It is possible to have all of your Domain Model resident in memory, but this is not usually practical in most applications. Instead the persistence layer allows you to load up a stored object using a Data Mapper. Again, this is the approach taken by most of the 3rd party persistence frameworks.

I am missing all the detail of the work that is done for you behind the scenes. Patterns like Unit of Work and Lazy Load described in [Fowler] would be used to magically load up connected parts of the Domain Model as they are accessed, and persist the changes to the objects. Equally, this code would look similar whether you use a homegrown persistence layer, or one of the developing frameworks mentioned above.

This all looks simple so far, but lets add a simple business rule:

A Simple Business Rule

As [Fowler] states, "at its worst business logic can be very complex", but let's consider a very simple business rule. The simple rule is that a part number should be unique within a Plant.

I have chosen a very simple rule, but any arbitrary business rule that compares an object with all others in a range would have done (e.g., only 20% of parts in a Plant are allowed to be made of wood). However, a common application of this rule that occurs in Enterprise systems is that entities are named, and the names have to be unique in the system (often for presentation purposes).

So now our Plant code might look something like this:


class Plant
{
    PartList    m_partList;

    PartList GetPartsWithNumber(String number)
    {
        PartList filteredList = new PartList();
        foreach (Part part in m_partList)
        {
            if (part.Number == number)
            {
                filteredList.Add(part);
            }
        }
        return filteredList;
    }

    void AddPart(Part part)
    {
        if (GetPartsWithNumber(part.Number).Count > 0)
        {
            throw new ApplicationException("Part is a duplicate");
        }

        m_partList.Add(part);
    }
}

That was fairly painless, and it works fine. Unfortunately, as much as the code is correct, and works, there is an obvious scalability problem.

If we know that there may be a lot (hundreds, thousands, millions) of Parts in a Plant, then although the above code works, it might cause a huge performance problem. A typical persistence framework will use a Lazy Load to get the Parts list whenever it is accessed, and pulling all the Parts from the database is unacceptable.

There is an obvious way to solve this problem. Databases and persistence frameworks allow us to write customised code to aggregate the stored information. For example, we may be able to use the following SQL to determine how many Parts exist with the same part number:


SELECT COUNT(1)
FROM PARTS
WHERE PLANT_ID = '%1'
AND PART_NUMBER = '%2';

A persistence layer may also provide a more abstract object-based query that can be performed, without explicit SQL knowledge. But both of these solutions, while they work, are fundamentally flawed, since we started with the premise that the Domain Model must not be aware of persistence.

It seems as though we have a paradox. The obvious way to solve the problem is to access the database (or persistence layer) directly, but our initial design principal was that the Domain Model would not know anything about persistence. The Domain Model, however, seems to be exactly the layer that needs to know about persistence. Although the persistence is transparent, the developer needs to know exactly what is going on behind the scenes, and suddenly it is no longer transparent!

The following sections describe three solutions to this dilemma.

Solution 1 - Using "Separated Interface"

[Fowler] states "on occasion you may need the domain objects to invoke find methods on the Data Mapper", and goes on to say "you don't want to add a dependency from your domain objects to your Data Mapper. You can solve this dilemma by using Separated Interface."

[Richardson] demonstrates an implementation of the Separated Interface by way of a 'domain manager'.

The code for the GetPartsWithNumber function could then use a factory to get the appropriate implementation.


class Plant
    ...
    PartList GetPartsWithNumber(String number)
    {
        DomainManager domain = DomainManager.GetManager();
        return domain.GetPartsWithNumber(number);
    }

In the non-persistable version of events, the implementation of the domain.GetPartsWithNumber will look the same as the original. If the factory returns the persistable version, then the implementation might use a filter on the Data Mapper to only pull back the required objects.


class PersistableDomainManager
    ...
    PartList GetPartsWithNumber(number)
    {
        return storage.LoadParts(new PartFilter("Number", FilterType::Equal, number));
    }

Solution 2 - Using Persistence Hooks

Persistence frameworks typically do some magic behind the scenes to make you think that your domain objects don't know anything about persistence. But in order to seamlessly Lazy Load your objects, they typically swap in a different object at runtime without you being aware. Some frameworks use things like Reflection to create objects dynamically at runtime, while others use code generation to generate the 'persistent version' of the domain objects.

In either case, it is possible to intercept the generated code and provide your own implementation. So the persistable version of the GetPartsWithNumber could have an implementation that uses the SQL above directly on the database, or a provided filter on the Data Mapper (as per the example in solution 1).


class PersistablePlant
    ...
    PartList GetPartsWithNumber(number)
    {
        return storage.LoadParts(new PartFilter("Number", FilterType::Equal, number));
    }

It is worth pointing out that, although it looks to the Service Layer that we are using the 'pure' object-oriented Domain Model, the reality at runtime is that the real objects get invisibly replaced with the persistable ones.

I do not know the details of which frameworks support this - I am merely pointing out that it can be done. It could be a homegrown persistence framework you are working with, which gives you the freedom to implement this yourself.

Solution 3 - An Intelligent Lazy Load

It is possible to make the above code a little more generic, by refactoring the filtering of the list into the base class for all list objects. If all lists support basic filtering functionality, then the GetPartsWithNumber might look like the following:


class Plant
    ...
    PartList GetPartsWithNumber(String number)
    {
        return m_partList.Filter(new PartFilter("Number", FilterType::Equal, number));
    }

We can then make our Lazy Load a little more intelligent. In the same way that the m_partList member variable has been replaced with a 'persistable' version without us knowing, the same could be true of the filtered list.

It is then possible to implement the 'Count' method on the generic persistable list to perform the custom SQL without having to load all the Parts into memory.

Summary

I do not like the first 2 solutions, since they rely on the developer to produce two implementations of the Domain Model code. This gives them double the chance to introduce a bug. Moreover, I believe the maintainability of the system is reduced. If the business rule changes, then the developer might change one implementation, but be unaware of a second.

If a 3rd party framework is being used, and it doesn't support the 'hooks' described in solution 2, then solution 1 might be your only option.

I prefer solution 3 since the implementation of the domain logic is in one place. The key difference with solution 3 is that the persistence framework has the responsibility of determining that is only needs to find the number of parts with a specified part number. If the domain logic is only coded in one place, then the chance to introduce a bug is reduced and maintainability is increased.

I have expressed a preference to solution 3 above, but I do not know if this is well supported in the existing 3rd party frameworks. I would be very interested to hear how people writing real-world applications are solving the dilemma of transparent persistence in a Domain Model.

2 Comments:

At September 24, 2004 at 2:49 AM, Anonymous said...: Hmm, maybe you've managed to program around this problem, but what about if you want to find a domain object from another one, or to create a new unattached domain object from within a method of another one? The important thing here, imho, is where you draw the line between the service layer and the domain model layer ... you can easily put the duplicate check in the service layer and forget about the problem described ... Maybe the choice is between transparent persistence+anemic domain model or non-transparent persistence+richer domain model.

Br,
Deyan Petrov
At September 24, 2004 at 7:48 AM, Anonymous said...: While it doesn't speak to relational databases as an object store (directly) there's also the "pseudo-transparent" persistence of OODBMSs like, say, FastObjects. In this case, the developer writes pure domain classes, decorates them with relatively sparse metadata to describe persistence (I still consider this to be absent persistence logic) and then a post processor "rewrites" the classes at compile time to graft on the persistence logic. Transactions are provided by a service layer as described. Thus, at development time, the developer never needs to think about persistence logic to any degree, the service layer provides advanced querying capabilities, and even after the fact, you can still access all the members of the domain model using the original unprocessed semantics. The interesting part is that FastObjects does have an "alternate" back-end that will use a relational database as an object store.

The problem that I've seen with nearly every object store built on a relational database is that it pretty well ignores the whole reason you're using a relational database as an object store in the first place, that being that you have some lock in that prevents you fron using some better solution. And yet, the whole idea of lock in is that you presumably have other software written against this datastore. And therein lies the problem. I know of no reasonably performant "RDBMS as Object Store" solution that isn't suceptible to problems if the data in the object store is changed "behind its back." So really the question that I was left with was "Why, if it doesnt really buy me anything, am I using O/R mapping?"

At any rate, I agree that persistence is much less useful if the persistence logic infects the domain model. My answer was to go to an OODBMS. I wish it hadn't had to be because it's been much more expensive cost-wise than the open source framework, but the speed of development and maturity of the Service layer has just blown me away.

Ian