Going Offline

26/06/2008

Recently there has been an increase in interest in building offline capable web applications. I’ve faced the offline use-case a few times in the past, I thought I’d try to jot down some of my experiences from these excursions. I’ve dealt with offline use cases in the following domains:

In the data centre, enabling J2EE applications to deal with offline mainframes and other legacy data sources/sinks. These resources might be offline due to failure, scheduled maintenance, or because their capacity is consumed with running batch jobs.
In the branch network where connectivity is lost to the data centre.
For mobile applications where some degree of disconnectedness is more or less a given

For many of us its hard to think of a world where our applications are not able to reach the data or systems they require to function, we live in a world where networks are almost omnipresent and the availability and bandwidth of networks is ever increasing. But the reality is that connectivity is not always present and Murphy’s law tells us that connectivity will most likely be lost when we need it the most. Businesses must manage this risk and in many cases that means when building networked applications they must also build functionality to enable those applications to function in the face of diminished or absent network connectivity. Given that most applications today are web applications, there is consequently an interest in enabling applications deployed on the web to function when offline. It might be helpful to try and classify applications a little in terms of how they use remote resources.

Class A

Some applications only allow a user to retrieve data from a remote system, they never allow the user to modify the state of the remote system. Naturally these read-only applications are the easiest to enable to function while offline, it is a matter of pre-caching the remote data locally so it is still available if connectivity fails. The challenge here is to determine what subset of data on the remote system is appropriate to store locally. If the application contains sensitive data then an additional challenge is to apply proper authentication and authorization while offline to ensure that data is only accessible to those with the correct authorizations. This can be a thorny problem, it may require the business rules for authentication and authorization to be available locally which will also require the authorization meta data (users, credentials, roles, privileges) to be locally available. Class A applications are quite rare, most applications enable their users to create and update data in the system. Once users are able to update data, the challenges of providing an application that functions offline increase. I would classify applications that allow users to update data into two subclasses, firstly (Class B) those which allow updates which have no side effects, and secondly (Class C) those which allow updates that do have side effects. An example of the former might be a simple HR database that allows users to change the values of fields in records, create and delete new records. An example of the latter might be an e-commerce system where the consequence of changing a single field value might trigger a complex workflow that results in the value of many other fields in the system changing.

Class B

So to take a Class B application offline we just need to enable the state of data records cached locally to be changed and additionally create a queue of requests representing these local changes to be sent to the server whenever connectivity is restored, right? Of course its not that simple, the challenges raised are manifold, firstly since we are now allowing users to input data, we need to validate that input; all the business rules around data validation need to be available locally. Secondly when dealing with creation of new records we may have to assure primary keys are globally unique. When connected to a central server the server decides on the primary key and can thus guarantee uniqueness, when offline, disconnected nodes have to choose a key themselves, how can they be sure they won’t choose the same key as another node? There are many ways to deal with this problem, partitioning the key space is one solution, another is for the local node to choose a key unique to that node and exchange it for a globally unique key generated by the server, when the request is finally accepted by the server. another is to use UUIDs. Thirdly we have to deal with concurrency, if a node makes a change to a record while offline and another connected node makes a change to the same record before the offline node reconnects, what happens to the offline change? Should the server attempt to merge the offline change or reject the change out of hand? How should the application inform the user that the change they made (likely some time ago) has been rejected? What UI paradigms do we provide the user to help them see their rejected changes and enable them to resolve the conflict?

Class C

To take a class C application offline we are faced with the prospect of having to make a significant portion of the business logic found on the server available locally so that the application continues to function correctly while offline. In addition since there is likely to be only a subset of the data and business logic available (and of course no access to any legacy systems that the application might need to interact with), it is very likely that the business logic will need to enhanced to deal with missing data and to otherwise take account of the system’s disconnectedness. One key variable that can have a large impact on offline enabled applications of this kind is time. Imagine a banking transaction that attracts interest, performed offline, what time do we calculate interest from? The time the customer performed the transaction or the time the offline request was received by the remote server? Naturally the former would be the correct time, but if a system was only ever designed with online operation in mind then its likely that the latter is what will actually happen. The key takeaway is that offline enabling applications can (and likely will) require changes to the server as well as the client.

Engineering Challenges

So all of the challenges presented above are well understood and there are means to deal with them all, but one challenge I feel is under-appreciated is the impact on the engineering life-cycle that offline enabling an application has. Any feature that you enable to work offline will likely require double the effort. Your design will have to take into account how it can operate in the face of limited data, and limited processing capability, this will likely lead to two largely separate code paths, one for online operation (which will run on the server) and one for offline operation (which will run on the client). Naturally you will need to test both these paths, so your testing effort is doubled, in fact it will more than double, you’ll need to test integration scenarios where some processing is done offline and some is done online to get full coverage. Every time during the lifetime of the application that a defect is uncovered in the feature or an enhancement is added to the feature, the whole cycle will need to be repeated for both the online and offline paths. One approach to mitigating this risk would be to move all of the business logic processing to the client and only have the server perform long term storage of data and manage concurrency, this means the business logic does not have to be built twice. Effectively this turns a Class C application into a Class B application, however this can only work when the server can trust the client not to send bogus or incorrect data.

Closing Thoughts

At a high level, offline enabling a web-application seems a reasonably straightforward challenge, a matter of caching data locally and storing requests locally until the server can be reached sometime in the future. As with many programming problems however this is just the tip of the iceberg, beneath the surface lie many interesting and challenging problems to be conquered. In the face of all these challenges it is easy to throw ones hands up in despair and choose not to address any of these issues, instead working harder to reduce the risk of disconnectedness. In many scenarios that is the sane approach, it may well be cheaper to invest in better connectivity than to expend the engineering effort to make the application robust in the face of disconnectedness. On the other hand there are also many cases where it is not possible to improve connectivity and so the effort has to be expended to make the application handle disconnection. Of course there is also the possibility of switching to a non computer based process when connectivity is lost. A pen and paper may well be adequate to deal with a period of disconnection. I have focused on the challenges that offline enabling an application present, but I would not wish to give the impression that building an offline enabled application is an especially onerous or even a futile endeavour. In fact I think we are just at the beginning of an age where web-applications do more and more processing on the client and chief amongst that will be the processing required to enable offline operation. Emerging technologies such as Gears and the work being done by the HTML5 and WebApps working groups will lower the bar to creating these kinds of applications and drive innovation in this space.