Asynchronous job execution

In software development we are often encouraged to decouple things in order to achieve greater flexibility in our code. So we decouple model objects from views, interfaces from their implementation, etc.

Here, I would like to write about different kind of decoupling, front-end from back-end. In many cases, projects tend to depend on resources that are either unreliable or overloaded. For example, besides your “main” database, you may need to write data to some LDAP server that is overloaded or you need to provide some data to the resources over the unreliable network connection. In most of these use cases, these operations are triggered by users through the user interface (web or stand-alone application). It is totally unnecessary (and usually unacceptable) to have your users wait a few minutes for operation to complete or in worse case to have dysfunctional system just because one resource is unavailable.

Queuing in one of the most common “patterns” in information technology and it is one of the patterns that could be used to solve problems described above in a natural way. You can, for example, identify potentially unreliable and slow operations and implement a kind of “job queuing engine” that will queue and execute those jobs in the batch. Of course, it is only applicable to actions that does not need to return data to the user immediately, which are in most cases insert/update operations of various kinds.

In the past, I often used Quartz scheduler to emulate this functionality which provided a solution that worked but was not easy to scale and maintain. For one of the recent projects I decided to “start fresh” and create an open-source engine (JQR) that will serve this purpose for my future projects (and could be potentially helpful to others with the same requirements). I based it on ActiveMQ open-source broker, since it provides a natural mechanism of queueing messages, has built-in many utilities that I needed and proved to be stable in earlier projects. Also, it provides great infrastructure to build upon and possibilities to create a real distributed job execution engine.

The basic principle of JQR is quite simple. You implement your job by defining code for four basic events:

  • When job is received for queuing
  • When job is executed (the main logic of the job)
  • When job finishes it execution successfully
  • When job fails for any reason

The job implemented like this is then deployed to the system (by its name and group) and could be triggered from any appropriate client (implemented in a language of your choice). Since the job will be queued and executed in the batch, client does not wait for the job to be executed. Instead it can proceed to its other activities. When finished (successfully or not), the job will execute appropriate actions, and could notify the client of its result (by calling some URL via HTTP for example). Of course, appropriate parameters are transferred between client and JQR, which provides the context for the job execution.

You can also, define a number of retires (job execution failures) before the job is declared as failed and send to the “failed jobs queue”. The usual scenario is to send a notification to the administrator who will inspect reasons for job failure and when right conditions are met trigger job again or delete it if he finds it appropriate.

Basically that would be the whole philosophy of the JQR, which could be summarized as: do this job for me and just let me know when you finish. It is not applicable for all problems, but there is a lot of problem domains where this approach is natural fit (just as in real life).

I must say that project is currently in the “infant” phase, meaning that is has implemented minimum requirements that I need at this moment. It contains a PHP client which communicates with the engine through the REST protocol and contains a few predefined jobs that could be used “off the shelf”. But the most important thing for this kind of project is to be easy to configure, maintain and use and I think that it is headed in right direction so far. It is easy to set basic configuration and start/stop the engine and it is fairly easy to develop and deploy new jobs for it.

In the future, there is a lot of things that should be added in order to make it easier to use and more useful. Some of the planned features are:

  • full support for most of the scripting languages both on the client and server (job writing tasks) sides. It would make it more easily to create new jobs and maintain existing ones and it would make it useful for projects written in different languages.
  • more different protocols both for job manipulation and global management, which could ease integration in different projects. Of course, a full-featured console for management of failed jobs should be integral part of the project.
  • access control for all operations
  • job scheduling (both dynamic and static) which would leverage this engine to the next level providing a full job manipulation engine for most scripting languages (and available instantly)
  • … many other stuff.

I intended this post to be just a starter of the conversation on this “usually overlooked” topic. I want to hear whether other people have similar requirements and what were their solutions? What a project like this should have? All constructive comments are welcome.

Leave a comment

Your email address will not be published. Required fields are marked *