Dealing with large code bases

If you haven’t already, be sure to read Steve Yegge’s post on his experience of dealing with large code bases (it’s a bit long post but an excellent read).

He explains the maintenance nightmare of his 500.000 lines of code game written in Java and quite correctly concludes that code size is a crucial problem for big software projects.

My minority opinion is that a mountain of code is the worst thing that can befall a person, a team, a company. I believe that code weight wrecks projects and companies, that it forces rewrites after a certain size, and that smart teams will do everything in their power to keep their code base from becoming a mountain. Tools or no tools. That’s what I believe.

I know this seems too much at first, but it’s actually quite common for projects that are developed over the few years period. Especially when you have a team that changes team members (even juniors) on regular basis, fast development pace and limited amount of resources for project management (aren’t there always?).

Steve goes then on explaining how, from his point of view, refactoring doesn’t help much in this case since it will only enlarge a code base and actually make things even worse. I don’t see it that way, since a better organized code is definitely more readable and easier to maintain. The other problem I often see with refactoring is team’s reluctance to perform them for two reasons: laziness and fear. Both of these are psychological and related with “if ain’t broken don’t fix it” approach. So the team see the part of the system as “not ideal” but working and usually dedicates resources to “new features” and customer demands.

The “fear” part means that people (in general) are not eager to take responsibility for doing risky modifications which can possibly break the whole system. I find this especially hard in cases when modifications include data layer (database schemes) and a huge amount of “live” data. This kind of “refactoring” impose a big risk on the current system, since you have one chance to do it right, it can be very time consuming and if something goes wrong it’s hard to go back (if it is possible at all). Badly designed database schemes tend to live in systems for the long time, so be sure to put an extra effort trying to make them as good as possible in the first place.

In the end, Steve decided to reimplement the whole application using Rhino (JavaScript for the JVM) and try to keep the code base under the 200.000 lines of code (which is by some theories an upper boundary of how much one developer can handle).

So taking for granted today that VMs are “good”, and acknowledging that my game is pretty heavily tied to the JVM – not just for the extensive libraries and monitoring tools, but also for more subtle architectural decisions like the threading and memory models – the rational answer to code bloat is to use another JVM language.

One nice thing about JVM languages is that Java programmers can learn them pretty fast, because you get all the libraries, monitoring tools and architectural decisions for free.

While I agree that JVM languages are one of the greatest advantages of JVM as a development platform, I would take Ola Bini’s side on this matter, who says in his post

Now, if I would have tackled the same problem, I would never reimplement the whole application in Rhino – rather, it would be more interesting to try to find the obvious place where the system needs to be dynamic and split it there, keep those parts in Java and then implement the new functionality on top of the stable Java layer.

That’s exactly why I think dynamic languages and frameworks (APIs) for their integration in Java should be an important tool in every developer’s tool box. There are places in your application where both programming approaches makes more sense over the other and the possibility to combine them to make a better architecture of your project can only be a good thing. And of course, it would improve maintainability of your projects as well, both in code size and ability to make appropriate refactoring and improvements while you go.

One Response to “Dealing with large code bases”

Leave a Reply