Wednesday, August 25, 2010

Beware of Static Initializers (even in 3rd party code!)

As i was just releasing a small patch to our application that uses POI, i noticed a bunch of errors at startup. Of course, I immediately thought i changed more than I thought and began combing thru the code to see why i was getting this error. The stack trace was pointing to an error in the code of POI, which had a index out of range on something that clearly was not related to anything i had changed in our code in a LONG time. But why was this error happening and why now?

After searching the web and coming up empty on this error, I began looking more carefully and realized that the data structure at hand is one that is statically initialized on demand. What that means is that it is one of those static functions that when called, checks to see if the given data structure is built and if not, builds it then. Well, that’s nice but in our multithreaded application, we had two threads simultaneously trying to build an Excel spreadsheet so one of them got thrown out due to this static initialization that was probably in progress.

The code looks as follows:

public static List getBuiltinFormats()
    {
        if ( builtinFormats == null )
        {
            populateBuiltinFormats();
        }
        return builtinFormats;
    }

So we have a static function using the builtInFormats. The actual function populateBuiltinFormats is synchronized but that just means that it cannot be invoked simultaneously, but if is partially populated then the second thread will think it is complete!

Since this is third party code, the simplest solution is at startup of my main to force the initialization of this static data structure, so i invoke HSSFDataFormat.getBuiltinFormats(); and voila, the issue has been solved.

As a Spring evangelist, I would point out that if you use Spring for functionality like this, and then in an init function that Spring uses, you will have the data populated. You could also just have it statically created and not conditionally. Otherwise, you would need to build the internal data structure to a temporary variable and only assign it when complete. Of course, that will mean a second check in the function populateBuiltinFormats to see that if by the time you got in, the data is already built, or maybe only allow this function to run once.

Tuesday, August 17, 2010

Tips For Maintainable Code

Due to a need of late to maintain a lot of code (where of course not all of it has been written by me), I have been able to give a lot of thought to which tips I have read about in the past that strike me as the most important to have maintainable code. In many scenarios, companies simply rewrite their applications every few years and thus this issue may not be relevant but in my case where the lack of manpower makes this impossible, (and the application works so well there is no business justification for it), it has led me to think carefully how to set up a project and practices to have in place so that an application is maintainable and customizable in a reasonable way. Much of what i will be listing is based on the book Clean Code which I already blogged about previously here.

1. Keep functions short and simple – and if you need to add in more functionality, refactor it into a new function that clearly defines in its name what it does. In the book Clean Code, the insistence is that all functions do exactly one thing.

2. Building on number 1, use small functions as a way to comment “funny” things you need to do. Sometimes, we have small things we needed to do – set some parameter to some funny value or the like and the person who makes this change may or may not comment this, since at the time it was so obvious and it may not even make the comment for the check in, since many files were checked in simultaneously, but if you get in the habit of creating a small function, such “nullOutParameterSoThatRequestHandlesCorrectly”, this comment is forever part of the code and will not get lost. It also means that when someone comes along later and sees it he can be sure it is there on purpose and not something wonder – it seems to me that removing this line is ok but i am not sure…

3. Comprehensive Unit Tests. This is a great way to document your code since we can see the kind of input you expect to receive for a given function and how you expect to handle it. As has been written about extensively on the ‘net, when we see these unit tests, we have greater confidence to make changes since we have a baseline to evaluate our changes. One other advantage of the unit tests, is that if you added a new class with dependencies, and you are using Dependency Injection, like Spring, even if the code never made it to production we have a working example to build off for later use. This actually happened to me, where code was written and not released, but lacking an example of how to set up this class in my Spring application context led to my lack of confidence in using this class.

4. Regression Tests. While Unit Tests are imperative, So are regression Tests. These are automated end to end tests so that we have a baseline of how the application works, in dailsychaining multiple functions together, what data was passed from beginning to end and what was received. I have used TestNG to make up regression tests due to the amazing features of paramterized testing, enabling to set up use cases that run multiple times with different inputs. (I hope to write more about this in an upcoming blog)

5. use the Source Control and DONT save pruned code. Remove it. Don’t worry about it since you can get it from the source control. It just creates confusion.

Remember, that YOU may be the one left to maintain the code so its in your best interest to get it right when developing it…