As i was just releasing a small patch to our application that uses POI, i noticed a bunch of errors at startup. Of course, I immediately thought i changed more than I thought and began combing thru the code to see why i was getting this error. The stack trace was pointing to an error in the code of POI, which had a index out of range on something that clearly was not related to anything i had changed in our code in a LONG time. But why was this error happening and why now?
After searching the web and coming up empty on this error, I began looking more carefully and realized that the data structure at hand is one that is statically initialized on demand. What that means is that it is one of those static functions that when called, checks to see if the given data structure is built and if not, builds it then. Well, that’s nice but in our multithreaded application, we had two threads simultaneously trying to build an Excel spreadsheet so one of them got thrown out due to this static initialization that was probably in progress.
The code looks as follows:
public static List getBuiltinFormats()
{
if ( builtinFormats == null )
{
populateBuiltinFormats();
}
return builtinFormats;
}
So we have a static function using the builtInFormats. The actual function populateBuiltinFormats is synchronized but that just means that it cannot be invoked simultaneously, but if is partially populated then the second thread will think it is complete!
Since this is third party code, the simplest solution is at startup of my main to force the initialization of this static data structure, so i invoke HSSFDataFormat.getBuiltinFormats(); and voila, the issue has been solved.
As a Spring evangelist, I would point out that if you use Spring for functionality like this, and then in an init function that Spring uses, you will have the data populated. You could also just have it statically created and not conditionally. Otherwise, you would need to build the internal data structure to a temporary variable and only assign it when complete. Of course, that will mean a second check in the function populateBuiltinFormats to see that if by the time you got in, the data is already built, or maybe only allow this function to run once.