Thursday, November 10, 2011

GHCN-M v3.1.0 – showing the value of engaging with software engineers


NCDC have just released version 3.1.0 of the GHCN product, detailed in a tech note, as documented in the dataset paper of their global Land Surface Air Temperature product – the Global Historical  Climatology Network Monthly. This release does two things.

Firstly it incorporates an array processing algorithm that significantly speeds up the processing which will enable NCDC to process the much larger databank holdings upon its first version release in early-to-mid 2012 to form a yet more comprehensive estimate of the global Land Surface Air Temperature evolution.

Secondly, and the focus of this post, is that it incorporates a set of five process bug fixes, four of which were discovered in the homogenization algorithm as a result of an effort undertaken by Daniel Rothenberg sponsored by the Google Summer of Code and mentored by the Climate Code Foundation. The final bug was discovered as a result of carefully checking for similarly based bugs which essentially related to array compression / non-compression for missing values on passing between routines. That bugs exist in what is several thousand lines of code is hardly surprising. In fact it would have been far more surprising if it had been discovered that there were no bugs. Daniel visited NCDC as part of his project and the bugs were discussed at length with relevant NCDC staff and fixes have subsequently been undertaken, extensively validated, and their impacts on the analysis documented.

The bottom line impact on the global mean trend is a difference of less than 0.002K/decade – below the typically quoted global mean estimate precision of 2 decimal places and two orders of magnitude less than the reported centennial scale global-mean Land Surface Air Temperature warming rate from this dataset. Equally global annual means show negligible differences. Differences at the station level are almost always below 0.2K/decade with effectively zero mean change. So, whilst the bug fixes were important from both a science and process perspective they do not significantly alter our current understanding of changes in climate at the largest space and longest timescales.

What this does provide is an example of the very real potential value in openness and transparency, in code replication, and in working in positive partnership to resolve the issues that arise. Daniel aims to continue working on his port of the algorithm to python and it will be of great interest to see what other benefits may accrue.

NCDC have released the old (v.3.0.0) and new (v.3.1.0) versions of the homogenized data (in frozen form) and other relevant metadata (with ongoing additions) at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/archives/.