averaging gridded NetCDF files

14 posts / 0 new
Last post
MattiasKNMI
averaging gridded NetCDF files

Hey!

I want take a mean of some daily mean of the bias in gridded NetCDF files to make a monthly or yearly means.

I tried the following for the example of yearly mean. The daily mean files are numbered per day and are all in the same directory.

cis eval bias:dailymean_* 'numpy.mean(bias)' 1 -o yearlymean.nc

it reads the correct files and it chooses the right product but gives the following error:

ERROR - An error occurred retrieving data using the product NetCDF_Gridded. Check that this is the correct product plugin for your chosen data. Exception was InvalidVariableError: failed to merge into a single cube.

It seems that cis has trouble merging the identical cubes of the different files and I don't understand why this is the case

Thanks in advance!
Mattias

duncanwp
Re: averaging gridded NetCDF files

Hi Mattias!

Hmm, OK. I think what might be going on here is that the python library we use for handling gridded datasets (Iris) is a bit reluctant to merge datasets with differing attributes. This can lead to some unexpected behaviour: http://scitools.org.uk/iris/docs/latest/userguide/merge_and_concat.html#....

For a couple of the models I work with I've created specific plugins which remove attributes which might cause this to happen (the 'history' attribute in particular). So this might be a solution.

Something as simple as this should work:


class IgnoreHistoryPlugin(NetCDF_Gridded):
"""A simple plugin which implements a callback to pass to iris when reading multiple files to allow correct merging """

@staticmethod
def load_multiple_files_callback(cube, field, filename):
cube.attributes.pop('history')
return cube

Otherwise you could use one of the NCO tools to remove the offending attributes.

I hope that helps, let me know if not!

duncanwp
Re: averaging gridded NetCDF files

OK, the form has eaten the whitespace in my code, but hopefully it's readable!

MattiasKNMI
It is readable!

It is readable!

I just checked the documentation on plugins, because I have not had to make any yet. I am a bit confused though on how to exactly implement the code you gave me. I understand what the code does, but not how I can enter use this within command line.

When I make a py file with this code within the dir I'm working from and then fill in product=IgnoreHistoryPlugin, how does CIS know where to look?

Mattias

duncanwp
It is readable!

Great, you're nearly there. You just need to set the CIS_PLUGIN_HOME environment variable as described here: http://cis.readthedocs.io/en/stable/plugin_development.html#using-and-te.... I'll try and make that section a bit more prominent in the future!

I've just realised you'll probably have to import the built-in plugin at the top of your python file by adding this at the top:
"from cis.data_io.products import NetCDF_Gridded"

CIS should be in your path when the plugin is read wherever you put it.

Let me know how you get on.

MattiasKNMI
I tried two things to fix the

I tried two things to fix the problem and unfortunately neither worked.

1) I made a plugin in python with the code you gave me
in my bashrc I added the statement export CIS_PLUGIN_HOME=[location of plugin]
I then ran cis eval bias:dailymean_*:product=IgnoreHistoryPlugin 'numpy.mean(bias)' 1 -o yearlymean.nc in the command line.
I got the following error: Product cannot be found for given file. (followed by all the available product)
So it still will not find the plugin even though I wrote a path towards it.

2) I tried removing the history attribute manually with ncatted. This did the trick of removing the value of history, but when I ran the eval code again cis raised an error about the time variable:

An error occurred retrieving data using the product NetCDF_Gridded. Check that this is the correct product plugin for your chosen data. Exception was InvalidVariableError: failed to merge into a single cube.
Coordinates in cube.dim_coords differ: time..
2016-06-07 11:50:10,172 - ERROR - An error occurred retrieving data using the product NetCDF_Gridded. Check that this is the correct product plugin for your chosen data. Exception was InvalidVariableError: failed to merge into a single cube.
Coordinates in cube.dim_coords differ: time.. - check cis.log for details

I don't understand that it tells me that the dimensions are different. All of the files are created by aggregating the ungridded data into gridded data with a total collapse of the time variable.

Thanks for helping me out with this
Mattias

duncanwp
Hi Mattis, thanks for the

Hi Mattis, thanks for the update, and sorry you're having this difficulty!

1) That's a little odd, does CIS list your product as one of the available ones? Are you sure you're using the name of the class as your product name - not the name of the file it's in?

2) Again, that does sound unusual. Have you looked at the attributes of the time variable in the NetCDF file using ndcump or similar? They must have all the same attributes across the files. If CIS made them from an aggregation then this should be the case...

My only other suggestion is that you actually create the monthly and yearly means directly from the ungridded data by specifying a time period, that might be easier?

Hope that helps!

MattiasKNMI
1) No, it doesn't show it. I

1) No, it doesn't show it. I wrote the class statement like you suggested: class IgnoreHistoryPlugin(NetCDF_Gridded):

2) The attributes and their values of time are the same, but I don' t see the cube.dim_coords anywhere in ncdump

I forgot to mention that this is what I tried in the first instance. This doesn' t work either. If I for instance want to make a mean from the first 9 days cis gives the following error:
ERROR - An error occurred retrieving data using the product cis. Check that this is the correct product plugin for your chosen data. Exception was IOError: Too many open files. - check cis.log for details
So it seems that cis can't handle trying to aggregate many files all at once. This is the reason I went for trying to first make the daily means and then work my way further.

duncanwp
1) OK, and the CIS_PLUGIN

1) OK, and the CIS_PLUGIN_HOME environment variable is definitely pointing to the folder which contains the plugin - not the plugin itself?

Ah OK - that's actually not a CIS issue, that's an operating system limit. Depending on which platform you're on you should be able to increase the limit, see e.g.: https://easyengine.io/tutorials/linux/increase-open-files-limit/, or http://krypted.com/mac-os-x/maximum-files-in-mac-os-x/

MattiasKNMI
It is pointing to the folder

It is pointing to the folder and not to the plugin python file

Oh, I didn't think of this. I'm sorry... I have to ask the IT guys to have the limit raised. I really hope this will work

MattiasKNMI
Little update from the IT

Little update from the IT department here: The limit on the amount of files that can be open at the same time is 4096. This is a hard limit, so they won't make this any different.

Also I was wondering about why it is necessary for CIS to have all 120000 files open at the same time for the operation. Isn't it possible for CIS to read in 1 file, close it and do this for all the files.

duncanwp
Hmm OK.

Hmm OK.

This is actually a design decision, because we only read the data when it is actually needed we keep a handle to the file open. This allows users to read only parts of a file at a time which is useful for really large netCDF files.

I'll have a think how we might change this when CIS is given thousands of files rather than a few very large ones. In the meantime are you able to split the command into a number of smaller ones?

Otherwise we'll try and get your plugin working: Could you email me a copy of your plugin file and a screenshot of the environment variable and the directory it's in? Just so I can rule things out! (duncan.watson-parris at physics.ox.ac.uk)

Thanks for persevering!

MattiasKNMI
I have sent you an email with

I have sent you an email with details.

What do you mean with splitting the command into smaller ones?

MattiasKNMI
I worked my way around the

I worked my way around the problem by using CDO. So I solved to original problem, but unfortunately not yet with CIS.

Website designed & built by OCC