.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "source/auto_examples/plotting/plot_groups.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_source_auto_examples_plotting_plot_groups.py: Working with netCDF groups -------------------------- This is an example about how to work with netCDF groups. Xarray does not natively read netCDF group files, but it does have the ability to read the data with a few independent calls. Author: Ken Kehoe .. GENERATED FROM PYTHON SOURCE LINES 12-123 .. rst-class:: sphx-glr-horizontal * .. image-sg:: /source/auto_examples/plotting/images/sphx_glr_plot_groups_001.png :alt: ESRL ML scattering_coefficient on 20200101 :srcset: /source/auto_examples/plotting/images/sphx_glr_plot_groups_001.png :class: sphx-glr-multi-img * .. image-sg:: /source/auto_examples/plotting/images/sphx_glr_plot_groups_002.png :alt: ESRL ML Scatter Coefficient from 2020-01-01 to 2020-01-04 :srcset: /source/auto_examples/plotting/images/sphx_glr_plot_groups_002.png :class: sphx-glr-multi-img .. code-block:: Python import xarray as xr import matplotlib.pyplot as plt import numpy as np from arm_test_data import DATASETS from act.io.arm import read_arm_netcdf from act.plotting import TimeSeriesDisplay # This data file is a bit complicated with the group organization. Each group # will need to be treated as a different netCDF file for reading. We can read each group # independently and merge into a single Dataset to use with standard Xarray or ACT methods. # Top Level: # data # # Groups: # - data # - light_absorption # - instrument # - particle_concentration # - instrument # - light_scattering # - instrument filename = DATASETS.fetch( 'ESRL-GMD-AEROSOL_v1.0_HOUR_MLO_s20200101T000000Z_e20210101T000000Z_c20210214T053835Z.nc' ) # We start by reading the location information from the top level of the netCDF file. # This is a standard Xarary call without a group keyword. Only the top level data is read. # The returned Dataset will contain all top level variables and top level global attributes. # We will use the direct Xarray call to read data since there is no time dimension at the top # level so we don't need ACT to do anything for us. ds_location = xr.open_mfdataset(filename) # Second, we read the 'data' group. This has a time dimension so we can use ACT to manage # correctly reading and formatting that information. We need to specify the 'data' group # to be read. This will read the data group but not the sub-group. ds_data = read_arm_netcdf(filename, group='data') # Third, we read the 'light_scattering' sub-group. We can read sub-groups # by using standard Linux directory path notation. Notice the light_scattering group is a # sub-group to 'data'. ds_ls = xr.open_mfdataset(filename, group='data/light_scattering') # We can also read the third group 'instrument'. This only contains a small amount of information # about the instrument used to collect the data for 'light_scattering' and has no # dimensions, only scalar values. ds = xr.open_mfdataset(filename, group='data/light_scattering/instrument') # Since the dimensionality aligns we can merge the three Datasets into a single Dataset. ds = xr.merge([ds_data, ds_ls, ds_location]) # Since the data file contains data for a full year we can subset the Dataset to be easier # to work with and process faster. ds = ds.sel(time=slice('2020-01-01T00:00:00', '2020-01-04T23:59:59')) # The data is written with dimensionality in reverse order from what ACT expects. We need to # reverse the dimensionality order. ds = ds.transpose() # Since we want to only plot the data from one of the cut sizes we can subset the Dataset # to only have data where the cut_size variable is set to a single value. This will reduce # the dimensionality from 3 to 2 or 2 to 1 dimensions on the variables. Take note that # the value of cut size is not the um value, it is the index. Since the index values # are 0 or 1, 0 = 1 um and 1 = 10 um. We are subsetting for less than 10 um. ds = ds.isel(cut_size=1) # netCDF has a default _FillValue that is used to indicate when the values are missing. There # is a known issue with float precision that does not correctly set the _FillValue to NaN # when reading. We need to set the obviously incorrect values to NaN for the data variables # that can have this problem. for var_name in ds.data_vars: try: data = ds[var_name].values data[ds[var_name].values >= 9e36] = np.nan ds[var_name].values = data except np._core._exceptions._UFuncNoLoopError: pass # This is a querk of reading the data. If we want to plot the day/night background correctly # we need to delete these global attribute. del ds.attrs['_file_dates'] del ds.attrs['_file_times'] # Since the data variable has a second dimension of wavelength and we want to plot each one as a # line plot, we will pass force_line_plot=True to force the second dimension to be removed from # the time series plot. We will want to set the labels to view on the plot by extracting the values # from the wavelenght dimension and pass into the plotting call. labels = [f"{int(wl)} {ds['wavelength'].attrs['units']}" for wl in ds['wavelength'].values] display = TimeSeriesDisplay({'ESRL ML': ds}) display.plot( 'scattering_coefficient', day_night_background=True, force_line_plot=True, labels=labels ) plt.show() # A second option is to extract the wavelength dimension form the variable and create a new variable # to plot. We are selecting the wavelength by using index so a value of 1 = 550 um. We need to # provide a new variable name not already in use and correctly describes the data. ds['scattering_coefficient_450'] = ds['scattering_coefficient'].isel(indexers={'wavelength': 0}) ds['scattering_coefficient_550'] = ds['scattering_coefficient'].isel(indexers={'wavelength': 1}) ds['scattering_coefficient_700'] = ds['scattering_coefficient'].isel(indexers={'wavelength': 2}) display = TimeSeriesDisplay({'ESRL ML': ds}) title = 'ESRL ML Scatter Coefficient from 2020-01-01 to 2020-01-04' display.plot('scattering_coefficient_450', label=labels[0], day_night_background=True) display.plot('scattering_coefficient_550', label=labels[1]) display.plot('scattering_coefficient_700', label=labels[2], set_title=title) plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.093 seconds) .. _sphx_glr_download_source_auto_examples_plotting_plot_groups.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_groups.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_groups.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_groups.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_