Create a dataset to mimic ARM file formats#

Example shows how to create a dataset from an ARM DOD. This will enable users to create files that mimic ARM files, making for easier use across the community.

Author: Adam Theisen

<xarray.Dataset> Size: 3MB
Dimensions:                            (time: 1440, drop_diameter: 50)
Coordinates:
  * time                               (time) int64 12kB 0 1 2 ... 1438 1439
  * drop_diameter                      (drop_diameter) int64 400B 0 1 ... 48 49
Data variables: (12/36)
    base_time                          (time) float64 12kB -9.999e+03 ... -9....
    time_offset                        (time) float64 12kB -9.999e+03 ... -9....
    num_drops                          (time, drop_diameter) float64 576kB -9...
    qc_num_drops                       (time, drop_diameter) float64 576kB -9...
    num_density                        (time, drop_diameter) float64 576kB -9...
    qc_num_density                     (time, drop_diameter) float64 576kB -9...
    ...                                 ...
    moment5                            (time) float64 12kB -9.999e+03 ... -9....
    moment6                            (time) float64 12kB -9.999e+03 ... -9....
    radar_reflectivity                 (time) float64 12kB -9.999e+03 ... -9....
    lat                                (time) float64 12kB -9.999e+03 ... -9....
    lon                                (time) float64 12kB -9.999e+03 ... -9....
    alt                                (time) float64 12kB -9.999e+03 ... -9....
Attributes: (12/23)
    command_line:
    process_version:
    dod_version:
    site_id:
    facility_id:
    input_source:
    ...                      ...
    qc_bit_3_description:    Value is greater than the valid_max.
    qc_bit_3_assessment:     Bad
    qc_bit_4_description:    Difference between current and previous values e...
    qc_bit_4_assessment:     Indeterminate
    datastream:
    history:
{'long_name': 'Number of drops per bin', 'units': 'unitless', 'valid_min': '0', 'missing_value': '-9999'}
{'command_line': '', 'process_version': '', 'dod_version': '', 'site_id': '', 'facility_id': '', 'input_source': '', 'resolution_description': 'The resolution field attributes refer to the number of significant digits relative to the decimal point that should be used in calculations. Using fewer digits might result in greater uncertainty. Using a larger number of digits should have no effect and thus is unnecessary. However, analysis based on differences in values with a larger number of significant digits than indicated could lead to erroneous results or misleading scientific conclusions.\n\nresolution for lat = 0.001\nresolution for lon = 0.001\nresolution for alt = 1', 'sampling_interval': '1 minute', 'serial_number': '', 'bin_width': '0.2 mm', 'qc_standards_version': '1.0', 'qc_method': 'Standard Mentor QC', 'qc_comment': "The QC field values are a bit packed representation of true/false values for the tests that may have been performed. A QC value of zero means that none of the tests performed on the value failed.\n\nThe QC field values make use of the internal binary format to store the results of the individual QC tests. This allows the representation of multiple QC states in a single value. If the test associated with a particular bit fails the bit is turned on. Turning on the bit equates to adding the integer value of the failed test to the current value of the field. The QC field's value can be interpreted by applying bit logic using bitwise operators, or by examining the QC value's integer representation. A QC field's integer representation is the sum of the individual integer values of the failed tests. The bit and integer equivalents for the first 5 bits are listed below:\n\nbit_1 = 00000001 = 0x01 = 2^0 = 1\nbit_2 = 00000010 = 0x02 = 2^1 = 2\nbit_3 = 00000100 = 0x04 = 2^2 = 4\nbit_4 = 00001000 = 0x08 = 2^3 = 8\nbit_5 = 00010000 = 0x10 = 2^4 = 16", 'qc_bit_1_description': 'Value is equal to missing_value.', 'qc_bit_1_assessment': 'Bad', 'qc_bit_2_description': 'Value is less than the valid_min.', 'qc_bit_2_assessment': 'Bad', 'qc_bit_3_description': 'Value is greater than the valid_max.', 'qc_bit_3_assessment': 'Bad', 'qc_bit_4_description': 'Difference between current and previous values exceeds valid_delta.', 'qc_bit_4_assessment': 'Indeterminate', 'datastream': '', 'history': ''}
command_line python  plot_create_arm_ds.py
process_version 1.2.3
history Processed with Jupyter Workbench
random 1234253sdgfadf

import act

# Create an empty dataset using an ARM DOD
ds = act.io.arm.create_ds_from_arm_dod('vdis.b1', {'time': 1440}, scalar_fill_dim='time')

# Print out the xarray dataset to see that it's empty
print(ds)

# The user could populate this through a number of ways
# and that's best left up to the user on how to do it.
# If one has an existing dataset, a mapping of variable
# names is sometimes the easiest way

# Let's look at some variable attributes
# These can be updated and it would be up to the
# user to ensure these tests are being applied
# and are appropriately set in the cooresponding QC variable
print(ds['num_drops'].attrs)

# Next, let's print out the global attribuets
print(ds.attrs)

# Add additional attributes or append to existing
# if they are needed using a dictionary
atts = {
    'command_line': 'python  plot_create_arm_ds.py',
    'process_version': '1.2.3',
    'history': 'Processed with Jupyter Workbench',
    'random': '1234253sdgfadf',
}
for a in atts:
    if a in ds.attrs:
        ds.attrs[a] += atts[a]
    else:
        ds.attrs[a] = atts[a]
    # Print out the attribute
    print(a, ds.attrs[a])

# Write data out to netcdf
ds.to_netcdf('./sgpvdisX1.b1.20230101.000000.nc')

# If one wants to clean up the dataset to better match CF standards
# the following can be done as well
ds.write.write_netcdf(cf_compliant=True, path='./sgpvdisX1.b1.20230101.000000.cf')

Total running time of the script: (0 minutes 1.894 seconds)

Gallery generated by Sphinx-Gallery

Create a dataset to mimic ARM file formats#

This Page