Note
Go to the end to download the full example code.
Working with embedded quality control variablesΒΆ
This is an example of how to use existing or create new quality control varibles. All the tests are located in act/qc/qctests.py file but called under the qcfilter method.
result = {'test_number': 1, 'test_meaning': 'Value is set to missing_value.', 'test_assessment': 'Bad', 'qc_variable_name': 'qc_inst_up_long_dome_resist', 'variable_name': 'inst_up_long_dome_resist'}
Data type = <class 'numpy.ma.core.MaskedArray'>
data = [-- 7.877699851989746 7.896500110626221 ... -- -- --]
data filled with masked array fill_value = [ nan 7.8777 7.8965 ... nan nan nan]
Data type = <class 'numpy.ndarray'>
data = [ nan 7.8777 7.8965 ... nan nan nan]
QC Variable = <xarray.DataArray 'qc_inst_up_long_dome_resist' (time: 4320)> Size: 17kB
array([1, 0, 0, ..., 2, 2, 2], dtype=int32)
Coordinates:
* time (time) datetime64[ns] 35kB 2019-06-01 ... 2019-06-01T23:59:40
Attributes:
long_name: Quality check results on field: Instantaneous Upwellin...
units: 1
flag_masks: [1, 2, 4, 16]
flag_meanings: ['Value is set to missing_value.', 'Data value less th...
flag_assessments: ['Bad', 'Bad', 'Suspect', 'Suspect']
standard_name: quality_flag
fail_min: 7.8
warn_max: 12.0
mask : test
-----------
1 : [1]
2 : [2]
4 : [3]
16 : [5]
Normal numpy array data values: [ nan 7.8777 7.8965 ... 7.6705 7.6892 7.6892]
Mask associated with values: [False False False ... False False False]
At least one less than test set = True
At least one difference test set = False
from arm_test_data import DATASETS
import numpy as np
from act.io.arm import read_arm_netcdf
from act.qc.qcfilter import parse_bit
# Read a data file that does not have any embedded quality control
# variables. This data comes from the example dataset within ACT.
# Can also read data that has existing quality control variables
# and add, manipulate or use those variables the same.
filename_irt = DATASETS.fetch('sgpirt25m20sC1.a0.20190601.000000.cdf')
ds = read_arm_netcdf(filename_irt)
# The name of the data variable we wish to work with
var_name = 'inst_up_long_dome_resist'
# Since there is no embedded quality control varible one will be
# created for us.
# We can start with adding where the data are set to missing value.
# First we will change the first value to NaN to simulate where
# a missing value exist in the data file.
data = ds[var_name].values
data[0] = np.nan
ds[var_name].values = data
# Add a test for where the data are set to missing value.
# Since a quality control variable does not exist in the file
# one will be created as part of adding this test.
result = ds.qcfilter.add_missing_value_test(var_name)
# The returned dictionary will contain the information added to the
# quality control varible for direct use now. Or the information
# can be looked up later for use.
print('\nresult =', result)
# We can add a second test where data is less than a specified value.
result = ds.qcfilter.add_less_test(var_name, 7.8)
# Next we add a test to indicate where a value is greater than
# or equal to a specified number. We also set the assessement
# to a user defined word. The default assessment is "Bad".
result = ds.qcfilter.add_greater_equal_test(var_name, 12, test_assessment='Suspect')
# We can now get the data as a numpy masked array with a mask set
# where the third test we added (greater than or equal to) using
# the result dictionary to get the test number created for us.
data = ds.qcfilter.get_masked_data(var_name, rm_tests=result['test_number'])
print('\nData type =', type(data))
# Or we can get the masked array for all tests that use the assessment
# set to "Bad".
data = ds.qcfilter.get_masked_data(var_name, rm_assessments=['Bad'])
# If we prefer to mask all data for both Bad or Suspect we can list
# as many assessments as needed
data = ds.qcfilter.get_masked_data(var_name, rm_assessments=['Suspect', 'Bad'])
print('\ndata =', data)
# We can convert the masked array into numpy array and choose the fill value.
data = data.filled(fill_value=np.nan)
print('\ndata filled with masked array fill_value =', data)
# We can create our own test by creating an array of indexes of where
# we want the test to be set and call the method to create our own test.
# We can allow the method to pick the test number (next available)
# or set the test number we wan to use. This example uses test number
# 5 to demonstrate how not all tests need to be used in order.
data = ds.qcfilter.get_masked_data(var_name)
diff = np.diff(data)
max_difference = 0.04
data = np.ma.masked_greater(diff, max_difference)
index = np.where(data.mask is True)[0]
result = ds.qcfilter.add_test(
var_name,
index=index,
test_meaning=f'Difference is greater than {max_difference}',
test_assessment='Suspect',
test_number=5,
)
# If we prefer to work with numpy arrays directly we can return the
# data array converted to a numpy array with masked values set
# to NaN. Here we are requesting both Suspect and Bad data be masked.
data = ds.qcfilter.get_masked_data(
var_name, rm_assessments=['Suspect', 'Bad'], return_nan_array=True
)
print('\nData type =', type(data))
print('data =', data)
# We can see how the quality control data is stored and what assessments,
# or test descriptions are set. Some of the tests have also added attributes to
# store the test limit values.
qc_variable = ds[result['qc_variable_name']]
print('\nQC Variable =', qc_variable)
# The test numbers are not the flag_masks numbers. The flag masks numbers
# are bit-paked numbers used to store what bit is set. To see the test
# numbers we can unpack the bits.
print('\nmask : test')
print('-' * 11)
for mask in qc_variable.attrs['flag_masks']:
print(mask, ' : ', parse_bit(mask))
# We can also just use the get_masked_data() method to get data
# the same as using ".values" method on the xarray dataset. If we don't
# request any tests or assessments to mask the returned masked array
# will not have any mask set. The returned value is a numpy masked array
# where the raw numpy array is accessable with .data property.
data = ds.qcfilter.get_masked_data(var_name)
print('\nNormal numpy array data values:', data.data)
print('Mask associated with values:', data.mask)
# We can use the get_masked_data() method to return a masked array
# where the test is set in the quality control varialbe, and use the
# masked array method to see if any of the values have the test set.
data = ds.qcfilter.get_masked_data(var_name, rm_tests=3)
print('\nAt least one less than test set =', data.mask.any())
data = ds.qcfilter.get_masked_data(var_name, rm_tests=4)
print('At least one difference test set =', data.mask.any())
Total running time of the script: (0 minutes 0.048 seconds)