Correlated Dataset

The Core Scientific Dataset Model (CSDM) also support multiple dependent variables that share the same coordinate grid. We call the dependent variables from these datasets as correlated datasets.

In this section, we go over a few examples.

Meteorological, 2D{1,1,2,1,1} dataset

The following dataset is obtained from NOAA/NCEP Global Forecast System (GFS) Atmospheric Model and subsequently converted to the CSD model file-format. The dataset consists of two spatial dimensions describing the geographical coordinates of the earth surface and five dependent variables with 1) surface temperature, 2) air temperature at 2 m, 3) relative humidity, 4) air pressure at sea level as the four scalar quantity_type dependent variable, and 5) wind velocity as the two-component vector, quantity type dependent variable.

Let’s import the csdmpy module and load this dataset.

>>> import csdmpy as cp

>>> filename = 'Test Files/correlatedDataset/forecast/NCEI.csdfe'
>>> multi_dataset = cp.load(filename)

Let’s get the tuple of dimension and dependent variable objects from multi_dataset instance.

>>> x = multi_dataset.dimensions
>>> y = multi_dataset.dependent_variables

The dataset contains two dimension objects representing the longitude and latitude of the earth’s surface. The respective dimensions are labeled as

>>> x[0].label
'longitude'

>>> x[1].label
'latitude'

There are a total of five dependent variables stored in this dataset. The first dependent variable is the surface air temperature. The data structure of this dependent variable is

>>> print(y[0].data_structure)
{
  "type": "internal",
  "description": "The label 'tmpsfc' is the standard attribute name for 'surface air temperature'.",
  "name": "Surface temperature",
  "unit": "K",
  "quantity_name": "temperature",
  "numeric_type": "float64",
  "quantity_type": "scalar",
  "component_labels": [
    "tmpsfc - surface air temperature"
  ],
  "components": [
    [
      "292.8152160644531, 293.0152282714844, ..., 301.8152160644531, 303.8152160644531"
    ]
  ]
}

If you have followed all previous examples, the above data structure should be self-explanatory. The following snippet plots a dependent variable of scalar quantity_type.

Tip

Plotting a scalar intensity plot

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from mpl_toolkits.axes_grid1 import make_axes_locatable

>>> def plot_scalar(yx):
...     fig, ax = plt.subplots(1,1, figsize=(6,3))
...
...     # Set the extents of the image plot.
...     extent = [x[0].coordinates[0].value, x[0].coordinates[-1].value,
...               x[1].coordinates[0].value, x[1].coordinates[-1].value]
...
...     # Add the image plot.
...     im = ax.imshow(yx.components[0], origin='lower', extent=extent,
...                    cmap='coolwarm')
...
...     # Add a colorbar.
...     divider = make_axes_locatable(ax)
...     cax = divider.append_axes("right", size="5%", pad=0.05)
...     cbar = fig.colorbar(im, cax)
...     cbar.ax.set_ylabel(yx.axis_label[0])
...
...     # Set up the axes label and figure title.
...     ax.set_xlabel(x[0].axis_label)
...     ax.set_ylabel(x[1].axis_label)
...     ax.set_title(yx.name)
...
...     # Set up the grid lines.
...     ax.grid(color='k', linestyle='--', linewidth=0.5)
...
...     plt.tight_layout(pad=0, w_pad=0, h_pad=0)
...     plt.show()

Now to plot the data from the dependent variable.

>>> plot_scalar(y[0])
../../_images/NCEI.csdfeSurfacetemperature.png

Similarly, other dependent variables with their respective plots are

>>> y[1].name
'Air temperature at 2m'
>>> plot_scalar(y[1])
../../_images/NCEI.csdfeAirtemperatureat2m.png
>>> y[3].name
'Relative humidity'
>>> plot_scalar(y[3])
../../_images/NCEI.csdfeRelativehumidity.png
>>> y[4].name
'Air pressure at sea level'
>>> plot_scalar(y[4])
../../_images/NCEI.csdfeAirpressureatsealevel.png

Notice, we didn’t plot the dependent variable at index 2. This is because this particular dependent variable is a vector dataset representing the wind velocity.

>>> y[2].quantity_type
'vector_2'
>>> y[2].name
'Wind velocity'

To visualize the vector data, we use matplotlib streamline plot.

Tip

Plotting a vector quiver plot

>>> def plot_vector(yx):
...     fig, ax = plt.subplots(1,1, figsize=(6,3))
...     X, Y = np.meshgrid(x[0].coordinates, x[1].coordinates)
...     magnitude = np.sqrt(yx.components[0]**2 + yx.components[1]**2)
...
...     cf = ax.quiver(x[0].coordinates, x[1].coordinates,
...                    yx.components[0], yx.components[1],
...                    magnitude, pivot ='middle', cmap='inferno')
...     divider = make_axes_locatable(ax)
...     cax = divider.append_axes("right", size="5%", pad=0.05)
...     cbar = fig.colorbar(cf, cax)
...     cbar.ax.set_ylabel(yx.name+' / '+str(yx.unit))
...
...     ax.set_xlim([x[0].coordinates[0].value, x[0].coordinates[-1].value])
...     ax.set_ylim([x[1].coordinates[0].value, x[1].coordinates[-1].value])
...
...     # Set axes labels and figure title.
...     ax.set_xlabel(x[0].axis_label)
...     ax.set_ylabel(x[1].axis_label)
...     ax.set_title(yx.name)
...
...     # Set grid lines.
...     ax.grid(color='gray', linestyle='--', linewidth=0.5)
...
...     plt.tight_layout(pad=0, w_pad=0, h_pad=0)
...     plt.show()
>>> plot_vector(y[2])
../../_images/NCEI.csdfeWindvelocity.png

Astronomy, 2D{1,1,1} dataset (Creating image composition)

More often the image in astronomy is a composition of datasets measured at a different wavelength over an area of the sky. Here, we show an example of an image composition using the data from the Eagle Nebula. Import the csdmpy model and load the dataset.

>>> import csdmpy as cp
>>> import matplotlib.pyplot as plt

>>> filename = 'Test Files/EagleNebula/eagleNebula.csdfe'
>>> eagle_nebula = cp.load(filename)

Let’s get the tuple of dimension and dependent variable objects from eagle_nebula instance.

>>> x = eagle_nebula.dimensions
>>> y = eagle_nebula.dependent_variables

Before we create an image composition, let’s take a look at the individual dependent variables from the dataset. The three dependent variables correspond to signal acquisition at 502 nm, 656 nm, and 673 nm, respectively. This information is also listed in the name attribute of the respective dependent variable instances,

>>> y[0].name
'Eagle Nebula acquired @ 502 nm'
>>> y[1].name
'Eagle Nebula acquired @ 656 nm'
>>> y[2].name
'Eagle Nebula acquired @ 673 nm'

Tip

A script to plot an intensity plot.

>>> import matplotlib.pyplot as plt
>>> from mpl_toolkits.axes_grid1 import make_axes_locatable
>>> from matplotlib.colors import LogNorm

>>> def plot_scalar(yx):
...     plt.figure(figsize=(6,4.5))
...
...     # Set the extents of the image plot.
...     extent = [x[0].coordinates[0].value, x[0].coordinates[-1].value,
...               x[1].coordinates[0].value, x[1].coordinates[-1].value]
...
...     # Add the image plot.
...     y0 = yx.components[0]
...     y0 = y0/y0.max()
...     im = plt.imshow(y0, origin='lower', extent=extent, cmap='bone', vmax=0.1)
...
...     # Add a colorbar.
...     divider = make_axes_locatable(plt.gca())
...     cax = divider.append_axes("right", size="5%", pad=0.05)
...     cbar = plt.gca().figure.colorbar(im, cax)
...     cbar.ax.set_ylabel(yx.axis_label[0])
...
...     # Set up the axes label and figure title.
...     plt.xlabel(x[0].axis_label)
...     plt.ylabel(x[1].axis_label)
...     plt.title(yx.name)
...
...     # Set up the grid lines.
...     plt.grid(color='k', linestyle='--', linewidth=0.5)
...
...     plt.tight_layout(pad=0, w_pad=0, h_pad=0)
...     plt.show()

Let’s plot the dependent variables, first dependent variable,

>>> plot_scalar(y[0])
../../_images/eagleNebula.csdfeEagleNebulaacquired@502nm.png

second dependent variable, and

>>> plot_scalar(y[1])
../../_images/eagleNebula.csdfeEagleNebulaacquired@656nm.png

the third dependent variable.

>>> plot_scalar(y[2])
../../_images/eagleNebula.csdfeEagleNebulaacquired@673nm.png

Image composition

In our image composition, we will assign the dependent variable at index 0 as the blue channel, at index 1 as the green channel, and index 2 as the red channel of an RGB image. First, create an empty array to hold the RGB dataset.

>>> shape = y[0].components[0].shape + (3,)
>>> image = np.empty(shape, dtype=np.float64)

Here, image is a variable we use for storing the composition. Let’s add the respective dependent variables to the designated color channel in the image array,

>>> image[...,0] = y[2].components[0]/y[2].components[0].max() # red channel
>>> image[...,1] = y[1].components[0]/y[1].components[0].max() # green channel
>>> image[...,2] = y[0].components[0]/y[0].components[0].max() # blue channel

If you follow the above figures, the component intensity from y[1] and, therefore, the green channel dominates the other two. If we plot the image data, the image will be saturated with green intensity. To attain a color-balanced image, we arbitrarily scale the intensities from the three channels. You may choose any scaling factor. Each scaling factor will produce a different composition. In this example, we use the following,

>>> image[...,0] = image[...,0]*65.0 # red channel
>>> image[...,1] = image[...,1]*7.5  # green channel
>>> image[...,2] = image[...,2]*75.0 # blue channel

Now to plot this composition.

>>> def image_composition():
...     # Set the extents of the image plot.
...     extent = [x[0].coordinates[0].value, x[0].coordinates[-1].value,
...               x[1].coordinates[0].value, x[1].coordinates[-1].value]
...
...     # add figure
...     plt.figure(figsize=(5,4.5))
...     plt.imshow(image, origin='lower', extent=extent)
...
...     plt.xlabel(x[0].axis_label)
...     plt.ylabel(x[1].axis_label)
...     plt.title('composition')
...
...     plt.tight_layout(pad=0, w_pad=0, h_pad=0)
...     plt.show()
>>> image_composition()
../../_images/eagleNebula.csdfecomposition.png