API Reference¶

`calculate_doublet_performance_stochastic(reservoir_properties, utc_properties=None, rng_seed=None, p_values=[50.0], chunk_size=None, print_execution_duration=False, mask_value=0.0)` ¶

Perform a ThermoGIS Stochastic Doublet performance simulation.

This function computes doublet performance metrics across all dimensions of the input dataset. The input can be scalar, 1D, or 2D gridded data.

If no temperature values are provided, they are estimated from a gradient defined in utc_properties. If a mask is provided, any non-NaN values in the mask datarray will result in zeroing the output values at those locations.

The ThermoGIS methodology works by simulating doublet performance across percentiles of transmissivity, which in turn is calculated using probability distributions of thickness and permeability. The output results will always contain the dimension p_value; although by default only a P50 simulation is done.

Parameters:

Name	Type	Description	Default
`reservoir_properties`	`Dataset`	An xarray Dataset containing the required input variables: - thickness_mean - thickness_sd - porosity - ntg - depth - ln_permeability_mean - ln_permeability_sd Optional variables: - temperature : If not provided, temperature is estimated using the depth and a temperature gradient from `input_params`. - mask : If provided, all non-NaN values will result in setting corresponding output values to mask_value.	required
`utc_properties`	`JClass`	A Java class specifying the properties of the doublet being simulated	`None`
`rng_seed`	`int`	Random seed used for stochastic components of the simulation.	`None`
`p_values`	`list of float`	List of probability values (e.g., [0.1, 0.5, 0.9]) for the performance evaluation. If not provided, the default value of P50 (0.5) is used.	`[50.0]`
`chunk_size`	`int`	None by default, if set to an integer then chunking of the reservoir properties occurs. The chunk size is used to split up the number of simulations into "chunks" which can be processed in parallel using the dask framework. Chunk size involves trade-offs: smaller chunks = more parallelism, but more overhead, while larger chunks = less overhead, but can lead to memory pressure. The optimal chunk size is dependent on the hardware being used to run the simulation. The user should test to find the optimal chunk size.	`None`
`print_execution_duration`	`bool`	False by default, If set to True print the time in seconds it took to simulate across all reservoir properties	`False`
`mask_value`	`float`	0.0 by default, Any cell that results in a failed simulation or corresponds to a non-nan value in the mask parameter will be assigned the mask value	`0.0`

Returns:

Name	Type	Description
`output_data`	`Dataset`	An xarray Dataset with the same spatial dimensions as `input_data`, plus an added `p_value` dimension. Contains the following output variables: - power - heat_pump_power - capex - opex - utc - npv - hprod - cop - cophp - pres - flow_rate - welld - inj_temp - prd_temp - temperature - thickness - depth - permeability - transmissivity - transmissivity_with_ntg

Source code in src/pythermogis/doublet_simulation/stochastic_doublet.py

def calculate_doublet_performance_stochastic(
        reservoir_properties: xr.Dataset,
        utc_properties = None,
        rng_seed = None,
        p_values: List[float] = [50.0],
        chunk_size: int = None,
        print_execution_duration = False,
        mask_value: float = 0.0,
) -> xr.Dataset:
    """
    Perform a ThermoGIS Stochastic Doublet performance simulation.

    This function computes doublet performance metrics across all dimensions of the input dataset.
    The input can be scalar, 1D, or 2D gridded data.

    If no temperature values are provided, they are estimated from a gradient defined in `utc_properties`.
    If a mask is provided, any non-NaN values in the mask datarray will result in zeroing the output values at those locations.

    The ThermoGIS methodology works by simulating doublet performance across percentiles of transmissivity, which in turn is calculated using probability distributions of thickness and permeability.
    The output results will always contain the dimension p_value; although by default only a P50 simulation is done.

    Parameters
    ----------
    reservoir_properties : xr.Dataset
        An xarray Dataset containing the required input variables:
        - thickness_mean
        - thickness_sd
        - porosity
        - ntg
        - depth
        - ln_permeability_mean
        - ln_permeability_sd

        Optional variables:
        - temperature : If not provided, temperature is estimated using the depth and a temperature gradient from `input_params`.
        - mask : If provided, all non-NaN values will result in setting corresponding output values to mask_value.

    utc_properties : JClass
        A Java class specifying the properties of the doublet being simulated

    rng_seed : int
        Random seed used for stochastic components of the simulation.

    p_values : list of float, optional
        List of probability values (e.g., [0.1, 0.5, 0.9]) for the performance evaluation.
        If not provided, the default value of P50 (0.5) is used.

    chunk_size : int
        None by default, if set to an integer then chunking of the reservoir properties occurs.
        The chunk size is used to split up the number of simulations into "chunks" which can be processed in parallel using the dask framework.
        Chunk size involves trade-offs: smaller chunks = more parallelism, but more overhead, while larger chunks = less overhead, but can lead to memory pressure.
        The optimal chunk size is dependent on the hardware being used to run the simulation. The user should test to find the optimal chunk size.

    print_execution_duration : bool
        False by default, If set to True print the time in seconds it took to simulate across all reservoir properties

    mask_value : float
        0.0 by default, Any cell that results in a failed simulation or corresponds to a
        non-nan value in the mask parameter will be assigned the mask value

    Returns
    -------
    output_data : xr.Dataset
        An xarray Dataset with the same spatial dimensions as `input_data`, plus an added `p_value` dimension.
        Contains the following output variables:
        - power
        - heat_pump_power
        - capex
        - opex
        - utc
        - npv
        - hprod
        - cop
        - cophp
        - pres
        - flow_rate
        - welld
        - inj_temp
        - prd_temp
        - temperature
        - thickness
        - depth
        - permeability
        - transmissivity
        - transmissivity_with_ntg
    """
    if print_execution_duration: start = timeit.default_timer()

    # Check that all essential variables are provided
    validate_input_data(reservoir_properties)

    if utc_properties is None: # Instantiate utc_properties if none is provided
        utc_properties = instantiate_utc_properties_builder().build()

    # convert p_values list to a xarray DataArray; needed to ensure the dimensionality of the calculations
    p_values = xr.DataArray(
        data=p_values,
        dims=["p_value"],
        coords=dict(
            p_value=(["p_value"], p_values),
        ))

    # Generate temperature values from gradient if no temperature provided
    if "temperature" not in reservoir_properties:
        reservoir_properties["temperature"] = calculate_temperature_from_gradient(reservoir_properties.depth, reservoir_properties.thickness_mean, utc_properties.tempGradient(), utc_properties.surfaceTemperature())

    # If no mask grid is provided, then provide a dummy mask grid with only nan
    if "mask" not in reservoir_properties:
        reservoir_properties["mask"] = np.nan

    # if chunk_size is specified, then chunk reservoir_properties, (see descripting of chunk_size)
    if chunk_size is not None:
        reservoir_properties = auto_chunk_dataset(reservoir_properties, chunk_size)

    # Setup output_data dataset
    output_data = reservoir_properties["temperature"].copy().to_dataset()
    output_data = output_data.expand_dims({"p_value": p_values})

    # Calculate Thickness, Permeability and Transmissivity for each P-value
    output_data["thickness"], output_data["permeability"], output_data["transmissivity"] = xr.apply_ufunc(generate_thickness_permeability_transmissivity_for_pvalues,
                                                                                                          reservoir_properties.thickness_mean,
                                                                                                          reservoir_properties.thickness_sd,
                                                                                                          reservoir_properties.ln_permeability_mean,
                                                                                                          reservoir_properties.ln_permeability_sd,
                                                                                                          p_values,
                                                                                                          input_core_dims=[[], [], [], [], ["p_value"]],
                                                                                                          output_core_dims=[["p_value"], ["p_value"], ["p_value"]],
                                                                                                          vectorize=True,
                                                                                                          dask="parallelized",
                                                                                                          )

    output_data = simulate_doublet(output_data, reservoir_properties, rng_seed, utc_properties, mask_value=mask_value)
    if chunk_size is not None: output_data.load() # If chunking has occurred then the data must be de-chunked
    if print_execution_duration: print(f"Doublet simulation took {timeit.default_timer() - start:.1f} seconds")
    return output_data

`validate_input_data(input_data)` ¶

Ensure that the input_data Dataset contains the minimum required variables. Otherwise raise an informative error.