Skip to content

API Reference

calculate_doublet_performance_stochastic(reservoir_properties, utc_properties=None, rng_seed=None, p_values=[50.0], chunk_size=None, print_execution_duration=False, mask_value=0.0)

Perform a ThermoGIS Stochastic Doublet performance simulation.

This function computes doublet performance metrics across all dimensions of the input dataset. The input can be scalar, 1D, or 2D gridded data.

If no temperature values are provided, they are estimated from a gradient defined in utc_properties. If a mask is provided, any non-NaN values in the mask datarray will result in zeroing the output values at those locations.

The ThermoGIS methodology works by simulating doublet performance across percentiles of transmissivity, which in turn is calculated using probability distributions of thickness and permeability. The output results will always contain the dimension p_value; although by default only a P50 simulation is done.

Parameters:

Name Type Description Default
reservoir_properties Dataset

An xarray Dataset containing the required input variables: - thickness_mean - thickness_sd - porosity - ntg - depth - ln_permeability_mean - ln_permeability_sd

Optional variables: - temperature : If not provided, temperature is estimated using the depth and a temperature gradient from input_params. - mask : If provided, all non-NaN values will result in setting corresponding output values to mask_value.

required
utc_properties JClass

A Java class specifying the properties of the doublet being simulated

None
rng_seed int

Random seed used for stochastic components of the simulation.

None
p_values list of float

List of probability values (e.g., [0.1, 0.5, 0.9]) for the performance evaluation. If not provided, the default value of P50 (0.5) is used.

[50.0]
chunk_size int

None by default, if set to an integer then chunking of the reservoir properties occurs. The chunk size is used to split up the number of simulations into "chunks" which can be processed in parallel using the dask framework. Chunk size involves trade-offs: smaller chunks = more parallelism, but more overhead, while larger chunks = less overhead, but can lead to memory pressure. The optimal chunk size is dependent on the hardware being used to run the simulation. The user should test to find the optimal chunk size.

None
print_execution_duration bool

False by default, If set to True print the time in seconds it took to simulate across all reservoir properties

False
mask_value float

0.0 by default, Any cell that results in a failed simulation or corresponds to a non-nan value in the mask parameter will be assigned the mask value

0.0

Returns:

Name Type Description
output_data Dataset

An xarray Dataset with the same spatial dimensions as input_data, plus an added p_value dimension. Contains the following output variables: - power - heat_pump_power - capex - opex - utc - npv - hprod - cop - cophp - pres - flow_rate - welld - inj_temp - prd_temp - temperature - thickness - depth - permeability - transmissivity - transmissivity_with_ntg

Source code in src/pythermogis/doublet_simulation/stochastic_doublet.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def calculate_doublet_performance_stochastic(
        reservoir_properties: xr.Dataset,
        utc_properties = None,
        rng_seed = None,
        p_values: List[float] = [50.0],
        chunk_size: int = None,
        print_execution_duration = False,
        mask_value: float = 0.0,
) -> xr.Dataset:
    """
    Perform a ThermoGIS Stochastic Doublet performance simulation.

    This function computes doublet performance metrics across all dimensions of the input dataset.
    The input can be scalar, 1D, or 2D gridded data.

    If no temperature values are provided, they are estimated from a gradient defined in `utc_properties`.
    If a mask is provided, any non-NaN values in the mask datarray will result in zeroing the output values at those locations.

    The ThermoGIS methodology works by simulating doublet performance across percentiles of transmissivity, which in turn is calculated using probability distributions of thickness and permeability.
    The output results will always contain the dimension p_value; although by default only a P50 simulation is done.

    Parameters
    ----------
    reservoir_properties : xr.Dataset
        An xarray Dataset containing the required input variables:
        - thickness_mean
        - thickness_sd
        - porosity
        - ntg
        - depth
        - ln_permeability_mean
        - ln_permeability_sd

        Optional variables:
        - temperature : If not provided, temperature is estimated using the depth and a temperature gradient from `input_params`.
        - mask : If provided, all non-NaN values will result in setting corresponding output values to mask_value.

    utc_properties : JClass
        A Java class specifying the properties of the doublet being simulated

    rng_seed : int
        Random seed used for stochastic components of the simulation.

    p_values : list of float, optional
        List of probability values (e.g., [0.1, 0.5, 0.9]) for the performance evaluation.
        If not provided, the default value of P50 (0.5) is used.

    chunk_size : int
        None by default, if set to an integer then chunking of the reservoir properties occurs.
        The chunk size is used to split up the number of simulations into "chunks" which can be processed in parallel using the dask framework.
        Chunk size involves trade-offs: smaller chunks = more parallelism, but more overhead, while larger chunks = less overhead, but can lead to memory pressure.
        The optimal chunk size is dependent on the hardware being used to run the simulation. The user should test to find the optimal chunk size.

    print_execution_duration : bool
        False by default, If set to True print the time in seconds it took to simulate across all reservoir properties

    mask_value : float
        0.0 by default, Any cell that results in a failed simulation or corresponds to a
        non-nan value in the mask parameter will be assigned the mask value

    Returns
    -------
    output_data : xr.Dataset
        An xarray Dataset with the same spatial dimensions as `input_data`, plus an added `p_value` dimension.
        Contains the following output variables:
        - power
        - heat_pump_power
        - capex
        - opex
        - utc
        - npv
        - hprod
        - cop
        - cophp
        - pres
        - flow_rate
        - welld
        - inj_temp
        - prd_temp
        - temperature
        - thickness
        - depth
        - permeability
        - transmissivity
        - transmissivity_with_ntg
    """
    if print_execution_duration: start = timeit.default_timer()

    # Check that all essential variables are provided
    validate_input_data(reservoir_properties)

    if utc_properties is None: # Instantiate utc_properties if none is provided
        utc_properties = instantiate_utc_properties_builder().build()

    # convert p_values list to a xarray DataArray; needed to ensure the dimensionality of the calculations
    p_values = xr.DataArray(
        data=p_values,
        dims=["p_value"],
        coords=dict(
            p_value=(["p_value"], p_values),
        ))

    # Generate temperature values from gradient if no temperature provided
    if "temperature" not in reservoir_properties:
        reservoir_properties["temperature"] = calculate_temperature_from_gradient(reservoir_properties.depth, reservoir_properties.thickness_mean, utc_properties.tempGradient(), utc_properties.surfaceTemperature())

    # If no mask grid is provided, then provide a dummy mask grid with only nan
    if "mask" not in reservoir_properties:
        reservoir_properties["mask"] = np.nan

    # if chunk_size is specified, then chunk reservoir_properties, (see descripting of chunk_size)
    if chunk_size is not None:
        reservoir_properties = auto_chunk_dataset(reservoir_properties, chunk_size)

    # Setup output_data dataset
    output_data = reservoir_properties["temperature"].copy().to_dataset()
    output_data = output_data.expand_dims({"p_value": p_values})

    # Calculate Thickness, Permeability and Transmissivity for each P-value
    output_data["thickness"], output_data["permeability"], output_data["transmissivity"] = xr.apply_ufunc(generate_thickness_permeability_transmissivity_for_pvalues,
                                                                                                          reservoir_properties.thickness_mean,
                                                                                                          reservoir_properties.thickness_sd,
                                                                                                          reservoir_properties.ln_permeability_mean,
                                                                                                          reservoir_properties.ln_permeability_sd,
                                                                                                          p_values,
                                                                                                          input_core_dims=[[], [], [], [], ["p_value"]],
                                                                                                          output_core_dims=[["p_value"], ["p_value"], ["p_value"]],
                                                                                                          vectorize=True,
                                                                                                          dask="parallelized",
                                                                                                          )

    output_data = simulate_doublet(output_data, reservoir_properties, rng_seed, utc_properties, mask_value=mask_value)
    if chunk_size is not None: output_data.load() # If chunking has occurred then the data must be de-chunked
    if print_execution_duration: print(f"Doublet simulation took {timeit.default_timer() - start:.1f} seconds")
    return output_data

validate_input_data(input_data)

Ensure that the input_data Dataset contains the minimum required variables. Otherwise raise an informative error.

Parameters:

Name Type Description Default
input_data Dataset

Input dataset that must contain the required ThermoGIS variables.

required

Returns:

Type Description
None
Source code in src/pythermogis/doublet_simulation/stochastic_doublet.py
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def validate_input_data(input_data: xr.Dataset):
    """
    Ensure that the input_data Dataset contains the minimum required variables. Otherwise raise an informative error.

    Parameters
    ----------
    input_data : xr.Dataset
        Input dataset that must contain the required ThermoGIS variables.

    Returns
    -------
    None
    """
    missing_variables = []
    for variable in ["thickness_mean", "thickness_sd", "porosity", "ntg", "depth", "ln_permeability_mean", "ln_permeability_sd"]:
        if variable not in input_data:
            missing_variables.append(variable)
    if len(missing_variables) > 0:
        raise ValueError(f"provided input Dataset does not contain the following required variables: {missing_variables}")

    if (input_data["thickness_mean"] < 0.0).any():
        raise ValueError(f"provided input Dataset contains a negative thickness_mean value. thickness_mean must always be >= 0.0 in [m]")
    if (input_data["thickness_sd"] < 0.0).any():
        raise ValueError(f"provided input Dataset contains a negative thickness_sd value. thickness_sd must always be >= 0.0 in [m]")
    if (input_data["porosity"] > 1.0).any() or (input_data["porosity"] < 0.0).any():
        raise ValueError(f"provided input Dataset contains a porosity value > 1.0. porosity must always be between 0.0 and 1.0 (100% porosity = 1.0)")
    if (input_data["ntg"] > 1.0).any() or (input_data["ntg"] < 0.0).any():
        raise ValueError(f"provided input Dataset contains a ntg value > 1.0. ntg must always be between 0.0 and 1.0 (100% ntg = 1.0)")