Tutorial 04 – Using Customized Data to Run Pywr-DRB#
In addition to built-in inflow and diversion datasets, Pywr-DRB allows users to supply their own customized input data. This tutorial explains how to integrate external inflow or diversion files using the model’s path configuration system.
You will learn:
How dataset paths are managed in Pywr-DRB using the
PathNavigator
How to register and use custom folders for flow and diversion data
What files must be present for a valid custom input folder
Step 1 – Understanding the Path Structure#
Pywr-DRB uses a centralized PathNavigator
object to manage file paths for input datasets, including flows, diversions, observations, and operational constants.
The PathNavigator
stores all dataset directories in a structured configuration that Pywr-DRB references when building or running a model.
You can inspect the current path configuration using:
import pywrdrb
from pprint import pprint
pn_config = pywrdrb.get_pn_config()
pprint(pn_config)
{'flows/nhmv10': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nhmv10',
'flows/nhmv10_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nhmv10_withObsScaled',
'flows/nwmv21': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nwmv21',
'flows/nwmv21_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nwmv21_withObsScaled',
'flows/pub_nhmv10_BC_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\pub_nhmv10_BC_withObsScaled',
'flows/wrf1960s_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrf1960s_calib_nlcd2016',
'flows/wrf2050s_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrf2050s_calib_nlcd2016',
'flows/wrfaorc_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrfaorc_calib_nlcd2016',
'flows/wrfaorc_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrfaorc_withObsScaled'}
The printed dictionary shows the current registered dataset paths. Each entry uses a key like flows/nhmv10
or diversions/nwmv21
to identify the type and name of the dataset, and maps it to a corresponding local directory.
From this configuration, you can see that Pywr-DRB includes several built-in inflow and diversion datasets:
Inflow types:
nhmv10
,nhmv10_withObsScaled
,nwmv21
,nwmv21_withObsScaled
,wrf1960s_calib_nlcd2016
,wrf2050s_calib_nlcd2016
, andwrfaorc_calib_nlcd2016
Diversion types: matching diversion data folders exist for each of the inflow types listed above
Each dataset folder is stored under a prefix (flows/
or diversions/
) to distinguish the type of input data. These paths are referenced during model building when you specify inflow_type
and diversion_type
.
Step 2 – Registering a Custom Dataset#
If you want to use your own inflow data stored in a folder like C:/my_data
, you can add it to the path configuration and register it with a custom name (e.g., "my_data"
):
pn_config = pywrdrb.get_pn_config()
pn_config["flows/my_data"] = "C:/my_data"
pywrdrb.load_pn_config(pn_config)
Once registered, you can build the model using your custom data by passing “my_data” as the inflow_type:
mb = pywrdrb.ModelBuilder(
inflow_type="my_data",
diversion_type="nhmv10",
start_date="1983-10-01",
end_date="1985-12-31"
)
mb.make_model()
print("Model created successfully using custom inflow type.")
If the dataset is not properly registered or does not include the required files, ModelBuilder will raise an error.
Note:
At present, all simulations use the same extrapolated diversion data.
Thediversion_type
option is not yet active and should be omitted from theModelBuilder
call.
Step 3 – Required Files for Custom Inflow Data#
Your custom inflow folder must include the following three files:
catchment_inflow_mgd.csv
gage_flow_mgd.csv
predicted_inflows_mgd.csv
Each file should:
Be a CSV file with a
datetime
columnInclude columns for relevant nodes or locations (e.g., reservoirs or catchments)
Use consistent formatting and overlapping date ranges across all files
To preview the expected format, you can inspect the structure of the built-in nhmv10
inflow folder:
import pandas as pd
pn = pywrdrb.get_pn_object()
files = pn.flows.nhmv10.list()
print("Required files:", files)
for file in files:
df = pd.read_csv(pn.flows.nhmv10.get(file))
print(f"\nPreview: {file}")
print(df.iloc[:5, :5]) # first 5 rows and 5 columns
Required files: ['catchment_inflow_mgd.csv', 'gage_flow_mgd.csv', 'predicted_inflows_mgd.csv']
Preview: catchment_inflow_mgd.csv
datetime cannonsville pepacton neversink wallenpaupack
0 1980-10-01 130.379373 89.325193 60.718368 33.438627
1 1980-10-02 320.876434 234.425929 58.710650 155.951484
2 1980-10-03 384.188487 261.300230 71.301161 196.969214
3 1980-10-04 345.220715 265.843778 115.661384 229.965044
4 1980-10-05 322.869093 257.040294 71.100223 208.087549
Preview: gage_flow_mgd.csv
datetime cannonsville pepacton neversink wallenpaupack
0 1980-10-01 130.379373 89.325193 60.718368 33.438627
1 1980-10-02 320.876434 234.425929 58.710650 155.951484
2 1980-10-03 384.188487 261.300230 71.301161 196.969214
3 1980-10-04 345.220715 265.843778 115.661384 229.965044
4 1980-10-05 322.869093 257.040294 71.100223 208.087549
Preview: predicted_inflows_mgd.csv
datetime delMontague_lag1_regression_disagg \
0 1983-10-01 229.021075
1 1983-10-02 437.812448
2 1983-10-03 1091.191061
3 1983-10-04 1367.417389
4 1983-10-05 1046.802593
delMontague_lag2_regression_disagg delTrenton_lag1_regression_disagg \
0 407.304641 34.783055
1 846.733954 330.528839
2 1608.726450 1597.578624
3 1845.161086 2856.923949
4 1512.261956 1945.907752
delTrenton_lag2_regression_disagg
0 312.558400
1 654.031017
2 2743.665878
3 4238.030173
4 2924.458661
Make sure your files use the same column names and structure so they can be correctly interpreted by the model.
Datasets you need to have in your customize “my_folder”#
print("For flow type folder, you need to have the following files: \n")
pn = pywrdrb.get_pn_object()
files = pn.flows.nhmv10.list()
print(f"File needed: {files}\n")
for file in files:
df = pd.read_csv(pn.flows.nhmv10.get(file))
print(f"File: {file}")
print(df.iloc[:5, :5]) # print first 5 rows and 5 columns
print("\n")
#df.head()
For flow type folder, you need to have the following files:
File needed: ['catchment_inflow_mgd.csv', 'gage_flow_mgd.csv', 'predicted_inflows_mgd.csv']
File: catchment_inflow_mgd.csv
datetime cannonsville pepacton neversink wallenpaupack
0 1980-10-01 130.379373 89.325193 60.718368 33.438627
1 1980-10-02 320.876434 234.425929 58.710650 155.951484
2 1980-10-03 384.188487 261.300230 71.301161 196.969214
3 1980-10-04 345.220715 265.843778 115.661384 229.965044
4 1980-10-05 322.869093 257.040294 71.100223 208.087549
File: gage_flow_mgd.csv
datetime cannonsville pepacton neversink wallenpaupack
0 1980-10-01 130.379373 89.325193 60.718368 33.438627
1 1980-10-02 320.876434 234.425929 58.710650 155.951484
2 1980-10-03 384.188487 261.300230 71.301161 196.969214
3 1980-10-04 345.220715 265.843778 115.661384 229.965044
4 1980-10-05 322.869093 257.040294 71.100223 208.087549
File: predicted_inflows_mgd.csv
datetime delMontague_lag1_regression_disagg \
0 1983-10-01 229.021075
1 1983-10-02 437.812448
2 1983-10-03 1091.191061
3 1983-10-04 1367.417389
4 1983-10-05 1046.802593
delMontague_lag2_regression_disagg delTrenton_lag1_regression_disagg \
0 407.304641 34.783055
1 846.733954 330.528839
2 1608.726450 1597.578624
3 1845.161086 2856.923949
4 1512.261956 1945.907752
delTrenton_lag2_regression_disagg
0 312.558400
1 654.031017
2 2743.665878
3 4238.030173
4 2924.458661