Tutorial 04 – Using Customized Data to Run Pywr-DRB#

In addition to built-in inflow and diversion datasets, Pywr-DRB allows users to supply their own customized input data. This tutorial explains how to integrate external inflow or diversion files using the model’s path configuration system.

You will learn:

  • How dataset paths are managed in Pywr-DRB using the PathNavigator

  • How to register and use custom folders for flow and diversion data

  • What files must be present for a valid custom input folder

Step 1 – Understanding the Path Structure#

Pywr-DRB uses a centralized PathNavigator object to manage file paths for input datasets, including flows, diversions, observations, and operational constants.

The PathNavigator stores all dataset directories in a structured configuration that Pywr-DRB references when building or running a model.

You can inspect the current path configuration using:

import pywrdrb
from pprint import pprint

pn_config = pywrdrb.get_pn_config()
pprint(pn_config)
{'flows/nhmv10': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nhmv10',
 'flows/nhmv10_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nhmv10_withObsScaled',
 'flows/nwmv21': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nwmv21',
 'flows/nwmv21_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nwmv21_withObsScaled',
 'flows/pub_nhmv10_BC_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\pub_nhmv10_BC_withObsScaled',
 'flows/wrf1960s_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrf1960s_calib_nlcd2016',
 'flows/wrf2050s_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrf2050s_calib_nlcd2016',
 'flows/wrfaorc_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrfaorc_calib_nlcd2016',
 'flows/wrfaorc_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrfaorc_withObsScaled'}

The printed dictionary shows the current registered dataset paths. Each entry uses a key like flows/nhmv10 or diversions/nwmv21 to identify the type and name of the dataset, and maps it to a corresponding local directory.

From this configuration, you can see that Pywr-DRB includes several built-in inflow and diversion datasets:

  • Inflow types: nhmv10, nhmv10_withObsScaled, nwmv21, nwmv21_withObsScaled, wrf1960s_calib_nlcd2016, wrf2050s_calib_nlcd2016, and wrfaorc_calib_nlcd2016

  • Diversion types: matching diversion data folders exist for each of the inflow types listed above

Each dataset folder is stored under a prefix (flows/ or diversions/) to distinguish the type of input data. These paths are referenced during model building when you specify inflow_type and diversion_type.


Step 2 – Registering a Custom Dataset#

If you want to use your own inflow data stored in a folder like C:/my_data, you can add it to the path configuration and register it with a custom name (e.g., "my_data"):

pn_config = pywrdrb.get_pn_config()
pn_config["flows/my_data"] = "C:/my_data"
pywrdrb.load_pn_config(pn_config)

Once registered, you can build the model using your custom data by passing “my_data” as the inflow_type:

mb = pywrdrb.ModelBuilder(
    inflow_type="my_data",
    diversion_type="nhmv10", 
    start_date="1983-10-01",
    end_date="1985-12-31"
)
mb.make_model()

print("Model created successfully using custom inflow type.")

If the dataset is not properly registered or does not include the required files, ModelBuilder will raise an error.

Note:
At present, all simulations use the same extrapolated diversion data.
The diversion_type option is not yet active and should be omitted from the ModelBuilder call.

Step 3 – Required Files for Custom Inflow Data#

Your custom inflow folder must include the following three files:

  • catchment_inflow_mgd.csv

  • gage_flow_mgd.csv

  • predicted_inflows_mgd.csv

Each file should:

  • Be a CSV file with a datetime column

  • Include columns for relevant nodes or locations (e.g., reservoirs or catchments)

  • Use consistent formatting and overlapping date ranges across all files

To preview the expected format, you can inspect the structure of the built-in nhmv10 inflow folder:

import pandas as pd

pn = pywrdrb.get_pn_object()
files = pn.flows.nhmv10.list()

print("Required files:", files)
for file in files:
    df = pd.read_csv(pn.flows.nhmv10.get(file))
    print(f"\nPreview: {file}")
    print(df.iloc[:5, :5])  # first 5 rows and 5 columns
Required files: ['catchment_inflow_mgd.csv', 'gage_flow_mgd.csv', 'predicted_inflows_mgd.csv']

Preview: catchment_inflow_mgd.csv
     datetime  cannonsville    pepacton   neversink  wallenpaupack
0  1980-10-01    130.379373   89.325193   60.718368      33.438627
1  1980-10-02    320.876434  234.425929   58.710650     155.951484
2  1980-10-03    384.188487  261.300230   71.301161     196.969214
3  1980-10-04    345.220715  265.843778  115.661384     229.965044
4  1980-10-05    322.869093  257.040294   71.100223     208.087549

Preview: gage_flow_mgd.csv
     datetime  cannonsville    pepacton   neversink  wallenpaupack
0  1980-10-01    130.379373   89.325193   60.718368      33.438627
1  1980-10-02    320.876434  234.425929   58.710650     155.951484
2  1980-10-03    384.188487  261.300230   71.301161     196.969214
3  1980-10-04    345.220715  265.843778  115.661384     229.965044
4  1980-10-05    322.869093  257.040294   71.100223     208.087549

Preview: predicted_inflows_mgd.csv
     datetime  delMontague_lag1_regression_disagg  \
0  1983-10-01                          229.021075   
1  1983-10-02                          437.812448   
2  1983-10-03                         1091.191061   
3  1983-10-04                         1367.417389   
4  1983-10-05                         1046.802593   

   delMontague_lag2_regression_disagg  delTrenton_lag1_regression_disagg  \
0                          407.304641                          34.783055   
1                          846.733954                         330.528839   
2                         1608.726450                        1597.578624   
3                         1845.161086                        2856.923949   
4                         1512.261956                        1945.907752   

   delTrenton_lag2_regression_disagg  
0                         312.558400  
1                         654.031017  
2                        2743.665878  
3                        4238.030173  
4                        2924.458661  

Make sure your files use the same column names and structure so they can be correctly interpreted by the model.

Datasets you need to have in your customize “my_folder”#

print("For flow type folder, you need to have the following files: \n")
pn = pywrdrb.get_pn_object()
files = pn.flows.nhmv10.list()
print(f"File needed: {files}\n")  
for file in files:
    df = pd.read_csv(pn.flows.nhmv10.get(file))
    print(f"File: {file}")
    print(df.iloc[:5, :5]) # print first 5 rows and 5 columns
    print("\n")
#df.head()
For flow type folder, you need to have the following files: 

File needed: ['catchment_inflow_mgd.csv', 'gage_flow_mgd.csv', 'predicted_inflows_mgd.csv']

File: catchment_inflow_mgd.csv
     datetime  cannonsville    pepacton   neversink  wallenpaupack
0  1980-10-01    130.379373   89.325193   60.718368      33.438627
1  1980-10-02    320.876434  234.425929   58.710650     155.951484
2  1980-10-03    384.188487  261.300230   71.301161     196.969214
3  1980-10-04    345.220715  265.843778  115.661384     229.965044
4  1980-10-05    322.869093  257.040294   71.100223     208.087549


File: gage_flow_mgd.csv
     datetime  cannonsville    pepacton   neversink  wallenpaupack
0  1980-10-01    130.379373   89.325193   60.718368      33.438627
1  1980-10-02    320.876434  234.425929   58.710650     155.951484
2  1980-10-03    384.188487  261.300230   71.301161     196.969214
3  1980-10-04    345.220715  265.843778  115.661384     229.965044
4  1980-10-05    322.869093  257.040294   71.100223     208.087549


File: predicted_inflows_mgd.csv
     datetime  delMontague_lag1_regression_disagg  \
0  1983-10-01                          229.021075   
1  1983-10-02                          437.812448   
2  1983-10-03                         1091.191061   
3  1983-10-04                         1367.417389   
4  1983-10-05                         1046.802593   

   delMontague_lag2_regression_disagg  delTrenton_lag1_regression_disagg  \
0                          407.304641                          34.783055   
1                          846.733954                         330.528839   
2                         1608.726450                        1597.578624   
3                         1845.161086                        2856.923949   
4                         1512.261956                        1945.907752   

   delTrenton_lag2_regression_disagg  
0                         312.558400  
1                         654.031017  
2                        2743.665878  
3                        4238.030173  
4                        2924.458661  

More About the Global PathNavigator Object Used in PywrDRB#

We can get the global PathNavigator object used in PywrDRB by running: pn = pywrdrb.get_pn_object()

This pn object contains all the directory and path information, allowing you to locate specific files used in PywrDRB within the file explorer.

More pn operations can be found here. However, users should ONLY use pn to explore file and folder locations. It is not designed for modifications unless you fully understand what you are doing.

pn = pywrdrb.get_pn_object()
print(f"The root directory of the pywrdrb: {pn.get()}")
The root directory of the pywrdrb: C:\Users\CL\Documents\GitHub\Pywr-DRB\src\pywrdrb\data

You can also scan and print the folder structure:

pn.scan(max_depth=2)  # scan the directory structure up to 2 levels deep
pn.tree()