Introduction to Pywr-DRB

Introduction to Pywr-DRB#

Overview:#

If you want to learn how to use the Pywr-DRB water resource model, you are in the right place.

This page is designed to introduce you to the Pywr-DRB code base, help you set up your environment, and show you how to access and begin interacting with a Pywr-DRB model instance.

Links:#

Tutorial content:#

Getting Started
Explanation of the Pywr-DRB code base
- input_data
- pywrdrb
Interacting with a Pywr-DRB model instance
- Constructing and loading a pywrdrb model
- Nodes
- Parameters
Running a Pywr-DRB simulation

1.0 Getting Started#

The Pywr-DRB GitHub organization page contains three repositories at the time of writing:

Repo	Description
Pywr-DRB	This repo contains all of the code needed
DRB-Historic-Reconstruction	Used to generate historic streamflow reconstructions from 1945-2022. Reconstructions are exported to the `Pywr-DRB/input_data` folder to be used for simulation.
Input-Data-Retrieval	Contains workflows for retrieving data from the USGS NWIS (for observed flows), NHMv1.0 and NWMv2.1 modeled flows. Data for Pywr-DRB relevant locations are retrieved and exported to the `Pywr-DRB/input_data` folder to be used for simulation. This process does not need to be repeated unless new or different datapoints are needed.

For now, these tutorials will only require the Pywr-DRB repository code. Start by cloning the github repository onto your machine.

To clone the most recent version:

git clone https://github.com/Pywr-DRB/Pywr-DRB.git

To get Pywr-DRB version 1.01, used to replicate the results in Hamilton, Amestoy & Reed. (Under Review), clone the diagnostic_paper branch of the Pywr-DRB repository from GitHub:

git clone -b diagnostic_paper https://github.com/Pywr-DRB/Pywr-DRB.git

Create a virtual environment where you can install dependencies.

For windows:

py -m pip install --upgrade pip
py -m pip install virtualenv

py -m virtualenv venv
source venv activate # linux
venv/Scripts/activate
py -m pip install -r requirements.txt

Now we are ready!

2.0 Explanation of the Pywr-DRB code base#

Now that you have the code base available, it will be helpful to take a moment to familiarize yourself with what is inside. The sections below highlight some key folders, their contents, and how they fit into the broader Pywr-DRB model workflow.

For now, these sections focus on the two most important folders in the Pywr-DRB repo:

Pywr-DRB/input_data
Pywr-DRB/pywrdrb

2.1 Input Data#

Pywr-DRB is able to run simulations using different sets of streamflow input data. It is currently set up to run simulations using multiple different datasets, including:

The National Hydrologic Model version 1.0 (NHM; "nhmv10")
The National Water Model version 2.1 (NWM; "nwmv21")
Hybrid datasets that combine observed, scaled-observed, and model (NHM/NWM) streamflows based on model location and data availability.
- Hybrid-NHM (hNHM)
- Hybrid-NWM (hNWM)

Each dataset is given a unique identifying name (e.g., "nhmv10") which is used at multiple points in the Pywr-DRB workflow.

The current input datasets which come with the Pywr-DRB repository are:

"nhmv10"
"nwmv21"
"nhmv10_withObsScaled"
"nwmv21_withObsScaled"

1.2.1 Necessary files for simulation:#

For each of the four datasets mentioned above, the necessary input files are included in the Pywr-DRB repository. It’s worth pointing out what the important input files. Given you want to run a simulation based on a specific streamflow scenario/dataset called <inflow_type>, then you need to make sure you have the following files:

Streamflow data:

catchment_inflow_<inflow_type>.csv
- These are catchment inflow timeseries for each of the main Pywr-DRB nodes. Data are at a daily timescale, in millions of gallons per day (MGD).
gage_flow_<inflow_type>.csv
- These are total streamflow timeseries at each of the main Pywr-DRB nodes. This data reflects pre-management streamflow conditions across the network. Units are MGD.
predicted_inflows_diversions_<inflow_type>.csv
- These are predicted N-day ahead streamflow conditions at Montague and Trenton. The value of N ranges from 2-4. These predictions are used in simulated FFMP operations at NYC reservoirs, since they seek to maintain releases downstream while being 4-days upstream.

Consumption and transbasin diversions:

sw_avg_wateruse_Pywr-DRB_Catchments.csv
- Water diversion timeseries for each of the catchments in Pywr-DRB.
deliveryNJ_DRCanal_extrapolated.csv
- Daily NJ diversion data, extrapolated further back in time based on recent data.
deliveryNYC_ORDM_extrapolated.csv
- Daily NYC diversion data, extrapolated further back in time based on recent data.

2.2 `pywrdrb`#

This folder (pywr-DRB/pywrdrb) is where all the code for the model lives.

There are several submodules (folders within pywrdrb) which are also important to be familiar with, and are described below.

2.2.1 `pywrdrb.model_data`#

Go to the folder Pywr-DRB/pywrdrb/model_data/.

The file drb_model_full_<input type>.json contains all of the structural information defining the model. Essentially, this is a dictionary containing lists of nodes, edges, and parameters. Together, this information is used by Pywr to construct the linear program which is used to simulate operations.

The model data structure as described in the pywr documentation is:

{
    "metadata": {},
    "timestepper": {},
    "solver": {},
    "nodes": {},
    "edges": {},
    "parameters": {}
}

In a later tutorial we will go into more detail into how these .json files are created.

The pywrdrb.model_data also has some of the other drb_model_<*>.csv files which contain extra information that is accessed at the start of Pywr simulation. Some examples include:

drb_model_istarf_conus.csv
- This file contains the STARFIT (aka ISTARF-CONUS) parameters developed by Turner et al. (2021), which are used to simulate reservoir operations at the Non-NYC reservoirs
drb_model_dailyProfiles.csv
- Daily values for the different Flexible Flow Management Program (FFMP) operations classifications which are based on NYC storage level values. This is loaded by Pywr at the start of the simulation and stored in a DataFrame to be used during simulation.

2.2.2 `pywrdrb.parameters`#

Parameters are simply Python classes which are used in a Pywr simulation. They are used to track different variables during the simulation and perform specific operations. Pywr be default has a set of built-in Parameters which can be used to do basic operations.

However, in many cases we need a custom parameter which will implement a custom function during simulation. These custom parameters are located in pywrdrb.parameters.

Some characteristics of Parameters are:

Parameters are loaded at the start of the model simulation
Parameters are written as class objects
There can be multiple different instances of a specific Parameter in the Pywr-DRB model
Parameters can be linked to other parameters or nodes in the model and access data from that parameter or node during each timestep
Parameters store data as attributes, and access that data every timestep
Parameters can return a value (output) every timestep

Specifically, in pywrdrb we have:

pywrdrb.parameters.ffmp
- These parameters are used to implement the FFMP at NYC reservoirs. There is a lot packed in here, and it will be good to return to this later on.
pywrdrb.parameters.starfit
- This contains the STARFITReservoirRelease parameter which is used to calculate the STARFIT based reservoir releases each day for non-NYC reservoirs. The output of this parameter (if you )
pywrdrb.parameters.lower_basin_ffmp
- These parameters are used to determine when and how much water from the lower basin reservoirs (Beltzville, Blue Marsh, Nockamixon) should be released to help meet the downstream flow targets. This parameter communicates with the pywrdrb.parameters.ffmp parameters in order to make this decision.
pywrdrb.parameters.general
- Currently, this only contains a single LaggedReservoirRelease parameter.
pywrdrb.parameters.inflow_ensemble
- This contains parameters which are used to handle ensemble simulations in parallel. We won’t run any ensembles yet, so don’t worry about this for now.

Later in this tutorial we will load a pywrdrb model and identify some parameters.

2.2.3 `pywrdrb.pre`#

The pywrdrb.pre module contains different functions used to prepare model input data. When you clone the Pywr-DRB repository, it will contain several pre-processed datasets.

The pywrdrb.pre module contains:

disaggregate_DRBC_demands.py
- Used to disaggregate demand data provided by the DRB Commission (DRBC). The demands are mapped to the Pywr-DRB catchment areas.
extrapolate_NYC_NJ_diversions.py
- Used to extend limited historic diversion data further back in time. Regressions are constructed which predict monthly diversion demands dependent on streamflow conditions. Then [[K-Nearest Neighbors|KNN]] timeseries sampling is used for temporal disaggregation from monthly to daily timeseries.
predict_inflows_diversions.py
- Contains models for predicting N-day ahead inflows and diversions across the Pywr-DRB network. These predictions are used in the FFMP operations, where NYC is interested in predicting up to the 4-day ahead flow at Trenton to plan their releases accordingly. The 3- and 2-day ahead predictions are also made.
prep_input_data_functions.py
- Contains several functions used in the pre-processing workflow. One example is the subtract_upstream_catchment_inflows() which transform total streamflow into marginal catchment inflow timeseries. These marginal inflows are used as inputs for each node in Pywr-DRB.

Later, as you consider preparing new input scenarios, it will be necessary to understand these processing steps. These preprocessing steps are explained in detail in the supplemental information for [[Hamilton, Amestoy, & Reed (2024)]].

2.2.4 `pywrdrb.post`#

The pywrdrb.post submodule contains different scripts used for post-processing simulation results.

The main function used here is pywrdrb.post.get_pywr_results() which is designed to extract different variables of interest from the output file.

get_pywr_results(output_dir, 
				 model, 
				 results_set='all', 
				 scenario=0, datetime_index=None)

In get_pywr_results, the results_set argument specifies what type of data you want to retrieve. For example results_set = 'major_flow' will return the total flow at major nodes while results_set = 'res_release' will return reservoir release data.

2.2.5 `pywrdrb.plotting`#

This module contains different plotting functions. We won’t use any of these plots yet, but keep in mind that there is a common place to store these.

Activity: Code flowchart#

ACTIVITY: Let’s pause here and take a minute to explore the Pywr-DRB code base. Specifically, go go through the repository and make a flowchart diagram which shows the relationships and key content for the various sub-folders in the repository.

You might consider using a flowchart software such as draw.io or doing this by hand.

Don’t get caught up in nitty-gritty details, as your understanding of the repo will change with time.

Send Trevor a version of this flowchart once you are done.

3.0 Interacting with a Pywr-DRB model instance#

Before running any of this code, you may need to modify the sys.path to make sure it can access the pywrdrb folder. Assuming that this tutorial is stored in the Pywr-DRB/notebooks/ folder, you will need to run:

import sys

path_to_pywrdrb = '../'
sys.path.append(path_to_pywrdrb)

3.1 Loading a Pywr model#

When loading a model with Pywr, we need to provide a json file which defines the nodes, edges, and parameters of the model (see the section 2.2.1 pywrdrb.model_data of this tutorial).

To load the model, we use the pywr.model.Model class which takes the json filename as an input.

The following code is used to specify a streamflow dataset, set the model file which we want to load, and load it using pywr.model:

from pywr.model import Model 

# import our custom parameters, since pywr will need them to construct the model
from pywrdrb.parameters import *

# import the make_model function to generate a new JSON model file
from pywrdrb import ModelBuilder

# Options: "nhmv10", "nwmv21", "nhmv10_withObsScaled", "nwmv21_withObsScaled" 
inflow_type = 'nhmv10'   

# Simulation start and end dates
from pywrdrb.utils.dates import model_date_ranges
start_date, end_date = model_date_ranges[inflow_type]

# We use the dataset name to specify the file name
model_filename = f'drb_model_full_{inflow_type}.json'
model_filename = f'{path_to_pywrdrb}/pywrdrb/model_data/{model_filename}'

# Make a new model JSON file
mb = ModelBuilder(inflow_type, start_date, end_date) # Optional "options" argument is available
mb.make_model()
mb.write_model(model_filename)

Now, you might not have noticed but the model file ‘drb_model_full_{inflow_type}.json’ was just replaced with a new version.

### load the pywrdrb model
model = Model.load(model_filename)

Initialized STARFITReservoirRelease for reservoir: nockamixon
Initialized STARFITReservoirRelease for reservoir: blueMarsh
Initialized STARFITReservoirRelease for reservoir: beltzvilleCombined
Initialized STARFITReservoirRelease for reservoir: greenLane
Initialized STARFITReservoirRelease for reservoir: stillCreek
Initialized STARFITReservoirRelease for reservoir: ontelaunee
Initialized STARFITReservoirRelease for reservoir: assunpink
Initialized STARFITReservoirRelease for reservoir: hopatcong
Initialized STARFITReservoirRelease for reservoir: merrillCreek
Initialized STARFITReservoirRelease for reservoir: fewalter
Initialized STARFITReservoirRelease for reservoir: mongaupeCombined
Initialized STARFITReservoirRelease for reservoir: shoholaMarsh
Initialized STARFITReservoirRelease for reservoir: prompton
Initialized STARFITReservoirRelease for reservoir: wallenpaupack

3.2 Nodes#

Nodes are the primary features in the Pywr-DRB model, and are used to represent reservoirs, USGS gauges, catchment inflow points, and other things.

Take a minute to check out the pywr documentation on node classes.

While pywr does allow for custom nodes, we are not currently using any of these in pywrdrb.

The code below allows you to make a list of the model nodes. Run the code and count the number of nodes in the model.

# Make a list of all model nodes
model_nodes = [n for n in model.nodes if n.name]

print(f'There are {len(model_nodes)} nodes in the model.')

There are 177 nodes in the model.

3.3 Parameters#

We can do the same thing for the model parameters:

### Read model parameter names into a list
model_parameters = [p for p in model.parameters if p.name]
model_parameter_names = [p.name for p in model_parameters]

print(f'There are {len(model_parameters)} parameters in the model')

There are 394 parameters in the model

4.0 Running a Pywr-DRB simulation#

Once we have loaded the model, we are almost ready to run a simulation.

First, we need to initializes a pywr.recorders.TablesRecorder which will keep store simulation data during the model run. The TablesRecorder will automatically create a hdf5 file where it will store simulation data.

The recorder accepts as input:

The model object
The output_filename
A list of model parameters

The code below is used to initialize the TablesRecorder, run the simulation!

This should take 3-5 minutes to complete the full simulation.

(You will likely see many warnings pop up; don’t worry about those unless the simulation actually stops..)

# The pywr.recorders.TablesRecorder class is used store simulation results
# the simulation data is stored in an hdf5 file which is accessed during the simulation
from pywr.recorders import TablesRecorder

# there are a few naming convention warnings pywr, we can ignore them
import warnings
warnings.filterwarnings("ignore")

output_filename = f'drb_output_{inflow_type}.hdf5'
output_filename = f'../output_data/{output_filename}'

### Add a storage recorder
TablesRecorder(model = model, 
			   h5file = output_filename, 
			   parameters = model_parameters)

### Run the model
stats = model.run()

Assigning STARFIT parameters for wallenpaupack
Assigning STARFIT parameters for prompton
Assigning STARFIT parameters for shoholaMarsh
Assigning STARFIT parameters for mongaupeCombined
Assigning STARFIT parameters for fewalter
Assigning STARFIT parameters for merrillCreek
Assigning STARFIT parameters for hopatcong
Assigning STARFIT parameters for assunpink
Assigning STARFIT parameters for ontelaunee
Assigning STARFIT parameters for stillCreek
Assigning STARFIT parameters for greenLane
Assigning STARFIT parameters for nockamixon
Assigning STARFIT parameters for blueMarsh
Assigning STARFIT parameters for beltzvilleCombined

Congrats, you’ve just completed your first simulation using pywrdrb!

5.0 Accessing Pywr-DRB output data#

The data will be output to a HDF5 file in the Pywr-DRB/output_data/ folder.

These HDF5 files can be a little tricky if you don’t have experience with them. We have made a function which makes it easy to load specific types of variables from these output files.

The pywrdrb.Output class provides an easy method for loading and accessing Pywr-DRB model results.

The `pywrdrb.Output` Class#

The Output class consolidates this entire data retrieval process into a single load() method, automatically validating inputs, handling scenarios, and storing results inside the Ouput object.

Output.load() uses pywrdrb.post.get_pywrdrb_results() to fetch data for all specified models and results_sets. It manages datetime indexing and scenarios, and then stores the results as attributes within the class.

After using Output.load(), results are stored in the class as a nested dictionary structure:

Output.results_set[model][scenario] -> pd.DataFrame

This will return a pd.DataFrame that contains the simulation data with a datetime index.

Let’s use the Output class to load multiple different sets of results from the simulation run above:

from pywrdrb import Output

models = [inflow_type]
results_sets = ['major_flow', 'res_storage', 'mrf_target', 'res_release']

output = Output(models, 
                results_sets = results_sets,
                print_status = True)
output.load()

## Access the data using format: 
# output.results_set[model][scenario]

output.major_flow['nhmv10'][0].head()

Loading major_flow data for nhmv10
Loading res_storage data for nhmv10
Loading mrf_target data for nhmv10
Loading res_release data for nhmv10

	01417000	01425000	01433500	01436000	01447800	01449800	01463620	01470960	delDRCanal	delLordville	delMontague	delTrenton	outletAssunpink	outletSchuylkill
1983-10-01	245.248604	192.037241	702.126510	216.100859	1296.612713	30.641998	77.324137	981.138761	872.861166	626.737443	1050.155248	804.943656	110.874962	329.747828
1983-10-02	157.896171	150.911779	716.300505	60.483957	1298.492457	32.867673	77.732775	982.026219	1837.957440	514.753462	2386.712148	1772.532316	132.337899	688.332292
1983-10-03	227.762061	193.806146	706.181702	203.568972	1294.352344	30.133752	77.890367	978.970950	4459.303140	640.682999	2799.830204	4394.501111	114.075338	1740.349575
1983-10-04	283.875990	232.867018	695.766889	164.629266	1294.308205	28.921843	77.668647	978.663561	5584.875812	715.024366	2772.808094	5519.450687	106.791078	1683.772339
1983-10-05	334.835463	279.267193	692.402041	202.008757	1294.264190	28.795100	77.063649	978.558148	5798.400400	780.455696	2777.388751	5733.598372	103.601430	1604.624740

You should see a DataFrame with multiple columns corresponding to each of the reservoirs in the model.

Now you can get into the fun of looking at results and some data visualization!

Summary of Training and Activities#

You’ve just made it to the end of the first Pywr-DRB training. Wooo!

Just to recap, in this training we considered:

Getting Started by cloning the repository and creating your virtual environment
Explanation of the Pywr-DRB code base with a focus on key folders and files.
- input_data
- pywrdrb
Interacting with a Pywr-DRB model instance
Running a Pywr-DRB simulation

To make the most of this training, I recommend that you complete the activities from this training, including:

Make a flow chart of the key elements of the Pywr-DRB code base
Run an instance of the Pywr-DRB model
Load and visualize some of the output data