Introduction to Pywr-DRB#

Overview:#

If you want to learn how to use the Pywr-DRB water resource model, you are in the right place.

This page is designed to introduce you to the Pywr-DRB code base, help you set up your environment, and show you how to access and begin interacting with a Pywr-DRB model instance.

Tutorial content:#

  1. Getting Started

  2. Explanation of the Pywr-DRB code base

    • input_data

    • pywrdrb

  3. Interacting with a Pywr-DRB model instance

    • Constructing and loading a pywrdrb model

    • Nodes

    • Parameters

  4. Running a Pywr-DRB simulation


1.0 Getting Started#

The Pywr-DRB GitHub organization page contains three repositories at the time of writing:

Repo

Description

Pywr-DRB

This repo contains all of the code needed

DRB-Historic-Reconstruction

Used to generate historic streamflow reconstructions from 1945-2022. Reconstructions are exported to the Pywr-DRB/input_data folder to be used for simulation.

Input-Data-Retrieval

Contains workflows for retrieving data from the USGS NWIS (for observed flows), NHMv1.0 and NWMv2.1 modeled flows. Data for Pywr-DRB relevant locations are retrieved and exported to the Pywr-DRB/input_data folder to be used for simulation. This process does not need to be repeated unless new or different datapoints are needed.

For now, these tutorials will only require the Pywr-DRB repository code. Start by cloning the github repository onto your machine.

To clone the most recent version:

git clone https://github.com/Pywr-DRB/Pywr-DRB.git

To get Pywr-DRB version 1.01, used to replicate the results in Hamilton, Amestoy & Reed. (Under Review), clone the diagnostic_paper branch of the Pywr-DRB repository from GitHub:

git clone -b diagnostic_paper https://github.com/Pywr-DRB/Pywr-DRB.git

Create a virtual environment where you can install dependencies.

For windows:

py -m pip install --upgrade pip
py -m pip install virtualenv

py -m virtualenv venv
source venv activate # linux
venv/Scripts/activate
py -m pip install -r requirements.txt

Now we are ready!



2.0 Explanation of the Pywr-DRB code base#

Now that you have the code base available, it will be helpful to take a moment to familiarize yourself with what is inside. The sections below highlight some key folders, their contents, and how they fit into the broader Pywr-DRB model workflow.

For now, these sections focus on the two most important folders in the Pywr-DRB repo:

  • Pywr-DRB/input_data

  • Pywr-DRB/pywrdrb

2.1 Input Data#

Pywr-DRB is able to run simulations using different sets of streamflow input data. It is currently set up to run simulations using multiple different datasets, including:

  • The National Hydrologic Model version 1.0 (NHM; "nhmv10")

  • The National Water Model version 2.1 (NWM; "nwmv21")

  • Hybrid datasets that combine observed, scaled-observed, and model (NHM/NWM) streamflows based on model location and data availability.

    • Hybrid-NHM (hNHM)

    • Hybrid-NWM (hNWM)

Each dataset is given a unique identifying name (e.g., "nhmv10") which is used at multiple points in the Pywr-DRB workflow.

The current input datasets which come with the Pywr-DRB repository are:

  • "nhmv10"

  • "nwmv21"

  • "nhmv10_withObsScaled"

  • "nwmv21_withObsScaled"

1.2.1 Necessary files for simulation:#

For each of the four datasets mentioned above, the necessary input files are included in the Pywr-DRB repository. It’s worth pointing out what the important input files. Given you want to run a simulation based on a specific streamflow scenario/dataset called <inflow_type>, then you need to make sure you have the following files:

Streamflow data:

  • catchment_inflow_<inflow_type>.csv

    • These are catchment inflow timeseries for each of the main Pywr-DRB nodes. Data are at a daily timescale, in millions of gallons per day (MGD).

  • gage_flow_<inflow_type>.csv

    • These are total streamflow timeseries at each of the main Pywr-DRB nodes. This data reflects pre-management streamflow conditions across the network. Units are MGD.

  • predicted_inflows_diversions_<inflow_type>.csv

    • These are predicted N-day ahead streamflow conditions at Montague and Trenton. The value of N ranges from 2-4. These predictions are used in simulated FFMP operations at NYC reservoirs, since they seek to maintain releases downstream while being 4-days upstream.

Consumption and transbasin diversions:

  • sw_avg_wateruse_Pywr-DRB_Catchments.csv

    • Water diversion timeseries for each of the catchments in Pywr-DRB.

  • deliveryNJ_DRCanal_extrapolated.csv

    • Daily NJ diversion data, extrapolated further back in time based on recent data.

  • deliveryNYC_ORDM_extrapolated.csv

    • Daily NYC diversion data, extrapolated further back in time based on recent data.


2.2 pywrdrb#

This folder (pywr-DRB/pywrdrb) is where all the code for the model lives.

There are several submodules (folders within pywrdrb) which are also important to be familiar with, and are described below.

2.2.1 pywrdrb.model_data#

Go to the folder Pywr-DRB/pywrdrb/model_data/.

The file drb_model_full_<input type>.json contains all of the structural information defining the model. Essentially, this is a dictionary containing lists of nodes, edges, and parameters. Together, this information is used by Pywr to construct the linear program which is used to simulate operations.

The model data structure as described in the pywr documentation is:

{
    "metadata": {},
    "timestepper": {},
    "solver": {},
    "nodes": {},
    "edges": {},
    "parameters": {}
}

In a later tutorial we will go into more detail into how these .json files are created.

The pywrdrb.model_data also has some of the other drb_model_<*>.csv files which contain extra information that is accessed at the start of Pywr simulation. Some examples include:

  • drb_model_istarf_conus.csv

    • This file contains the STARFIT (aka ISTARF-CONUS) parameters developed by Turner et al. (2021), which are used to simulate reservoir operations at the Non-NYC reservoirs

  • drb_model_dailyProfiles.csv

    • Daily values for the different Flexible Flow Management Program (FFMP) operations classifications which are based on NYC storage level values. This is loaded by Pywr at the start of the simulation and stored in a DataFrame to be used during simulation.

2.2.2 pywrdrb.parameters#

Parameters are simply Python classes which are used in a Pywr simulation. They are used to track different variables during the simulation and perform specific operations. Pywr be default has a set of built-in Parameters which can be used to do basic operations.

However, in many cases we need a custom parameter which will implement a custom function during simulation. These custom parameters are located in pywrdrb.parameters.

Some characteristics of Parameters are:

  • Parameters are loaded at the start of the model simulation

  • Parameters are written as class objects

  • There can be multiple different instances of a specific Parameter in the Pywr-DRB model

  • Parameters can be linked to other parameters or nodes in the model and access data from that parameter or node during each timestep

  • Parameters store data as attributes, and access that data every timestep

  • Parameters can return a value (output) every timestep

Specifically, in pywrdrb we have:

  • pywrdrb.parameters.ffmp

    • These parameters are used to implement the FFMP at NYC reservoirs. There is a lot packed in here, and it will be good to return to this later on.

  • pywrdrb.parameters.starfit

    • This contains the STARFITReservoirRelease parameter which is used to calculate the STARFIT based reservoir releases each day for non-NYC reservoirs. The output of this parameter (if you )

  • pywrdrb.parameters.lower_basin_ffmp

    • These parameters are used to determine when and how much water from the lower basin reservoirs (Beltzville, Blue Marsh, Nockamixon) should be released to help meet the downstream flow targets. This parameter communicates with the pywrdrb.parameters.ffmp parameters in order to make this decision.

  • pywrdrb.parameters.general

    • Currently, this only contains a single LaggedReservoirRelease parameter.

  • pywrdrb.parameters.inflow_ensemble

    • This contains parameters which are used to handle ensemble simulations in parallel. We won’t run any ensembles yet, so don’t worry about this for now.

Later in this tutorial we will load a pywrdrb model and identify some parameters.

2.2.3 pywrdrb.pre#

The pywrdrb.pre module contains different functions used to prepare model input data. When you clone the Pywr-DRB repository, it will contain several pre-processed datasets.

The pywrdrb.pre module contains:

  • disaggregate_DRBC_demands.py

    • Used to disaggregate demand data provided by the DRB Commission (DRBC). The demands are mapped to the Pywr-DRB catchment areas.

  • extrapolate_NYC_NJ_diversions.py

    • Used to extend limited historic diversion data further back in time. Regressions are constructed which predict monthly diversion demands dependent on streamflow conditions. Then [[K-Nearest Neighbors|KNN]] timeseries sampling is used for temporal disaggregation from monthly to daily timeseries.

  • predict_inflows_diversions.py

    • Contains models for predicting N-day ahead inflows and diversions across the Pywr-DRB network. These predictions are used in the FFMP operations, where NYC is interested in predicting up to the 4-day ahead flow at Trenton to plan their releases accordingly. The 3- and 2-day ahead predictions are also made.

  • prep_input_data_functions.py

    • Contains several functions used in the pre-processing workflow. One example is the subtract_upstream_catchment_inflows() which transform total streamflow into marginal catchment inflow timeseries. These marginal inflows are used as inputs for each node in Pywr-DRB.

Later, as you consider preparing new input scenarios, it will be necessary to understand these processing steps. These preprocessing steps are explained in detail in the supplemental information for [[Hamilton, Amestoy, & Reed (2024)]].

2.2.4 pywrdrb.post#

The pywrdrb.post submodule contains different scripts used for post-processing simulation results.

The main function used here is pywrdrb.post.get_pywr_results() which is designed to extract different variables of interest from the output file.

get_pywr_results(output_dir, 
				 model, 
				 results_set='all', 
				 scenario=0, datetime_index=None)

In get_pywr_results, the results_set argument specifies what type of data you want to retrieve. For example results_set = 'major_flow' will return the total flow at major nodes while results_set = 'res_release' will return reservoir release data.

2.2.5 pywrdrb.plotting#

This module contains different plotting functions. We won’t use any of these plots yet, but keep in mind that there is a common place to store these.


Activity: Code flowchart#

ACTIVITY: Let’s pause here and take a minute to explore the Pywr-DRB code base. Specifically, go go through the repository and make a flowchart diagram which shows the relationships and key content for the various sub-folders in the repository.

You might consider using a flowchart software such as draw.io or doing this by hand.

Don’t get caught up in nitty-gritty details, as your understanding of the repo will change with time.

Send Trevor a version of this flowchart once you are done.


3.0 Interacting with a Pywr-DRB model instance#

Before running any of this code, you may need to modify the sys.path to make sure it can access the pywrdrb folder. Assuming that this tutorial is stored in the Pywr-DRB/notebooks/ folder, you will need to run:

import sys

path_to_pywrdrb = '../'
sys.path.append(path_to_pywrdrb)

3.1 Loading a Pywr model#

When loading a model with Pywr, we need to provide a json file which defines the nodes, edges, and parameters of the model (see the section 2.2.1 pywrdrb.model_data of this tutorial).

To load the model, we use the pywr.model.Model class which takes the json filename as an input.

The following code is used to specify a streamflow dataset, set the model file which we want to load, and load it using pywr.model:

from pywr.model import Model 

# import our custom parameters, since pywr will need them to construct the model
from pywrdrb.parameters import *

# import the make_model function to generate a new JSON model file
from pywrdrb import ModelBuilder

# Options: "nhmv10", "nwmv21", "nhmv10_withObsScaled", "nwmv21_withObsScaled" 
inflow_type = 'nhmv10'   

# Simulation start and end dates
from pywrdrb.utils.dates import model_date_ranges
start_date, end_date = model_date_ranges[inflow_type]

# We use the dataset name to specify the file name
model_filename = f'drb_model_full_{inflow_type}.json'
model_filename = f'{path_to_pywrdrb}/pywrdrb/model_data/{model_filename}'

# Make a new model JSON file
mb = ModelBuilder(inflow_type, start_date, end_date) # Optional "options" argument is available
mb.make_model()
mb.write_model(model_filename)

Now, you might not have noticed but the model file ‘drb_model_full_{inflow_type}.json’ was just replaced with a new version.

### load the pywrdrb model
model = Model.load(model_filename)
Initialized STARFITReservoirRelease for reservoir: nockamixon
Initialized STARFITReservoirRelease for reservoir: blueMarsh
Initialized STARFITReservoirRelease for reservoir: beltzvilleCombined
Initialized STARFITReservoirRelease for reservoir: greenLane
Initialized STARFITReservoirRelease for reservoir: stillCreek
Initialized STARFITReservoirRelease for reservoir: ontelaunee
Initialized STARFITReservoirRelease for reservoir: assunpink
Initialized STARFITReservoirRelease for reservoir: hopatcong
Initialized STARFITReservoirRelease for reservoir: merrillCreek
Initialized STARFITReservoirRelease for reservoir: fewalter
Initialized STARFITReservoirRelease for reservoir: mongaupeCombined
Initialized STARFITReservoirRelease for reservoir: shoholaMarsh
Initialized STARFITReservoirRelease for reservoir: prompton
Initialized STARFITReservoirRelease for reservoir: wallenpaupack

3.2 Nodes#

Nodes are the primary features in the Pywr-DRB model, and are used to represent reservoirs, USGS gauges, catchment inflow points, and other things.

Take a minute to check out the pywr documentation on node classes.

While pywr does allow for custom nodes, we are not currently using any of these in pywrdrb.

The code below allows you to make a list of the model nodes. Run the code and count the number of nodes in the model.

# Make a list of all model nodes
model_nodes = [n for n in model.nodes if n.name]

print(f'There are {len(model_nodes)} nodes in the model.')
There are 177 nodes in the model.

3.3 Parameters#

We can do the same thing for the model parameters:

### Read model parameter names into a list
model_parameters = [p for p in model.parameters if p.name]
model_parameter_names = [p.name for p in model_parameters]

print(f'There are {len(model_parameters)} parameters in the model')
There are 394 parameters in the model

4.0 Running a Pywr-DRB simulation#

Once we have loaded the model, we are almost ready to run a simulation.

First, we need to initializes a pywr.recorders.TablesRecorder which will keep store simulation data during the model run. The TablesRecorder will automatically create a hdf5 file where it will store simulation data.

The recorder accepts as input:

  • The model object

  • The output_filename

  • A list of model parameters

The code below is used to initialize the TablesRecorder, run the simulation!

This should take 3-5 minutes to complete the full simulation.

(You will likely see many warnings pop up; don’t worry about those unless the simulation actually stops..)

# The pywr.recorders.TablesRecorder class is used store simulation results
# the simulation data is stored in an hdf5 file which is accessed during the simulation
from pywr.recorders import TablesRecorder

# there are a few naming convention warnings pywr, we can ignore them
import warnings
warnings.filterwarnings("ignore")

output_filename = f'drb_output_{inflow_type}.hdf5'
output_filename = f'../output_data/{output_filename}'

### Add a storage recorder
TablesRecorder(model = model, 
			   h5file = output_filename, 
			   parameters = model_parameters)

### Run the model
stats = model.run()
Assigning STARFIT parameters for wallenpaupack
Assigning STARFIT parameters for prompton
Assigning STARFIT parameters for shoholaMarsh
Assigning STARFIT parameters for mongaupeCombined
Assigning STARFIT parameters for fewalter
Assigning STARFIT parameters for merrillCreek
Assigning STARFIT parameters for hopatcong
Assigning STARFIT parameters for assunpink
Assigning STARFIT parameters for ontelaunee
Assigning STARFIT parameters for stillCreek
Assigning STARFIT parameters for greenLane
Assigning STARFIT parameters for nockamixon
Assigning STARFIT parameters for blueMarsh
Assigning STARFIT parameters for beltzvilleCombined

Congrats, you’ve just completed your first simulation using pywrdrb!


5.0 Accessing Pywr-DRB output data#

The data will be output to a HDF5 file in the Pywr-DRB/output_data/ folder.

These HDF5 files can be a little tricky if you don’t have experience with them. We have made a function which makes it easy to load specific types of variables from these output files.

The pywrdrb.Output class provides an easy method for loading and accessing Pywr-DRB model results.

The pywrdrb.Output Class#

The Output class consolidates this entire data retrieval process into a single load() method, automatically validating inputs, handling scenarios, and storing results inside the Ouput object.

Output.load() uses pywrdrb.post.get_pywrdrb_results() to fetch data for all specified models and results_sets. It manages datetime indexing and scenarios, and then stores the results as attributes within the class.

After using Output.load(), results are stored in the class as a nested dictionary structure:

Output.results_set[model][scenario] -> pd.DataFrame

This will return a pd.DataFrame that contains the simulation data with a datetime index.

Let’s use the Output class to load multiple different sets of results from the simulation run above:

from pywrdrb import Output

models = [inflow_type]
results_sets = ['major_flow', 'res_storage', 'mrf_target', 'res_release']

output = Output(models, 
                results_sets = results_sets,
                print_status = True)
output.load()

## Access the data using format: 
# output.results_set[model][scenario]

output.major_flow['nhmv10'][0].head()
Loading major_flow data for nhmv10
Loading res_storage data for nhmv10
Loading mrf_target data for nhmv10
Loading res_release data for nhmv10
01417000 01425000 01433500 01436000 01447800 01449800 01463620 01470960 delDRCanal delLordville delMontague delTrenton outletAssunpink outletSchuylkill
1983-10-01 245.248604 192.037241 702.126510 216.100859 1296.612713 30.641998 77.324137 981.138761 872.861166 626.737443 1050.155248 804.943656 110.874962 329.747828
1983-10-02 157.896171 150.911779 716.300505 60.483957 1298.492457 32.867673 77.732775 982.026219 1837.957440 514.753462 2386.712148 1772.532316 132.337899 688.332292
1983-10-03 227.762061 193.806146 706.181702 203.568972 1294.352344 30.133752 77.890367 978.970950 4459.303140 640.682999 2799.830204 4394.501111 114.075338 1740.349575
1983-10-04 283.875990 232.867018 695.766889 164.629266 1294.308205 28.921843 77.668647 978.663561 5584.875812 715.024366 2772.808094 5519.450687 106.791078 1683.772339
1983-10-05 334.835463 279.267193 692.402041 202.008757 1294.264190 28.795100 77.063649 978.558148 5798.400400 780.455696 2777.388751 5733.598372 103.601430 1604.624740

You should see a DataFrame with multiple columns corresponding to each of the reservoirs in the model.

Now you can get into the fun of looking at results and some data visualization!


Summary of Training and Activities#

You’ve just made it to the end of the first Pywr-DRB training. Wooo!

Just to recap, in this training we considered:

  1. Getting Started by cloning the repository and creating your virtual environment

  2. Explanation of the Pywr-DRB code base with a focus on key folders and files.

  3. Interacting with a Pywr-DRB model instance

  4. Running a Pywr-DRB simulation

To make the most of this training, I recommend that you complete the activities from this training, including: