Introduction to Pywr-DRB#
Overview:#
If you want to learn how to use the Pywr-DRB water resource model, you are in the right place.
This page is designed to introduce you to the Pywr-DRB code base, help you set up your environment, and show you how to access and begin interacting with a Pywr-DRB model instance.
Links:#
Tutorial content:#
Getting Started
Explanation of the Pywr-DRB code base
input_data
pywrdrb
Interacting with a Pywr-DRB model instance
Constructing and loading a pywrdrb model
Nodes
Parameters
Running a Pywr-DRB simulation
1.0 Getting Started#
The Pywr-DRB GitHub organization page contains three repositories at the time of writing:
Repo |
Description |
---|---|
This repo contains all of the code needed |
|
Used to generate historic streamflow reconstructions from 1945-2022. Reconstructions are exported to the |
|
Contains workflows for retrieving data from the USGS NWIS (for observed flows), NHMv1.0 and NWMv2.1 modeled flows. Data for Pywr-DRB relevant locations are retrieved and exported to the |
For now, these tutorials will only require the Pywr-DRB repository code. Start by cloning the github repository onto your machine.
To clone the most recent version:
git clone https://github.com/Pywr-DRB/Pywr-DRB.git
To get Pywr-DRB version 1.01, used to replicate the results in Hamilton, Amestoy & Reed. (Under Review), clone the diagnostic_paper
branch of the Pywr-DRB repository from GitHub:
git clone -b diagnostic_paper https://github.com/Pywr-DRB/Pywr-DRB.git
Create a virtual environment where you can install dependencies.
For windows:
py -m pip install --upgrade pip
py -m pip install virtualenv
py -m virtualenv venv
source venv activate # linux
venv/Scripts/activate
py -m pip install -r requirements.txt
Now we are ready!
2.0 Explanation of the Pywr-DRB code base#
Now that you have the code base available, it will be helpful to take a moment to familiarize yourself with what is inside. The sections below highlight some key folders, their contents, and how they fit into the broader Pywr-DRB model workflow.
For now, these sections focus on the two most important folders in the Pywr-DRB repo:
Pywr-DRB/input_data
Pywr-DRB/pywrdrb
2.1 Input Data#
Pywr-DRB is able to run simulations using different sets of streamflow input data. It is currently set up to run simulations using multiple different datasets, including:
The National Hydrologic Model version 1.0 (NHM;
"nhmv10"
)The National Water Model version 2.1 (NWM;
"nwmv21"
)Hybrid datasets that combine observed, scaled-observed, and model (NHM/NWM) streamflows based on model location and data availability.
Hybrid-NHM (hNHM)
Hybrid-NWM (hNWM)
Each dataset is given a unique identifying name (e.g., "nhmv10"
) which is used at multiple points in the Pywr-DRB workflow.
The current input datasets which come with the Pywr-DRB repository are:
"nhmv10"
"nwmv21"
"nhmv10_withObsScaled"
"nwmv21_withObsScaled"
1.2.1 Necessary files for simulation:#
For each of the four datasets mentioned above, the necessary input files are included in the Pywr-DRB repository. It’s worth pointing out what the important input files. Given you want to run a simulation based on a specific streamflow scenario/dataset called <inflow_type>
, then you need to make sure you have the following files:
Streamflow data:
catchment_inflow_<inflow_type>.csv
These are catchment inflow timeseries for each of the main Pywr-DRB nodes. Data are at a daily timescale, in millions of gallons per day (MGD).
gage_flow_<inflow_type>.csv
These are total streamflow timeseries at each of the main Pywr-DRB nodes. This data reflects pre-management streamflow conditions across the network. Units are MGD.
predicted_inflows_diversions_<inflow_type>.csv
These are predicted N-day ahead streamflow conditions at Montague and Trenton. The value of N ranges from 2-4. These predictions are used in simulated FFMP operations at NYC reservoirs, since they seek to maintain releases downstream while being 4-days upstream.
Consumption and transbasin diversions:
sw_avg_wateruse_Pywr-DRB_Catchments.csv
Water diversion timeseries for each of the catchments in Pywr-DRB.
deliveryNJ_DRCanal_extrapolated.csv
Daily NJ diversion data, extrapolated further back in time based on recent data.
deliveryNYC_ORDM_extrapolated.csv
Daily NYC diversion data, extrapolated further back in time based on recent data.
2.2 pywrdrb
#
This folder (pywr-DRB/pywrdrb
) is where all the code for the model lives.
There are several submodules (folders within pywrdrb
) which are also important to be familiar with, and are described below.
2.2.1 pywrdrb.model_data
#
Go to the folder Pywr-DRB/pywrdrb/model_data/
.
The file drb_model_full_<input type>.json
contains all of the structural information defining the model. Essentially, this is a dictionary containing lists of nodes, edges, and parameters. Together, this information is used by Pywr to construct the linear program which is used to simulate operations.
The model data structure as described in the pywr
documentation is:
{
"metadata": {},
"timestepper": {},
"solver": {},
"nodes": {},
"edges": {},
"parameters": {}
}
In a later tutorial we will go into more detail into how these .json
files are created.
The pywrdrb.model_data
also has some of the other drb_model_<*>.csv
files which contain extra information that is accessed at the start of Pywr simulation. Some examples include:
drb_model_istarf_conus.csv
This file contains the STARFIT (aka ISTARF-CONUS) parameters developed by Turner et al. (2021), which are used to simulate reservoir operations at the Non-NYC reservoirs
drb_model_dailyProfiles.csv
Daily values for the different Flexible Flow Management Program (FFMP) operations classifications which are based on NYC storage level values. This is loaded by Pywr at the start of the simulation and stored in a DataFrame to be used during simulation.
2.2.2 pywrdrb.parameters
#
Parameters are simply Python classes which are used in a Pywr simulation. They are used to track different variables during the simulation and perform specific operations. Pywr be default has a set of built-in Parameters which can be used to do basic operations.
However, in many cases we need a custom parameter which will implement a custom function during simulation. These custom parameters are located in pywrdrb.parameters
.
Some characteristics of Parameters are:
Parameters are loaded at the start of the model simulation
Parameters are written as class objects
There can be multiple different instances of a specific Parameter in the Pywr-DRB model
Parameters can be linked to other parameters or nodes in the model and access data from that parameter or node during each timestep
Parameters store data as attributes, and access that data every timestep
Parameters can return a value (output) every timestep
Specifically, in pywrdrb
we have:
pywrdrb.parameters.ffmp
These parameters are used to implement the FFMP at NYC reservoirs. There is a lot packed in here, and it will be good to return to this later on.
pywrdrb.parameters.starfit
This contains the
STARFITReservoirRelease
parameter which is used to calculate the STARFIT based reservoir releases each day for non-NYC reservoirs. The output of this parameter (if you )
pywrdrb.parameters.lower_basin_ffmp
These parameters are used to determine when and how much water from the lower basin reservoirs (Beltzville, Blue Marsh, Nockamixon) should be released to help meet the downstream flow targets. This parameter communicates with the
pywrdrb.parameters.ffmp
parameters in order to make this decision.
pywrdrb.parameters.general
Currently, this only contains a single
LaggedReservoirRelease
parameter.
pywrdrb.parameters.inflow_ensemble
This contains parameters which are used to handle ensemble simulations in parallel. We won’t run any ensembles yet, so don’t worry about this for now.
Later in this tutorial we will load a pywrdrb
model and identify some parameters.
2.2.3 pywrdrb.pre
#
The pywrdrb.pre
module contains different functions used to prepare model input data. When you clone the Pywr-DRB repository, it will contain several pre-processed datasets.
The pywrdrb.pre
module contains:
disaggregate_DRBC_demands.py
Used to disaggregate demand data provided by the DRB Commission (DRBC). The demands are mapped to the Pywr-DRB catchment areas.
extrapolate_NYC_NJ_diversions.py
Used to extend limited historic diversion data further back in time. Regressions are constructed which predict monthly diversion demands dependent on streamflow conditions. Then [[K-Nearest Neighbors|KNN]] timeseries sampling is used for temporal disaggregation from monthly to daily timeseries.
predict_inflows_diversions.py
Contains models for predicting N-day ahead inflows and diversions across the Pywr-DRB network. These predictions are used in the FFMP operations, where NYC is interested in predicting up to the 4-day ahead flow at Trenton to plan their releases accordingly. The 3- and 2-day ahead predictions are also made.
prep_input_data_functions.py
Contains several functions used in the pre-processing workflow. One example is the
subtract_upstream_catchment_inflows()
which transform total streamflow into marginal catchment inflow timeseries. These marginal inflows are used as inputs for each node in Pywr-DRB.
Later, as you consider preparing new input scenarios, it will be necessary to understand these processing steps. These preprocessing steps are explained in detail in the supplemental information for [[Hamilton, Amestoy, & Reed (2024)]].
2.2.4 pywrdrb.post
#
The pywrdrb.post
submodule contains different scripts used for post-processing simulation results.
The main function used here is pywrdrb.post.get_pywr_results()
which is designed to extract different variables of interest from the output file.
get_pywr_results(output_dir,
model,
results_set='all',
scenario=0, datetime_index=None)
In get_pywr_results
, the results_set
argument specifies what type of data you want to retrieve. For example results_set = 'major_flow'
will return the total flow at major nodes while results_set = 'res_release'
will return reservoir release data.
2.2.5 pywrdrb.plotting
#
This module contains different plotting functions. We won’t use any of these plots yet, but keep in mind that there is a common place to store these.
Activity: Code flowchart#
ACTIVITY: Let’s pause here and take a minute to explore the Pywr-DRB code base. Specifically, go go through the repository and make a flowchart diagram which shows the relationships and key content for the various sub-folders in the repository.
You might consider using a flowchart software such as draw.io or doing this by hand.
Don’t get caught up in nitty-gritty details, as your understanding of the repo will change with time.
Send Trevor a version of this flowchart once you are done.
3.0 Interacting with a Pywr-DRB model instance#
Before running any of this code, you may need to modify the sys.path
to make sure it can access the pywrdrb
folder. Assuming that this tutorial is stored in the Pywr-DRB/notebooks/
folder, you will need to run:
import sys
path_to_pywrdrb = '../'
sys.path.append(path_to_pywrdrb)
3.1 Loading a Pywr model#
When loading a model with Pywr, we need to provide a json
file which defines the nodes, edges, and parameters of the model (see the section 2.2.1 pywrdrb.model_data
of this tutorial).
To load the model, we use the pywr.model.Model
class which takes the json
filename as an input.
The following code is used to specify a streamflow dataset, set the model file which we want to load, and load it using pywr.model
:
from pywr.model import Model
# import our custom parameters, since pywr will need them to construct the model
from pywrdrb.parameters import *
# import the make_model function to generate a new JSON model file
from pywrdrb import ModelBuilder
# Options: "nhmv10", "nwmv21", "nhmv10_withObsScaled", "nwmv21_withObsScaled"
inflow_type = 'nhmv10'
# Simulation start and end dates
from pywrdrb.utils.dates import model_date_ranges
start_date, end_date = model_date_ranges[inflow_type]
# We use the dataset name to specify the file name
model_filename = f'drb_model_full_{inflow_type}.json'
model_filename = f'{path_to_pywrdrb}/pywrdrb/model_data/{model_filename}'
# Make a new model JSON file
mb = ModelBuilder(inflow_type, start_date, end_date) # Optional "options" argument is available
mb.make_model()
mb.write_model(model_filename)
Now, you might not have noticed but the model file ‘drb_model_full_{inflow_type}.json’ was just replaced with a new version.
### load the pywrdrb model
model = Model.load(model_filename)
Initialized STARFITReservoirRelease for reservoir: nockamixon
Initialized STARFITReservoirRelease for reservoir: blueMarsh
Initialized STARFITReservoirRelease for reservoir: beltzvilleCombined
Initialized STARFITReservoirRelease for reservoir: greenLane
Initialized STARFITReservoirRelease for reservoir: stillCreek
Initialized STARFITReservoirRelease for reservoir: ontelaunee
Initialized STARFITReservoirRelease for reservoir: assunpink
Initialized STARFITReservoirRelease for reservoir: hopatcong
Initialized STARFITReservoirRelease for reservoir: merrillCreek
Initialized STARFITReservoirRelease for reservoir: fewalter
Initialized STARFITReservoirRelease for reservoir: mongaupeCombined
Initialized STARFITReservoirRelease for reservoir: shoholaMarsh
Initialized STARFITReservoirRelease for reservoir: prompton
Initialized STARFITReservoirRelease for reservoir: wallenpaupack
3.2 Nodes#
Nodes are the primary features in the Pywr-DRB model, and are used to represent reservoirs, USGS gauges, catchment inflow points, and other things.
Take a minute to check out the pywr
documentation on node classes.
While pywr
does allow for custom nodes, we are not currently using any of these in pywrdrb
.
The code below allows you to make a list of the model nodes. Run the code and count the number of nodes in the model.
# Make a list of all model nodes
model_nodes = [n for n in model.nodes if n.name]
print(f'There are {len(model_nodes)} nodes in the model.')
There are 177 nodes in the model.
3.3 Parameters#
We can do the same thing for the model parameters:
### Read model parameter names into a list
model_parameters = [p for p in model.parameters if p.name]
model_parameter_names = [p.name for p in model_parameters]
print(f'There are {len(model_parameters)} parameters in the model')
There are 394 parameters in the model
4.0 Running a Pywr-DRB simulation#
Once we have loaded the model
, we are almost ready to run a simulation.
First, we need to initializes a pywr.recorders.TablesRecorder
which will keep store simulation data during the model run. The TablesRecorder
will automatically create a hdf5
file where it will store simulation data.
The recorder accepts as input:
The
model
objectThe
output_filename
A list of model parameters
The code below is used to initialize the TablesRecorder
, run the simulation!
This should take 3-5 minutes to complete the full simulation.
(You will likely see many warnings pop up; don’t worry about those unless the simulation actually stops..)
# The pywr.recorders.TablesRecorder class is used store simulation results
# the simulation data is stored in an hdf5 file which is accessed during the simulation
from pywr.recorders import TablesRecorder
# there are a few naming convention warnings pywr, we can ignore them
import warnings
warnings.filterwarnings("ignore")
output_filename = f'drb_output_{inflow_type}.hdf5'
output_filename = f'../output_data/{output_filename}'
### Add a storage recorder
TablesRecorder(model = model,
h5file = output_filename,
parameters = model_parameters)
### Run the model
stats = model.run()
Assigning STARFIT parameters for wallenpaupack
Assigning STARFIT parameters for prompton
Assigning STARFIT parameters for shoholaMarsh
Assigning STARFIT parameters for mongaupeCombined
Assigning STARFIT parameters for fewalter
Assigning STARFIT parameters for merrillCreek
Assigning STARFIT parameters for hopatcong
Assigning STARFIT parameters for assunpink
Assigning STARFIT parameters for ontelaunee
Assigning STARFIT parameters for stillCreek
Assigning STARFIT parameters for greenLane
Assigning STARFIT parameters for nockamixon
Assigning STARFIT parameters for blueMarsh
Assigning STARFIT parameters for beltzvilleCombined
Congrats, you’ve just completed your first simulation using pywrdrb
!
5.0 Accessing Pywr-DRB output data#
The data will be output to a HDF5 file in the Pywr-DRB/output_data/ folder.
These HDF5 files can be a little tricky if you don’t have experience with them. We have made a function which makes it easy to load specific types of variables from these output files.
The pywrdrb.Output
class provides an easy method for loading and accessing Pywr-DRB model results.
The pywrdrb.Output
Class#
The Output
class consolidates this entire data retrieval process into a single load()
method, automatically validating inputs, handling scenarios, and storing results inside the Ouput
object.
Output.load()
uses pywrdrb.post.get_pywrdrb_results()
to fetch data for all specified models
and results_sets
. It manages datetime indexing and scenarios, and then stores the results as attributes within the class.
After using Output.load()
, results are stored in the class as a nested dictionary structure:
Output.results_set[model][scenario] -> pd.DataFrame
This will return a pd.DataFrame that contains the simulation data with a datetime index.
Let’s use the Output
class to load multiple different sets of results from the simulation run above:
from pywrdrb import Output
models = [inflow_type]
results_sets = ['major_flow', 'res_storage', 'mrf_target', 'res_release']
output = Output(models,
results_sets = results_sets,
print_status = True)
output.load()
## Access the data using format:
# output.results_set[model][scenario]
output.major_flow['nhmv10'][0].head()
Loading major_flow data for nhmv10
Loading res_storage data for nhmv10
Loading mrf_target data for nhmv10
Loading res_release data for nhmv10
01417000 | 01425000 | 01433500 | 01436000 | 01447800 | 01449800 | 01463620 | 01470960 | delDRCanal | delLordville | delMontague | delTrenton | outletAssunpink | outletSchuylkill | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1983-10-01 | 245.248604 | 192.037241 | 702.126510 | 216.100859 | 1296.612713 | 30.641998 | 77.324137 | 981.138761 | 872.861166 | 626.737443 | 1050.155248 | 804.943656 | 110.874962 | 329.747828 |
1983-10-02 | 157.896171 | 150.911779 | 716.300505 | 60.483957 | 1298.492457 | 32.867673 | 77.732775 | 982.026219 | 1837.957440 | 514.753462 | 2386.712148 | 1772.532316 | 132.337899 | 688.332292 |
1983-10-03 | 227.762061 | 193.806146 | 706.181702 | 203.568972 | 1294.352344 | 30.133752 | 77.890367 | 978.970950 | 4459.303140 | 640.682999 | 2799.830204 | 4394.501111 | 114.075338 | 1740.349575 |
1983-10-04 | 283.875990 | 232.867018 | 695.766889 | 164.629266 | 1294.308205 | 28.921843 | 77.668647 | 978.663561 | 5584.875812 | 715.024366 | 2772.808094 | 5519.450687 | 106.791078 | 1683.772339 |
1983-10-05 | 334.835463 | 279.267193 | 692.402041 | 202.008757 | 1294.264190 | 28.795100 | 77.063649 | 978.558148 | 5798.400400 | 780.455696 | 2777.388751 | 5733.598372 | 103.601430 | 1604.624740 |
You should see a DataFrame with multiple columns corresponding to each of the reservoirs in the model.
Now you can get into the fun of looking at results and some data visualization!
Summary of Training and Activities#
You’ve just made it to the end of the first Pywr-DRB training. Wooo!
Just to recap, in this training we considered:
Getting Started by cloning the repository and creating your virtual environment
Explanation of the Pywr-DRB code base with a focus on key folders and files.
To make the most of this training, I recommend that you complete the activities from this training, including:
Make a flow chart of the key elements of the Pywr-DRB code base
Run an instance of the Pywr-DRB model
Load and visualize some of the output data