pywrdrb.pre.ExtrapolatedDiversionPreprocessor#

class pywrdrb.pre.ExtrapolatedDiversionPreprocessor(loc)#

Class for extrapolating NYC and NJ diversion data based on streamflow regressions.

The class implements a workflow to extrapolate historical diversions into time periods where data is not available, using seasonal flow regressions. The diversion data is organized into daily time series and saved to CSV files.

load()#

Load historical diversion and streamflow data from data/observations/_raw.

get_quarter(m)#

Return the quarter (season) of the year for a given month.

get_overlapping_timespan(df1, df2)#

Find the maximum overlapping timespan between two DataFrames.

train_regressions(df_m)#

Train seasonal regression models for diversion prediction.

get_random_prediction_sample(lrms, lrrs, quarter, x)#

Generate a random prediction sample from the regression distribution.

process()#

Run the full extrapolation workflow: load data, train models, predict diversions.

save()#

Save the extrapolated diversion data to data/diversions/.

loc#

Location indicator, either “nyc” or “nj”.

Type:

str

quarters#

Seasons used for different regression models (DJF, MAM, JJA, SON).

Type:

tuple

lrms#

Dictionary of linear regression models for each season.

Type:

dict

lrrs#

Dictionary of fitted linear regression results for each season.

Type:

dict

diversion#

DataFrame containing the historical diversion data.

Type:

pd.DataFrame

flow#

DataFrame containing the historical streamflow data.

Type:

pd.DataFrame

df#

DataFrame of daily states combining diversion and flow data.

Type:

pd.DataFrame

df_m#

DataFrame of monthly mean states.

Type:

pd.DataFrame

df_long#

DataFrame containing the full time series data for extrapolation.

Type:

pd.DataFrame

df_long_m#

DataFrame containing monthly mean data for the full time series.

Type:

pd.DataFrame

processed_data#

Dictionary to store the processed extrapolated diversion data.

Type:

dict

Example Usage
-------------
>>> from pywrdrb.pre import ExtrapolatedDiversionPreprocessor
>>> processor = ExtrapolatedDiversionPreprocessor(loc='nyc')
>>> hist_diversions, hist_flows = processor.load()
>>> processor.process()
>>> processor.save()
>>> [out] Saved extrapolated diversion data to <path>src\pywrdrb\data\diversions\
__init__(loc)#

Initialize the ExtrapolatedDiversionPreprocessor.

Parameters:

loc (str) – Location indicator, must be either “nyc” or “nj”.

Raises:

ValueError – If the location parameter is not “nyc” or “nj”.

Methods

__init__(loc)

Initialize the ExtrapolatedDiversionPreprocessor.

get_overlapping_timespan(df1, df2)

Find the maximum overlapping timespan between two DataFrames.

get_quarter(m)

Return the quarter (season) of the year for a given month.

get_random_prediction_sample(lrms, lrrs, ...)

Generate a random prediction sample from the regression distribution.

load()

Load historical diversion and streamflow data.

plot([kind])

Create plots of the extrapolation process.

plot_diversions()

Plot the historical and extrapolated diversion time series.

plot_regressions()

Plot the seasonal regression models and data points.

process()

Process the loaded data to extrapolate diversions.

save()

Save the processed extrapolated diversion data to CSV.

train_regressions(df_m)

Train seasonal regression models for diversion prediction.