pywrdrb.pre.ExtrapolatedDiversionPreprocessor#
- class pywrdrb.pre.ExtrapolatedDiversionPreprocessor(loc)#
Class for extrapolating NYC and NJ diversion data based on streamflow regressions.
The class implements a workflow to extrapolate historical diversions into time periods where data is not available, using seasonal flow regressions. The diversion data is organized into daily time series and saved to CSV files.
- load()#
Load historical diversion and streamflow data from data/observations/_raw.
- get_quarter(m)#
Return the quarter (season) of the year for a given month.
- get_overlapping_timespan(df1, df2)#
Find the maximum overlapping timespan between two DataFrames.
- train_regressions(df_m)#
Train seasonal regression models for diversion prediction.
- get_random_prediction_sample(lrms, lrrs, quarter, x)#
Generate a random prediction sample from the regression distribution.
- process()#
Run the full extrapolation workflow: load data, train models, predict diversions.
- save()#
Save the extrapolated diversion data to data/diversions/.
- loc#
Location indicator, either “nyc” or “nj”.
- Type:
str
- quarters#
Seasons used for different regression models (DJF, MAM, JJA, SON).
- Type:
tuple
- lrms#
Dictionary of linear regression models for each season.
- Type:
dict
- lrrs#
Dictionary of fitted linear regression results for each season.
- Type:
dict
- diversion#
DataFrame containing the historical diversion data.
- Type:
pd.DataFrame
- flow#
DataFrame containing the historical streamflow data.
- Type:
pd.DataFrame
- df#
DataFrame of daily states combining diversion and flow data.
- Type:
pd.DataFrame
- df_m#
DataFrame of monthly mean states.
- Type:
pd.DataFrame
- df_long#
DataFrame containing the full time series data for extrapolation.
- Type:
pd.DataFrame
- df_long_m#
DataFrame containing monthly mean data for the full time series.
- Type:
pd.DataFrame
- processed_data#
Dictionary to store the processed extrapolated diversion data.
- Type:
dict
- Example Usage
- -------------
- >>> from pywrdrb.pre import ExtrapolatedDiversionPreprocessor
- >>> processor = ExtrapolatedDiversionPreprocessor(loc='nyc')
- >>> hist_diversions, hist_flows = processor.load()
- >>> processor.process()
- >>> processor.save()
- >>> [out] Saved extrapolated diversion data to <path>src\pywrdrb\data\diversions\
- __init__(loc)#
Initialize the ExtrapolatedDiversionPreprocessor.
- Parameters:
loc (str) – Location indicator, must be either “nyc” or “nj”.
- Raises:
ValueError – If the location parameter is not “nyc” or “nj”.
Methods
__init__
(loc)Initialize the ExtrapolatedDiversionPreprocessor.
get_overlapping_timespan
(df1, df2)Find the maximum overlapping timespan between two DataFrames.
get_quarter
(m)Return the quarter (season) of the year for a given month.
get_random_prediction_sample
(lrms, lrrs, ...)Generate a random prediction sample from the regression distribution.
load
()Load historical diversion and streamflow data.
plot
([kind])Create plots of the extrapolation process.
plot_diversions
()Plot the historical and extrapolated diversion time series.
plot_regressions
()Plot the seasonal regression models and data points.
process
()Process the loaded data to extrapolate diversions.
save
()Save the processed extrapolated diversion data to CSV.
train_regressions
(df_m)Train seasonal regression models for diversion prediction.