diff --git a/.buildinfo b/.buildinfo index c5b5ab2..972c2e3 100644 --- a/.buildinfo +++ b/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: db223b98a23d8b34e758885381c3cf4e +config: 8a787254773681215798eb2e50dff41d tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/_downloads/1e8236fb3d31a317b1c34ff861150124/similarity_examples-5.hires.png b/_downloads/1e8236fb3d31a317b1c34ff861150124/similarity_examples-5.hires.png new file mode 100644 index 0000000..58c5142 Binary files /dev/null and b/_downloads/1e8236fb3d31a317b1c34ff861150124/similarity_examples-5.hires.png differ diff --git a/_downloads/40423e11d6e99b58c33bb0d76dda903f/similarity_examples-4.hires.png b/_downloads/40423e11d6e99b58c33bb0d76dda903f/similarity_examples-4.hires.png new file mode 100644 index 0000000..1230c5b Binary files /dev/null and b/_downloads/40423e11d6e99b58c33bb0d76dda903f/similarity_examples-4.hires.png differ diff --git a/_downloads/49384fb54eb1dfe3c84bac7d3e069bfc/similarity_examples-5.png b/_downloads/49384fb54eb1dfe3c84bac7d3e069bfc/similarity_examples-5.png new file mode 100644 index 0000000..5739548 Binary files /dev/null and b/_downloads/49384fb54eb1dfe3c84bac7d3e069bfc/similarity_examples-5.png differ diff --git a/_downloads/673f52eb5229b209fe0b5faf997910af/flow_duration_curve_examples-5.hires.png b/_downloads/673f52eb5229b209fe0b5faf997910af/flow_duration_curve_examples-5.hires.png index 0fcf6a8..acb8d86 100644 Binary files a/_downloads/673f52eb5229b209fe0b5faf997910af/flow_duration_curve_examples-5.hires.png and b/_downloads/673f52eb5229b209fe0b5faf997910af/flow_duration_curve_examples-5.hires.png differ diff --git a/_downloads/72ef893d3fadbacea7b6adae1e4280aa/index-6.hires.png b/_downloads/72ef893d3fadbacea7b6adae1e4280aa/index-6.hires.png new file mode 100644 index 0000000..a473077 Binary files /dev/null and b/_downloads/72ef893d3fadbacea7b6adae1e4280aa/index-6.hires.png differ diff --git a/_downloads/77bff4448d915acc8b9cce71410ee204/similarity_examples-2.png b/_downloads/77bff4448d915acc8b9cce71410ee204/similarity_examples-2.png new file mode 100644 index 0000000..728feab Binary files /dev/null and b/_downloads/77bff4448d915acc8b9cce71410ee204/similarity_examples-2.png differ diff --git a/_downloads/869827d4a39dc05a6ff889cd05729475/index-6.png b/_downloads/869827d4a39dc05a6ff889cd05729475/index-6.png new file mode 100644 index 0000000..ed78d1e Binary files /dev/null and b/_downloads/869827d4a39dc05a6ff889cd05729475/index-6.png differ diff --git a/_downloads/b3620d145d83b447a4fc98b4bb4ffeaf/similarity_examples-3.hires.png b/_downloads/b3620d145d83b447a4fc98b4bb4ffeaf/similarity_examples-3.hires.png new file mode 100644 index 0000000..a2c4fd6 Binary files /dev/null and b/_downloads/b3620d145d83b447a4fc98b4bb4ffeaf/similarity_examples-3.hires.png differ diff --git a/_downloads/c2ab381d3f41e6c05566af9eb27f5932/similarity_examples-2.hires.png b/_downloads/c2ab381d3f41e6c05566af9eb27f5932/similarity_examples-2.hires.png new file mode 100644 index 0000000..af17916 Binary files /dev/null and b/_downloads/c2ab381d3f41e6c05566af9eb27f5932/similarity_examples-2.hires.png differ diff --git a/_downloads/d5581bd2b32570239e1161370cb8362e/similarity_examples-4.png b/_downloads/d5581bd2b32570239e1161370cb8362e/similarity_examples-4.png new file mode 100644 index 0000000..518e904 Binary files /dev/null and b/_downloads/d5581bd2b32570239e1161370cb8362e/similarity_examples-4.png differ diff --git a/_downloads/e0cbf90d607cb28567961a67b21cda1a/similarity_examples-3.png b/_downloads/e0cbf90d607cb28567961a67b21cda1a/similarity_examples-3.png new file mode 100644 index 0000000..990509e Binary files /dev/null and b/_downloads/e0cbf90d607cb28567961a67b21cda1a/similarity_examples-3.png differ diff --git a/_downloads/e6c156a9724701eb308f2075973eb9f2/flow_duration_curve_examples-5.png b/_downloads/e6c156a9724701eb308f2075973eb9f2/flow_duration_curve_examples-5.png index a548025..e1bd6de 100644 Binary files a/_downloads/e6c156a9724701eb308f2075973eb9f2/flow_duration_curve_examples-5.png and b/_downloads/e6c156a9724701eb308f2075973eb9f2/flow_duration_curve_examples-5.png differ diff --git a/_images/flow_duration_curve_examples-5.png b/_images/flow_duration_curve_examples-5.png index a548025..e1bd6de 100644 Binary files a/_images/flow_duration_curve_examples-5.png and b/_images/flow_duration_curve_examples-5.png differ diff --git a/_images/hyswap.png b/_images/hyswap.png index 65cf2e2..4864a1d 100644 Binary files a/_images/hyswap.png and b/_images/hyswap.png differ diff --git a/_images/index-6.png b/_images/index-6.png new file mode 100644 index 0000000..ed78d1e Binary files /dev/null and b/_images/index-6.png differ diff --git a/_images/similarity_examples-2.png b/_images/similarity_examples-2.png new file mode 100644 index 0000000..728feab Binary files /dev/null and b/_images/similarity_examples-2.png differ diff --git a/_images/similarity_examples-3.png b/_images/similarity_examples-3.png new file mode 100644 index 0000000..990509e Binary files /dev/null and b/_images/similarity_examples-3.png differ diff --git a/_images/similarity_examples-4.png b/_images/similarity_examples-4.png new file mode 100644 index 0000000..518e904 Binary files /dev/null and b/_images/similarity_examples-4.png differ diff --git a/_images/similarity_examples-5.png b/_images/similarity_examples-5.png new file mode 100644 index 0000000..5739548 Binary files /dev/null and b/_images/similarity_examples-5.png differ diff --git a/_modules/hyswap/cumulative.html b/_modules/hyswap/cumulative.html index 3f3bd28..3267d7c 100644 --- a/_modules/hyswap/cumulative.html +++ b/_modules/hyswap/cumulative.html @@ -2,18 +2,18 @@ - + - hyswap.cumulative — hyswap 0.1.dev1+gc646d0d documentation + hyswap.cumulative — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

Navigation

  • modules |
  • - + @@ -50,7 +50,9 @@

    Source code for hyswap.cumulative

     from hyswap.utils import define_year_doy_columns
     
     
    -
    [docs]def calculate_daily_cumulative_values(df, data_column_name, +
    +[docs] +def calculate_daily_cumulative_values(df, data_column_name, date_column_name=None, year_type='calendar'): """Calculate daily cumulative values. @@ -116,7 +118,10 @@

    Source code for hyswap.cumulative

         return cdf
    -
    [docs]def _tidy_cumulative_dataframe(cdf, year_type): + +
    +[docs] +def _tidy_cumulative_dataframe(cdf, year_type): """Tidy a cumulative dataframe. Parameters @@ -162,6 +167,7 @@

    Source code for hyswap.cumulative

         # set date to index
         cdf2 = cdf2.set_index("date")
         return cdf2
    +
    @@ -193,14 +199,14 @@

    Navigation

  • modules |
  • - + \ No newline at end of file diff --git a/_modules/hyswap/exceedance.html b/_modules/hyswap/exceedance.html index b32755f..1c1921a 100644 --- a/_modules/hyswap/exceedance.html +++ b/_modules/hyswap/exceedance.html @@ -2,18 +2,18 @@ - + - hyswap.exceedance — hyswap 0.1.dev1+gc646d0d documentation + hyswap.exceedance — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - + @@ -50,7 +50,9 @@

    Source code for hyswap.exceedance

     from scipy import stats
     
     
    -
    [docs]def calculate_exceedance_probability_from_distribution(x, dist, +
    +[docs] +def calculate_exceedance_probability_from_distribution(x, dist, *args, **kwargs): """ Calculate the exceedance probability of a value relative to a distribution. @@ -118,7 +120,10 @@

    Source code for hyswap.exceedance

                              "'weibull', or 'exponential'.")
    -
    [docs]def calculate_exceedance_probability_from_values(x, values_to_compare): + +
    +[docs] +def calculate_exceedance_probability_from_values(x, values_to_compare): """ Calculate the exceedance probability of a value compared to several values. @@ -175,7 +180,10 @@

    Source code for hyswap.exceedance

         return np.sum(values_to_compare >= x) / len(values_to_compare)
    -
    [docs]def calculate_exceedance_probability_from_distribution_multiple(values, dist, + +
    +[docs] +def calculate_exceedance_probability_from_distribution_multiple(values, dist, *args, **kwargs): """ @@ -230,7 +238,10 @@

    Source code for hyswap.exceedance

             x, dist, *args, **kwargs) for x in values])
    -
    [docs]def calculate_exceedance_probability_from_values_multiple(values, + +
    +[docs] +def calculate_exceedance_probability_from_values_multiple(values, values_to_compare): """ Calculate the exceedance probability of multiple values vs a set of values. @@ -275,6 +286,7 @@

    Source code for hyswap.exceedance

         """
         return np.array([calculate_exceedance_probability_from_values(
             x, values_to_compare) for x in values])
    +
    @@ -306,14 +318,14 @@

    Navigation

  • modules |
  • - + \ No newline at end of file diff --git a/_modules/hyswap/percentiles.html b/_modules/hyswap/percentiles.html index c6cfeae..65cabc7 100644 --- a/_modules/hyswap/percentiles.html +++ b/_modules/hyswap/percentiles.html @@ -2,18 +2,18 @@ - + - hyswap.percentiles — hyswap 0.1.dev1+gc646d0d documentation + hyswap.percentiles — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - + @@ -54,7 +54,9 @@

    Source code for hyswap.percentiles

     from hyswap.utils import rolling_average
     
     
    -
    [docs]def calculate_fixed_percentile_thresholds( +
    +[docs] +def calculate_fixed_percentile_thresholds( data, percentiles=np.array((0, 5, 10, 25, 75, 90, 95, 100)), method='weibull', @@ -105,7 +107,10 @@

    Source code for hyswap.percentiles

         return np.percentile(data, percentiles, method=method, **kwargs)
    -
    [docs]def calculate_variable_percentile_thresholds_by_day( + +
    +[docs] +def calculate_variable_percentile_thresholds_by_day( df, data_column_name, percentiles=[0, 5, 10, 25, 75, 90, 95, 100], @@ -236,6 +241,7 @@

    Source code for hyswap.percentiles

     
         # return percentiles by day of year
         return percentiles_by_day
    +
    @@ -267,14 +273,14 @@

    Navigation

  • modules |
  • - + \ No newline at end of file diff --git a/_modules/hyswap/plots.html b/_modules/hyswap/plots.html index 59877c0..49b44ee 100644 --- a/_modules/hyswap/plots.html +++ b/_modules/hyswap/plots.html @@ -2,18 +2,18 @@ - + - hyswap.plots — hyswap 0.1.dev1+gc646d0d documentation + hyswap.plots — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - + @@ -51,7 +51,9 @@

    Source code for hyswap.plots

     from hyswap.percentiles import calculate_variable_percentile_thresholds_by_day
     
     
    -
    [docs]def plot_flow_duration_curve( +
    +[docs] +def plot_flow_duration_curve( values, exceedance_probabilities, observations=None, observation_probabilities=None, ax=None, title='Flow Duration Curve', @@ -165,7 +167,10 @@

    Source code for hyswap.plots

         return ax
    -
    [docs]def plot_raster_hydrograph(df_formatted, ax=None, + +
    +[docs] +def plot_raster_hydrograph(df_formatted, ax=None, title='Raster Hydrograph', xlab='Month', ylab='Year', cbarlab='Discharge, in Cubic Feet per Second', @@ -277,7 +282,10 @@

    Source code for hyswap.plots

         return ax
    -
    [docs]def plot_duration_hydrograph(percentiles_by_day, df, data_col, doy_col, + +
    +[docs] +def plot_duration_hydrograph(percentiles_by_day, df, data_col, doy_col, pct_list=[0, 5, 10, 25, 75, 90, 95, 100], data_label=None, ax=None, title="Duration Hydrograph", @@ -409,7 +417,10 @@

    Source code for hyswap.plots

         return ax
    -
    [docs]def plot_cumulative_hydrograph(cumulative_percentiles, target_years, + +
    +[docs] +def plot_cumulative_hydrograph(cumulative_percentiles, target_years, year_type='calendar', envelope_pct=[25, 75], max_pct=False, min_pct=False, @@ -554,7 +565,10 @@

    Source code for hyswap.plots

         return ax
    -
    [docs]def plot_hydrograph(df, data_col, + +
    +[docs] +def plot_hydrograph(df, data_col, date_col=None, start_date=None, end_date=None, @@ -658,6 +672,103 @@

    Source code for hyswap.plots

         ax.set_yticks(yticks[1:-1], labels=yticklabels[1:-1])
         # return
         return ax
    + + + +
    +[docs] +def plot_similarity_heatmap(sim_matrix, n_obs=None, cmap='inferno', + show_values=False, ax=None, + title='Similarity Matrix'): + """Plot a similarity matrix as a heatmap. + + Parameters + ---------- + sim_matrix : pandas.DataFrame + Similarity matrix to plot. Must be square. Can be the output of + :meth:`hyswap.similarity.calculate_correlations`, + :meth:`hyswap.similarity.calculate_wasserstein_distance`, + :meth:`hyswap.similarity.calculate_energy_distance`, or any other + square matrix represented as a pandas DataFrame. + + cmap : str, optional + Colormap to use. Default is 'inferno'. + + show_values : bool, optional + Whether to show the values of the matrix on the plot. Default is False. + + ax : matplotlib.axes.Axes, optional + Axes object to plot on. If not provided, a new figure and axes will be + created. + + title : str, optional + Title for the plot. Default is 'Similarity Matrix'. + + Returns + ------- + matplotlib.axes.Axes + Axes object containing the plot. + + Examples + -------- + Calculate the correlation matrix between two sites and plot it as a + heatmap. + + .. plot:: + :include-source: + + >>> df, _ = dataretrieval.nwis.get_dv(site='06892350', + ... parameterCd='00060', + ... start='2010-01-01', + ... end='2021-12-31') + >>> df2, _ = dataretrieval.nwis.get_dv(site='06892000', + ... parameterCd='00060', + ... start='2010-01-01', + ... end='2021-12-31') + >>> corr_matrix, n_obs = hyswap.similarity.calculate_correlations( + ... [df, df2], '00060_Mean') + >>> ax = hyswap.plots.plot_similarity_heatmap(corr_matrix, + ... show_values=True) + >>> plt.show() + """ + # Create axes if not provided + if ax is None: + _, ax = plt.subplots() + # plot heatmap using matplotlib + vmin = sim_matrix.min().min() + vmax = sim_matrix.max().max() + im = ax.imshow(sim_matrix, cmap=cmap, + vmin=sim_matrix.min().min(), + vmax=sim_matrix.max().max()) + # show values if desired + if show_values: + for i in range(sim_matrix.shape[0]): + for j in range(sim_matrix.shape[1]): + # if below halfway point, make text white + if sim_matrix.iloc[i, j] < (vmax - vmin) / 2 + vmin: + ax.text(j, i, f'{sim_matrix.iloc[i, j]:.2f}', + ha="center", va="center", color="w") + # otherwise, make text black + else: + ax.text(j, i, f'{sim_matrix.iloc[i, j]:.2f}', + ha="center", va="center", color="k") + # set labels + if n_obs is not None: + title = f'{title} (n={n_obs})' + ax.set_title(title) + ax.set_xlabel('Site') + ax.set_ylabel('Site') + # set ticks at center of each cell + ax.set_xticks(np.arange(sim_matrix.shape[0])) + ax.set_yticks(np.arange(sim_matrix.shape[1])) + # set tick labels + ax.set_xticklabels(sim_matrix.columns) + ax.set_yticklabels(sim_matrix.index) + # add colorbar + plt.colorbar(im, ax=ax) + # return + return ax
    +
    @@ -689,14 +800,14 @@

    Navigation

  • modules |
  • - + \ No newline at end of file diff --git a/_modules/hyswap/rasterhydrograph.html b/_modules/hyswap/rasterhydrograph.html index 1501cf2..a3de42a 100644 --- a/_modules/hyswap/rasterhydrograph.html +++ b/_modules/hyswap/rasterhydrograph.html @@ -2,18 +2,18 @@ - + - hyswap.rasterhydrograph — hyswap 0.1.dev1+gc646d0d documentation + hyswap.rasterhydrograph — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - + @@ -51,7 +51,9 @@

    Source code for hyswap.rasterhydrograph

     from hyswap.utils import set_data_type
     
     
    -
    [docs]def format_data(df, data_column_name, date_column_name=None, +
    +[docs] +def format_data(df, data_column_name, date_column_name=None, data_type='daily', year_type='calendar', begin_year=None, end_year=None, **kwargs): """ @@ -181,7 +183,10 @@

    Source code for hyswap.rasterhydrograph

         return df_out
    -
    [docs]def _check_inputs(df, data_column_name, date_column_name, + +
    +[docs] +def _check_inputs(df, data_column_name, date_column_name, data_type, year_type, begin_year, end_year): """Private function to check inputs for the format_data function. @@ -274,7 +279,10 @@

    Source code for hyswap.rasterhydrograph

         return df
    -
    [docs]def _calculate_date_range(df, year_type, begin_year, end_year): + +
    +[docs] +def _calculate_date_range(df, year_type, begin_year, end_year): """Private function to calculate the date range and set the index. Parameters @@ -319,6 +327,7 @@

    Source code for hyswap.rasterhydrograph

         date_range = pd.date_range(begin_date, end_date)
     
         return date_range
    +
    @@ -350,14 +359,14 @@

    Navigation

  • modules |
  • - + \ No newline at end of file diff --git a/_modules/hyswap/runoff.html b/_modules/hyswap/runoff.html index e66002b..dc5ecf2 100644 --- a/_modules/hyswap/runoff.html +++ b/_modules/hyswap/runoff.html @@ -2,18 +2,18 @@ - + - hyswap.runoff — hyswap 0.1.dev1+gc646d0d documentation + hyswap.runoff — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - + @@ -47,7 +47,9 @@

    Source code for hyswap.runoff

     import pandas as pd
     
     
    -
    [docs]def convert_cfs_to_runoff(cfs, drainage_area, frequency="annual"): +
    +[docs] +def convert_cfs_to_runoff(cfs, drainage_area, frequency="annual"): """Convert cfs to runoff values for some drainage area. Parameters @@ -97,7 +99,10 @@

    Source code for hyswap.runoff

         return mmf
    -
    [docs]def streamflow_to_runoff(df, data_col, drainage_area, frequency="annual"): + +
    +[docs] +def streamflow_to_runoff(df, data_col, drainage_area, frequency="annual"): """Convert streamflow to runoff for a given drainage area. For a given gage/dataframe, convert streamflow to runoff using the @@ -143,7 +148,10 @@

    Source code for hyswap.runoff

         return df
    -
    [docs]def calculate_geometric_runoff(geom_id, df_list, weights_matrix, + +
    +[docs] +def calculate_geometric_runoff(geom_id, df_list, weights_matrix, start_date=None, end_date=None, data_col='runoff'): """Function to calculate the runoff for a specified geometry. @@ -214,7 +222,10 @@

    Source code for hyswap.runoff

         return weighted_runoff
    -
    [docs]def _get_date_range(df_list, start_date, end_date): + +
    +[docs] +def _get_date_range(df_list, start_date, end_date): """Get date range for runoff calculation. This is an internal function used by the :obj:`calculate_geometric_runoff` @@ -265,7 +276,10 @@

    Source code for hyswap.runoff

         return date_range
    -
    [docs]def identify_sites_from_weights(geom_id, weights_matrix): + +
    +[docs] +def identify_sites_from_weights(geom_id, weights_matrix): """Identify sites for a specified geometry. Function to identify sites with non-zero weights for a given @@ -293,7 +307,10 @@

    Source code for hyswap.runoff

         return site_list
    -
    [docs]def calculate_multiple_geometric_runoff( + +
    +[docs] +def calculate_multiple_geometric_runoff( geom_id_list, df_list, weights_matrix, start_date=None, end_date=None, data_col='runoff'): """Calculate runoff for multiple geometries at once. @@ -343,6 +360,7 @@

    Source code for hyswap.runoff

             # add runoff to results_df
             results_df[geom_id] = runoff.to_frame()
         return results_df
    +
    @@ -374,14 +392,14 @@

    Navigation

  • modules |
  • - + \ No newline at end of file diff --git a/_modules/hyswap/similarity.html b/_modules/hyswap/similarity.html new file mode 100644 index 0000000..8668f0c --- /dev/null +++ b/_modules/hyswap/similarity.html @@ -0,0 +1,315 @@ + + + + + + + + + hyswap.similarity — hyswap 0.1.dev1+gdcc7916 documentation + + + + + + + + + + + + + + + +
    +
    +
    +
    + +

    Source code for hyswap.similarity

    +"""Similarity measures for hyswap."""
    +
    +import numpy as np
    +import pandas as pd
    +from scipy import stats
    +from hyswap.utils import filter_to_common_time
    +
    +
    +
    +[docs] +def calculate_correlations(df_list, data_column_name, df_names=None): + """Calculate Pearson correlations between dataframes in df_list. + + This function is designed to calculate the Pearson correlation + coefficients between dataframes in df_list. The dataframes in df_list are + expected to have the same columns. The correlation coefficients are + calculated using the `numpy.corrcoeff` function. + + Parameters + ---------- + df_list : list + List of dataframes. The dataframes are expected to have the same + columns. Likely inputs are the output of a function like + dataretrieval.nwis.get_dv() or similar + + data_column_name : str + Name of the column to use for the correlation calculation. + + df_names : list, optional + List of names for the dataframes in df_list. If provided, the names + will be used to label the rows and columns of the output array. If + not provided, the column "site_no" will be used if available, if it is + not available, the index of the dataframe in the list will be used. + + Returns + ------- + correlations : pandas.DataFrame + Dataframe of correlation coefficients. The rows and columns are + labeled with the names of the dataframes in df_list as provided + by df_names argument. + + n_obs : int + Number of observations used to calculate the energy distance. + + Examples + -------- + Calculate correlations between two synthetic dataframes. + + .. doctest:: + + >>> df1 = pd.DataFrame({'a': np.arange(10), 'b': np.arange(10)}) + >>> df2 = pd.DataFrame({'a': -1*np.arange(10), 'b': np.arange(10)}) + >>> results, n_obs = similarity.calculate_correlations([df1, df2], 'a') + >>> results + 0 1 + 0 1.0 -1.0 + 1 -1.0 1.0 + """ + # handle the names of the dataframes + df_names = _name_handling(df_list, df_names) + # preprocess dataframe list so they have the same index/times + df_list, n_obs = filter_to_common_time(df_list) + # calculate correlations between all pairs of dataframes in the list + correlations = np.empty((len(df_list), len(df_list))) + for i, df1 in enumerate(df_list): + for j, df2 in enumerate(df_list): + correlations[i, j] = np.corrcoef( + df1[data_column_name], df2[data_column_name])[0, 1] + # turn the correlations into a dataframe + correlations = pd.DataFrame( + correlations, index=df_names, columns=df_names) + return correlations, n_obs
    + + + +
    +[docs] +def calculate_wasserstein_distance(df_list, data_column_name, df_names=None): + """Calculate Wasserstein distance between dataframes in df_list. + + This function is designed to calculate the Wasserstein distance between + dataframes in df_list. The dataframes in df_list are expected to have the + same columns. The Wasserstein distance is calculated using the + `scipy.stats.wasserstein_distance` function. + + Parameters + ---------- + df_list : list + List of dataframes. The dataframes are expected to have the same + columns. Likely inputs are the output of a function like + dataretrieval.nwis.get_dv() or similar + + data_column_name : str + Name of the column to use for the Wasserstein distance calculation. + + df_names : list, optional + List of names for the dataframes in df_list. If provided, the names + will be used to label the rows and columns of the output array. If + not provided, the column "site_no" will be used if available, if it is + not available, the index of the dataframe in the list will be used. + + Returns + ------- + wasserstein_distances : pandas.DataFrame + Dataframe of Wasserstein distances. The rows and columns are + labeled with the names of the dataframes in df_list as provided + by df_names argument. + + n_obs : int + Number of observations used to calculate the energy distance. + + Examples + -------- + Calculate Wasserstein distances between two synthetic dataframes. + + .. doctest:: + + >>> df1 = pd.DataFrame({'a': np.arange(10), 'b': np.arange(10)}) + >>> df2 = pd.DataFrame({'a': -1*np.arange(10), 'b': np.arange(10)}) + >>> results, n_obs = similarity.calculate_wasserstein_distance( + ... [df1, df2], 'a') + >>> results + 0 1 + 0 0.0 9.0 + 1 9.0 0.0 + """ + # handle the names of the dataframes + df_names = _name_handling(df_list, df_names) + # preprocess dataframe list so they have the same index/times + df_list, n_obs = filter_to_common_time(df_list) + # calculate distances between all pairs of dataframes in the list + wasserstein_distances = np.empty((len(df_list), len(df_list))) + for i, df1 in enumerate(df_list): + for j, df2 in enumerate(df_list): + wasserstein_distances[i, j] = stats.wasserstein_distance( + df1[data_column_name], df2[data_column_name]) + # handle the names of the dataframes + df_names = _name_handling(df_list, df_names) + # turn the distances into a dataframe + wasserstein_distances = pd.DataFrame( + wasserstein_distances, index=df_names, columns=df_names) + return wasserstein_distances, n_obs
    + + + +
    +[docs] +def calculate_energy_distance(df_list, data_column_name, df_names=None): + """Calculate energy distance between dataframes in df_list. + + This function is designed to calculate the energy distance between + dataframes in df_list. The dataframes in df_list are expected to have the + same columns. The energy distance is calculated using the + `scipy.stats.energy_distance` function. + + Parameters + ---------- + df_list : list + List of dataframes. The dataframes are expected to have the same + columns. Likely inputs are the output of a function like + dataretrieval.nwis.get_dv() or similar + + data_column_name : str + Name of the column to use for the energy distance calculation. + + df_names : list, optional + List of names for the dataframes in df_list. If provided, the names + will be used to label the rows and columns of the output array. If + not provided, the column "site_no" will be used if available, if it is + not available, the index of the dataframe in the list will be used. + + Returns + ------- + energy_distances : pandas.DataFrame + Dataframe of energy distances. The rows and columns are + labeled with the names of the dataframes in df_list as provided + by df_names argument. + + n_obs : int + Number of observations used to calculate the energy distance. + + Examples + -------- + Calculate energy distances between two synthetic dataframes. + + .. doctest:: + + >>> df1 = pd.DataFrame({'a': np.arange(10), 'b': np.arange(10)}) + >>> df2 = pd.DataFrame({'a': -1*np.arange(10), 'b': np.arange(10)}) + >>> results, n_obs = similarity.calculate_energy_distance( + ... [df1, df2], 'a') + >>> results + 0 1 + 0 0.000000 3.376389 + 1 3.376389 0.000000 + """ + # handle the names of the dataframes + df_names = _name_handling(df_list, df_names) + # preprocess dataframe list so they have the same index/times + df_list, n_obs = filter_to_common_time(df_list) + # calculate distances between all pairs of dataframes in the list + energy_distances = np.empty((len(df_list), len(df_list))) + for i, df1 in enumerate(df_list): + for j, df2 in enumerate(df_list): + energy_distances[i, j] = stats.energy_distance( + df1[data_column_name], df2[data_column_name]) + # handle the names of the dataframes + df_names = _name_handling(df_list, df_names) + # turn the distances into a dataframe + energy_distances = pd.DataFrame( + energy_distances, index=df_names, columns=df_names) + return energy_distances, n_obs
    + + + +
    +[docs] +def _name_handling(df_list, df_names): + """Private function to handle the names of the dataframes.""" + if df_names is None: + df_names = [] + for i, df in enumerate(df_list): + if 'site_no' in df.columns: + df_names.append(df['site_no'].iloc[0]) + else: + df_names.append(str(i)) + return df_names
    + +
    + +
    +
    +
    +
    + +
    +
    + + + + \ No newline at end of file diff --git a/_modules/hyswap/utils.html b/_modules/hyswap/utils.html index d9f800b..b39112d 100644 --- a/_modules/hyswap/utils.html +++ b/_modules/hyswap/utils.html @@ -2,18 +2,18 @@ - + - hyswap.utils — hyswap 0.1.dev1+gc646d0d documentation + hyswap.utils — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - + @@ -47,7 +47,9 @@

    Source code for hyswap.utils

     import pandas as pd
     
     
    -
    [docs]def filter_approved_data(data, filter_column=None): +
    +[docs] +def filter_approved_data(data, filter_column=None): """Filter a dataframe to only return approved "A" (or "A, e") data. Parameters @@ -89,7 +91,10 @@

    Source code for hyswap.utils

                          (data[filter_column] == "A, e"))]
    -
    [docs]def rolling_average(df, data_column_name, data_type, **kwargs): + +
    +[docs] +def rolling_average(df, data_column_name, data_type, **kwargs): """Calculate a rolling average for a dataframe. Default behavior right-aligns the window used for the rolling average, @@ -122,7 +127,10 @@

    Source code for hyswap.utils

         return df_out
    -
    [docs]def filter_data_by_time(df, value, data_column_name, date_column_name=None, + +
    +[docs] +def filter_data_by_time(df, value, data_column_name, date_column_name=None, time_interval='day', leading_values=0, trailing_values=0): """Filter data by some time interval. @@ -224,7 +232,10 @@

    Source code for hyswap.utils

         return dff
    -
    [docs]def calculate_metadata(data): + +
    +[docs] +def calculate_metadata(data): """Calculate metadata for a series of data. Parameters @@ -263,7 +274,10 @@

    Source code for hyswap.utils

         return meta
    -
    [docs]def define_year_doy_columns(df, date_column_name=None, year_type='calendar', + +
    +[docs] +def define_year_doy_columns(df, date_column_name=None, year_type='calendar', clip_leap_day=False): """Function to add year, day of year, and month-day columns to a DataFrame. @@ -343,7 +357,10 @@

    Source code for hyswap.utils

         return df
    -
    [docs]def leap_year_adjustment(df, year_type='calendar'): + +
    +[docs] +def leap_year_adjustment(df, year_type='calendar'): """Function to adjust leap year days in a DataFrame. Adjust for a leap year by removing February 29 from the DataFrame and @@ -385,7 +402,10 @@

    Source code for hyswap.utils

         return df
    -
    [docs]def munge_nwis_stats(df, source_pct_col=None, target_pct_col=None, + +
    +[docs] +def munge_nwis_stats(df, source_pct_col=None, target_pct_col=None, year_type='calendar'): """Function to munge and reformat NWIS statistics data. @@ -394,6 +414,8 @@

    Source code for hyswap.utils

         be used on Python dataretrieval dataframe returns for the nwis.get_stats()
         function for "daily" data, a single site, and a single parameter code.
     
    +    Parameters
    +    ----------
         df : pandas.DataFrame
             DataFrame containing NWIS statistics data retrieved from the statistics
             web service. Assumed to come in as a dataframe retrieved with a
    @@ -477,7 +499,10 @@ 

    Source code for hyswap.utils

         return df_slim
    -
    [docs]def calculate_summary_statistics(df, data_col="00060_Mean"): + +
    +[docs] +def calculate_summary_statistics(df, data_col="00060_Mean"): """ Calculate summary statistics for a site. @@ -550,7 +575,76 @@

    Source code for hyswap.utils

         return summary_df
    -
    [docs]def set_data_type(data_type): + +
    +[docs] +def filter_to_common_time(df_list): + """Filter a list of dataframes to common times based on index. + + This function takes a list of dataframes and filters them to only include + the common times based on the index of the dataframes. This is necessary + before comparing the timeseries and calculating statistics between two or + more timeseries. + + Parameters + ---------- + df_list : list + List of pandas.DataFrame objects to filter to common times. + DataFrames assumed to have date-time information in the index. + Expect input to be the output from a function like + dataretrieval.nwis.get_dv() or similar. + + Returns + ------- + df_list : list + List of pandas.DataFrame objects filtered to common times. + n_obs : int + Number of observations in the common time period. + + Examples + -------- + Get some NWIS data. + + .. doctest:: + + >>> df1, md1 = dataretrieval.nwis.get_dv( + ... "03586500", parameterCd="00060", + ... start="2018-12-15", end="2019-01-07") + >>> df2, md2 = dataretrieval.nwis.get_dv( + ... "01646500", parameterCd="00060", + ... start="2019-01-01", end="2019-01-14") + >>> type(df1) + <class 'pandas.core.frame.DataFrame'> + >>> type(df2) + <class 'pandas.core.frame.DataFrame'> + + Filter the dataframes to common times. + + .. doctest:: + + >>> df_list, n_obs = utils.filter_to_common_time([df1, df2]) + >>> df_list[0].shape + (7, 3) + >>> df_list[1].shape + (7, 3) + """ + # get the common index + common_index = df_list[0].index + for df in df_list: + common_index = common_index.intersection(df.index) + # filter the dataframes to the common index + for i, df in enumerate(df_list): + df_list[i] = df.loc[common_index] + # get the number of observations + n_obs = len(common_index) + # return the list of dataframes + return df_list, n_obs
    + + + +
    +[docs] +def set_data_type(data_type): """Function to set the data type for rolling averages. Parameters @@ -579,6 +673,7 @@

    Source code for hyswap.utils

             data_type = '28D'
     
         return data_type
    +
    @@ -610,14 +705,14 @@

    Navigation

  • modules |
  • - +
    \ No newline at end of file diff --git a/_modules/index.html b/_modules/index.html index d5a2926..8ec03ec 100644 --- a/_modules/index.html +++ b/_modules/index.html @@ -2,18 +2,18 @@ - + - Overview: module code — hyswap 0.1.dev1+gc646d0d documentation + Overview: module code — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - +
    @@ -48,6 +48,7 @@

    All modules for which code is available

  • hyswap.plots
  • hyswap.rasterhydrograph
  • hyswap.runoff
  • +
  • hyswap.similarity
  • hyswap.utils
  • @@ -80,13 +81,13 @@

    Navigation

  • modules |
  • - +
    \ No newline at end of file diff --git a/_sources/examples/index.rst.txt b/_sources/examples/index.rst.txt index d3f9797..716ab3c 100644 --- a/_sources/examples/index.rst.txt +++ b/_sources/examples/index.rst.txt @@ -34,4 +34,12 @@ Cumulative Hydrograph Examples .. toctree:: :maxdepth: 2 - cumulative_hydrograph_examples \ No newline at end of file + cumulative_hydrograph_examples + +Similarity Examples +------------------- + +.. toctree:: + :maxdepth: 2 + + similarity_examples diff --git a/_sources/examples/similarity_examples.rst.txt b/_sources/examples/similarity_examples.rst.txt new file mode 100644 index 0000000..0c58f3e --- /dev/null +++ b/_sources/examples/similarity_examples.rst.txt @@ -0,0 +1,158 @@ + +Similarity Measures +------------------- + +These examples showcase the usage of the functions in the `similarity` module, with heatmap visualizations via the :obj:`hyswap.plots.plot_similarity_heatmap` function. +Sometimes it is helpful to compare the relationships between a set of stations and their respective measurements. +The `similarity` functions packaged in `hyswap` handle some of the data clean-up for you by ensuring the time-series of observations being compared at the same, and by removing any missing data. +This ensures that your results are not skewed by missing data or gaps in one of the time-series. + + +Correlations Between 5 Stations +******************************* + +The following example shows the correlations between streamflow at 5 stations (07374525, 07374000, 07289000, 07032000, 07024175) along the Mississippi River, listed from downstream to upstream. +First we have to fetch the streamflow data for these stations, to do this we will use the `dataretrieval` package to access the NWIS database. + +.. plot:: + :context: reset + :include-source: + + # get the data from these 5 sites + site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"] + + # fetch some streamflow data from NWIS as a list of dataframes + df_list = [] + for site in site_list: + df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01", + end="2022-12-31", + parameterCd='00060') + df_list.append(df) + +Once we've collected the streamflow data, we will calculate the pair-wise correlations between the stations using the :obj:`hyswap.similarity.calculate_correlations` function and then plot the results using :obj:`hyswap.plots.plot_similarity_heatmap`. + +.. plot:: + :context: + :include-source: + + # calculate correlations + results, n_obs = hyswap.similarity.calculate_correlations(df_list, "00060_Mean") + + # make plot + ax = hyswap.plots.plot_similarity_heatmap( + results, n_obs=n_obs, + title="Pearson Correlation Coefficients for Streamflow\n" + + "Between 5 Sites Along the Mississippi River") + + # show the plot + plt.tight_layout() + plt.show() + +If we'd like, we can display the specific values of the correlations by setting the `show_values` argument to `True` in the :obj:`hyswap.plots.plot_similarity_heatmap` function. + +.. plot:: + :context: reset + :include-source: + + # get the data from these 5 sites + site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"] + + # fetch some streamflow data from NWIS as a list of dataframes + df_list = [] + for site in site_list: + df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01", + end="2022-12-31", + parameterCd='00060') + df_list.append(df) + + # calculate correlations + results, n_obs = hyswap.similarity.calculate_correlations(df_list, "00060_Mean") + + # make plot + ax = hyswap.plots.plot_similarity_heatmap( + results, n_obs=n_obs, + title="Pearson Correlation Coefficients for Streamflow\n" + + "Between 5 Sites Along the Mississippi River", + show_values=True) + + # show the plot + plt.tight_layout() + plt.show() + + +Wasserstein Distances Between 5 Stations +**************************************** + +In this example we compare the same 5 time-series as the previous example, but instead of calculating correlations, we calculate the `Wasserstein Distance `_ between each pairing of time-series. +The Wasserstein Distance is a measure of the distance between two probability distributions, in this case the probability distributions of the streamflow values at each station. +Specifically in `hyswap`, we utilize the `scipy.stats.wasserstein_distance()` function, and are therefore specifically calculating the "first" Wasserstein Distance between two time-series. + +.. _wasserstein_doc: https://en.wikipedia.org/wiki/Wasserstein_metric + +.. plot:: + :context: reset + :include-source: + + # get the data from these 5 sites + site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"] + + # fetch some streamflow data from NWIS as a list of dataframes + df_list = [] + for site in site_list: + df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01", + end="2022-12-31", + parameterCd='00060') + df_list.append(df) + + # calculate Wasserstein Distances + results, n_obs = hyswap.similarity.calculate_wasserstein_distance(df_list, "00060_Mean") + + # make plot + ax = hyswap.plots.plot_similarity_heatmap( + results, n_obs=n_obs, + title="Wasserstein Distances for Streamflow\n" + + "Between 5 Sites Along the Mississippi River", + show_values=True) + + # show the plot + plt.tight_layout() + plt.show() + + +Energy Distances Between 5 Stations +*********************************** + +In this example we compare the same 5 time-series as the previous example, but this time using another distance measure, the so-called "Energy Distance" between two time-series. +The `Energy Distance `_ is a statistical distance between two probability distributions, in this case the probability distributions of the streamflow values at each station. +Specifically in `hyswap`, we utilize the `scipy.stats.energy_distance()` function. + +.. _energy_dist: https://en.wikipedia.org/wiki/Energy_distance + +.. plot:: + :context: reset + :include-source: + + # get the data from these 5 sites + site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"] + + # fetch some streamflow data from NWIS as a list of dataframes + df_list = [] + for site in site_list: + df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01", + end="2022-12-31", + parameterCd='00060') + df_list.append(df) + + # calculate Wasserstein Distances + results, n_obs = hyswap.similarity.calculate_energy_distance(df_list, "00060_Mean") + + # make plot + ax = hyswap.plots.plot_similarity_heatmap( + results, n_obs=n_obs, + title="Energy Distances for Streamflow\n" + + "Between 5 Sites Along the Mississippi River", + show_values=True) + + # show the plot + plt.tight_layout() + plt.show() diff --git a/_sources/reference/index.rst.txt b/_sources/reference/index.rst.txt index 0ae5053..4f2b093 100644 --- a/_sources/reference/index.rst.txt +++ b/_sources/reference/index.rst.txt @@ -62,3 +62,10 @@ Runoff Calculation Functions .. automodule:: hyswap.runoff :members: :special-members: + +Similarity Functions +-------------------- + +.. automodule:: hyswap.similarity + :members: + :special-members: diff --git a/_static/basic.css b/_static/basic.css index 2a1ca75..c8079f4 100644 --- a/_static/basic.css +++ b/_static/basic.css @@ -237,6 +237,10 @@ a.headerlink { visibility: hidden; } +a:visited { + color: #551A8B; +} + h1:hover > a.headerlink, h2:hover > a.headerlink, h3:hover > a.headerlink, diff --git a/_static/bizstyle.css b/_static/bizstyle.css index 5e46037..8f1ce71 100644 --- a/_static/bizstyle.css +++ b/_static/bizstyle.css @@ -172,6 +172,10 @@ a:hover { text-decoration: underline; } +a:visited { + color: #551a8b; +} + div.body a { text-decoration: underline; } diff --git a/_static/bizstyle.js b/_static/bizstyle.js index ce40ff6..40af1ab 100644 --- a/_static/bizstyle.js +++ b/_static/bizstyle.js @@ -23,7 +23,7 @@ const initialiseBizStyle = () => { } window.addEventListener("resize", - () => (document.querySelector("li.nav-item-0 a").innerText = (window.innerWidth <= 776) ? "Top" : "hyswap 0.1.dev1+gc646d0d documentation") + () => (document.querySelector("li.nav-item-0 a").innerText = (window.innerWidth <= 776) ? "Top" : "hyswap 0.1.dev1+gdcc7916 documentation") ) if (document.readyState !== "loading") initialiseBizStyle() diff --git a/_static/documentation_options.js b/_static/documentation_options.js index 3579915..244331e 100644 --- a/_static/documentation_options.js +++ b/_static/documentation_options.js @@ -1,6 +1,5 @@ -var DOCUMENTATION_OPTIONS = { - URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'), - VERSION: '0.1.dev1+gc646d0d', +const DOCUMENTATION_OPTIONS = { + VERSION: '0.1.dev1+gdcc7916', LANGUAGE: 'en', COLLAPSE_INDEX: false, BUILDER: 'html', diff --git a/_static/searchtools.js b/_static/searchtools.js index 97d56a7..7918c3f 100644 --- a/_static/searchtools.js +++ b/_static/searchtools.js @@ -57,12 +57,12 @@ const _removeChildren = (element) => { const _escapeRegExp = (string) => string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string -const _displayItem = (item, searchTerms) => { +const _displayItem = (item, searchTerms, highlightTerms) => { const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; - const docUrlRoot = DOCUMENTATION_OPTIONS.URL_ROOT; const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; const [docName, title, anchor, descr, score, _filename] = item; @@ -75,20 +75,24 @@ const _displayItem = (item, searchTerms) => { if (dirname.match(/\/index\/$/)) dirname = dirname.substring(0, dirname.length - 6); else if (dirname === "index/") dirname = ""; - requestUrl = docUrlRoot + dirname; + requestUrl = contentRoot + dirname; linkUrl = requestUrl; } else { // normal html builders - requestUrl = docUrlRoot + docName + docFileSuffix; + requestUrl = contentRoot + docName + docFileSuffix; linkUrl = docName + docLinkSuffix; } let linkEl = listItem.appendChild(document.createElement("a")); linkEl.href = linkUrl + anchor; linkEl.dataset.score = score; linkEl.innerHTML = title; - if (descr) + if (descr) { listItem.appendChild(document.createElement("span")).innerHTML = " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } else if (showSearchSummary) fetch(requestUrl) .then((responseData) => responseData.text()) @@ -97,6 +101,9 @@ const _displayItem = (item, searchTerms) => { listItem.appendChild( Search.makeSearchSummary(data, searchTerms) ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); }); Search.output.appendChild(listItem); }; @@ -115,14 +122,15 @@ const _finishSearch = (resultCount) => { const _displayNextItem = ( results, resultCount, - searchTerms + searchTerms, + highlightTerms, ) => { // results left, load the summary and display it // this is intended to be dynamic (don't sub resultsCount) if (results.length) { - _displayItem(results.pop(), searchTerms); + _displayItem(results.pop(), searchTerms, highlightTerms); setTimeout( - () => _displayNextItem(results, resultCount, searchTerms), + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), 5 ); } @@ -360,7 +368,7 @@ const Search = { // console.info("search results:", Search.lastresults); // print the results - _displayNextItem(results, results.length, searchTerms); + _displayNextItem(results, results.length, searchTerms, highlightTerms); }, /** diff --git a/_static/sphinx_highlight.js b/_static/sphinx_highlight.js index aae669d..8a96c69 100644 --- a/_static/sphinx_highlight.js +++ b/_static/sphinx_highlight.js @@ -29,14 +29,19 @@ const _highlight = (node, addItems, text, className) => { } span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); parent.insertBefore( span, parent.insertBefore( - document.createTextNode(val.substr(pos + text.length)), + rest, node.nextSibling ) ); node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); if (isInSVG) { const rect = document.createElementNS( @@ -140,5 +145,10 @@ const SphinxHighlight = { }, }; -_ready(SphinxHighlight.highlightSearchWords); -_ready(SphinxHighlight.initEscapeListener); +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/examples/cumulative_hydrograph_examples.html b/examples/cumulative_hydrograph_examples.html index de3231e..98786f9 100644 --- a/examples/cumulative_hydrograph_examples.html +++ b/examples/cumulative_hydrograph_examples.html @@ -2,23 +2,23 @@ - + - Cumulative Streamflow Hydrographs — hyswap 0.1.dev1+gc646d0d documentation + Cumulative Streamflow Hydrographs — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + - + + + + +
    +
    +
    +
    + +
    +

    Similarity Measures

    +

    These examples showcase the usage of the functions in the similarity module, with heatmap visualizations via the hyswap.plots.plot_similarity_heatmap function. +Sometimes it is helpful to compare the relationships between a set of stations and their respective measurements. +The similarity functions packaged in hyswap handle some of the data clean-up for you by ensuring the time-series of observations being compared at the same, and by removing any missing data. +This ensures that your results are not skewed by missing data or gaps in one of the time-series.

    +
    +

    Correlations Between 5 Stations

    +

    The following example shows the correlations between streamflow at 5 stations (07374525, 07374000, 07289000, 07032000, 07024175) along the Mississippi River, listed from downstream to upstream. +First we have to fetch the streamflow data for these stations, to do this we will use the dataretrieval package to access the NWIS database.

    +
    # get the data from these 5 sites
    +site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"]
    +
    +# fetch some streamflow data from NWIS as a list of dataframes
    +df_list = []
    +for site in site_list:
    +    df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01",
    +                                      end="2022-12-31",
    +                                      parameterCd='00060')
    +    df_list.append(df)
    +
    +
    +

    Once we’ve collected the streamflow data, we will calculate the pair-wise correlations between the stations using the hyswap.similarity.calculate_correlations function and then plot the results using hyswap.plots.plot_similarity_heatmap.

    +
    # calculate correlations
    +results, n_obs = hyswap.similarity.calculate_correlations(df_list, "00060_Mean")
    +
    +# make plot
    +ax = hyswap.plots.plot_similarity_heatmap(
    +    results, n_obs=n_obs,
    +    title="Pearson Correlation Coefficients for Streamflow\n" +
    +          "Between 5 Sites Along the Mississippi River")
    +
    +# show the plot
    +plt.tight_layout()
    +plt.show()
    +
    +
    +

    (png, hires.png)

    +
    +../_images/similarity_examples-2.png +
    +

    If we’d like, we can display the specific values of the correlations by setting the show_values argument to True in the hyswap.plots.plot_similarity_heatmap function.

    +
    # get the data from these 5 sites
    +site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"]
    +
    +# fetch some streamflow data from NWIS as a list of dataframes
    +df_list = []
    +for site in site_list:
    +    df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01",
    +                                      end="2022-12-31",
    +                                      parameterCd='00060')
    +    df_list.append(df)
    +
    +# calculate correlations
    +results, n_obs = hyswap.similarity.calculate_correlations(df_list, "00060_Mean")
    +
    +# make plot
    +ax = hyswap.plots.plot_similarity_heatmap(
    +    results, n_obs=n_obs,
    +    title="Pearson Correlation Coefficients for Streamflow\n" +
    +          "Between 5 Sites Along the Mississippi River",
    +    show_values=True)
    +
    +# show the plot
    +plt.tight_layout()
    +plt.show()
    +
    +
    +

    (png, hires.png)

    +
    +../_images/similarity_examples-3.png +
    +
    +
    +

    Wasserstein Distances Between 5 Stations

    +

    In this example we compare the same 5 time-series as the previous example, but instead of calculating correlations, we calculate the Wasserstein Distance between each pairing of time-series. +The Wasserstein Distance is a measure of the distance between two probability distributions, in this case the probability distributions of the streamflow values at each station. +Specifically in hyswap, we utilize the scipy.stats.wasserstein_distance() function, and are therefore specifically calculating the “first” Wasserstein Distance between two time-series.

    +
    # get the data from these 5 sites
    +site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"]
    +
    +# fetch some streamflow data from NWIS as a list of dataframes
    +df_list = []
    +for site in site_list:
    +    df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01",
    +                                      end="2022-12-31",
    +                                      parameterCd='00060')
    +    df_list.append(df)
    +
    +# calculate Wasserstein Distances
    +results, n_obs = hyswap.similarity.calculate_wasserstein_distance(df_list, "00060_Mean")
    +
    +# make plot
    +ax = hyswap.plots.plot_similarity_heatmap(
    +    results, n_obs=n_obs,
    +    title="Wasserstein Distances for Streamflow\n" +
    +          "Between 5 Sites Along the Mississippi River",
    +    show_values=True)
    +
    +# show the plot
    +plt.tight_layout()
    +plt.show()
    +
    +
    +

    (png, hires.png)

    +
    +../_images/similarity_examples-4.png +
    +
    +
    +

    Energy Distances Between 5 Stations

    +

    In this example we compare the same 5 time-series as the previous example, but this time using another distance measure, the so-called “Energy Distance” between two time-series. +The Energy Distance is a statistical distance between two probability distributions, in this case the probability distributions of the streamflow values at each station. +Specifically in hyswap, we utilize the scipy.stats.energy_distance() function.

    +
    # get the data from these 5 sites
    +site_list = ["07374525", "07374000", "07289000", "07032000", "07024175"]
    +
    +# fetch some streamflow data from NWIS as a list of dataframes
    +df_list = []
    +for site in site_list:
    +    df, _ = dataretrieval.nwis.get_dv(site, start="2012-01-01",
    +                                      end="2022-12-31",
    +                                      parameterCd='00060')
    +    df_list.append(df)
    +
    +# calculate Wasserstein Distances
    +results, n_obs = hyswap.similarity.calculate_energy_distance(df_list, "00060_Mean")
    +
    +# make plot
    +ax = hyswap.plots.plot_similarity_heatmap(
    +    results, n_obs=n_obs,
    +    title="Energy Distances for Streamflow\n" +
    +          "Between 5 Sites Along the Mississippi River",
    +    show_values=True)
    +
    +# show the plot
    +plt.tight_layout()
    +plt.show()
    +
    +
    +

    (png, hires.png)

    +
    +../_images/similarity_examples-5.png +
    +
    +
    + + +
    +
    +
    +
    + +
    +
    + + + + \ No newline at end of file diff --git a/examples/streamflow_duration_hydrograph_examples.html b/examples/streamflow_duration_hydrograph_examples.html index cf5ee3c..3712ec6 100644 --- a/examples/streamflow_duration_hydrograph_examples.html +++ b/examples/streamflow_duration_hydrograph_examples.html @@ -2,19 +2,19 @@ - + - Streamflow Duration Hydrographs — hyswap 0.1.dev1+gc646d0d documentation + Streamflow Duration Hydrographs — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -40,7 +40,7 @@

    Navigation

  • previous |
  • - + @@ -52,11 +52,11 @@

    Navigation

    -

    Streamflow Duration Hydrographs

    +

    Streamflow Duration Hydrographs

    These examples show how a streamflow hydrograph can be constructed by fetching historical streamflow data from NWIS using dataretrieval, and then calculating daily percentiles of streamflow for each day of the year. The resulting hydrographs show the streamflow values for all of 2022 plotted on top of the historical percentiles which are shown as shaded regions.

    -

    Calculating Percentiles Using hyswap

    +

    Calculating Percentiles Using hyswap

    First, we will fetch streamflow data for a single gage from NWIS using the dataretrieval package.

    df, _ = dataretrieval.nwis.get_dv("03586500",
                                       parameterCd="00060",
    @@ -95,7 +95,7 @@ 

    Calculating Percentiles Using hyswap

    -

    Fetching Percentiles from the NWIS Statistics Service

    +

    Fetching Percentiles from the NWIS Statistics Service

    You don’t have to compute the percentiles using hyswap. If you’d rather use the NWIS web service daily percentiles, you can use those values instead. We provide a convenience utility function to help make this possible, hyswap.utils.munge_nwis_stats. @@ -143,7 +143,7 @@

    Fetching Percentiles from the NWIS Statistics Service

    -

    Plotting by Water Year

    +

    Plotting by Water Year

    The examples above show how to plot the percentiles by day of year using the calendar year. In this example, we will plot the percentiles by day of water year, as water years are commonly by hydrologists. The only change this requires from above is specifying the type of year we are planning to use when calculating the daily percentile thresholds.

    @@ -182,7 +182,7 @@

    Plotting by Water Year

    -

    Plotting by Climate Year

    +

    Plotting by Climate Year

    The examples above show how to plot the percentiles by day of year using the calendar year. In this example, we will plot the percentiles by day of climate year. The only change this requires from above is specifying the type of year we are planning to use when calculating the daily percentile thresholds.

    @@ -221,7 +221,7 @@

    Plotting by Climate Year -

    Plotting Custom Set of Percentile Thresholds

    +

    Plotting Custom Set of Percentile Thresholds

    In this example we will calculate and plot a unique set of daily percentile thresholds. We will also specify the colors to be used for the percentile envelopes.

    # fetch historic data from NWIS
    @@ -261,7 +261,7 @@ 

    Plotting Custom Set of Percentile Thresholds -

    Rolling Averages for Historic Daily Percentile Calculations

    +

    Rolling Averages for Historic Daily Percentile Calculations

    In this example, rather than calculating historic daily percentile values based solely on the past values from that day of the year, we will calculate the historic daily percentile values based on rolling averages of the past values around that day. Under the hood this uses the pandas.DataFrame.rolling() method to calculate the rolling average, with the default parameters. To show the effect of this, we will plot the historic daily percentile values for the daily (default) rolling average, 7-day rolling average, and the 28-day rolling average.

    @@ -331,7 +331,7 @@

    Rolling Averages for Historic Daily Percentile Calculations -

    Customizing Fill Areas

    +

    Customizing Fill Areas

    In this example we will customize the fill areas between the percentile thresholds by passing keyword arguments to the hyswap.plots.plot_duration_hydrograph function that are then passed through to the matplotlib.axes.Axes.fill_between() function. Specifically we will set the alpha argument to 1.0 to make the fill areas opaque (the default value is 0.5 for some transparency).

    # fetch historic data from NWIS
    @@ -440,14 +440,14 @@ 

    Navigation

  • previous |
  • - +
    \ No newline at end of file diff --git a/genindex.html b/genindex.html index ee9deda..041a2da 100644 --- a/genindex.html +++ b/genindex.html @@ -2,18 +2,18 @@ - + - Index — hyswap 0.1.dev1+gc646d0d documentation + Index — hyswap 0.1.dev1+gdcc7916 documentation - + - + - + @@ -31,7 +31,7 @@

    Navigation

  • modules |
  • - +
    @@ -68,6 +68,8 @@

    _