Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate fleet data by subplant, not plant #395

Merged
merged 26 commits into from
Nov 28, 2024
Merged

Conversation

grgmiller
Copy link
Collaborator

@grgmiller grgmiller commented Nov 20, 2024

Purpose

This PR is in response to #392, which explains:

If a plant has multiple generators that burn multiple fuels, all of the generation and emissions for this plant are assigned the plant primary fuel type when aggregating to the fleet level or other higher level aggregations.

For example, in 2022, the Meramec plant overall burned more natural gas than coal, although only units 1 and 2 burned natural gas, while units 3 and 4 burned coal. However, since this plant was categorized as natural gas, all of the coal emissions also got lumped in with natural gas, which can throw off the fleet totals.

Going forward, we should probably try and publish subplant-level data (at least annually) as well as perform all fleet-level aggregations based on subplant-level data, rather than aggregating the already aggregated plant-level data.

This PR does two main things:

  • Exports new subplant-level, annual and monthly results as part of the results/plant_data/ outputs
  • Uses subplants (rather than plants) as the basis for identifying fleet composition
  • Updates code to ensure complete fuel mapping to all subplants (see below)

Closes CAR-4709, CAR-4708

What the code is doing

Specific changes in this PR include:

  • Export monthly and annual subplant-level data as part of the results. We actually use this data elsewhere so it would be useful to have this exported.
  • Re-arrange the order of the data pipeline so that all monthly/annual results (plant data and power sector data) are exported before turning to the hourly imputation and data exports. This makes the pipeline a bit cleaner for the historical data years as well.
  • Update generated averages function. We previously exported an output table for generation averages, which represented a US-average fleet-level generated emission rate. We use this in the pipeline as backstop emission rates for non-US balancing areas. To make this more clear, I updated this function to export a "US" file to results/power_sector_data
  • Remove option to export shaped fleet data as part of plant-level results. OGE originally did not export hourly data for all EIA plants, but we changed this in Add hourly data for all individual plants #246, but retained the option to export data in the old way. This PR cleans this up and only gives the option to export data in the new way.
  • Update the documentation to reflect the updated pipeline order, and update step numbers in pipeline
  • When calculating residual fleet profiles, aggregate CEMS data based on capacity-based, subplant primary fuels rather than fuel-based. This is likely better aligned with how the data would be reported to EIA-930.
  • Shaping EIA data: Shape subplant-level data instead of plant-level data.
  • When shaping EIA data, assign profiles based on the subplant, capacity-based primary fuel
  • Update and clarify add_missing_cems_profiles(). In our existing pipeline, when shaping data, we use cems profiles as one of the backstop hourly profiles. However, I noticed that we were using the CEMS profiles that we used to calculate the residual profiles, rather than the CEMS profiles specifically calculated for backstop purposes. This PR fixes this.

Ensuring complete subplant_primary_fuel coverage

Now that this PR switches to basing all aggregations on the subplant level, when testing the PR I ran into an issue where a UserWarning was being raised because there were not primary fuel category mappings available for all subplants (this was not previously a problem when we were using plant-level data).

Digging into this issue, I found that there were three types of issues preventing a complete mapping:

  1. The issue: Sometimes, a generator changes its plant_id_eia or generator_id over time, while the EPA plant and unit identifiers remain the same. For example, when a generator was being repowered, it would be assigned a new generator ID by EIA. For example, at plant 10298, generator "GEN1" got renamed "GT1" after a repower in 2014. In addition, sometimes certain generators at an existing plant switch ownership, and are assigned a new plant ID. For example at plant 1571, three generators switched to a new plant id (65285) in 2021. While two of the generators then retired that year, one of them remained at the new plant code through 2022, and then switched BACK to plant 1571 in 2023. The fix: This involved adding these mappings to the epa_eia_crosswalk_manual.csv file, but since these mappings are time-dependent, I also had to add new columns to indicate the start and end year for the mapping to be valid (if no year is specified, the mapping is assumed to be valid for all time). This also involved some tweaks to the code where we convert the CEMS IDs to EIA IDs. Because we are only ever loading a single year of CEMS data at a time, we can filter the mappings to only include those which are valid for that year.
  2. The issue: CEMS reports data for generators that are still "proposed" in EIA. Previously, when constructing the primary fuel table (which is used to assign fleet identities to subplants), we were only determining primary fuels for generators that were currently operational according to EIA. The fix: We now add proposed generators that are in the late stage of their development (currently under construction) to the primary fuel table. Because these generators do not yet have any reported fuel data in EIA, the fuel type is determined by the capacity-based fuel reported in EIA-860.

This also revealed a bug with the gross-to-net generation conversion, where retired generators that reported 0 generation to CEMS were getting net generation applied incorrectly. I fixed this by dropping these subplants from the GTN conversion calculation so that they should backstop to using the default backstop ratio, which should still result in zero net generation.

Impacts

These changes would likely have wide ranging impacts on the results:

  • Aggregating fleets by subplants will change the fleet average emission rates
  • Aggregating fleets by subplant may also change the profiles used to shape hourly data
  • Changing some of the backstop profiles used for shaping may also change hourly profiles for EIA data

Testing

Have not yet tested:

  • Run the pipeline for 2022 without errors
  • Compare new outputs to old outputs

Where to look

Suggested order to review files:

  1. data_pipeline
  2. data_cleaning
  3. helpers
  4. impute_hourly_profiles
  5. output_data
  6. validation, column checks, consumed

Usage Example/Visuals

N/A

Review estimate

30-45 minutes (happy to walk through any of this)

Future work

As I was working on this, I noticed several opportunities to re-organize which modules certain functions live in, or opportunities to create new modules to shorten some of our modules. I didn't implement this because it would make it harder to review line-item changes to existing functions in the future, we could:

  • Create an "aggregation" module that is separate from outputs and hourly imputation
  • Move the "calculate and export hourly plant data" to a different module?

Checklist

  • Update the documentation to reflect changes made in this PR
  • Format all updated python files using black
  • Clear outputs from all notebooks modified
  • Add docstrings and type hints to any new functions created

@grgmiller grgmiller requested a review from rouille November 22, 2024 23:12
@grgmiller grgmiller marked this pull request as ready for review November 22, 2024 23:13
@@ -287,85 +287,59 @@ def output_data_quality_metrics(
)


def output_plant_data(
df: pd.DataFrame,
def write_plant_data_to_results(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to specifically output monthly and annual plant level data, or output monthly or annual subplant-level data

@@ -398,10 +372,10 @@ def convert_results(df: pd.DataFrame) -> pd.DataFrame:
return converted


def write_generated_averages(
def write_national_fleet_averages(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to make clearer and write "US" file to results rather than outputs

@@ -372,6 +330,36 @@
},
}

DATA_COLUMNS = [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from data_cleaning

@@ -143,7 +143,11 @@ def load_cems_ids() -> pd.DataFrame:
filters=[["year", "==", year]],
columns=["plant_id_epa", "plant_id_eia", "emissions_unit_id_epa"],
).drop_duplicates()
cems_id_year = apply_dtypes(cems_id_year)
# update the plant_id_eia column using manual matches
cems_id_year = update_epa_to_eia_map(cems_id_year, year)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved up to apply dytpes and EIA mappings on a year-by-year basis, rather than at the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a utility function to remove certain files locally that I never use (like all of the metric unit files) to save space on my computer.

src/oge/helpers.py Outdated Show resolved Hide resolved
src/oge/helpers.py Outdated Show resolved Hide resolved
src/oge/helpers.py Outdated Show resolved Hide resolved
src/oge/output_data.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rouille rouille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@grgmiller grgmiller merged commit ddbf126 into development Nov 28, 2024
2 checks passed
@grgmiller grgmiller deleted the greg/subplant_agg branch November 28, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregate data to fuel types using subplant-level rather than plant-level data
2 participants