Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate fleet data by subplant, not plant #395

Merged
merged 26 commits into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
13fdfe3
output subplant data
grgmiller Nov 20, 2024
f23b908
export fleet data
grgmiller Nov 20, 2024
516fd11
aggregate cems based on capacity fuels
grgmiller Nov 21, 2024
69e117e
shape EIA based on subplant primary fuel
grgmiller Nov 21, 2024
e7d3ee0
clean up cems backstop profiles
grgmiller Nov 22, 2024
f9df46c
update step numbers and docs
grgmiller Nov 22, 2024
66cac73
add US fleet average results
grgmiller Nov 22, 2024
2e5fe51
add notebook to remove unnecessary files
grgmiller Nov 22, 2024
a5c646a
move function to fix circular import
grgmiller Nov 22, 2024
2f0fb07
add new pudl columns to dtypes
grgmiller Nov 22, 2024
1040577
fix formatting
grgmiller Nov 22, 2024
d1349ff
add more helpful warning for missing fleet keys
grgmiller Nov 23, 2024
768be7d
fix missing primary fuels for CEMS
grgmiller Nov 23, 2024
0c2a89c
fix issue with UC generators not being added
grgmiller Nov 23, 2024
6c892f7
add years to epa-eia crosswalk
grgmiller Nov 23, 2024
55d749f
fix bugs
grgmiller Nov 23, 2024
c27fd00
update broken import
grgmiller Nov 23, 2024
9da20da
update notebooks
grgmiller Nov 23, 2024
8c3ab72
fix bug with gtn
grgmiller Nov 24, 2024
807e648
fix dropped primary fuel cols
grgmiller Nov 26, 2024
570aa76
respond to comments
grgmiller Nov 26, 2024
bb2ebbd
fix memory error
grgmiller Nov 27, 2024
18d9b06
remove shaped plant ids
grgmiller Nov 28, 2024
134f3c0
reduce memory use during grouping
grgmiller Nov 28, 2024
4f1fc4b
final bug fixes
grgmiller Nov 28, 2024
f208f86
Merge branch 'development' into greg/subplant_agg
grgmiller Nov 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docs/Data Validation/Comparing Data to eGRID.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
stoplight-id: egrid_comparison
---
Although the OGE methodology is based on the EPA's eGRID methodology, there are some important differences. Thus, if comparing OGEI data to eGRID data, it is important to keep the following differences in mind:
Although the OGE methodology is based on the EPA's eGRID methodology, there are some important differences. Thus, if comparing OGE data to eGRID data, it is important to keep the following differences in mind:


<table>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
stoplight-id: shaping_fleet_data
---

## Shaping EIA-only without plant-specific profiles
## Shaping EIA-only without generator-specific profiles

One of the primary innovations of the Open Grid Emissions Initiative is its approach to assigning an hourly profile to monthly generation and fuel data reported in EIA-923. This is accomplished using hourly regional generation fleet data reported in EIA-930 (also known as the “Hourly Electric Grid Monitor”). For each regional balancing authority, EIA-930 reports the hourly net generation from all generators of each fuel category (e.g. coal, natural gas, hydro, solar, etc) in that region (which we will refer to as a “fleet”). Since we know the total net generation profile of each fleet, as well as the net generation profile reported to CEMS, we can calculate a residual profile that should theoretically reflect the aggregate profile of all generators in a fleet that do not report to CEMS. This fleet-specific profile can then be used to shape the monthly total data reported to EIA-923. Although this method still has several issues (which will be discussed later), we believe this to be the best currently available method for imputing these shapes, since it is based on observed data.

> Definition: a group of all plants in a balancing authority that have the same fuel category (e.g. coal plants in MISO) are referred to as a "fleet"
> Definition: a group of all subplants in a balancing authority that have the same fuel category (e.g. coal subplants in MISO) are referred to as a "fleet"


## Calculating a residual profile

At its most basic, calculating a residual hourly profile that reflects the generation profile of the part of each fleet that does not report to CEMS involves subtracting the hourly CEMS net generation profile for that fleet from the hourly total EIA-930 profile for that fleet.

To prepare the EIA-930 data for this calculation, it is cleaned and reconciled using a process described elsewhere in this documentation. To prepare the CEMS data for this calculation, first all previously-shaped data (CEMS, partial CEMS subplant, and partial CEMS plant) is added together, and aggregated by balancing authority and fuel category. Each plant may be assigned one of 41 unique primary fuels, but EIA-930 only reports generation totals for 8 broader fuel categories (solar, wind, hydro, nuclear, natural gas, coal, petroleum, and other), each specific energy source type is mapped to one of these broader categories for aggregation.
To prepare the EIA-930 data for this calculation, it is cleaned and reconciled using a process described elsewhere in this documentation. To prepare the CEMS data for this calculation, first all previously-shaped data (CEMS, partial CEMS subplant, and partial CEMS plant) is added together, and aggregated by balancing authority and fuel category. Each subplant may be assigned one of 41 unique primary fuels, but EIA-930 only reports generation totals for 8 broader fuel categories (solar, wind, hydro, nuclear, natural gas, coal, petroleum, and other), each specific energy source type is mapped to one of these broader categories for aggregation.

In some cases, the CEMS profile for a fleet will be larger than the EIA-930 fleet total. This may be a result of several inconsistencies in the way that generation data is reported to EIA-930 versus EIA-860 and EIA-923. These issues are discussed in more depth below, but include instances when generation is reported to a different balancing authority, is categorized under a different fuel category, or not reported to EIA-930 because the plant it is associated with is connected to the distribution grid, rather than the transmission grid.

Expand All @@ -35,7 +35,7 @@ If no data is available from neighboring BAs, we instead use a national average

If neither a good-quality residual profile or shifted residual profile are available for a given fleet, we first fall back to using the total EIA-930 fleet profile to estimate the profile. If a complete EIA-930 fleet profile is not available, we fall back to using the fleet profile for those plants that report to CEMS. Both of these approaches assume that plants of a certain fuel type in a given region generally operate in similar patterns (which may not always be the case). The EIA-930 profile is preferred to the CEMS profile because we assume that the average profile of the entire fleet better represents any single member of the fleet than the profile of a non-random sample of the fleet (CEMS data only represents relatively large generators > 25MW).

If all other attempts at imputing a reasonable hourly profile fail, we assign a flat hourly profile to the data (which is functionally the same as using a monthly average value. For certain fuels, like geothermal, nuclear, biomass, and waste, which tend to run as baseload resources, this may be a reasonable assumption.
If all other attempts at imputing a reasonable hourly profile fail, we assign a flat hourly profile to the data (which is functionally the same as using a monthly average value). For certain fuels, like geothermal, nuclear, biomass, and waste, which tend to run as baseload resources, this may be a reasonable assumption.


## Known issues with using EIA-930 data for hourly profiles
Expand Down Expand Up @@ -70,16 +70,12 @@ The primary fuel codes assigned to each plant in the OGE pipeline may not match
Our understanding is that the data published in EIA-930 only reflects plants have metered telemetry that communicates with the grid operator. In many cases, this may exclude plants that are connected to distribution grids, rather than directly interconnected at the transmission level. Thus, it is possible that the set of plants reported in EIA-930 may not reflect the full set of plants that report to EIA-923 (which includes plants that are connected to distribution grids as well).

## Shaping the Monthly data
Once an hourly profile for each fleet-month has been calculated, it is converted to a percentage of the monthly total value. These percentages are then multiplied by each monthly total value to get the hourly profile for each fleet. This approach ensures that the shaped hourly values, when aggregated back to the monthly level, will equal the total reported monthly value that was shaped.
Once an hourly profile for each fleet-month has been calculated, it is converted to a percentage of the monthly total value. These percentages are then multiplied by each monthly total value to get the hourly profile for each subplant. This approach ensures that the shaped hourly values, when aggregated back to the monthly level, will equal the total reported monthly value that was shaped.

Currently, we only report hourly data for these EIA-only plants at the fleet level, rather than the individual plant level. We do this for several reasons:
1. Although the hourly CEMS data represents approximately 90% of all electricity-related emissions, the EIA data that is being shaped accounts for approximately 80% of all subplant operating hours in the dataset, meaning that there is a large amount of small plants that need to be shaped. Thus, creating an hourly value for each of these plants would result in a huge dataset that many computers do not have the ability to store in their memory (RAM). Thus, shaping fleet-level data helps keep the size of this dataset managable.
2. Because the method used to shape the data relies on fleet-level observations (rather than plant-specific imputation), we feel that providing a plant-level hourly value may create a sense of false precision in the end result.

Because aggregating the plant-level data to the fleet level removes the `plant_id_eia` identifier, we create a synthetic `shaped_plant_id` to represent these aggregated plants. These synthetic ids are 6-digit identifiers that follow the format `9BBBFF` where `BBB` is the three-digit ba number identified in [this table](https://github.com/singularity-energy/open-grid-emissions/blob/main/data/manual/ba_reference.csv), and `FF` represents a two-digit number unique to each fuel category, and defined [here](https://github.com/singularity-energy/open-grid-emissions/blob/afb3ddec0dc93003c21f655b90300c17344107f8/src/impute_hourly_profiles.py#L11). A mapping of `plant_id_eia` to `shaped_plant_id` for each plant can be found in the `plant_static_attributes` table that is included in the dataset.
When exporting hourly, plant-level data, we export hourly data for each individual plant. However, this data is too large to hold in memory, so this data is created for each BA and then exported. When calculating hourly, fleet-level outputs, we first aggregate the EIA-923 data to the fleet level before assigning a shape.

## Future Work, Known Issues, and Open Questions
- Infer missing hourly profiles for hydro generation ([details](https://github.com/singularity-energy/open-grid-emissions/issues/37)
- Infer hourly profiles for energy storage charge and discharge ([details](https://github.com/singularity-energy/open-grid-emissions/issues/59)
- Should we model hourly shapes for missing peaker or load following genration? ([details](https://github.com/singularity-energy/open-grid-emissions/issues/96)
- Improve imputation of missing wind and solar generation profiles ([details](https://github.com/singularity-energy/open-grid-emissions/issues/171)
- Infer missing hourly profiles for hydro generation ([details](https://github.com/singularity-energy/open-grid-emissions/issues/37))
- Infer hourly profiles for energy storage charge and discharge ([details](https://github.com/singularity-energy/open-grid-emissions/issues/59))
- Should we model hourly shapes for missing peaker or load following genration? ([details](https://github.com/singularity-energy/open-grid-emissions/issues/96))
- Improve imputation of missing wind and solar generation profiles ([details](https://github.com/singularity-energy/open-grid-emissions/issues/171))
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
---
stoplight-id: ba_aggregation
---

## Aggregating plant data to balancing authority

In addition to plant-level data, OGEI includes data aggregated to the balancing authority level.

As explained in the [eGRID technical support documentation](https://www.epa.gov/system/files/documents/2022-01/egrid2020_technical_guide.pdf):


> A balancing authority is a portion of an integrated power grid for which a single dispatcher has operational control of all electric generators. A balancing authority is the responsible entity that integrates resource plans ahead of time, maintains demand and resource balance within a BA area, and supports interconnection frequency in real time. The balancing authority dispatches generators in order to meet an area’s needs and can also control load to maintain the load-generation balance.

Balancing authorities are assigned to each plant based on plant-level data reported in EIA-860. If a plant is missing a reported BA code, we attempt to infer the ba based on the reported balancing authority name, the reported utility name, and the reported transmission or distribution system owner name, in that order. This mapping is based on [this table](https://github.com/singularity-energy/open-grid-emissions/blob/main/data/manual/utility_name_ba_code_map.csv) which associates the full balancing authority name with the ba_code.

For plants that do not belong to a specific BA, we assign a miscellaneous BA code based on the two-letter state code of the state where the plant is located. For example, many plants in Alaska will be assigned the code “AKMS” and many plants in Hawaii will be assigned a code of “HIMS.” For any plants that are associated with a balancing authority that is no longer active, we also replace the retired BA code with a state-based miscellaneous code.

## Commercial vs physical balancing authorities

According to the [EIA-930 instructions](https://www.eia.gov/survey/form/eia_930/instructions.pdf), there are two primary ways that an individual generator can be assigned to a balancing authority:
- The “physical” definition would assign a generator to a balancing authority based on whether that generator is “physically embedded within the tie line boundary of [a] balancing authority.”
- The “commercial” definition would assign a plant to the balancing authority that owns, operates, and/or dispatches that generator.

In general, it seems that the “commercial” definition of balancing authority is used, whereas EIA-930 “is attempting to represent electric system operations in as purely a physical way as possible. Ownership and dispatch are irrelevant to the determination of what is associated with a balancing authority.”

More specifically, the EIA-930 instructions state the following requirements for reporting data:

> Generators physically embedded within the tie line boundary of your balancing authority, but owned, operated, or dispatched by another balancing authority, are to be included, for the purposes of the EIA-930, in your reporting of net generation. The transmission connection of that plant to your system is not considered to be a tie line boundary.

In EIA Form 860, information is collected both about a plant’s balancing authority, and about who owns the transmission lines that a plant is connected to. The [instructions](https://www.eia.gov/survey/form/eia_860/instructions.pdf) for Form 860 note that “A balancing authority manages supply, demand, and interchanges within an electrically defined area. It may or may not be the same as the Owner of Transmission/Distribution Facilities.”

It would thus seem that the “balancing authority” reported in Form 860 would represent the plant’s “commercial” balancing authority, while the “transmission owner” would represent the plant’s “physical” balancing authority.

Thus, in the OGE dataset, all references to `ba_code` represent commercial balancing authorities. However, we also assign each plant a `ba_code_physical` based on the reported “Transmission or Distribution System Owner” in Form 860, Schedule 2. However, `ba_code_physical` is not yet used in the data pipeline until we can better understand how it relates to the EIA-930 data.
---
stoplight-id: ba_aggregation
---
## Aggregating plant data to balancing authority
In addition to plant-level data, OGE includes data aggregated to the balancing authority level.
As explained in the [eGRID technical support documentation](https://www.epa.gov/system/files/documents/2022-01/egrid2020_technical_guide.pdf):
> A balancing authority is a portion of an integrated power grid for which a single dispatcher has operational control of all electric generators. A balancing authority is the responsible entity that integrates resource plans ahead of time, maintains demand and resource balance within a BA area, and supports interconnection frequency in real time. The balancing authority dispatches generators in order to meet an area’s needs and can also control load to maintain the load-generation balance.
Balancing authorities are assigned to each plant based on plant-level data reported in EIA-860. If a plant is missing a reported BA code, we attempt to infer the ba based on the reported balancing authority name, the reported utility name, and the reported transmission or distribution system owner name, in that order. This mapping is based on [this table](https://github.com/singularity-energy/open-grid-emissions/blob/main/data/manual/utility_name_ba_code_map.csv) which associates the full balancing authority name with the ba_code.
For plants that do not belong to a specific BA, we assign a miscellaneous BA code based on the two-letter state code of the state where the plant is located. For example, many plants in Alaska will be assigned the code “AKMS” and many plants in Hawaii will be assigned a code of “HIMS.” For any plants that are associated with a balancing authority that is no longer active, we also replace the retired BA code with a state-based miscellaneous code.
## Commercial vs physical balancing authorities
According to the [EIA-930 instructions](https://www.eia.gov/survey/form/eia_930/instructions.pdf), there are two primary ways that an individual generator can be assigned to a balancing authority:
- The “physical” definition would assign a generator to a balancing authority based on whether that generator is “physically embedded within the tie line boundary of [a] balancing authority.”
- The “commercial” definition would assign a plant to the balancing authority that owns, operates, and/or dispatches that generator.
In general, it seems that the “commercial” definition of balancing authority is used, whereas EIA-930 “is attempting to represent electric system operations in as purely a physical way as possible. Ownership and dispatch are irrelevant to the determination of what is associated with a balancing authority.”
More specifically, the EIA-930 instructions state the following requirements for reporting data:
> Generators physically embedded within the tie line boundary of your balancing authority, but owned, operated, or dispatched by another balancing authority, are to be included, for the purposes of the EIA-930, in your reporting of net generation. The transmission connection of that plant to your system is not considered to be a tie line boundary.
In EIA Form 860, information is collected both about a plant’s balancing authority, and about who owns the transmission lines that a plant is connected to. The [instructions](https://www.eia.gov/survey/form/eia_860/instructions.pdf) for Form 860 note that “A balancing authority manages supply, demand, and interchanges within an electrically defined area. It may or may not be the same as the Owner of Transmission/Distribution Facilities.”
It would thus seem that the “balancing authority” reported in Form 860 would represent the plant’s “commercial” balancing authority, while the “transmission owner” would represent the plant’s “physical” balancing authority.
Thus, in the OGE dataset, all references to `ba_code` represent commercial balancing authorities. However, we also assign each plant a `ba_code_physical` based on the reported “Transmission or Distribution System Owner” in Form 860, Schedule 2. However, `ba_code_physical` is not yet used in the data pipeline until we can better understand how it relates to the EIA-930 data.
Loading
Loading