-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate fleet data by subplant, not plant #395
Conversation
@@ -287,85 +287,59 @@ def output_data_quality_metrics( | |||
) | |||
|
|||
|
|||
def output_plant_data( | |||
df: pd.DataFrame, | |||
def write_plant_data_to_results( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to specifically output monthly and annual plant level data, or output monthly or annual subplant-level data
@@ -398,10 +372,10 @@ def convert_results(df: pd.DataFrame) -> pd.DataFrame: | |||
return converted | |||
|
|||
|
|||
def write_generated_averages( | |||
def write_national_fleet_averages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to make clearer and write "US" file to results rather than outputs
@@ -372,6 +330,36 @@ | |||
}, | |||
} | |||
|
|||
DATA_COLUMNS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from data_cleaning
@@ -143,7 +143,11 @@ def load_cems_ids() -> pd.DataFrame: | |||
filters=[["year", "==", year]], | |||
columns=["plant_id_epa", "plant_id_eia", "emissions_unit_id_epa"], | |||
).drop_duplicates() | |||
cems_id_year = apply_dtypes(cems_id_year) | |||
# update the plant_id_eia column using manual matches | |||
cems_id_year = update_epa_to_eia_map(cems_id_year, year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved up to apply dytpes and EIA mappings on a year-by-year basis, rather than at the end.
notebooks/manual_data/zip_data.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a utility function to remove certain files locally that I never use (like all of the metric unit files) to save space on my computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Purpose
This PR is in response to #392, which explains:
This PR does two main things:
results/plant_data/
outputsCloses CAR-4709, CAR-4708
What the code is doing
Specific changes in this PR include:
results/power_sector_data
Ensuring complete subplant_primary_fuel coverage
Now that this PR switches to basing all aggregations on the subplant level, when testing the PR I ran into an issue where a UserWarning was being raised because there were not primary fuel category mappings available for all subplants (this was not previously a problem when we were using plant-level data).
Digging into this issue, I found that there were three types of issues preventing a complete mapping:
epa_eia_crosswalk_manual.csv
file, but since these mappings are time-dependent, I also had to add new columns to indicate the start and end year for the mapping to be valid (if no year is specified, the mapping is assumed to be valid for all time). This also involved some tweaks to the code where we convert the CEMS IDs to EIA IDs. Because we are only ever loading a single year of CEMS data at a time, we can filter the mappings to only include those which are valid for that year.This also revealed a bug with the gross-to-net generation conversion, where retired generators that reported 0 generation to CEMS were getting net generation applied incorrectly. I fixed this by dropping these subplants from the GTN conversion calculation so that they should backstop to using the default backstop ratio, which should still result in zero net generation.
Impacts
These changes would likely have wide ranging impacts on the results:
Testing
Have not yet tested:
Where to look
Suggested order to review files:
Usage Example/Visuals
N/A
Review estimate
30-45 minutes (happy to walk through any of this)
Future work
As I was working on this, I noticed several opportunities to re-organize which modules certain functions live in, or opportunities to create new modules to shorten some of our modules. I didn't implement this because it would make it harder to review line-item changes to existing functions in the future, we could:
Checklist
black