Power scheduling is an NP-hard optimization problem with high dimensionality, combinatorial nature, non-convex, non-smooth, and discontinuous properties together with multi-period multiple constraints.
- Two sequential tasks:
- Unit Commitment
- Load Dispatch:
- Economic Load Dispatch
- Environmental Load Dispatch
Power Scheduling aims to determine an optimal load dispatch schedule for simultaneously minimizing different conflicting objectives, particularly economic costs and environmental emissions.
pymops
is an open-source Python package developed for solving single- to tri-objective optimization in power scheduling problems. The package is built on a novel multi-agent simulation environment, where power-generating units are represented as agents. The agents are heterogeneous, each with multiple conflicting objectives. The scheduling dynamics are simulated using Markov Decision Processes (MDPs), which are used to train a deep reinforcement learning model for solving the optimization problem.
The general multi-objective function is formulated by combining different conflicting objectives via a hybrid approach that uses both weighting hyperparameters and unit-specific cost-to-emission conversion factors:
where
Economic Cost Functions:
Environmental Emission Functions:
Constraints | Specification |
---|---|
Minimum and maximum power capacities: | |
Maximum ramp-down and ramp-up rates: |
|
Mininmum operating (online/offline) durations: |
|
Power supply and demand balance: | |
Minimum available reserve: |
The framework MARL manifests the form of state
-
Planning Horizon: The scheduling horizon is an hourly divided day.
-
Timestep/Period: Each hour of a day is considered a timestep.
-
Episode: One cycle of determination of unit commitments and load dispatches for a day.
-
-
Simulation Enviroment: Custom MARL simulation environment, structurally similar to OpenAI Gym.
-
Mono-objective to tri-objective scheduling problem (cost, CO2 and SO2).
-
Ramp rate constraints and valve point effects are taken into account.
-
-
Agents: The generating units are represented as multiple agents.
- The agents are heterogenous (different generating-unit-specific characteristics).
- Each agent has multiple conflicting objectives.
- The agents are cooperative type of RL agents:
-
Agents collaborate the satisfy the demand at each period/timestep.
-
Agents also strive to minimize the multi-objective function in the entire planning horizon.
-
-
State Space: Consists of timestep, minimum and maximum capacities, operating (online/offline) durations, demand to be satisfied.
-
Action Space: The commitment statuses (ON/OFF) of all agents.
-
Transition Function: The probability of making transition from current state to the next state (no specific formula).
-
The decisions of agents violating any constraint is automatically corrected by the environment.
-
The environment makes also adjustments for both excess and shortages of power supplies.
-
-
Reward function: Agents get a common reward which is the inverse of the average of the normalized value of all objectives.
The MOPS dynamics can be simulated as a 4-tuple
-
The MDPs are input for the deep RL model.
-
The deep RL model predicts decision (action) of agents.
-
The predicted agents' action is input for the transition function in the environment
The simulation environment can be installed using pip
:
```
pip install pymops
```
Or it can be cloned from GitHub repo and installed.
```
git clone https://github.com/awolseid/pymops.git
cd pymops
pip install .
```
```python
import pymops
from pymops.environ import SimEnv
```
```
env = SimEnv(
supply_df = default_supply_df, # Units' profile dataframe
demand_df = default_demand_df, # Demands profile
SR = 0.0, # proportion of spinning reserve => [0, 1]
RR = "Yes", # Ramp rate => "yes" or (default "no" (=None))
VPE = None, # Valve point effects => "yes" or (default "no" (=None))
n_objs = None, # Objectives => "tri" for 3 or (default "bi" (=None) for bi-objective)
w = None, # Weight => [0, 1] for bi-objective, a list [0.2,0.3,0.5] for tri-objective
duplicates = None # Num of duplicates: duplicate units and adjust demands proportionally
)
```
```
initial_flat_state, initial_dict_state = env.reset()
```
```
flat_state, dict_state = env.get_current_state()
```
```
action_vec = np.array([1,1,0,1,0,0,0,0,0,0])
flat_next_state, reward, done, next_state_dict, dispatch_info = env.step(action_vec)
```
```python
from pymops.define_dqn import DQNet
from pymops.madqn import DQNAgents
from pymops.replaymemory import ReplayMemory
from pymops.schedules import get_schedules
```
```
model_0 = DQNet(env, 64)
print(model_0)
```
```
RL_agents = DQNAgents(
environ = env,
model = model_0,
epsilon_max = 1.0,
epsilon_min = 0.1,
epsilon_decay = 0.99,
lr = 0.001
)
```
```
memory = ReplayMemory(environ = env, buffer_size = 64)
```
```
training_results_df = RL_agents.train(memory = memory, batch_size = 64, num_episodes = 500)
```
```
cost, emis, CO2, SO2, schedules_df = get_schedules(environ = env, trained_agents = RL_agents)
schedules_df
```
Any questions, issues, suggestions, or collaboration opportunities can be reached at: awolseid@pukyong.ac.kr ; youngk@pknu.ac.kr.
Users should cite the following resources.
-
Code Ocean Reproducible Capsule: https://codeocean.com/capsule/0242917/tree:
- Ebrie, A.S.;, Kim, Y.J. (2023). pymops: A multi-agent reinforcement learning simulation environment for multi-objective optimization in power scheduling [Software Code]. https://doi.org/10.24433/CO.9235622.v1
-
**Article produced from the very first version of the package:
- Ebrie, A.S.; Paik, C.; Chung, Y.; Kim, Y.J. (2023). Environment-Friendly Power Scheduling Based on Deep Contextual Reinforcement Learning. Energies, 16, 5920. https://doi.org/10.3390/en16165920.