Wrappers
Contents
Wrappers#
Bauwerk provides a number of wrappers for its environments based on the OpenAI Gym wrappers. To use one of the wrappers, simply apply it to a Bauwerk environment. For example:
import bauwerk
import bauwerk.envs.wrappers
import gym
env = gym.make("bauwerk/House-v0")
wrapped_env = bauwerk.envs.wrappers.InfeasControlPenalty(env)
List of available wrappers#
Wrappers for Bauwerk environments.
- class bauwerk.envs.wrappers.ClipActions(env, low, high)[source]#
Clip actions that can be taken in environment.
- Parameters
env (gym.Env) – gym to clip actions for.
low (Any) – lower bound of clipped action space (passed to gym.spaces.Box). This must fit the shape of the env’s action space.
high (Any) – upper bound of clipped action space (passed to gym.spaces.Box).
- class bauwerk.envs.wrappers.ClipReward(env, min_reward, max_reward)[source]#
Clip reward of environment.
Adapted from https://www.gymlibrary.dev/api/wrappers/#rewardwrapper. Note that in Bauwerk environments clipping the reward may lead to alternative optimal policies. Thus, use with care.
- Parameters
env (gym.Env) – environment to apply wrapper to.
min_reward (float) – minimum reward value.
max_reward (float) – maximum reward value.
- class bauwerk.envs.wrappers.InfeasControlPenalty(env, penalty_factor=1.0)[source]#
Add penalty to reward when agents tries infeasible control actions.
The penalty is computed based on the absolute difference between the (dis)charging power that the agent last tried to apply to the battery, and the power that was actually discharged after accounting for the physics of the system.
- Parameters
env (bauwerk.HouseEnv) – environment to wrap.
penalty_factor (float, optional) – multiplicative factor that is applied to the power difference. Similar to a price on infeasible control. The scale should be adapted to the pricing scheme in your control problem, as this factor effectively determines the “price” of infeasible control. Defaults to 1.0.
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class bauwerk.envs.wrappers.NormalizeObs(env)[source]#
Normalise Bauwerk environment’s observations.
- Parameters
env (bauwerk.HouseEnv) – environment to wrap.
- class bauwerk.envs.wrappers.TaskParamObs(env, task_param_names, task_param_low, task_param_high, normalize=False)[source]#
Wrapper that adds task parameters to observation space.
- Parameters
env (bauwerk.HouseEnv) – environment to wrap.
task_param_names (list) – list of names of task parameters. Each name should be a attribute of the environment’s config.
task_param_low (np.array) – lower bound of task parameters.
task_param_high (np.array) – upper bound of the task parameters.
normalize (bool, optional) – whether to normalise the task parameters. Defaults to False.
- reset(*args, **kwargs)[source]#
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)