Wrappers#

Bauwerk provides a number of wrappers for its environments based on the OpenAI Gym wrappers. To use one of the wrappers, simply apply it to a Bauwerk environment. For example:

import bauwerk
import bauwerk.envs.wrappers
import gym

env = gym.make("bauwerk/House-v0")
wrapped_env = bauwerk.envs.wrappers.InfeasControlPenalty(env)

List of available wrappers#

Wrappers for Bauwerk environments.

class bauwerk.envs.wrappers.ClipActions(env, low, high)[source]#

Clip actions that can be taken in environment.

Parameters
  • env (gym.Env) – gym to clip actions for.

  • low (Any) – lower bound of clipped action space (passed to gym.spaces.Box). This must fit the shape of the env’s action space.

  • high (Any) – upper bound of clipped action space (passed to gym.spaces.Box).

class bauwerk.envs.wrappers.ClipReward(env, min_reward, max_reward)[source]#

Clip reward of environment.

Adapted from https://www.gymlibrary.dev/api/wrappers/#rewardwrapper. Note that in Bauwerk environments clipping the reward may lead to alternative optimal policies. Thus, use with care.

Parameters
  • env (gym.Env) – environment to apply wrapper to.

  • min_reward (float) – minimum reward value.

  • max_reward (float) – maximum reward value.

class bauwerk.envs.wrappers.InfeasControlPenalty(env, penalty_factor=1.0)[source]#

Add penalty to reward when agents tries infeasible control actions.

The penalty is computed based on the absolute difference between the (dis)charging power that the agent last tried to apply to the battery, and the power that was actually discharged after accounting for the physics of the system.

Parameters
  • env (bauwerk.HouseEnv) – environment to wrap.

  • penalty_factor (float, optional) – multiplicative factor that is applied to the power difference. Similar to a price on infeasible control. The scale should be adapted to the pricing scheme in your control problem, as this factor effectively determines the “price” of infeasible control. Defaults to 1.0.

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class bauwerk.envs.wrappers.NormalizeObs(env)[source]#

Normalise Bauwerk environment’s observations.

Parameters

env (bauwerk.HouseEnv) – environment to wrap.

class bauwerk.envs.wrappers.TaskParamObs(env, task_param_names, task_param_low, task_param_high, normalize=False)[source]#

Wrapper that adds task parameters to observation space.

Parameters
  • env (bauwerk.HouseEnv) – environment to wrap.

  • task_param_names (list) – list of names of task parameters. Each name should be a attribute of the environment’s config.

  • task_param_low (np.array) – lower bound of task parameters.

  • task_param_high (np.array) – upper bound of the task parameters.

  • normalize (bool, optional) – whether to normalise the task parameters. Defaults to False.

reset(*args, **kwargs)[source]#

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)