Evaluation#

Individual Buildings#

In order to evaluate the performance of a policy \(\pi\) on a Bauwerk building \(b\), we consider the expected return when using policy \(\pi\) to operate building \(b\),

\[\mathbb{E}_{\pi}[\sum_{t=0}^{T}\gamma_b^t R_b(s_t, a_t)],\]

where \(R_b\) is the reward function and \(\gamma_b\) is the discount factor of building \(b\)’s partially observable Markov decision process (POMDP), and \(s_t\), \(a_t\) are random variables of states and actions visited under policy \(\pi\). This value is basically the expected cost of using policy \(\pi\) as a controller in building \(b\).

Below we compute this expected return for a random policy.

[2]:

import gym
import bauwerk

NUM_SAMPLES = 10

def estimate_exp_rew(num_samples, len_episode ):
    env = gym.make("bauwerk/House-v0")
    env.reset()
    cum_rewards = []

    for i in range(num_samples):
        cum_rewards.append(0)
        while range(10**6):
            obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
            cum_rewards[i] += reward
            if terminated or truncated:
                env.reset()
                break

    overall_reward = sum(cum_rewards)/num_samples
    return overall_reward

overall_reward = estimate_exp_rew(num_samples=NUM_SAMPLES)
print(f"Expected reward with random policy (estimated using {NUM_SAMPLES} samples): {overall_reward}")

Expected reward with random policy (estimated using 10 samples): -8549.335485292095

Next we quickly look at how the number of samples affects the estimate.

[10]:

import matplotlib.pyplot as plt
import numpy as np

nums_samples = np.arange(1,21)
estimates = []



for num_samples in nums_samples:
    estimates.append(estimate_exp_rew(num_samples=num_samples))

[11]:

plt.plot(nums_samples, estimates)

[11]:

[<matplotlib.lines.Line2D at 0xffff627e9820>]

../../_images/early-access_advanced_evaluation_4_1.png

Instead of using

[2]:

import gym
import bauwerk

/opt/conda/lib/python3.9/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"

[13]:

# Timing env simulation
%timeit

def estimate_exp_rew(num_steps):
    env = gym.make("bauwerk/House-v0")
    env.reset()
    for i in range(num_steps):
        obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
        if terminated or truncated:
            env.reset()
            print(f"env resetted {i}")

estimate_exp_rew(num_steps=24*365)

env resetted 8758
env resetted 17517
env resetted 26276
env resetted 35035
env resetted 43794
env resetted 52553
env resetted 61312
env resetted 70071
env resetted 78830
env resetted 87589

Evaluation

Contents

Evaluation#

Individual Buildings#