Mini World Of Bits benchmark

Mini World of Bits ("MiniWoB") is a benchmark for reinforcement learning agents who interact with websites. The agents perceive the raw pixels of a small (210x160 pixel) webpage and produce keyboard and mouse actions. The environments are written in HTML/Javascript/CSS and are designed to test the agent's capacity to interact with common web browser elements, such as buttons, text fields, slides, date pickers, etc. The environments of this benchmark are accessible through the OpenAI Universe.

MiniWoB environments

Each environment is an HTML page that is 210 pixels high, 160px wide (i.e. identical to ATARI ALE simulator dimensions). The top 50 pixels (with a yellow background) contain the task query - a description of what the agent should do in the environment. The environment logic is written in Javascript, which monitors for events and assigns a reward between 1.0 (failure) to 1.0 (success). We think of MiniWoB as an equivalent of the MNIST dataset for visual recognition, in that these environments are small, self-contained, and contain many of the challenges that an agent navigating websites on the internet must overcome.

The tasks in the benchmark contain many common UI elements and range from simple (e.g. "click the Cancel button") to complex (e.g. "search for a flight from SFO to LAX on 12/05/2016 and book the cheapest flight")):

Preview the environments here.

Benchmark

The MiniWoB benchmark contains a set of environments with a train/test split. The ultimate objective is to perform well on the test environments given not too many interaction steps. One may utilize unrestricted amount of pretraining on the train environments. We also plan to release demonstrations for the training environments, as many of them could be very difficult to perform well on with RL alone.

12/05/2016, Version 0 80 environments train/test split coming soon

Contribute environments. Since the environments are very small and easy to write in Javascript/HTML/CSS, we encourage the help of the community to contribute to future release versions of the benchmark. The complete MiniWoB source code will be released as a Github repository in the coming weeks and contributions will be possible as pull requests.

Starter Code

These environments are integrated in the OpenAI Universe.
To train RL agents, we adapt the Universe instructions to run the MiniWob environments. The following simplest starter code creates an agent that clicks at random in the 160x160px "game" area of MiniWoB at 5 FPS:

import gym
import universe # register the universe environments
import numpy as np

def forward(ob):
  """ 
  Takes raw (768,1024,3) uint8 screen and returns list of VNC events.
  The browser window indents the origin of MiniWob by 75 pixels from top and
  10 pixels from the left. The first 50 pixels along height are the query.
  """
  if ob is None: return []

  x = ob['vision']
  crop = x[75:75+210, 10:10+160, :]               # miniwob coordinates crop
  xcoord = np.random.randint(0, 160) + 10         # todo: something more clever here
  ycoord = np.random.randint(0, 160) + 75 + 50    # todo: something more clever here

  # 1. move to x,y with left button released, and click there (2. and 3.)
  action = [universe.spaces.PointerEvent(xcoord, ycoord, 0),
            universe.spaces.PointerEvent(xcoord, ycoord, 1),
            universe.spaces.PointerEvent(xcoord, ycoord, 0)]

  return action

env = gym.make('wob.mini.ClickButton-v0')
# automatically creates a local docker container
env.configure(remotes=1, fps=5,
              vnc_driver='go', 
              vnc_kwargs={'encoding': 'tight', 'compress_level': 0, 
                          'fine_quality_level': 100, 'subsample_level': 0})
observation_n = env.reset()

while True:
  action_n = [forward(ob) for ob in observation_n] # your agent here
  observation_n, reward_n, done_n, info = env.step(action_n)
  env.render()

Human Demonstrations

We plan to release a collection of human demonstrations. The recordings are a sequence of (768,1024,3) uint8 numpy arrays (which can be cropped to (210,160,3)), and the actions comprise keyboard events and mouse events. Here is a visualization of one recording:

Download human demonstrations. coming soon

News

December 12, 2016: initial release of the benchmark page.

Contact

- The dataset was developed by Andrej Karpathy at OpenAI. Email: karpathy _at_ openai.com
- Jonathan Hernandez contributed a lot of the environments.