OpenAI gym tutorial

3 minute read

Deep RL and Controls OpenAI Gym Recitation

Domain Example OpenAI

VirtualEnv Installation

It is recommended that you install the gym and any dependencies in a virtualenv
The following steps will create a virtualenv with the gym installed virtualenv openai-gym-demo

source openai-gym-demo/bin/activate
pip install -U gym[all]
python -c 'import gym; gym.make("FrozenLake-v0")'

Basic RL Step

Basic Agent Loop

Basic Agent Loop
import gym
env = gym.make("Taxi-v2")
observation = env.reset()
for _ in range(1000):
env.render()
# your agent here (this takes random actions)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
env.render()
break

Create Instance

Each gym environment has a unique name of the form ([A-Za-z0-9]+-)v([0-9]+)
To create an environment from the name use the env = gym.make(env_name)
For example, to create a Taxi environment: env = gym.make(‘Taxi-v2’)

Reset Function

init_state = env.reset()

Step Function

step(action) -> (next_state,reward,is_terminal,debug_info)

Render Function

Optional method
Used to display the state of your environment
Useful for debugging and qualitatively comparing different agent policies

Datatypes

Reward: float
Terminal: bool
Action: Depends on environment
State: Depends on environment

Environment Space Attributes

Most environments have two special attributes: action_space observation_space
These contain instances of gym.spaces classes
Makes it easy to find out what are valid states and actions I
There is a convenient sample method to generate uniform random samples in the space.

gym.spaces

Action spaces and State spaces are defined by instances of classes of the gym.spaces modules

Included types are:

  gym.spaces.Discrete

  gym.spaces.MultiDiscrete

  gym.spaces.Box

  gym.spaces.Tuple

All instances have a sample method which will sample random instances within the space

gym.spaces.Discrete

The homework environments will use this type of space Specifies a space containing n discrete points
Each point is mapped to an integer from [0 ,n−1]
Discrete(10) A space containing 10 items mapped to integers in [0,9] sample will return integers such as 0, 3, and 9.

gym.spaces.MultiDiscrete

You will use this to implement an environment in the homework
Species a space containing k dimensions each with a separate number of discrete points.
Each point in the space is represented by a vector of integers of length k
MultiDiscrete([(1, 3), (0, 5)]) A space with k= 2 dimensions First dimension has 4 points mapped to integers in [1,3] Second dimension has 6 points mapped to integers in [0,5] sample will return a vector such as [2,5] and [1,3]

gym.spaces.Box

Used for multidimensional continuous spaces with bounds
You will see environments with these types of state and action spaces in future homeworks
Box(np.array((-1.0, -2.0)), np.array((1.0, 2.0))) A 2D continous state spaceI First dimension has values in range [−1.0,1.0) Second dimension has values in range [−2.0,2.0) sample will return a vector such as [−.55,2.] and [.768,−1.55]

Creating an Environment

gym.Env Class

All environments should inherit from gym.Env
At a minimum you must override a handful of methods:
- _step
- _reset
At a minimum you must provide the following attributes action_space observation_space

Subclass Methods

_step is the same api as the step function used in the example
_reset is the same api as the reset function in the example
You may also provide the following methods for additionalfunctionality:

_render
_close
_configure
_seed

Attributes

observation_space represents the state space
action_space represents the action space
Both are instances of gym.spaces classes
You can also provide a reward_range , but this defaults to (−∞,∞)

Registration

How do you get your environment to work with gym.make()? You must register it

Registration Example

from gym.envs.registration import register

register(
  id='Deterministic-4x4-FrozenLake-v0',
  entry_point='gym.envs.toy_text.frozen_lake:FrozenLakeEnv',
  kwargs={'map_name': '4x4',
  'is_slippery': False})

id: the environment name used with gym.make
entry_point: module path and class name of environment
kwargs: dictionary of keyword arguments to environment constructor

Discrete Environment Class

A subclass of the gym.Env which provides the
followingattributes
- nS: number of states
- nA: number of actions
- P: model of environment
- isd: initial state distribution

Model

P is a dictionary of dictionary of lists P[s][a] == [(prob, next_state, reward, terminal), …]
isd is a list or array of length nS isd == [0., 0., 1., 0.]

Example

FrozenLake-v0 Example
demos/frozen_lake_demo.py

Monitoring and Scoring

OpenAI Gym Scoreboard

The gym also includes an online scoreboard
Gym provides an API to automatically record: learning curves of cumulative reward vs episode number Videos of the agent executing its policy
You can see other people’s solutions and compete for the best scoreboard

Monitor Wrapper

import gym
from gym import wrappers
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, '/tmp/cartpole-experiment-1')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
env.close()
gym.upload('/tmp/cartpole-experiment-1', api_key='blah')

Scoreboard Demo

demos/monitor_demo.py

Share on

Twitter Facebook Google+ LinkedIn

Mr Ko

OpenAI gym tutorial

Domain Example OpenAI

VirtualEnv Installation

Basic RL Step

Basic Agent Loop

Create Instance

Reset Function

Step Function

Render Function

Datatypes

Environment Space Attributes

gym.spaces

gym.spaces.Discrete

gym.spaces.MultiDiscrete

gym.spaces.Box

Creating an Environment

gym.Env Class

Subclass Methods

Attributes

Registration

Registration Example

Discrete Environment Class

Model

Example

Monitoring and Scoring

OpenAI Gym Scoreboard

Monitor Wrapper

Scoreboard Demo

Share on

You May Also Enjoy

Reinforcement Learning for Market

Elf three rings

Keras Image Generator

Nvidia Self Driving Car Model