Reinforcement Learning with ROS and Gazebo

9 minute read

Reinforcement Learning with ROS and Gazebo

Content based on Erle Robotics's whitepaper: Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo.

  • The work presented here follows the same baseline structure displayed by researchers in the OpenAI Gym, and builds a gazebo environment
  • OpenAI gym focuses on the episodic setting of RL, aiming to maximize the expectation of total reward each episode
  • This toolkit aims to integrate the gym API with robotic hardware, validating reinforcement learning algorithm in real environments
  • Real world operation is achieved combining Gazebo simulator with the Robot Operating System, a set of libraries and tools that help software developers create robot applications
  • The main problem with RL in robotics is the high cost per trial which is not only the economical cost but also the long time needed to perform learning operation
  • Another issue is that learning with a real robot in a real environment can be dangerous

Architecture

Simplified software architecture used in OpenAI Gym for robotics.

  • The architecture consists of three main software blocks: OpenAI Gym,ROS and Gazebo
  • Gazebo provides a robust physics engine, high-quality graphics, and convenient programmatic and graphical interfaces
  • The architecture described was tested with three different robots:

Getting your robot into the gym

  • Let’s go ahead and code of a simple example with this OpenAI Gym extension for robotics(that we call the robot gym)
  • We’ll take the turtlebot and use RL(Q-learning) to teach the robot how to avoid obstacles using only a simulated LIDAR

Getting everything ready for the robot gym to work will need you to set it up appropriately. Refer to these instructions and do it yourself.

If you're looking for full and complete code example, refer to circuit2_turtlebot_lidar_qlearn.py.

An OpenAI gym extension for using Gazebo known as gym-gazebo

  • This work presents an extension of the initial OpenAI gym for robotics using ROS and Gazebo. A whitepaper about this work is available at Extending the OpenAI Gym for robotics. Please use the following BibTex entry to cite our work:

Community-maintained environments

The following are some of the gazebo environments maintained by the community using gym-gazebo. If you’d like to contribute and maintain an additional environment, submit a Pull Request with the corresponding addition.

Name Middleware Description Observation Space Action Space Reward range
GazeboCircuit2TurtlebotLidar-v0GazeboCircuit2TurtlebotLidar-v0 ROS A simple circuit with straight tracks and 90 degree turns. Highly discretized LIDAR readings are used to train the Turtlebot. Scripts implementing Q-learning and Sarsa can be found in the examples folder.      
GazeboCircuitTurtlebotLidar-v0GazeboCircuitTurtlebotLidar-v0.png ROS A more complex maze with high contrast colors between the floor and the walls. Lidar is used as an input to train the robot for its navigation in the environment.     TBD
GazeboMazeErleRoverLidar-v0 ROS, APM Deprecated      
GazeboErleCopterHover-v0 ROS, APM Deprecated      

Installation

Refer to Install manual

gym-gazebo

turtlebot_setup.bash in /gym-gazebo-master/gym_gazebo/envs/installation

  • Above shell scripts are definition of gazebo model in ../assets/models
#!/bin/bash

if [ -z "$GAZEBO_MODEL_PATH" ]; then
  bash -c 'echo "export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:"`pwd`/../assets/models >> ~/.bashrc'
else
  bash -c 'sed "s,GAZEBO_MODEL_PATH=[^;]*,'GAZEBO_MODEL_PATH=`pwd`/../assets/models'," -i ~/.bashrc'
fi

  • catkin_ws must have the turtlebot_gazebo and turtlebot_simulator

#Load turtlebot variables. Temporal solution
chmod +x catkin_ws/src/turtlebot_simulator/turtlebot_gazebo/env-hooks/25.turtlebot-gazebo.sh.em
bash catkin_ws/src/turtlebot_simulator/turtlebot_gazebo/env-hooks/25.turtlebot-gazebo.sh.em
  • load gazebo world and maze files
#add turtlebot launch environment variable
if [ -z "$GYM_GAZEBO_WORLD_MAZE" ]; then
  bash -c 'echo "export GYM_GAZEBO_WORLD_MAZE="`pwd`/../assets/worlds/maze.world >> ~/.bashrc'
else
  bash -c 'sed "s,GYM_GAZEBO_WORLD_MAZE=[^;]*,'GYM_GAZEBO_WORLD_MAZE=`pwd`/../assets/worlds/maze.world'," -i ~/.bashrc'
fi
if [ -z "$GYM_GAZEBO_WORLD_CIRCUIT" ]; then
  bash -c 'echo "export GYM_GAZEBO_WORLD_CIRCUIT="`pwd`/../assets/worlds/circuit.world >> ~/.bashrc'
else
  bash -c 'sed "s,GYM_GAZEBO_WORLD_CIRCUIT=[^;]*,'GYM_GAZEBO_WORLD_CIRCUIT=`pwd`/../assets/worlds/circuit.world'," -i ~/.bashrc'
fi
if [ -z "$GYM_GAZEBO_WORLD_CIRCUIT2" ]; then
  bash -c 'echo "export GYM_GAZEBO_WORLD_CIRCUIT2="`pwd`/../assets/worlds/circuit2.world >> ~/.bashrc'
else
  bash -c 'sed "s,GYM_GAZEBO_WORLD_CIRCUIT2=[^;]*,'GYM_GAZEBO_WORLD_CIRCUIT2=`pwd`/../assets/worlds/circuit2.world'," -i ~/.bashrc'
fi
if [ -z "$GYM_GAZEBO_WORLD_CIRCUIT2C" ]; then
  bash -c 'echo "export GYM_GAZEBO_WORLD_CIRCUIT2C="`pwd`/../assets/worlds/circuit2c.world >> ~/.bashrc'
else
  bash -c 'sed "s,GYM_GAZEBO_WORLD_CIRCUIT2C=[^;]*,'GYM_GAZEBO_WORLD_CIRCUIT2C=`pwd`/../assets/worlds/circuit2c.world'," -i ~/.bashrc'
fi
if [ -z "$GYM_GAZEBO_WORLD_ROUND" ]; then
  bash -c 'echo "export GYM_GAZEBO_WORLD_ROUND="`pwd`/../assets/worlds/round.world >> ~/.bashrc'
else
  bash -c 'sed "s,GYM_GAZEBO_WORLD_ROUND=[^;]*,'GYM_GAZEBO_WORLD_ROUND=`pwd`/../assets/worlds/round.world'," -i ~/.bashrc'
fi

  • copy kobuki urdf files into the catkin_ws/src/kobuki and mesh files

#copy altered urdf model
cp -r ../assets/urdf/kobuki_urdf/urdf/ catkin_ws/src/kobuki/kobuki_description

#copy laser mesh file
cp ../assets/meshes/lidar_lite_v2_withRay.dae catkin_ws/src/kobuki/kobuki_description/meshes
 
exec bash # reload bash

setup_kinetic.bash in /gym-gazebo-master/gym_gazebo/envs/installation

  • check ROS install
#!/bin/bash

if [ -z "$ROS_DISTRO" ]; then
  echo "ROS not installed. Check the installation steps: https://github.com/erlerobot/gym#installing-the-gazebo-environment"
fi
  • check gazebo install
program="gazebo"
condition=$(which $program 2>/dev/null | grep -v "not found" | wc -l)
if [ $condition -eq 0 ] ; then
    echo "Gazebo is not installed. Check the installation steps: https://github.com/erlerobot/gym#installing-the-gazebo-environment"
fi
  • start ros kinetic shell
source /opt/ros/kinetic/setup.bash
  • create catkin_ws work-space
# Create catkin_ws
ws="catkin_ws"
if [ -d $ws ]; then
  echo "Error: catkin_ws directory already exists" 1>&2
fi
src=$ws"/src"
mkdir -p $src
cd $src
catkin_init_workspace
  • ??? ROS kinetic dependencies none ?? in bash
sudo pip3 install rospkg catkin_pkg
sudo apt-get install python3-pyqt4

sudo apt-get install \
            cmake gcc g++ qt4-qmake libqt4-dev \
            libusb-dev libftdi-dev \
            python3-defusedxml python3-vcstool \
            ros-kinetic-octomap-msgs        \
            ros-kinetic-joy                 \
            ros-kinetic-geodesy             \
            ros-kinetic-octomap-ros         \
            ros-kinetic-control-toolbox     \
            ros-kinetic-pluginlib          \
            ros-kinetic-trajectory-msgs     \
            ros-kinetic-control-msgs           \
            ros-kinetic-std-srvs           \
            ros-kinetic-nodelet        \
            ros-kinetic-urdf               \
            ros-kinetic-rviz               \
            ros-kinetic-kdl-conversions     \
            ros-kinetic-eigen-conversions   \
            ros-kinetic-tf2-sensor-msgs     \
            ros-kinetic-pcl-ros \
            ros-kinetic-navigation
  • ?? Install Sophus ?? None in bash
#Install Sophus
cd
git clone https://github.com/stonier/sophus -b release/0.9.1-kinetic
cd sophus
mkdir build
cd build
cmake ..
make
sudo make install
echo "## Sophus installed ##\n"
  • ?? Install APM ?? None in bash
    • check ros kinectic version for APM

#Install APM/Ardupilot
cd ../../
mkdir apm
cd apm
git clone https://github.com/erlerobot/ardupilot.git -b gazebo_udp
git clone https://github.com/tridge/jsbsim.git
cd jsbsim
./autogen.sh --enable-libraries
make -j2
sudo make install
echo "## AMP/Ardupilot installed ##"
  • ?? Install theano and keras for python3
sudo pip3 install h5py
sudo apt-get install python3-skimage

# install Theano
cd ~/
git clone git://github.com/Theano/Theano.git
cd Theano/
sudo python3 setup.py develop

#install Keras
sudo pip3 install keras
  • import and build dependencies and catkin_make -> run modules bash
# Import and build dependencies
cd ../../catkin_ws/src/
vcs import < ../../gazebo.repos

cd ..
catkin_make --pkg mav_msgs
source devel/setup.bash
catkin_make -j 1
bash -c 'echo source `pwd`/devel/setup.bash >> ~/.bashrc'
echo "## ROS workspace compiled ##"

# add own models path to gazebo models path
if [ -z "$GAZEBO_MODEL_PATH" ]; then
  bash -c 'echo "export GAZEBO_MODEL_PATH="`pwd`/../../assets/models >> ~/.bashrc'
  exec bash #reload bashrc
fi

circuit2_turtlebot_lidar_qlearn.py

  • GazeboCircuit2TurtlebotLidar-v0 environment

if __name__ == '__main__':

    env = gym.make('GazeboCircuit2TurtlebotLidar-v0')

OpenAI Tutorial:create environment

from gym.envs.registration import register
...
register(
    id='GazeboCircuit2TurtlebotLidar-v0',
    entry_point='gym_gazebo.envs.turtlebot:GazeboCircuit2TurtlebotLidarEnv',
    # More arguments here
)
...
  • gazebo_env.py has the definition of a basic gym.Env and gazebo_circuit2_turtlebot_liday.py in turtlebot folder has an inheritance of gazebo_env.GazeboEnv class

  • gazebo_env.py
import gym
import rospy
#import roslaunch
...
from std_srvs.srv import Empty
import random

class GazeboEnv(gym.Env):
    ...
  • gazebo_circuit2_turtlebot_liday.py
import gym
import rospy
import roslaunch

from gym import utils, spaces
from gym_gazebo.envs import gazebo_env
from geometry_msgs.msg import Twist
from std_srvs.srv import Empty

from sensor_msgs.msg import LaserScan
from gym.utils import seeding

class GazeboCircuit2TurtlebotLidarEnv(gazebo_env.GazeboEnv):
    ...

  • Q-Learning
...
"create Qlearn class with action_space.n"
qlearn = qlearn.QLearn(actions=range(env.action_space.n),
                    alpha=0.2, gamma=0.8, epsilon=0.9)

initial_epsilon = qlearn.epsilon

epsilon_discount = 0.9986
...
for x in range(total_episodes):
    ...
    observation = env.reset()
    state = ''.join(map(str, observation))

    "decaying eplison value"
    if qlearn.epsilon > 0.05:
        qlearn.epsilon *= epsilon_discount

    for i in range(1500):
        "send action to Qlearn class\
         send selected action to env.step\
         and get observation,reward"
        # Pick an action based on the current state
        action = qlearn.chooseAction(state)    
        # Execute the action and get feedback
        observation, reward, done, info = env.step(action)
        cumulated_reward += reward

        nextState = ''.join(map(str, observation))
        "training and learning Q-learning formula"
        "\
        Q-learning:\
            Q(s, a) += alpha * (reward(s,a) + max(Q(s') - Q(s,a))\
        "
        qlearn.learn(state, action, reward, nextState)

gazebo_env.py

Writing the Publisher Node in ROS in Python

subprocess.Popen("swfdump /tmp/filename.swf -d")
subprocess.Popen("swfdump %s -d" % (filename))  # NOTE: filename is a variable

from subprocess import Popen, PIPE
process = Popen(['swfdump', '/tmp/filename.swf', '-d'], stdout=PIPE, stderr=PIPE)
stdout, stderr = process.communicate()

import gym
import rospy
import subprocess

class GazeboEnv(gym.Env):
    """Superclass for all Gazebo environments.
    """
    metadata = {'render.modes': ['human']}

    def __init__(self, launchfile):
        "set ROS port and gazebo simulator port"
        random_number = random.randint(10000, 15000)
        self.port = "11311"#str(random_number) #os.environ["ROS_PORT_SIM"]
        self.port_gazebo = str(random_number+1) #os.environ["ROS_PORT_SIM"]
        # os.environ["ROS_MASTER_URI"] = "http://localhost:"+self.port
        # os.environ["GAZEBO_MASTER_URI"] = "http://localhost:"+self.port_gazebo
        #
        # self.ros_master_uri = os.environ["ROS_MASTER_URI"];

        "start roscore:start master" 
        subprocess.Popen(["roscore", "-p", self.port])
        time.sleep(1)
        print ("Roscore launched!")

        # Launch the simulation with the given launchfile name
        "\
        The next line, rospy.init_node(NAME, ...), is very important as it tells\
        rospy the name of your node -- until rospy has this information, it \
        cannot start communicating with the ROS Master. In this case, your node\
        will take on the name talker. NOTE: the name must be a base name, i.e. it\
        cannot contain any slashes "/".\
        "
        rospy.init_node('gym', anonymous=True)

        if launchfile.startswith("/"):
            fullpath = launchfile
        else:
            fullpath = os.path.join(os.path.dirname(__file__), "assets","launch", launchfile)
        if not path.exists(fullpath):
            raise IOError("File "+fullpath+" does not exist")

        "example-> roslaunch oroca_ros_tutorials union.launch"
        "roslaunch package_name launch_file_name"
        subprocess.Popen(["roslaunch","-p", self.port, fullpath])
        print ("Gazebo launched!")

        self.gzclient_pid = 0

    def _close(self):

        # Kill gzclient, gzserver and roscore
        tmp = os.popen("ps -Af").read()
        gzclient_count = tmp.count('gzclient')
        gzserver_count = tmp.count('gzserver')
        roscore_count = tmp.count('roscore')
        rosmaster_count = tmp.count('rosmaster')

        if gzclient_count > 0:
            os.system("killall -9 gzclient")
        if gzserver_count > 0:
            os.system("killall -9 gzserver")
        if rosmaster_count > 0:
            os.system("killall -9 rosmaster")
        if roscore_count > 0:
            os.system("killall -9 roscore")

        if (gzclient_count or gzserver_count or roscore_count or rosmaster_count >0):
            os.wait()
    

gazebo_circuit2_turtlebot_liday.py

import gym
import rospy
import roslaunch
import time
import numpy as np

from gym import utils, spaces
from gym_gazebo.envs import gazebo_env
from geometry_msgs.msg import Twist
from std_srvs.srv import Empty

from sensor_msgs.msg import LaserScan
from gym.utils import seeding

class GazeboCircuit2TurtlebotLidarEnv(gazebo_env.GazeboEnv):

    def __init__(self):
        " start ROS nodes which includs mobile_base,gazebo"
        # Launch the simulation with the given launchfile name
        gazebo_env.GazeboEnv.__init__(self, "GazeboCircuit2TurtlebotLidar_v0.launch")
        self.vel_pub = rospy.Publisher('/mobile_base/commands/velocity', Twist, queue_size=5)
        self.unpause = rospy.ServiceProxy('/gazebo/unpause_physics', Empty)
        self.pause = rospy.ServiceProxy('/gazebo/pause_physics', Empty)
        self.reset_proxy = rospy.ServiceProxy('/gazebo/reset_simulation', Empty)

        "define action space size and reward size"
        self.action_space = spaces.Discrete(3) #F,L,R
        self.reward_range = (-np.inf, np.inf)

        self._seed()

    "override _step function"    
    def _step(self, action):

        "unpause gazebo"
        rospy.wait_for_service('/gazebo/unpause_physics')
        try:
            self.unpause()
        except (rospy.ServiceException) as e:
            print ("/gazebo/unpause_physics service call failed")

        "ROS node message \
         fill values in geometry_msgs.msg.Twist\
         and send(publish) message using /mobile_base/commands/velocity"    
        if action == 0: #FORWARD
            vel_cmd = Twist()
            vel_cmd.linear.x = 0.3
            vel_cmd.angular.z = 0.0
            self.vel_pub.publish(vel_cmd)
        elif action == 1: #LEFT
            vel_cmd = Twist()
            vel_cmd.linear.x = 0.05
            vel_cmd.angular.z = 0.3
            self.vel_pub.publish(vel_cmd)
        elif action == 2: #RIGHT
            vel_cmd = Twist()
            vel_cmd.linear.x = 0.05
            vel_cmd.angular.z = -0.3
            self.vel_pub.publish(vel_cmd)

        "gazebo sends scan data"    
        data = None
        while data is None:
            try:
                data = rospy.wait_for_message('/scan', LaserScan, timeout=5)
            except:
                pass

        "gazebo paueses after receiving scan data"
        rospy.wait_for_service('/gazebo/pause_physics')
        try:
            #resp_pause = pause.call()
            self.pause()
        except (rospy.ServiceException) as e:
            print ("/gazebo/pause_physics service call failed")

        "calculate next state and done status"    
        state,done = self.discretize_observation(data,5)

        "fill reward"
        if not done:
            if action == 0:
                reward = 5
            else:
                reward = 1
        else:
            reward = -200

        return state, reward, done, {}
    
    def discretize_observation(self,data,new_ranges):
        discretized_ranges = []
        min_range = 0.2
        done = False
        mod = len(data.ranges)/new_ranges
        for i, item in enumerate(data.ranges):
            if (i%mod==0):
                if data.ranges[i] == float ('Inf') or np.isinf(data.ranges[i]):
                    discretized_ranges.append(6)
                elif np.isnan(data.ranges[i]):
                    discretized_ranges.append(0)
                else:
                    discretized_ranges.append(int(data.ranges[i]))
            if (min_range > data.ranges[i] > 0):
                done = True
        return discretized_ranges,done

Reference sites

Reinforcement Learning with ROS and Gazebo

A toolkit for developing and comparing reinforcement learning algorithms using ROS and Gazebo.

A toolkit for developing and comparing reinforcement learning algorithms

Deep RL and Controls OpenAI Gym Recitation