We originally built OpenAI Gym as a tool to accelerate our own RL research. We hope it will be just as useful for the broader community.
Getting started
If you'd like to dive in right away, you can work through our tutorial. You can also help out while learning by reproducing a result.
Why RL?
Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It's exciting for two reasons:
- RL is very general, encompassing all problems that involve making a sequence of decisions: for example, controlling a robot's motors so that it's able to run and jump, making business decisions like pricing and inventory management, or playing video games and board games. RL can even be applied to supervised learning problems with sequential or structured outputs.
- RL algorithms have started to achieve good results in many difficult environments. RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind's Atari results, BRETT from Pieter Abbeel's group, and AlphaGo all used deep RL algorithms which did not make too many assumptions about their environment, and thus can be applied in other settings.
However, RL research is also slowed down by two factors:
- The need for better benchmarks. In supervised learning, progress has been driven by large labeled datasets like ImageNet. In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don't have enough variety, and they are often difficult to even set up and use.
- Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task's difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.
OpenAI Gym is an attempt to fix both problems.
The Environments
OpenAI Gym provides a diverse suite of environments that range from easy to difficult and involve many different kinds of data. We're starting out with the following collections:
- Classic control and toy text: complete small-scale tasks, mostly from the RL literature. They're here to get you started.
- Algorithmic: perform computations such as adding multi-digit numbers and reversing sequences. One might object that these tasks are easy for a computer. The challenge is to learn these algorithms purely from examples. These tasks have the nice property that it's easy to vary the difficulty by varying the sequence length.
- Atari: play classic Atari games. We've integrated the Arcade Learning Environment (which has had a big impact on reinforcement learning research) in an easy-to-install form.
- Board games: play Go on 9x9 and 19x19 boards. Two-player games are fundamentally different than the other settings we've included, because there is an adversary playing against you. In our initial release, there is a fixed opponent provided by Pachi, and we may add other opponents later (patches welcome!). We'll also likely expand OpenAI Gym to have first-class support for multi-player games.
- 2D and 3D robots: control a robot in simulation. These tasks use the MuJoCo physics engine, which was designed for fast and accurate robot simulation. Included are some environments from a recent benchmark by UC Berkeley researchers (who incidentally will be joining us this summer). MuJoCo is proprietary software, but offers free trial licenses.
Over time, we plan to greatly expand this collection of environments. Contributions from the community are more than welcome.
Each environment has a version number (such as Hopper-v0). If we need to change an environment, we'll bump the version number, defining an entirely new task. This ensures that results on a particular environment are always comparable.
Evaluations
We've made it easy to upload results to OpenAI Gym. However, we've opted not to create traditional leaderboards. What matters for research isn't your score (it's possible to overfit or hand-craft solutions to particular tasks), but instead the generality of your technique.
We're starting out by maintaing a curated list of contributions that say something interesting about algorithmic capabilities. Long-term, we want this curation to be a community effort rather than something owned by us. We'll necessarily have to figure out the details over time, and we'd would love your help in doing so.
We want OpenAI Gym to be a community effort from the beginning. We've starting working with partners to put together resources around OpenAI Gym:
- NVIDIA: technical Q&A with John.
- Nervana: implementation of a DQN OpenAI Gym agent.
- Amazon Web Services (AWS): $250 credit vouchers for select OpenAI Gym users. If you have an evaluation demonstrating the promise of your algorithm and are resource-constrained from scaling it up, ping us for a voucher. (While supplies last!)
During the public beta, we're looking for feedback on how to make this into an even better tool for research. If you'd like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people's results, or even implementing your own environments. Also please join us in the community chat!
ORIGINAL: OpenAI
by Greg Brockman and John Schulman
April 27, 2016
No hay comentarios:
Publicar un comentario
Nota: solo los miembros de este blog pueden publicar comentarios.