The first world

Japanese

I implemented the first world. In this post, I set the rules of the first world, write them in the OpenAI Gym format, register it as an OpenAI Gym environment, and use Keras-rl2 to train a neural net for this environment.

I put the code on Github, so I’ll focus on the specifications of the world.

  • The world is a 30 x 30 2D grid space
  • There is only one living entity (agent)
  • The agent has 100 energy at the beginning
  • Food is distributed randomly (both location and amount (0-100) are random)
  • Time is discretized in steps
  • In each step, the agent can either move to one of four directions or stay.
  • The agent spends 1 energy if it moves or 0.5 energy if it stays.
  • If the agent overlaps with food, it gains its energy, and get a reward point of the (food amount)/100.
  • If it uses up the energy, it dies and loses 3 reward points.
  • In each step, there is a 2% chance that food appears in the world. The amount is random (0-100).
  • An episode ends if the agent dies or at 2000 steps.

When the environment is registered to OpenAI Gym, doing “pip -e gym-smallworld” keeps the environment editable during development. It makes easy to keep modifying the world specification.

The world is defined now. The network training is pretty much a simplified version of the Atari game example from Keras-rl2. Specifically,

  • 2 layers of 3 x 3 x 16 convolutional layers (stride 1)
  • 1 dense layer with 32 neurons
  • Scaled exponential linear unit (selu) for activation function
  • Deep Q Network (DQN) with epsilon greedy Q policy with linear annealing
  • Adam optimizer with learning rate = 0.001
  • 10000 warm-up steps
  • the target model is updated every 10000 steps.

The difference from the example is the size of the network (smaller) and the use of selu (which I have a nice experience with). Most other things are the same. No GPU used. If you install appropriate packages (gym, Keras-rl2, etc, with Python 3) it should work fine on a CPU. If it works, the training begins and animation like this will be displayed.

A green dot is the agent and red dots are food. The brighter the dot is, the more food there is there. If you hide this animation with other window and not let it render on your display, it seems that it omits the calculation of this and calculation gets efficient. I don’t know exactly how it works, but it’s a nice tip.

Pretty much what the agent has to do is to move towards any available food. There are no obstacles or enemies, so this is a pretty straightforward task. With this simple training scheme, the episode reward goes up nicely, and saturate near 1.5-2M steps (blue is the epsode reward, orange is the Q-value). It takes about half a day with Core i7-3770k.

An issue is that the current program does not have the same performance when it is switched to the test mode. I’ll investigate the reason for this (I assume that the policy change between the training mode and test mode is the issue).

Anyways, the first world is completed. My next task is to make an environment for multiple agents that could be trained simultaneously and independently.

Software environment to make the small world

Japanese

In the last post, I said that I want to make a world in which I can experiment with evolution. In this post, I’d like to talk about the software environment to do that.

I think the best language for doing this task is Python because the task would eventually require doing machine learning (including deep learning) with multiple agents. Python has a framework like OpenAI Gym that makes machine learning easy. It is also the most powerful language for deep learning because of the vast amount of libraries. In terms of pure speed, it is also competitive if you use modules like Numpy or Numba. (I tested running Conway’s game of life using Matlab, Python, and Julia and compared the speed. If you optimize, the speeds of Python and Julia were similar. Matlab was a bit slower (~x2) than the other two. I would omit the details of these experiments, but let me know if you are interested. I may write a post for that.)

I will use PyCharm to write the code. I was torn between VSCode and PyCharm, but PyCharm seemed to be easier for debugging. I also thought about Vim, but there were too many plugins to be installed to make it practical. More plugins mean more maintenance, so I skipped it this time. These are not final decisions.

I would try making my own environment and register it to OpenAI Gym, then combine it with one of the existing deep learning frameworks. I’m using a bit old Intel Mac, so I thought about using PlaidML + Keras to use CPU graphics for computational resources. However, I got errors when combined with OpenAI Gym (discussed in this thread), and couldn’t fix it. (I could run the PlaidML benchmark on my CPU. That was fast.) I thought my current computer would not be used for the long-term, so I decided to use just a CPU for now.

In near future, I would build a reasonable computer with a GPU and run the calculation on that. I would write another post on the hardware environment, but I’m not thinking that it is not the greatest time to write about hardware right now. At least, I should wait until the details of Apple Silicon Macs and AMD’s Ryzen 5000 series are known.

This is a relatively short post without any code. In the next post, I will probably start deciding what the world’s rules are, and register them to Open AI Gym.

Dreaming of a virtual world with evolving electric sheep

Japanese Version

For long, I’ve been dreaming of making a small world (on a computer) where I can experiment with evolution. Too ambitious? Perhaps. We already know that order and complexity can emerge from simple ingredients (e.g. cellular automaton). How about intelligence? Will it emerge out of simple rules?

To be more specific, I want to make a slightly complicated version of a cellular automaton with a little more realistic life-like entities. I’d like to make an environment in which lives can think, act, learn, reproduce, and die. Changes caused by reproduction will make diversity of the lives, and natural selection would keep the entities that fit the current environment. The process goes on and in the end, in an ideal scenario that I’m dreaming of, progressively intelligent entities would emerge from such an environment.

I know this project will not be very easy. Probably many people had already tried and failed. Can I contribute at all by doing it as a hobby? I don’t know. Will I have a unique perspective to tackle this issue? Maybe. Even if I won’t achieve much, but it’s fine. I’ll learn something valuable through the process (at the very least, more programming skills).

There are two important factors for this. The first one is the genetic algorithm implemented by inheritance and natural selection. I won’t use a specific training function, but let them survive in the environment. If they thrive, they survive. The second one is neural networks whose structure can be changed through evolution and can be trained through the course of a single life.

Articles that I will write in this blog will be like a journal for this exploration. In each article, I’ll focus on a small topic. Let’s see how it goes. I’m excited.