とある計算神経科学者のPython環境

English

もう大分前の話になりますが、仕事で使う言語をMatlabからPythonに変更しました。快適に作業ができるようになるまでセットアップに苦労したので、ここにどんなことをしたかを記しておこうと思います。

マシンの選択:Intel版Mac

Macを使う理由はコントロールキーとコマンドキーが分かれているからです。プラットフォームの選択について他にも色々と絡む要素があると思いますが、個人的に重要なのはこれくらいです。これら2つのキーが分かれていることの利点は、後述するエディタ、VS Codeを含む多くのアプリで使用感に一貫性が出ることです。とりわけ、Vim的な動作をアプリに求めるとコントロールとコマンドが分かれていない場合は多くのコマンドが干渉してしまうことになります。(例:ctrl-v はVimではページ送りですが、Mac以外のプラットフォームではペーストと干渉します。)

なぜインテルなのか。これはこの記事を書いた時点では、Apple Siliconへの多くのソフトの対応が遅れていたからで、今となってはApple Siliconで何ら問題ないと思います。Apple Siliconの優位性については過去の記事を参照してください。

個人的にはアップルのキーボードは好みではないので違うのを使っています。しばらくは東プレのRealForceを使っていたのですが、Ferris Sweep(キーレイアウトについてはこちら)を経てCorneに落ち着きそうです。RealForceをMacで使う場合はKarabiner-Elementsを使って一部のキーをリバインドするといいかもしれません。

エディタ:Visual Studio Code (VS Code)

VS Codeはマイクロソフトが出している(一応)オープンソースのエディタです。汎用性が高く、プラグインを使うことでPythonを含む様々な言語に対応できます。

特筆すべきはそのリモート支援機能です。この機能はリモートマシンにVSCodeをインストールしてサーバとして動かすことができます。そのサーバとローカルのVSCodeを通信させてファイルの読み書きができるので、リモートのマシンを使いながらもローカルな操作感(入力遅延なし)が実現できます。以下、私の環境の詳細を記しておきます。

使用中のプラグイン

Python plugin: VS CodeのPythonの基本的なサポート機能です。ほぼ必須です。

Pylance: Pythonのより進んだコード補完(他のファイルを参照して関数等のチェックも行う)や色付けができます。あると便利です。

Jupyter: 人とやりとりする時やデモの作成にノートブックを使うことがあると思いますので、あるといいでしょう。入力と出力がこんがらがるので個人的にはノートブックは好きではなく、Matlabのセルの方がいいと思うのですが、そういう人のために、インタラクティブモードも用意されています。

VS Code Neovim (Vimが好きな人用): Vimをメインエディタにしている人におすすめです。Vimプラグインもあるのですが、そちらはVimの機能を完全に再現しているわけではなく、一部の機能しかありません。一方こちらのNeoVimプラグインはシステムにインストールしてあるNeoVimをVSCode内部で直接動かすため、NeoVimの全ての機能が使えます。

Remote development extension pack (リモートのマシンを動かしたい場合): 現状私がVS Codeを使う最も大きな理由がこれです。リモートのマシンにssh接続するとそこに自動でサーバ用ソフトをインストールし、ローカルのVS Codeで操作できるようにしてくれます。離れた場所にある計算機にアクセスしたい時にも便利だし、MacでできないことをMacのインターフェイスでやりたい時(Linuxに載ってるNvidiaのGPUを使いたい等)にも便利です。どちらにしてもファイルをローカルで直接いじっている感覚で使えます。

その他のエディタの設定

フォーマッタ: Black
コードのフォーマットを綺麗にしておくことはメンテナンスのためにとても良いことですが、フォーマットのことまで考えながらコーディングするのは面倒ですし、書いちゃったらもう直すのは面倒です。ですが、Blackを使うとなんと自動的にコードをフォーマットしてくれます。最高です。Blackを使うことによる様々な利点については、こちらのブログを参照していただきたく存じます。VS Code側で”Editor: Format On Save”をチェックしておくと、保存する度に自動的にフォーマットしてくれます。

色のテーマ: Monokai
Pylanceを使うと、”semantic highlighting”という特殊な色付けが使えます。これは普通のハイライティングよりも一歩進んだやつで、コードを読んで、変数、クラス、メソッド、関数などを区別し、それぞれを別の色にしてくれる機能です。残念ながら、すべての色のテーマがこれに対応しているわけではないので、テーマによっては区別できるのにされない要素などがあったりします。Monokaiはそれぞれの要素のちゃんと別々の色が定義されているのでおすすめです。

その他の設定: 添付ファイルを参照
その他にも色々と変更した点があるので設定ファイルを添付しました。何かおかしいと思う所がある場合は参照してみてください。

デバッグ: “justMyCode” を “False” にしておく
デバッグの設定は launch.json で行うのですが、この中にある “justMyCode” を “False” にしておくといいかもしれません。デフォルトでは “True” になっていて、そのままにしておくとエラーが出た時にそのエラーが自分が書いた部分でない場合デバッガが起動しません。自分が書いた部分でなくても、ライブラリ関数への引数の与え方が違っていてエラーになる事などもしょっちゅうあるので、これは”False”にしておいた方がいいと思います。

一行実行するショートカット”Ctrl + .”の設定
何か試したい時に、セルを作らずに一行だけ実行したい時ってあると思います。そういう時のためにショートカットを用意しましょう。”Jupyter: Run Selection/Line in Interactive Window” を適当なキー(私の場合は “Ctrl + .”にしてあります)に割り当てておくと作業が捗ります。

プロファイリング: cProfile + pyprof2calltree (+ QCacheGrind)
ここはまだちょっと最適解を得たのかちょっと自信がない部分ではあります。こちらに詳しい解説があります。個人的には、Matlab付属のプロファイラの方が使い勝手はよさそうだなと感じます。もしよりよいツールがあればコメントでお教えいただけると有り難いです。

まとめ

短い記事で広範な内容をざっとカバーしました。デフォルトの状態でもよく機能するPyCharmなどとは違ってVS Codeは使い勝手がよくなるまでにセットアップを要する印象があります。しかし、根気よくセットアップをしてやればその豊富なプラグインで唯一無二のツールになってくれるでしょう。この記事がその一助になれば幸いです。

When is GPU better than CPU for Deep Learning on Apple Silicon?

Apple Silicon has brought about significant improvements in performance and energy efficiency, making it a viable choice for deep learning tasks. However, choosing between a GPU and a CPU can still be confusing, especially when it comes to choosing the right one for the task at hand. In this post, we’ll explore the conditions under which GPUs outperform CPUs on Apple Silicon when using deep learning libraries like TensorFlow and PyTorch.

To determine when to use a GPU over a CPU on Apple Silicon for deep learning tasks, I conducted several experiments using TensorFlow and PyTorch libraries. I used a base-model MacBook Air (M1, 8GB RAM, 7-core GPU) with a thermal pad mod to eliminate any potential throttling.

What does the running time depend on?

When comparing the performance of GPUs and CPUs for deep learning tasks, several factors come into play. Generally speaking, GPUs can perform computations faster than CPUs when there is a high amount of parallelism in the computations, and when there is little need for accessing memory. When working with a simple feed-forward network, there are several factors to consider:

  • Number of layers
  • Number of hidden layer neurons
  • Batch size
  • Backend (TensorFlow or pyTorch)

While exploring different combinations of these numbers, I found that the computation time is mostly proportional to the number of layers, and it won’t contribute to changing the ratio of the speed between CPUs and GPUs. So, I fixed it to 4 layers for this exercise.

Now, let’s focus on the computations per layer. After tinkering with these numbers, I realized that the computation time mostly depends on the total amount of computation per layer. The total amount is determined by the number of hidden layer neurons and the batch size. More precisely, it can be calculated using this equation:

tflops = (2 * n_hidden * n_hidden + n_hidden) * batch_size / 1e12

This equation assumes a dense connection without batch normalization and uses the ReLU activation function, which has a small overhead. To test the performance in different scenarios, I varied the batch size from 16 to 1024 and the number of hidden neurons from 128 to 2048.

TensorFlow results

TensorFlow result

The results from my tests using TensorFlow showed that performance predictably improved with larger per-layer computations. While batch size isn’t explicit in the figure, it can be inferred from the values of n_hidden and tflops_per_layer. As long as tflops_per_layer is the same, different combinations of batch size and n_hidden perform similarly (except for cases with large networks, i.e., n_hidden 1024 or 2048, where memory allocation may have impacted the results). However, in all of the test cases, the performance of GPUs never exceeded that of CPUs.

PyTorch results

PyTorch Results

The pattern of the results from using PyTorch is more irregular than that of TensorFlow, especially for the CPU. Generally speaking, PyTorch was faster than TensorFlow, except for some rare cases (n_hidden: 1024 or 2048, batch size is relatively small). Notably, the performance of the CPU at small per-layer computations and the performance of the GPU at large per-layer computations were both incredible. It was great to see the CPU achieve good performance (~700 Gflops) across a broad range of configurations, while the GPU exceeded that performance and reached ~1.3 Tflops. The theoretical performance of the GPU is ~2 Tflops, so it is getting close to that.

Conclusion

In this simple exercise, I demonstrated following things

  • The matrix calculation performance improves as a function of the amount of computation per layer.
  • In my setup, PyTorch is faster than TensorFlow in general.
  • In PyTorch, GPU speed exceed CPU speed at around ~100Mflops per layer.

Of course, these things will depend on multiple factors such as software version, CPU performance, or number of GPU cores, so I suggest running a test on your own. On the base M1 chip, the domain where the GPU outperforms the CPU was rather limited, but this might be very different on other chips that has a larger GPU core counts.

Software versions

TensorFlow environment:
Python: 3.9.5
TensorFlow: 2.6.0

PyTorch envisonment:
Python: 3.8.13
PyTorch: 1.13.0.dev20220709

Simple 34-key layout for happy python coding

My 34-key keyboard, Ferris Sweep

A keyboard is a vital input device for computers, impacting work efficiency and even quality of life. While we often talk about ergonomics in terms of posture and the physical design of keyboards, the layout of the keys is also important. Unfortunately, most traditional keyboards come with a standard layout that hasn’t changed much in decades, leaving little room for customization. But things are starting to change.

Customizable keyboards like the Ergodox or Moonlander by ZSA Technology Labs are becoming more popular, and many people are even designing their own custom keyboards from scratch. With this increased flexibility, there’s a real opportunity to create a keyboard layout that works for you. That’s where this article comes in.

I’ve designed a simple layout for a 34-key keyboard that’s perfect for python-writing computational neuroscientists who use NeoVim in VSCode as their main editor on Macs. Of course, this layout might not work for everyone, but it should give you some good ideas for how to customize your own keyboard. And I’d like to acknowledge that this layout is standing on the shoulders of other giants (introduced to you later!) – I’ve borrowed some of the best ideas from existing layouts and added a few tweaks of my own. So if you’re tired of the same old keyboard layout and want to improve your typing experience, read on!

Why 34-key split keyboard?

Why use a 34-key split keyboard? Well, for starters, there are two main advantages: ergonomics and portability. Let’s dive into each of these.

When it comes to ergonomics, traditional keyboards have too many keys, and many of them are located in hard-to-reach positions. This can make typing uncomfortable and even lead to injury over time. However, by using more layers, you can bring all these keys back to comfortable positions and eliminate the need for the majority of keys found on a traditional keyboard. The custom keyboard community has moved from minor variations of the traditional design to more radically reduced layouts. If you find it uncomfortable to reach more than one key’s distance on a keyboard or use your pinky a lot, a 40% keyboard layout may be right for you. These layouts are often vertically staggered to fit the lengths of your fingers, allowing you to type more naturally and with less strain.

In addition to improved ergonomics, small split keyboards are also highly portable. With fewer keys, they can be made much smaller and lighter than traditional keyboards. For example, my 34-key split keyboard fits within 11 x 9 x 2 cm3 and weighs only 70 g for each side. Even two sides combined, that’s still lighter than an iPhone 13 mini! You can even make them wired or wireless for added convenience. With a wireless setup, you can use your keyboard anywhere without having to worry about cables.

The layout

Here is my layout. I used ZMK, and this layout will not be possible with QMK because it uses shifted symbols for tap-mods. (I started from ZMK, and didn’t think too much about compatibility with QMK, which I should have. But at the same time, I feel that the layout should not be restricted by the limitation of the firmware.)

My layout

This layout is heavily inspired by Pnohty by rayduck (https://github.com/rayduck/pnohty) (most of the symbol keys are the same as Pnohty), and Ben Vallack’s video (https://www.youtube.com/watch?v=8wZ8FRwOzhU) for the ideas of transitioning the layout. Overall, this should be understood as just a slight modification of the Pnohty layout. Please refer to the original write-up by rayduck to get good ideas implemented in Pnohty. Specific issues that I had with original Pnohty are 1) Modifier keys (all done as combos in Pnohty) were difficult to use, especially when modified inputs need to be repeated, 2) some chording (simultaneous key press) was hard for me (particularly, using the navigation).

For the issue 1, I went back to the home row mods (https://precondition.github.io/home-row-mods) instead of using combos. For the issue 2, I changed the navigation layer to be modal.

Packing everything you want into 34-key is not easy, but it is certainly possible using layers. The concept of the layer is already used in a traditional keyboard as a shift-key. It changes momentarily what you can type with each key, alphabets are capitalized, numbers turn symbols, and symbols turn other symbols. In a modern keyboard firmware, you can design your own layers to type anything with fewer keys.

I have 4 main layers: ‘base (alphabets)’, ‘symbols’, ‘numbers’, and ‘navigation’. The alphabet layer is the default one, and you don’t have to do anything special to be in this layer. symbols layer include most of the symbols that you need for normal programming. The number layer contains numbers and basic arithmetics. periods are also repeated in this layer for easy access to a decimal point.

Base Layer

For my base layer, I chose Colemak Mod-DH. I find this layout to be a good compromise between comfort and familiarity with the traditional QWERTY layout. It is also well-regarded by many users as an alternative to QWERTY.

Colemak Mod-DH is a modification of the original Colemak layout, with the positions of the D and H keys moved to the bottom row for greater comfort. If you prioritize the location of common keyboard shortcuts like “copy” and “paste” over optimal hand positioning, you may prefer to stick with the original Colemak layout.

Learning a new keyboard layout takes time and practice, but the benefits can be substantial. It generally takes a few weeks to a month to become comfortable with a new layout. During this time, you will likely start with a slow typing speed, gradually improving over the course of a week. With continued practice, you should be able to reach about 80% of your original typing speed after a month.

I recommend not completely abandoning your old layout or keyboard during this transition period, especially if you rely on typing for work. However, if you type frequently, the effort to retrain yourself with a new layout can be well worth it in the long run.

Symbol Layer

This layer contains symbols that are not used for arithmetic. To activate this layer, simply hold down the “layer 1” button located at the left thumb. Once you release the button, you’ll be back to the base layer. The symbol layer includes various commonly used symbols, such as parentheses and brackets. In Python, “[]” and “()” are used quite often, so they are placed in comfortable positions for easy access using the middle and ring fingers.

The location of the symbols on this layer was carefully considered, taking into account bigrams (two-key sequences) in Python. Additionally, I’ve added the home row mods as tap-mods to make it easier to access them.

For more information and ideas behind the choice of symbol positions, you can refer to the original Pnohty write-up by rayduck.

Number Layer

The Number Layer contains numbers and symbols used for arithmetic. To activate this layer, simply press the key located under your right thumb, which is symmetric to the key used for the Symbol Layer. Some symbols such as ‘*’ are duplicated in this layer, even though they appear in the Symbol Layer.

What’s interesting about this layer is the order in which the numbers are arranged. Unlike a typical numpad, the numbers 1, 2, and 3 are located in the center, while 4, 5, and 6 are at the top, and 7, 8, and 9 are at the bottom. This is because you type “123” more often than other numbers. Additionally, 0 has been assigned a special location near the thumb for easy access. I also added an extra spacebar in this layer since you often have to insert space when you are writing equations.

Navigation layer

The Navigation layer is a special layer in my layout that works differently than the Symbol and Number layers. While the Symbol and Number layers are activated momentarily with a key press, the Navigation layer is activated with a toggle key, similar to the Caps Lock key. Once activated, you stay in the Navigation layer until you press the “go back to base (layer 0)” button.

The Navigation layer is designed to make basic text editing operations easier. It includes features for navigating text, selecting, cutting, copying, pasting, deleting, and typing enter or space within the layer. Since these operations can be awkward if you have to keep pressing another key, making the Navigation layer a modal layer makes sense.

One potential issue with modal layers is forgetting to switch back to the original layer, which can result in pressing unwanted keys. To mitigate this issue, I turned the home rows of the left hand into pure modifiers, although this does not eliminate the problem completely.

Other layers

The other layers are not essential functions (bluetooth related, and left hand arrow keys). If you are curious, you can go to my repository and check them. (https://github.com/shixnya/zmk-config)

Concluding remarks

While ergonomic keyboards are attractive, designing a 34-key layout that fulfills all of your needs is not a trivial task. However, I hope that this article has provided some guidance and inspiration for those who are starting the journey of finding the optimal key layout for their needs. Don’t be afraid to experiment and tweak your layout as you go along, and remember that what works for someone else may not necessarily work for you. Ultimately, the goal is to create a personalized keyboard that maximizes your comfort, productivity, and overall typing experience. Good luck on your keyboard journey!

A computational neuroscientist’s Python environment

Japanese

I switched from Matlab to Python for data analysis. It was a little tricky to set up an environment that I like. Here I provide a list of important elements in my environment with a short description for each.

Machine: An Intel Mac

Why Mac? Because the control and the command keys are separated. This is important for getting consistency of the UI feeling, especially if you are using Vim key bind in VSCode. For example, Ctrl-V is pagedown in Vim, and it can be separated from pasting (Command-V) on Macs, not Windows or Linux (in which you’d have to give up either).

Why an Intel processor? Because some of the numerical libraries still don’t work with Apple Silicon Macs. It doesn’t feel nice if you always have to suspect that your processor could be the reason for an error when you encounter one. If you exclusively work on remote server, and don’t run any computation locally (or do with libraries that work for sure), you can use Apple Silicon Macs. This recommendation will change in near future because Apple chips are great, and the community is making great progress in making things compatible.

I personally don’t feel that Apple’s keyboards are great. If you spend a lot of time typing, I’d recommend getting a keyboard you are comfortable with. I like Topre RealForce, but any decent keyboard would be fine. You can use Karabiner-Elements if you want to use a PC keyboard with rebinding.

Editor: Visual Studio Code (VS Code)

VS Code is an open source (but see this) editor from Microsoft. It is a versatile text editor that is compatible (with extensions) with many languages including Python.

A remarkable feature of VSCode is its remote capability. This feature installs a copy of VSCode on the remote machine, and run it as a server. You can edit files with the local VSCode with a feeling of editing local files, but saving to/running on the remote computer. This is a unique feature of VSCode that distinguish it from other editors. Below, I explain detailed settings of the editor in my environment.

Extensions

Python plugin: This is VSCode’s basic Python support. Why not.

Pylance: This is an extension that provides advanced code-completion and linting. Good to have.

Jupyter: Most likely, you will be using Jupyter notebooks if you do a collaborative data science project. I’m not a fan of using notebooks because lengthy outputs of cells becomes distractor of code navigation. I think Matlab’s cell style works better for doing the task. in VSCode, there is an interactive mode that resolves the issue (which also uses the Jupyter plugin).

VSCode Neovim (optional): This is nice if you come from Vim or like Vim keybindings. NeoVim plugin is preferred over Vim plugin. Vim plugin is an effort of making keybinding compatible with Vim, but Neovim Plugin runs actual Neovim under the hood. This enables some of the actions that don’t work with Vim plugin (e.g. the behaviors of ‘.’ and ‘undo/redo’ are a little strange in Vim plugin). It is important to install Neovim >= 0.5.0. If you are using homebrew for package management, run with “brew install –HEAD neovim”.

Remote development extension pack (if you work with a server): This is one of the main reasons that I use VSCode as noted above. It allows you to install server and client of VSCode, and you can use local client (UI is local, so it’s responsive) while directly working with remote files and executables. This is also useful when you want to separate your client user interface and server environment. Particularly, this resolves a dilemma of wanting to use Mac interface and do deep learning with Nvidia GPUs. You can easily setup a linux computer with a good GPU and remotely connect with your Mac, with a feeling of working directly on that computer.

Other editor settings

Formatter: Black
Code formatting is important for keeping the code legible. Black automatically format the code for you. The same logic, the same result. You may have slightly different preference from it’s setting, but the point here is consistency. If you and your colleagues use Black, you don’t have to parse your colleagues’ code to identify their coding style or feel guilty by mixing in your style. I’ll just go with Black’s default setting for all. Please read this nice article if you are not convinced yet. Using black can be done by setting “Python > Formatting: Provider” to “black”. I also recommend keeping “Editor: Format On Save” checked in setting.

Color theme: Monokai
If you use Pylance, you get a special mode of syntax highlighting called “semantic highlighting”. It parses the code a little deeper than a typical syntax highlighter and understand distinctions between classes, methods, local variables, function arguments, etc. However, few color themes can distinguish all of these elements. In my opinion, Monokai does the distinction the best. It highlights using different colors, underscore, italic fonts to keep all of the elements separated, while not making too confusing.

Misc settings: see attachment
I also change a bunch of things in the setting such as font size, ruler settings, etc. I attached my settings.json to this article. Please check it out if you feel something is not working correctly.

Debugging: keep “justMyCode: False”
The key is to set up nice debugging configuration (launch.json) file. Please see this official guide for how to set it up. It’s definitely worth knowing. Often errors you encounter has something to do with your inputs to library functions. The default setting for the VSCode debugger does not in such a case. By changing “justMyCode: False” in your launch.json, your code will stop in such a situation while debugging.

A shortcut for interactive window: “Ctrl + .”
I like having an option to send just one line of the code to an interactive window to see the results. You can do it by setting up this shortcut. In keyboard shortcut setting, search “Jupyter: Run Selection/Line in Interactive Window” and put in this keybinding.

Profiling: cProfile + pyprof2calltree (+ QCacheGrind)
This is where I’m still struggling. You can see this nice blog post to see how it works. However, in my opinion, this is still less useful than Matlab’s line profiler with GUI. I’d like to know if there is something better than this.

Summary

I quickly covered quite a few things about my coding environment. Unlike PyCharm, which defaults are quite functional, you need a little more setup to use VSCode effectively for data science in python. However, once you know how to do it and set them up patiently, it will become a very versatile environment. You won’t regret. You may not like everything I use, but hopefully you can adopt one or two things above in your environment.

About 2020 Apple Silicon Macs

Japanese

Today, Nov. 10, 2020, Apple announced three new models of Macs with Apple Silicon.

  • Macbook Air (13 inch, base price $999)
  • Macbook Pro (13 inch, base price $1299)
  • Mac mini (base price $699)

All of them have the new M1 chip that integrates CPU, GPU and neural engine among with other features. The base models have 8GB RAM and 256GB SSD.

Customization options are few due to its integrated chip design. Upgrading RAM to 16GB (max) costs $200 and storage is about $400/TB up to 2TB.

There are more details on specs on other sites, so I’ll focus on writing just my opinions about them.

Apple seemed to focus on how “lazy” they can be on hardware this time. They focused on designing a minimum number (1!) of chips while trying to make a product lineup that can access the largest number of customers. So, it makes sense that they focused on the ‘entry’ models of their products. Also, they reduced the effort by reusing the exterior designs of the existing products. Probably they would still have extra space inside due to integrating functions to M1 chip and removing a fan (Air) so that space might have been filled with extra battery. (We’ll see when iFixit tear it down next week.) It is possible that they have been designing recent products compatible with both previous logic boards and the board for M1.

Where did they put the resources that they saved in hardware design? My guess is that they went to software that needs to be changed radically because of the switch of the CPU architecture. Intel CPUs are in the x86 family and M1 is in the ARM family that has a different instruction set. I don’t know how much effort is necessary to maintain perfect backward compatibility at a reasonable speed, but it won’t sell well if many problems arise upon release of these Macs. I think it is reasonable to be very cautious about this.

It will ultimately depend on preferences which one you should buy, but it should be noted that they are all using the same chip. The only possibility for performance difference is the thermal capability of Air that went fanless this time. If thermal throttling occurs, it may sacrifice some performance during intensive tasks. It is possible that fanless design is just fine because M1 will be very power efficient. (Based on A14’s 6W Thermal Design Power (TDP), I expect M1 TDP to be ~10W.) Other than that, they are all identical, so you can just choose based on your use cases. Laptops come with a complete set of interfaces, so they are more expensive than mini. The main features that make Pro different are the fan, touch bar, and microphone array. If you don’t care, Air is better and you save $300.

The differences between A14 Bionic and M1 are just the number of high-performance cores (2 to 4) and GPU cores (4 to 8). Several days before this event, there was a leaked Geekbench 5 score of a chip called A14X (non-existing). The score was ~7000. The score of A14 was ~4000 and if you think that the high-performance cores are doubled but the high-efficiency cores are the same (4), this seems a reasonable score for M1. This score is close to Ryzen 7 4800HS or Core-i7 10875H that are relatively high-end mobile processors (not the highest, though) and, by itself, it is not amazing yet. However, if the power necessary to do the calculation is expected to be 1/3 or 1/4, this suddenly becomes the king in performance/watt. Probably the closest competition is Ryzen 4800U which already has almost insane perf/watt with a score of ~6000 and 15W TDP. M1 well outperform this, and it sounds more insane. These numbers matter when you increase the core count further for more powerful desktops, so I want proper measurements to be done.

For scientific computing, this computer should be pretty good hardware. However, the software may not be ARM native at the beginning, and your favorite one may not even work, so it might be wiser to buy after making sure what you want to use works on these machines. These computers can replace pretty much every computer but high-end desktops, so if you want to do the very extensive computation, probably it’s better to wait until iMacs or Mac Pros that are coming later.

For me, if the computer can do neural network training using Neural Engine cores with Tensorflow or PyTorch, I’d think about buying one for the World Making project. Even in that case, probably waiting for a more powerful iMac might be better. At least I’d like to see what happens over the next few weeks. This is a very exciting year for computing.

2020年の Apple Silicon Mac について

English

2020年11月10日、Apple Siliconを乗せた新しいMacのモデルが公開されました。

ラインナップは

  • Macbook Air (13インチ)
  • Macbook Pro (13インチ)
  • Mac mini

で、それぞれM1チップを積んでおり、他のプロセッサオプションはありません。

価格はそれぞれ最低値が

  • Air: $999
  • Pro: $1299
  • mini: $699

で、ベースモデルはメモリは8G、ストレージが256GBになります。

グラフィックス統合チップを使っているのでカスタマイズオプションは少なくメモリを16GBにするのに+$200、ストレージを増やすのに概ね$400/TB 程度となります。

スペックの解説は他のサイトに詳しいので、ここでは個人的な見解を書いていきます。

今回アップルが注力した点は、「ハード面でどれだけ手を抜けるか」ということだと思います。まずは作るチップの種類を最小限(1種類)に抑えた上で、可能な限り広い範囲に届く製品ラインアップ(エントリー向けの3種類)を組みました。Apple Siliconのデビュー作として効果を出すためには一番客層の厚いエントリー層に向けた製品を出すのは必然だったと思います。ハードウェアのデザインは昔のものをそのまま流用し、チップ統合やファンの撤去によって余ったスペースはおそらくバッテリーで埋めたものと思われます(一週間後にiFixitが分解したときにわかりますね)。ここ数年で出した製品はM1への移行がスムーズになるようにM1を意識した設計がほどこされていた可能性もあります。

ハード面で浮いたリソースはどこに回されたのでしょうか。これはおそらくx86ベースからARMベースのシステムにするために必要だった後方互換性をとるためのソフトウェア周りに、でしょう。私は異なるアーキテクチャでソフトウェアの互換性をとるための開発がどのくらい困難なものかはわからないのですが、発売からすぐにソフトウェア周りに多数の不具合が見つかるようでは評判が悪くなるのでこの部分のテストに注力するのは適切なように思われます。

どれを買えばいいかは人それぞれだと思いますが、Macbook AirとProが全く同じプロセッサで動いていることは注目すべきところだと思います。同じプロセッサだから同じパフォーマンスが出るかというとそういうものではなく、Airはファンレスで排熱が弱いので少しでも長い計算を行うと発熱で計算能力を落とすかもしれません。このあたりはベンチマークの結果を見たいところです。A14のTDP(Thermal Design Power)が6Wであることを考慮するとM1はおそらく10W程度だと思われるので、ファンレスでも十分な可能性はあります。それ以外では性能は同じだと思われるのでラップトップがいいかデスクトップがいいか、あるいはタッチバーが好きかどうかで決めていいのではないかと思います。

M1チップについてですが、A14との違いはハイパフォーマンスコアが2つから4つになったこと、GPUコアが4つから8つになったことで、あとは大体同じです。イベントの少し前にA14Xという(存在しない)チップの非公式のGeekbench 5スコアのリークがあったようで、スコアは7000程度とのことでした。A14がだいたい4000くらいなので、ハイパフォーマンスコアが2倍になり、高効率コアが4つのまま据置きであることを考えるとこれがM1でも納得できる数値だと思います。これは Ryzen 7 4800HS や Core-i7 10875Hと同程度で、モバイル用のハイエンドのものと肩を並べる程度のパフォーマンスになり、これ自体ラップトップとしてはそれほど驚異的な数字ではありません。しかし、消費電力がそれらハイエンドCPUの1/3から1/4になると予想されることも考慮に入れると驚くべき数字になります。ほとんど非合理とも言える効率で有名なRyzen 4800Uですら15Wでスコア6000程度なことを考えると、10W程度のプロセッサで7000を出すとなると、何かがおかしいようにも感じます。このあたりの数字は将来もっとコア数を増やした際にデータセンター等で使えるかどうか、というところに関わってくる可能性があるので適切な測定がなされる(あるいはデータが提供される)ことを期待します。

科学計算をするのに適切な計算機であるかという観点から見ると、ハードウェアの観点から言うと十分に使用に耐えうると思います。ただ、最初のうちはARMネイティブで動くソフトが少ないかもしれないので、ソフトウェアの移行が進んでから購入する方が賢いかもしれません。このままのスペックでもかなりの計算はできますし(中堅デスクトップ程度かと)、発熱やファンノイズの問題もほぼないと言えるでしょう。ハイエンドデスクトップを置き換えたいのであれば、来年以降のiMacかMac Proを待つ方がよさそうです。

もしニューラルエンジンがモデルのトレーニングで(Tensorflowなどで)簡単にできるようになった場合、購入することも考えています。本当はより高性能なプロセッサを積んだiMacを待ちたいところですが、どうするか少し決めかねています。長くなりましたが、以上です。

The first world

Japanese

I implemented the first world. In this post, I set the rules of the first world, write them in the OpenAI Gym format, register it as an OpenAI Gym environment, and use Keras-rl2 to train a neural net for this environment.

I put the code on Github, so I’ll focus on the specifications of the world.

  • The world is a 30 x 30 2D grid space
  • There is only one living entity (agent)
  • The agent has 100 energy at the beginning
  • Food is distributed randomly (both location and amount (0-100) are random)
  • Time is discretized in steps
  • In each step, the agent can either move to one of four directions or stay.
  • The agent spends 1 energy if it moves or 0.5 energy if it stays.
  • If the agent overlaps with food, it gains its energy, and get a reward point of the (food amount)/100.
  • If it uses up the energy, it dies and loses 3 reward points.
  • In each step, there is a 2% chance that food appears in the world. The amount is random (0-100).
  • An episode ends if the agent dies or at 2000 steps.

When the environment is registered to OpenAI Gym, doing “pip -e gym-smallworld” keeps the environment editable during development. It makes easy to keep modifying the world specification.

The world is defined now. The network training is pretty much a simplified version of the Atari game example from Keras-rl2. Specifically,

  • 2 layers of 3 x 3 x 16 convolutional layers (stride 1)
  • 1 dense layer with 32 neurons
  • Scaled exponential linear unit (selu) for activation function
  • Deep Q Network (DQN) with epsilon greedy Q policy with linear annealing
  • Adam optimizer with learning rate = 0.001
  • 10000 warm-up steps
  • the target model is updated every 10000 steps.

The difference from the example is the size of the network (smaller) and the use of selu (which I have a nice experience with). Most other things are the same. No GPU used. If you install appropriate packages (gym, Keras-rl2, etc, with Python 3) it should work fine on a CPU. If it works, the training begins and animation like this will be displayed.

A green dot is the agent and red dots are food. The brighter the dot is, the more food there is there. If you hide this animation with other window and not let it render on your display, it seems that it omits the calculation of this and calculation gets efficient. I don’t know exactly how it works, but it’s a nice tip.

Pretty much what the agent has to do is to move towards any available food. There are no obstacles or enemies, so this is a pretty straightforward task. With this simple training scheme, the episode reward goes up nicely, and saturate near 1.5-2M steps (blue is the epsode reward, orange is the Q-value). It takes about half a day with Core i7-3770k.

An issue is that the current program does not have the same performance when it is switched to the test mode. I’ll investigate the reason for this (I assume that the policy change between the training mode and test mode is the issue).

Anyways, the first world is completed. My next task is to make an environment for multiple agents that could be trained simultaneously and independently.

最初の世界

English

さて、今回から世界の実装を始めていきます。今回のゴールは、最初のシンプルな世界のルールを決め、OpenAI Gym に登録できる形式で書き下し、Keras-rl2を使ってトレーニングができるところまでとします。細かいことは気にせず、とりあえず動くものを作ります。

コードは Github に置いておいたので、ここでは世界の仕様を書いていきます。

  • 30 × 30 の二次元グリッドスペース空間
  • エージェントは1つのみ
  • エージェントは最初に100のエネルギーを持っている
  • 空間には食べ物がランダムに配置されている(場所も量(0-100)もランダム)
  • 時間は不連続(ステップ制)
  • 各ステップに生き物は各方向に移動するか、待機するかを選択できる(選択肢は5つ)
  • 生き物は移動するとエネルギーを1消費し、待機すると0.5消費する
  • 生き物が食べ物に重なるとそのエネルギーを自分のものにでき、同時に食べ物の量/100分のスコアを獲得する
  • 生き物が全てのエネルギーを使うと死亡し、スコアを3点失う
  • 1ステップごとに2%の確率でランダムなマスにランダムな量(0-100)の食べ物が発生する
  • 生き物が死ぬか2000ステップを過ぎたら終了する

OpenAI Gym に登録するときは、 pip -e gym-smallworld のように -e (editable) オプションをつけると、世界の仕様を変更するのが簡単になります。

一方、トレーニングの方法は、Keras-rl2の例にあるAtariゲーム用のトレーニングコードをほぼ流用しました。具体的には以下の仕様でトレーニングします。

  • 3 × 3 × 16 の畳み込みレイヤー(ストライド数1)を2枚
  • 32ニューロンの全結合層が1枚
  • 活性化関数は scaled exponential linear unit (selu) (変更点)
  • Deep Q Network (DQN) を epsilon greedy Q ポリシーと共に使い、線形焼き鈍し(linear annealing)をする
  • 最適化には Adam をラーニングレート0.001で用いる
  • ウォームアップステップは10000
  • ターゲットモデルは10000ステップごとに更新される

ネットワークを小さくし、デフォルトの relu を selu に変更しました。それ以外はだいたい元のコードのままです。GPUで動かすようにはしていないので、適当なパッケージ(gym, keras-rl2あたり)を入れればどんなパソコンでも動くはずです。(Python 3 で動かしています。)動いた場合は下のようなアニメーションが表示され、トレーニングが始まります。

緑が生き物で、赤が食べ物です。明るいほど食べ物のエネルギーが大きいことを示しています。この画像ですが、バックグラウンドにもっていって描画されないようにすると、計算を省略するみたいで、効率が上がります。見たいときはフォアグラウンドにもってくると適切な速度で表示されます。どういう仕組みかはわかりませんが、便利です。

見える範囲にある食べ物を集めていればよく、邪魔するものも何もないので、非常に簡単なタスクです。Atariのゲームの方がよほど難しいと思われます。episode_rewardもとんとん拍子に上昇し、次の図のようにだいたい1.5-2Mステップで収束します(青がエピソードごとのスコア、オレンジがQ値です)。Core i7-3770kで半日程度かかりました。

現時点での問題として、テストの時にトレーニング時のパフォーマンスが発揮できないのですが、これについては後々原因を追求します。(トレーニング時とテスト時のポリシーの変更によるところかなと思っています)

とりあえず動く世界ができました。次は複数エージェントでのトレーニングの実装をしたいと思います。

Software environment to make the small world

Japanese

In the last post, I said that I want to make a world in which I can experiment with evolution. In this post, I’d like to talk about the software environment to do that.

I think the best language for doing this task is Python because the task would eventually require doing machine learning (including deep learning) with multiple agents. Python has a framework like OpenAI Gym that makes machine learning easy. It is also the most powerful language for deep learning because of the vast amount of libraries. In terms of pure speed, it is also competitive if you use modules like Numpy or Numba. (I tested running Conway’s game of life using Matlab, Python, and Julia and compared the speed. If you optimize, the speeds of Python and Julia were similar. Matlab was a bit slower (~x2) than the other two. I would omit the details of these experiments, but let me know if you are interested. I may write a post for that.)

I will use PyCharm to write the code. I was torn between VSCode and PyCharm, but PyCharm seemed to be easier for debugging. I also thought about Vim, but there were too many plugins to be installed to make it practical. More plugins mean more maintenance, so I skipped it this time. These are not final decisions.

I would try making my own environment and register it to OpenAI Gym, then combine it with one of the existing deep learning frameworks. I’m using a bit old Intel Mac, so I thought about using PlaidML + Keras to use CPU graphics for computational resources. However, I got errors when combined with OpenAI Gym (discussed in this thread), and couldn’t fix it. (I could run the PlaidML benchmark on my CPU. That was fast.) I thought my current computer would not be used for the long-term, so I decided to use just a CPU for now.

In near future, I would build a reasonable computer with a GPU and run the calculation on that. I would write another post on the hardware environment, but I’m not thinking that it is not the greatest time to write about hardware right now. At least, I should wait until the details of Apple Silicon Macs and AMD’s Ryzen 5000 series are known.

This is a relatively short post without any code. In the next post, I will probably start deciding what the world’s rules are, and register them to Open AI Gym.

世界を作るための準備(ソフトウェア環境編)

English

前回の投稿で進化の実験ができる世界を作りたいと言いました。今回はその実現のためのソフトウェア環境について考察します。

将来的には深層学習を含めた複数エージェントの機械学習をすることになりますので、現時点で最適な言語は Python だと言えると思います。 OpenAI Gym などの機械学習のフレームワークが確立されている他、深層学習では強力なライブラリ群によってほぼ独壇場のシェアがあります。速度面でも Numpy や Numba をうまく使えば他の言語と遜色なく、フレキシブルに使える分 Python に利があるとも言えます。(Matlab と Julia でライフゲームの速度比較をしたのですが、Python も Julia もきちんと最適化をすればほぼ同等の性能が出ました。Matlabだけはどうしてもそこまで速度を上げることができませんでした。これらの実験の詳細は割愛します。もし興味がある方がいらっしゃれば、一報いただければ記事を書こうと思います。)

エディタは PyCharm を使うつもりです。VSCode も評価が高く、考慮はしたのですが、デバッグが PyCharm の方が簡単そうだったのでこちらにしました。 Vim も選択肢にはありましたが、実用的なレベルになるまでに必要なプラグインの数が多いこと(結果メンテナンスの量が増える)もあって、諦めました。無理してVimを使うよりは、PyCharm の Vim mode を使った方が短期的には楽になる気がしたので。このあたりは様子を見ながら色々試してみたいと思っています。

今回は OpenAI Gym に自作の環境(世界)を登録して、既存の深層学習ライブラリと組み合わせて計算することを試みます。やや古い Intel Mac を使っているので、PlaidML + Keras を使って Intel CPU のグラフィック機能を計算資源にできないかと思ったのですが、強化学習のフレームワークと相性が悪く、直す方法もわからなかったので大人しく CPU で計算することにしました。(こちらでとりあげられている問題が発生し、PlaidML側の反応としては、Kerasの方が何とかならないと難しい、とのことでした。PlaidMLのベンチマーク自体は走らせることができました。)

将来的には GPU を積んだ手頃な計算機を構築して、そちらで計算を回せるようにしたいところです。計算用のハードウェアの構成の考察も大変面白いので、別途記事にしますが、今は計算機を構築するのに最適な時期ではない気がするのでもう少し後に書こうと思います。少なくとも Apple Silicon Mac と、 AMD Ryzen 5000 シリーズの情報が出て来てから考えるべきだと思っています。

今回は特に紹介するコードもなく手短になりました。次回はおそらく手始めに OpenAI Gym に登録できる世界を作ってみることになると思います。お楽しみに。