とある計算神経科学者のPython環境

English

もう大分前の話になりますが、仕事で使う言語をMatlabからPythonに変更しました。快適に作業ができるようになるまでセットアップに苦労したので、ここにどんなことをしたかを記しておこうと思います。

マシンの選択:Intel版Mac

Macを使う理由はコントロールキーとコマンドキーが分かれているからです。プラットフォームの選択について他にも色々と絡む要素があると思いますが、個人的に重要なのはこれくらいです。これら2つのキーが分かれていることの利点は、後述するエディタ、VS Codeを含む多くのアプリで使用感に一貫性が出ることです。とりわけ、Vim的な動作をアプリに求めるとコントロールとコマンドが分かれていない場合は多くのコマンドが干渉してしまうことになります。(例:ctrl-v はVimではページ送りですが、Mac以外のプラットフォームではペーストと干渉します。)

なぜインテルなのか。これはこの記事を書いた時点では、Apple Siliconへの多くのソフトの対応が遅れていたからで、今となってはApple Siliconで何ら問題ないと思います。Apple Siliconの優位性については過去の記事を参照してください。

個人的にはアップルのキーボードは好みではないので違うのを使っています。しばらくは東プレのRealForceを使っていたのですが、Ferris Sweep(キーレイアウトについてはこちら)を経てCorneに落ち着きそうです。RealForceをMacで使う場合はKarabiner-Elementsを使って一部のキーをリバインドするといいかもしれません。

エディタ:Visual Studio Code (VS Code)

VS Codeはマイクロソフトが出している(一応)オープンソースのエディタです。汎用性が高く、プラグインを使うことでPythonを含む様々な言語に対応できます。

特筆すべきはそのリモート支援機能です。この機能はリモートマシンにVSCodeをインストールしてサーバとして動かすことができます。そのサーバとローカルのVSCodeを通信させてファイルの読み書きができるので、リモートのマシンを使いながらもローカルな操作感(入力遅延なし)が実現できます。以下、私の環境の詳細を記しておきます。

使用中のプラグイン

Python plugin: VS CodeのPythonの基本的なサポート機能です。ほぼ必須です。

Pylance: Pythonのより進んだコード補完(他のファイルを参照して関数等のチェックも行う)や色付けができます。あると便利です。

Jupyter: 人とやりとりする時やデモの作成にノートブックを使うことがあると思いますので、あるといいでしょう。入力と出力がこんがらがるので個人的にはノートブックは好きではなく、Matlabのセルの方がいいと思うのですが、そういう人のために、インタラクティブモードも用意されています。

VS Code Neovim (Vimが好きな人用): Vimをメインエディタにしている人におすすめです。Vimプラグインもあるのですが、そちらはVimの機能を完全に再現しているわけではなく、一部の機能しかありません。一方こちらのNeoVimプラグインはシステムにインストールしてあるNeoVimをVSCode内部で直接動かすため、NeoVimの全ての機能が使えます。

Remote development extension pack (リモートのマシンを動かしたい場合): 現状私がVS Codeを使う最も大きな理由がこれです。リモートのマシンにssh接続するとそこに自動でサーバ用ソフトをインストールし、ローカルのVS Codeで操作できるようにしてくれます。離れた場所にある計算機にアクセスしたい時にも便利だし、MacでできないことをMacのインターフェイスでやりたい時(Linuxに載ってるNvidiaのGPUを使いたい等)にも便利です。どちらにしてもファイルをローカルで直接いじっている感覚で使えます。

その他のエディタの設定

フォーマッタ: Black
コードのフォーマットを綺麗にしておくことはメンテナンスのためにとても良いことですが、フォーマットのことまで考えながらコーディングするのは面倒ですし、書いちゃったらもう直すのは面倒です。ですが、Blackを使うとなんと自動的にコードをフォーマットしてくれます。最高です。Blackを使うことによる様々な利点については、こちらのブログを参照していただきたく存じます。VS Code側で”Editor: Format On Save”をチェックしておくと、保存する度に自動的にフォーマットしてくれます。

色のテーマ: Monokai
Pylanceを使うと、”semantic highlighting”という特殊な色付けが使えます。これは普通のハイライティングよりも一歩進んだやつで、コードを読んで、変数、クラス、メソッド、関数などを区別し、それぞれを別の色にしてくれる機能です。残念ながら、すべての色のテーマがこれに対応しているわけではないので、テーマによっては区別できるのにされない要素などがあったりします。Monokaiはそれぞれの要素のちゃんと別々の色が定義されているのでおすすめです。

その他の設定: 添付ファイルを参照
その他にも色々と変更した点があるので設定ファイルを添付しました。何かおかしいと思う所がある場合は参照してみてください。

デバッグ: “justMyCode” を “False” にしておく
デバッグの設定は launch.json で行うのですが、この中にある “justMyCode” を “False” にしておくといいかもしれません。デフォルトでは “True” になっていて、そのままにしておくとエラーが出た時にそのエラーが自分が書いた部分でない場合デバッガが起動しません。自分が書いた部分でなくても、ライブラリ関数への引数の与え方が違っていてエラーになる事などもしょっちゅうあるので、これは”False”にしておいた方がいいと思います。

一行実行するショートカット”Ctrl + .”の設定
何か試したい時に、セルを作らずに一行だけ実行したい時ってあると思います。そういう時のためにショートカットを用意しましょう。”Jupyter: Run Selection/Line in Interactive Window” を適当なキー(私の場合は “Ctrl + .”にしてあります)に割り当てておくと作業が捗ります。

プロファイリング: cProfile + pyprof2calltree (+ QCacheGrind)
ここはまだちょっと最適解を得たのかちょっと自信がない部分ではあります。こちらに詳しい解説があります。個人的には、Matlab付属のプロファイラの方が使い勝手はよさそうだなと感じます。もしよりよいツールがあればコメントでお教えいただけると有り難いです。

まとめ

短い記事で広範な内容をざっとカバーしました。デフォルトの状態でもよく機能するPyCharmなどとは違ってVS Codeは使い勝手がよくなるまでにセットアップを要する印象があります。しかし、根気よくセットアップをしてやればその豊富なプラグインで唯一無二のツールになってくれるでしょう。この記事がその一助になれば幸いです。

When is GPU better than CPU for Deep Learning on Apple Silicon?

Apple Silicon has brought about significant improvements in performance and energy efficiency, making it a viable choice for deep learning tasks. However, choosing between a GPU and a CPU can still be confusing, especially when it comes to choosing the right one for the task at hand. In this post, we’ll explore the conditions under which GPUs outperform CPUs on Apple Silicon when using deep learning libraries like TensorFlow and PyTorch.

To determine when to use a GPU over a CPU on Apple Silicon for deep learning tasks, I conducted several experiments using TensorFlow and PyTorch libraries. I used a base-model MacBook Air (M1, 8GB RAM, 7-core GPU) with a thermal pad mod to eliminate any potential throttling.

What does the running time depend on?

When comparing the performance of GPUs and CPUs for deep learning tasks, several factors come into play. Generally speaking, GPUs can perform computations faster than CPUs when there is a high amount of parallelism in the computations, and when there is little need for accessing memory. When working with a simple feed-forward network, there are several factors to consider:

  • Number of layers
  • Number of hidden layer neurons
  • Batch size
  • Backend (TensorFlow or pyTorch)

While exploring different combinations of these numbers, I found that the computation time is mostly proportional to the number of layers, and it won’t contribute to changing the ratio of the speed between CPUs and GPUs. So, I fixed it to 4 layers for this exercise.

Now, let’s focus on the computations per layer. After tinkering with these numbers, I realized that the computation time mostly depends on the total amount of computation per layer. The total amount is determined by the number of hidden layer neurons and the batch size. More precisely, it can be calculated using this equation:

tflops = (2 * n_hidden * n_hidden + n_hidden) * batch_size / 1e12

This equation assumes a dense connection without batch normalization and uses the ReLU activation function, which has a small overhead. To test the performance in different scenarios, I varied the batch size from 16 to 1024 and the number of hidden neurons from 128 to 2048.

TensorFlow results

TensorFlow result

The results from my tests using TensorFlow showed that performance predictably improved with larger per-layer computations. While batch size isn’t explicit in the figure, it can be inferred from the values of n_hidden and tflops_per_layer. As long as tflops_per_layer is the same, different combinations of batch size and n_hidden perform similarly (except for cases with large networks, i.e., n_hidden 1024 or 2048, where memory allocation may have impacted the results). However, in all of the test cases, the performance of GPUs never exceeded that of CPUs.

PyTorch results

PyTorch Results

The pattern of the results from using PyTorch is more irregular than that of TensorFlow, especially for the CPU. Generally speaking, PyTorch was faster than TensorFlow, except for some rare cases (n_hidden: 1024 or 2048, batch size is relatively small). Notably, the performance of the CPU at small per-layer computations and the performance of the GPU at large per-layer computations were both incredible. It was great to see the CPU achieve good performance (~700 Gflops) across a broad range of configurations, while the GPU exceeded that performance and reached ~1.3 Tflops. The theoretical performance of the GPU is ~2 Tflops, so it is getting close to that.

Conclusion

In this simple exercise, I demonstrated following things

  • The matrix calculation performance improves as a function of the amount of computation per layer.
  • In my setup, PyTorch is faster than TensorFlow in general.
  • In PyTorch, GPU speed exceed CPU speed at around ~100Mflops per layer.

Of course, these things will depend on multiple factors such as software version, CPU performance, or number of GPU cores, so I suggest running a test on your own. On the base M1 chip, the domain where the GPU outperforms the CPU was rather limited, but this might be very different on other chips that has a larger GPU core counts.

Software versions

TensorFlow environment:
Python: 3.9.5
TensorFlow: 2.6.0

PyTorch envisonment:
Python: 3.8.13
PyTorch: 1.13.0.dev20220709

A computational neuroscientist’s Python environment

Japanese

I switched from Matlab to Python for data analysis. It was a little tricky to set up an environment that I like. Here I provide a list of important elements in my environment with a short description for each.

Machine: An Intel Mac

Why Mac? Because the control and the command keys are separated. This is important for getting consistency of the UI feeling, especially if you are using Vim key bind in VSCode. For example, Ctrl-V is pagedown in Vim, and it can be separated from pasting (Command-V) on Macs, not Windows or Linux (in which you’d have to give up either).

Why an Intel processor? Because some of the numerical libraries still don’t work with Apple Silicon Macs. It doesn’t feel nice if you always have to suspect that your processor could be the reason for an error when you encounter one. If you exclusively work on remote server, and don’t run any computation locally (or do with libraries that work for sure), you can use Apple Silicon Macs. This recommendation will change in near future because Apple chips are great, and the community is making great progress in making things compatible.

I personally don’t feel that Apple’s keyboards are great. If you spend a lot of time typing, I’d recommend getting a keyboard you are comfortable with. I like Topre RealForce, but any decent keyboard would be fine. You can use Karabiner-Elements if you want to use a PC keyboard with rebinding.

Editor: Visual Studio Code (VS Code)

VS Code is an open source (but see this) editor from Microsoft. It is a versatile text editor that is compatible (with extensions) with many languages including Python.

A remarkable feature of VSCode is its remote capability. This feature installs a copy of VSCode on the remote machine, and run it as a server. You can edit files with the local VSCode with a feeling of editing local files, but saving to/running on the remote computer. This is a unique feature of VSCode that distinguish it from other editors. Below, I explain detailed settings of the editor in my environment.

Extensions

Python plugin: This is VSCode’s basic Python support. Why not.

Pylance: This is an extension that provides advanced code-completion and linting. Good to have.

Jupyter: Most likely, you will be using Jupyter notebooks if you do a collaborative data science project. I’m not a fan of using notebooks because lengthy outputs of cells becomes distractor of code navigation. I think Matlab’s cell style works better for doing the task. in VSCode, there is an interactive mode that resolves the issue (which also uses the Jupyter plugin).

VSCode Neovim (optional): This is nice if you come from Vim or like Vim keybindings. NeoVim plugin is preferred over Vim plugin. Vim plugin is an effort of making keybinding compatible with Vim, but Neovim Plugin runs actual Neovim under the hood. This enables some of the actions that don’t work with Vim plugin (e.g. the behaviors of ‘.’ and ‘undo/redo’ are a little strange in Vim plugin). It is important to install Neovim >= 0.5.0. If you are using homebrew for package management, run with “brew install –HEAD neovim”.

Remote development extension pack (if you work with a server): This is one of the main reasons that I use VSCode as noted above. It allows you to install server and client of VSCode, and you can use local client (UI is local, so it’s responsive) while directly working with remote files and executables. This is also useful when you want to separate your client user interface and server environment. Particularly, this resolves a dilemma of wanting to use Mac interface and do deep learning with Nvidia GPUs. You can easily setup a linux computer with a good GPU and remotely connect with your Mac, with a feeling of working directly on that computer.

Other editor settings

Formatter: Black
Code formatting is important for keeping the code legible. Black automatically format the code for you. The same logic, the same result. You may have slightly different preference from it’s setting, but the point here is consistency. If you and your colleagues use Black, you don’t have to parse your colleagues’ code to identify their coding style or feel guilty by mixing in your style. I’ll just go with Black’s default setting for all. Please read this nice article if you are not convinced yet. Using black can be done by setting “Python > Formatting: Provider” to “black”. I also recommend keeping “Editor: Format On Save” checked in setting.

Color theme: Monokai
If you use Pylance, you get a special mode of syntax highlighting called “semantic highlighting”. It parses the code a little deeper than a typical syntax highlighter and understand distinctions between classes, methods, local variables, function arguments, etc. However, few color themes can distinguish all of these elements. In my opinion, Monokai does the distinction the best. It highlights using different colors, underscore, italic fonts to keep all of the elements separated, while not making too confusing.

Misc settings: see attachment
I also change a bunch of things in the setting such as font size, ruler settings, etc. I attached my settings.json to this article. Please check it out if you feel something is not working correctly.

Debugging: keep “justMyCode: False”
The key is to set up nice debugging configuration (launch.json) file. Please see this official guide for how to set it up. It’s definitely worth knowing. Often errors you encounter has something to do with your inputs to library functions. The default setting for the VSCode debugger does not in such a case. By changing “justMyCode: False” in your launch.json, your code will stop in such a situation while debugging.

A shortcut for interactive window: “Ctrl + .”
I like having an option to send just one line of the code to an interactive window to see the results. You can do it by setting up this shortcut. In keyboard shortcut setting, search “Jupyter: Run Selection/Line in Interactive Window” and put in this keybinding.

Profiling: cProfile + pyprof2calltree (+ QCacheGrind)
This is where I’m still struggling. You can see this nice blog post to see how it works. However, in my opinion, this is still less useful than Matlab’s line profiler with GUI. I’d like to know if there is something better than this.

Summary

I quickly covered quite a few things about my coding environment. Unlike PyCharm, which defaults are quite functional, you need a little more setup to use VSCode effectively for data science in python. However, once you know how to do it and set them up patiently, it will become a very versatile environment. You won’t regret. You may not like everything I use, but hopefully you can adopt one or two things above in your environment.

About 2020 Apple Silicon Macs

Japanese

Today, Nov. 10, 2020, Apple announced three new models of Macs with Apple Silicon.

  • Macbook Air (13 inch, base price $999)
  • Macbook Pro (13 inch, base price $1299)
  • Mac mini (base price $699)

All of them have the new M1 chip that integrates CPU, GPU and neural engine among with other features. The base models have 8GB RAM and 256GB SSD.

Customization options are few due to its integrated chip design. Upgrading RAM to 16GB (max) costs $200 and storage is about $400/TB up to 2TB.

There are more details on specs on other sites, so I’ll focus on writing just my opinions about them.

Apple seemed to focus on how “lazy” they can be on hardware this time. They focused on designing a minimum number (1!) of chips while trying to make a product lineup that can access the largest number of customers. So, it makes sense that they focused on the ‘entry’ models of their products. Also, they reduced the effort by reusing the exterior designs of the existing products. Probably they would still have extra space inside due to integrating functions to M1 chip and removing a fan (Air) so that space might have been filled with extra battery. (We’ll see when iFixit tear it down next week.) It is possible that they have been designing recent products compatible with both previous logic boards and the board for M1.

Where did they put the resources that they saved in hardware design? My guess is that they went to software that needs to be changed radically because of the switch of the CPU architecture. Intel CPUs are in the x86 family and M1 is in the ARM family that has a different instruction set. I don’t know how much effort is necessary to maintain perfect backward compatibility at a reasonable speed, but it won’t sell well if many problems arise upon release of these Macs. I think it is reasonable to be very cautious about this.

It will ultimately depend on preferences which one you should buy, but it should be noted that they are all using the same chip. The only possibility for performance difference is the thermal capability of Air that went fanless this time. If thermal throttling occurs, it may sacrifice some performance during intensive tasks. It is possible that fanless design is just fine because M1 will be very power efficient. (Based on A14’s 6W Thermal Design Power (TDP), I expect M1 TDP to be ~10W.) Other than that, they are all identical, so you can just choose based on your use cases. Laptops come with a complete set of interfaces, so they are more expensive than mini. The main features that make Pro different are the fan, touch bar, and microphone array. If you don’t care, Air is better and you save $300.

The differences between A14 Bionic and M1 are just the number of high-performance cores (2 to 4) and GPU cores (4 to 8). Several days before this event, there was a leaked Geekbench 5 score of a chip called A14X (non-existing). The score was ~7000. The score of A14 was ~4000 and if you think that the high-performance cores are doubled but the high-efficiency cores are the same (4), this seems a reasonable score for M1. This score is close to Ryzen 7 4800HS or Core-i7 10875H that are relatively high-end mobile processors (not the highest, though) and, by itself, it is not amazing yet. However, if the power necessary to do the calculation is expected to be 1/3 or 1/4, this suddenly becomes the king in performance/watt. Probably the closest competition is Ryzen 4800U which already has almost insane perf/watt with a score of ~6000 and 15W TDP. M1 well outperform this, and it sounds more insane. These numbers matter when you increase the core count further for more powerful desktops, so I want proper measurements to be done.

For scientific computing, this computer should be pretty good hardware. However, the software may not be ARM native at the beginning, and your favorite one may not even work, so it might be wiser to buy after making sure what you want to use works on these machines. These computers can replace pretty much every computer but high-end desktops, so if you want to do the very extensive computation, probably it’s better to wait until iMacs or Mac Pros that are coming later.

For me, if the computer can do neural network training using Neural Engine cores with Tensorflow or PyTorch, I’d think about buying one for the World Making project. Even in that case, probably waiting for a more powerful iMac might be better. At least I’d like to see what happens over the next few weeks. This is a very exciting year for computing.