VS Code Neovim (Vimが好きな人用): Vimをメインエディタにしている人におすすめです。Vimプラグインもあるのですが、そちらはVimの機能を完全に再現しているわけではなく、一部の機能しかありません。一方こちらのNeoVimプラグインはシステムにインストールしてあるNeoVimをVSCode内部で直接動かすため、NeoVimの全ての機能が使えます。
Remote development extension pack (リモートのマシンを動かしたい場合): 現状私がVS Codeを使う最も大きな理由がこれです。リモートのマシンにssh接続するとそこに自動でサーバ用ソフトをインストールし、ローカルのVS Codeで操作できるようにしてくれます。離れた場所にある計算機にアクセスしたい時にも便利だし、MacでできないことをMacのインターフェイスでやりたい時(Linuxに載ってるNvidiaのGPUを使いたい等)にも便利です。どちらにしてもファイルをローカルで直接いじっている感覚で使えます。
その他のエディタの設定
フォーマッタ: Black コードのフォーマットを綺麗にしておくことはメンテナンスのためにとても良いことですが、フォーマットのことまで考えながらコーディングするのは面倒ですし、書いちゃったらもう直すのは面倒です。ですが、Blackを使うとなんと自動的にコードをフォーマットしてくれます。最高です。Blackを使うことによる様々な利点については、こちらのブログを参照していただきたく存じます。VS Code側で”Editor: Format On Save”をチェックしておくと、保存する度に自動的にフォーマットしてくれます。
Apple Silicon has brought about significant improvements in performance and energy efficiency, making it a viable choice for deep learning tasks. However, choosing between a GPU and a CPU can still be confusing, especially when it comes to choosing the right one for the task at hand. In this post, we’ll explore the conditions under which GPUs outperform CPUs on Apple Silicon when using deep learning libraries like TensorFlow and PyTorch.
To determine when to use a GPU over a CPU on Apple Silicon for deep learning tasks, I conducted several experiments using TensorFlow and PyTorch libraries. I used a base-model MacBook Air (M1, 8GB RAM, 7-core GPU) with a thermal pad mod to eliminate any potential throttling.
What does the running time depend on?
When comparing the performance of GPUs and CPUs for deep learning tasks, several factors come into play. Generally speaking, GPUs can perform computations faster than CPUs when there is a high amount of parallelism in the computations, and when there is little need for accessing memory. When working with a simple feed-forward network, there are several factors to consider:
Number of layers
Number of hidden layer neurons
Batch size
Backend (TensorFlow or pyTorch)
While exploring different combinations of these numbers, I found that the computation time is mostly proportional to the number of layers, and it won’t contribute to changing the ratio of the speed between CPUs and GPUs. So, I fixed it to 4 layers for this exercise.
Now, let’s focus on the computations per layer. After tinkering with these numbers, I realized that the computation time mostly depends on the total amount of computation per layer. The total amount is determined by the number of hidden layer neurons and the batch size. More precisely, it can be calculated using this equation:
This equation assumes a dense connection without batch normalization and uses the ReLU activation function, which has a small overhead. To test the performance in different scenarios, I varied the batch size from 16 to 1024 and the number of hidden neurons from 128 to 2048.
TensorFlow results
The results from my tests using TensorFlow showed that performance predictably improved with larger per-layer computations. While batch size isn’t explicit in the figure, it can be inferred from the values of n_hidden and tflops_per_layer. As long as tflops_per_layer is the same, different combinations of batch size and n_hidden perform similarly (except for cases with large networks, i.e., n_hidden 1024 or 2048, where memory allocation may have impacted the results). However, in all of the test cases, the performance of GPUs never exceeded that of CPUs.
PyTorch results
The pattern of the results from using PyTorch is more irregular than that of TensorFlow, especially for the CPU. Generally speaking, PyTorch was faster than TensorFlow, except for some rare cases (n_hidden: 1024 or 2048, batch size is relatively small). Notably, the performance of the CPU at small per-layer computations and the performance of the GPU at large per-layer computations were both incredible. It was great to see the CPU achieve good performance (~700 Gflops) across a broad range of configurations, while the GPU exceeded that performance and reached ~1.3 Tflops. The theoretical performance of the GPU is ~2 Tflops, so it is getting close to that.
Conclusion
In this simple exercise, I demonstrated following things
The matrix calculation performance improves as a function of the amount of computation per layer.
In my setup, PyTorch is faster than TensorFlow in general.
In PyTorch, GPU speed exceed CPU speed at around ~100Mflops per layer.
Of course, these things will depend on multiple factors such as software version, CPU performance, or number of GPU cores, so I suggest running a test on your own. On the base M1 chip, the domain where the GPU outperforms the CPU was rather limited, but this might be very different on other chips that has a larger GPU core counts.
I switched from Matlab to Python for data analysis. It was a little tricky to set up an environment that I like. Here I provide a list of important elements in my environment with a short description for each.
Machine: An Intel Mac
Why Mac? Because the control and the command keys are separated. This is important for getting consistency of the UI feeling, especially if you are using Vim key bind in VSCode. For example, Ctrl-V is pagedown in Vim, and it can be separated from pasting (Command-V) on Macs, not Windows or Linux (in which you’d have to give up either).
Why an Intel processor? Because some of the numerical libraries still don’t work with Apple Silicon Macs. It doesn’t feel nice if you always have to suspect that your processor could be the reason for an error when you encounter one. If you exclusively work on remote server, and don’t run any computation locally (or do with libraries that work for sure), you can use Apple Silicon Macs. This recommendation will change in near future because Apple chips are great, and the community is making great progress in making things compatible.
I personally don’t feel that Apple’s keyboards are great. If you spend a lot of time typing, I’d recommend getting a keyboard you are comfortable with. I like Topre RealForce, but any decent keyboard would be fine. You can use Karabiner-Elements if you want to use a PC keyboard with rebinding.
Editor: Visual Studio Code (VS Code)
VS Code is an open source (but see this) editor from Microsoft. It is a versatile text editor that is compatible (with extensions) with many languages including Python.
A remarkable feature of VSCode is its remote capability. This feature installs a copy of VSCode on the remote machine, and run it as a server. You can edit files with the local VSCode with a feeling of editing local files, but saving to/running on the remote computer. This is a unique feature of VSCode that distinguish it from other editors. Below, I explain detailed settings of the editor in my environment.
Extensions
Python plugin: This is VSCode’s basic Python support. Why not.
Pylance: This is an extension that provides advanced code-completion and linting. Good to have.
Jupyter: Most likely, you will be using Jupyter notebooks if you do a collaborative data science project. I’m not a fan of using notebooks because lengthy outputs of cells becomes distractor of code navigation. I think Matlab’s cell style works better for doing the task. in VSCode, there is an interactive mode that resolves the issue (which also uses the Jupyter plugin).
VSCode Neovim (optional): This is nice if you come from Vim or like Vim keybindings. NeoVim plugin is preferred over Vim plugin. Vim plugin is an effort of making keybinding compatible with Vim, but Neovim Plugin runs actual Neovim under the hood. This enables some of the actions that don’t work with Vim plugin (e.g. the behaviors of ‘.’ and ‘undo/redo’ are a little strange in Vim plugin). It is important to install Neovim >= 0.5.0. If you are using homebrew for package management, run with “brew install –HEAD neovim”.
Remote development extension pack (if you work with a server): This is one of the main reasons that I use VSCode as noted above. It allows you to install server and client of VSCode, and you can use local client (UI is local, so it’s responsive) while directly working with remote files and executables. This is also useful when you want to separate your client user interface and server environment. Particularly, this resolves a dilemma of wanting to use Mac interface and do deep learning with Nvidia GPUs. You can easily setup a linux computer with a good GPU and remotely connect with your Mac, with a feeling of working directly on that computer.
Other editor settings
Formatter: Black Code formatting is important for keeping the code legible. Black automatically format the code for you. The same logic, the same result. You may have slightly different preference from it’s setting, but the point here is consistency. If you and your colleagues use Black, you don’t have to parse your colleagues’ code to identify their coding style or feel guilty by mixing in your style. I’ll just go with Black’s default setting for all. Please read this nice article if you are not convinced yet. Using black can be done by setting “Python > Formatting: Provider” to “black”. I also recommend keeping “Editor: Format On Save” checked in setting.
Color theme: Monokai If you use Pylance, you get a special mode of syntax highlighting called “semantic highlighting”. It parses the code a little deeper than a typical syntax highlighter and understand distinctions between classes, methods, local variables, function arguments, etc. However, few color themes can distinguish all of these elements. In my opinion, Monokai does the distinction the best. It highlights using different colors, underscore, italic fonts to keep all of the elements separated, while not making too confusing.
Misc settings: see attachment I also change a bunch of things in the setting such as font size, ruler settings, etc. I attached my settings.json to this article. Please check it out if you feel something is not working correctly.
Debugging: keep “justMyCode: False” The key is to set up nice debugging configuration (launch.json) file. Please see this official guide for how to set it up. It’s definitely worth knowing. Often errors you encounter has something to do with your inputs to library functions. The default setting for the VSCode debugger does not in such a case. By changing “justMyCode: False” in your launch.json, your code will stop in such a situation while debugging.
A shortcut for interactive window: “Ctrl + .” I like having an option to send just one line of the code to an interactive window to see the results. You can do it by setting up this shortcut. In keyboard shortcut setting, search “Jupyter: Run Selection/Line in Interactive Window” and put in this keybinding.
Profiling: cProfile + pyprof2calltree (+ QCacheGrind) This is where I’m still struggling. You can see this nice blog post to see how it works. However, in my opinion, this is still less useful than Matlab’s line profiler with GUI. I’d like to know if there is something better than this.
Summary
I quickly covered quite a few things about my coding environment. Unlike PyCharm, which defaults are quite functional, you need a little more setup to use VSCode effectively for data science in python. However, once you know how to do it and set them up patiently, it will become a very versatile environment. You won’t regret. You may not like everything I use, but hopefully you can adopt one or two things above in your environment.