When is GPU better than CPU for Deep Learning on Apple Silicon?

Apple Silicon has brought about significant improvements in performance and energy efficiency, making it a viable choice for deep learning tasks. However, choosing between a GPU and a CPU can still be confusing, especially when it comes to choosing the right one for the task at hand. In this post, we’ll explore the conditions under which GPUs outperform CPUs on Apple Silicon when using deep learning libraries like TensorFlow and PyTorch.

To determine when to use a GPU over a CPU on Apple Silicon for deep learning tasks, I conducted several experiments using TensorFlow and PyTorch libraries. I used a base-model MacBook Air (M1, 8GB RAM, 7-core GPU) with a thermal pad mod to eliminate any potential throttling.

What does the running time depend on?

When comparing the performance of GPUs and CPUs for deep learning tasks, several factors come into play. Generally speaking, GPUs can perform computations faster than CPUs when there is a high amount of parallelism in the computations, and when there is little need for accessing memory. When working with a simple feed-forward network, there are several factors to consider:

  • Number of layers
  • Number of hidden layer neurons
  • Batch size
  • Backend (TensorFlow or pyTorch)

While exploring different combinations of these numbers, I found that the computation time is mostly proportional to the number of layers, and it won’t contribute to changing the ratio of the speed between CPUs and GPUs. So, I fixed it to 4 layers for this exercise.

Now, let’s focus on the computations per layer. After tinkering with these numbers, I realized that the computation time mostly depends on the total amount of computation per layer. The total amount is determined by the number of hidden layer neurons and the batch size. More precisely, it can be calculated using this equation:

tflops = (2 * n_hidden * n_hidden + n_hidden) * batch_size / 1e12

This equation assumes a dense connection without batch normalization and uses the ReLU activation function, which has a small overhead. To test the performance in different scenarios, I varied the batch size from 16 to 1024 and the number of hidden neurons from 128 to 2048.

TensorFlow results

TensorFlow result

The results from my tests using TensorFlow showed that performance predictably improved with larger per-layer computations. While batch size isn’t explicit in the figure, it can be inferred from the values of n_hidden and tflops_per_layer. As long as tflops_per_layer is the same, different combinations of batch size and n_hidden perform similarly (except for cases with large networks, i.e., n_hidden 1024 or 2048, where memory allocation may have impacted the results). However, in all of the test cases, the performance of GPUs never exceeded that of CPUs.

PyTorch results

PyTorch Results

The pattern of the results from using PyTorch is more irregular than that of TensorFlow, especially for the CPU. Generally speaking, PyTorch was faster than TensorFlow, except for some rare cases (n_hidden: 1024 or 2048, batch size is relatively small). Notably, the performance of the CPU at small per-layer computations and the performance of the GPU at large per-layer computations were both incredible. It was great to see the CPU achieve good performance (~700 Gflops) across a broad range of configurations, while the GPU exceeded that performance and reached ~1.3 Tflops. The theoretical performance of the GPU is ~2 Tflops, so it is getting close to that.

Conclusion

In this simple exercise, I demonstrated following things

  • The matrix calculation performance improves as a function of the amount of computation per layer.
  • In my setup, PyTorch is faster than TensorFlow in general.
  • In PyTorch, GPU speed exceed CPU speed at around ~100Mflops per layer.

Of course, these things will depend on multiple factors such as software version, CPU performance, or number of GPU cores, so I suggest running a test on your own. On the base M1 chip, the domain where the GPU outperforms the CPU was rather limited, but this might be very different on other chips that has a larger GPU core counts.

Software versions

TensorFlow environment:
Python: 3.9.5
TensorFlow: 2.6.0

PyTorch envisonment:
Python: 3.8.13
PyTorch: 1.13.0.dev20220709

Simple 34-key layout for happy python coding

My 34-key keyboard, Ferris Sweep

A keyboard is a vital input device for computers, impacting work efficiency and even quality of life. While we often talk about ergonomics in terms of posture and the physical design of keyboards, the layout of the keys is also important. Unfortunately, most traditional keyboards come with a standard layout that hasn’t changed much in decades, leaving little room for customization. But things are starting to change.

Customizable keyboards like the Ergodox or Moonlander by ZSA Technology Labs are becoming more popular, and many people are even designing their own custom keyboards from scratch. With this increased flexibility, there’s a real opportunity to create a keyboard layout that works for you. That’s where this article comes in.

I’ve designed a simple layout for a 34-key keyboard that’s perfect for python-writing computational neuroscientists who use NeoVim in VSCode as their main editor on Macs. Of course, this layout might not work for everyone, but it should give you some good ideas for how to customize your own keyboard. And I’d like to acknowledge that this layout is standing on the shoulders of other giants (introduced to you later!) – I’ve borrowed some of the best ideas from existing layouts and added a few tweaks of my own. So if you’re tired of the same old keyboard layout and want to improve your typing experience, read on!

Why 34-key split keyboard?

Why use a 34-key split keyboard? Well, for starters, there are two main advantages: ergonomics and portability. Let’s dive into each of these.

When it comes to ergonomics, traditional keyboards have too many keys, and many of them are located in hard-to-reach positions. This can make typing uncomfortable and even lead to injury over time. However, by using more layers, you can bring all these keys back to comfortable positions and eliminate the need for the majority of keys found on a traditional keyboard. The custom keyboard community has moved from minor variations of the traditional design to more radically reduced layouts. If you find it uncomfortable to reach more than one key’s distance on a keyboard or use your pinky a lot, a 40% keyboard layout may be right for you. These layouts are often vertically staggered to fit the lengths of your fingers, allowing you to type more naturally and with less strain.

In addition to improved ergonomics, small split keyboards are also highly portable. With fewer keys, they can be made much smaller and lighter than traditional keyboards. For example, my 34-key split keyboard fits within 11 x 9 x 2 cm3 and weighs only 70 g for each side. Even two sides combined, that’s still lighter than an iPhone 13 mini! You can even make them wired or wireless for added convenience. With a wireless setup, you can use your keyboard anywhere without having to worry about cables.

The layout

Here is my layout. I used ZMK, and this layout will not be possible with QMK because it uses shifted symbols for tap-mods. (I started from ZMK, and didn’t think too much about compatibility with QMK, which I should have. But at the same time, I feel that the layout should not be restricted by the limitation of the firmware.)

My layout

This layout is heavily inspired by Pnohty by rayduck (https://github.com/rayduck/pnohty) (most of the symbol keys are the same as Pnohty), and Ben Vallack’s video (https://www.youtube.com/watch?v=8wZ8FRwOzhU) for the ideas of transitioning the layout. Overall, this should be understood as just a slight modification of the Pnohty layout. Please refer to the original write-up by rayduck to get good ideas implemented in Pnohty. Specific issues that I had with original Pnohty are 1) Modifier keys (all done as combos in Pnohty) were difficult to use, especially when modified inputs need to be repeated, 2) some chording (simultaneous key press) was hard for me (particularly, using the navigation).

For the issue 1, I went back to the home row mods (https://precondition.github.io/home-row-mods) instead of using combos. For the issue 2, I changed the navigation layer to be modal.

Packing everything you want into 34-key is not easy, but it is certainly possible using layers. The concept of the layer is already used in a traditional keyboard as a shift-key. It changes momentarily what you can type with each key, alphabets are capitalized, numbers turn symbols, and symbols turn other symbols. In a modern keyboard firmware, you can design your own layers to type anything with fewer keys.

I have 4 main layers: ‘base (alphabets)’, ‘symbols’, ‘numbers’, and ‘navigation’. The alphabet layer is the default one, and you don’t have to do anything special to be in this layer. symbols layer include most of the symbols that you need for normal programming. The number layer contains numbers and basic arithmetics. periods are also repeated in this layer for easy access to a decimal point.

Base Layer

For my base layer, I chose Colemak Mod-DH. I find this layout to be a good compromise between comfort and familiarity with the traditional QWERTY layout. It is also well-regarded by many users as an alternative to QWERTY.

Colemak Mod-DH is a modification of the original Colemak layout, with the positions of the D and H keys moved to the bottom row for greater comfort. If you prioritize the location of common keyboard shortcuts like “copy” and “paste” over optimal hand positioning, you may prefer to stick with the original Colemak layout.

Learning a new keyboard layout takes time and practice, but the benefits can be substantial. It generally takes a few weeks to a month to become comfortable with a new layout. During this time, you will likely start with a slow typing speed, gradually improving over the course of a week. With continued practice, you should be able to reach about 80% of your original typing speed after a month.

I recommend not completely abandoning your old layout or keyboard during this transition period, especially if you rely on typing for work. However, if you type frequently, the effort to retrain yourself with a new layout can be well worth it in the long run.

Symbol Layer

This layer contains symbols that are not used for arithmetic. To activate this layer, simply hold down the “layer 1” button located at the left thumb. Once you release the button, you’ll be back to the base layer. The symbol layer includes various commonly used symbols, such as parentheses and brackets. In Python, “[]” and “()” are used quite often, so they are placed in comfortable positions for easy access using the middle and ring fingers.

The location of the symbols on this layer was carefully considered, taking into account bigrams (two-key sequences) in Python. Additionally, I’ve added the home row mods as tap-mods to make it easier to access them.

For more information and ideas behind the choice of symbol positions, you can refer to the original Pnohty write-up by rayduck.

Number Layer

The Number Layer contains numbers and symbols used for arithmetic. To activate this layer, simply press the key located under your right thumb, which is symmetric to the key used for the Symbol Layer. Some symbols such as ‘*’ are duplicated in this layer, even though they appear in the Symbol Layer.

What’s interesting about this layer is the order in which the numbers are arranged. Unlike a typical numpad, the numbers 1, 2, and 3 are located in the center, while 4, 5, and 6 are at the top, and 7, 8, and 9 are at the bottom. This is because you type “123” more often than other numbers. Additionally, 0 has been assigned a special location near the thumb for easy access. I also added an extra spacebar in this layer since you often have to insert space when you are writing equations.

Navigation layer

The Navigation layer is a special layer in my layout that works differently than the Symbol and Number layers. While the Symbol and Number layers are activated momentarily with a key press, the Navigation layer is activated with a toggle key, similar to the Caps Lock key. Once activated, you stay in the Navigation layer until you press the “go back to base (layer 0)” button.

The Navigation layer is designed to make basic text editing operations easier. It includes features for navigating text, selecting, cutting, copying, pasting, deleting, and typing enter or space within the layer. Since these operations can be awkward if you have to keep pressing another key, making the Navigation layer a modal layer makes sense.

One potential issue with modal layers is forgetting to switch back to the original layer, which can result in pressing unwanted keys. To mitigate this issue, I turned the home rows of the left hand into pure modifiers, although this does not eliminate the problem completely.

Other layers

The other layers are not essential functions (bluetooth related, and left hand arrow keys). If you are curious, you can go to my repository and check them. (https://github.com/shixnya/zmk-config)

Concluding remarks

While ergonomic keyboards are attractive, designing a 34-key layout that fulfills all of your needs is not a trivial task. However, I hope that this article has provided some guidance and inspiration for those who are starting the journey of finding the optimal key layout for their needs. Don’t be afraid to experiment and tweak your layout as you go along, and remember that what works for someone else may not necessarily work for you. Ultimately, the goal is to create a personalized keyboard that maximizes your comfort, productivity, and overall typing experience. Good luck on your keyboard journey!