September 24th, 2019 at 3:50 AM
If you aspire to do anything half-decent in programming, do yourself a favour and learn either C or Assembly properly. Not only will you learn about all the low-level intricacies of memory management, you'll also learn how to reinvent the wheel when you need to.
I've been learning assembly for quite some time now.
It's not a difficult language. Actually, it's insanely primitive. The difficulty stems from making something useful out of it, but generally, that's not a major issue if you have access to system calls or externally linked libraries.
So, while practicing my assembly skills, I tend to find simple challenged that could be a few small lines in C or whatever, and then I try to port them over to ASM.
I then remembered an 'interview' question I heard a while back to filter out all the braindeads that were applying to be some Malaysian dude's programmer.
The question is simple, and an astounding number of people couldn't solve it. Out of literally hundreds of applicants, only one guy was able to solve it. Sure, it says a lot about the quality of code overseas, but have a look at the question:
That's it. No catch. Nothing weird. No need to try and find the primes in that list or anything.
Simple stuff.
So, back to my story. I decided to use this problem for my assembly challenges. At the same time, I figured it would be cool to have a quick test to see if ASM was significantly faster than C/C++, so I made a few programs to demonstrate, and compiled them with NASM and GCC respectively (with enough compiler flags for optimization, which I'll get into as well.
The Compiler.
Here are the sources I used. Pretty simple. No weird bitfuckery going on here.
C - https://pastebin.com/P8ZNwwSC
C++ - https://pastebin.com/zd0DmKfW
(N)ASM - https://pastebin.com/Mg9VT38s
Have a look at the difference in source code length. C/C++ are almost identical aside from how they handle output.
All these programs have the exact same output. Starts at 700, and prints every decrement of 13 until it dips below 200 (doesn't print 200, stops at 206.)
To compile the C samples, I used GCC/G++ to make my life easier.
To compile the ASM sample, I used NASM and then linked with ld.
Since I specified the main function as _start as a global for the ASM source, ld picked up on the default entry location. No need to f*** around with that.
Now, the first part of the tests: file size.
In each case, I used whatever compiler optimizations I knew to make it fast and have smaller file sizes. -s is the most notable, which basically strips comments and metadata during compilation. Most notably, in the C++ test with G++, I initially compiled without -s and the file size was 17.2 KB or so.
After everything was set and done, I ran a quick ls -l to get the actual byte sizes.
The number beside the 'roots' in each line is the raw byte size of each file.
For the source files, you can see that C/C++ are almost identical, but the ASM source is 5x/6x larger.
But what's interesting is when we get to the binaries.
I first thought that 8.5KB was pretty big for a file I wrote in ASM. I mean, shouldn't it at least be smaller than the source once it's all condensed into bytecode?
And sure enough, I was right, but I forgot to factor in one thing: ELF format file headers.
ELF is an executable format pretty much made for linux and microcontrollers, and has lots and lots of header data. Why it can't be condensed is beyond me, but the easiest way to figure out the actual size of the code is to launch readelf:
For this, you need to know a bit about assembly, but the important thing you need to know is that the actual code/logic is stored in the .text segment.
And how big is that segment?
Have a look at the size column (admittedly it's easier to read in a terminal.)
Eighty Four Bytes.
EIGHTY FOUR BYTES.
Oh wait, that's in hexadecimal. Convert it to decimal and you get 132 bytes.
Almost smaller than the raw C source code.
(.data and .bss segments are important as well for this program, but still, adding them all up is less than 100b.)
So the file itself is only large because of the headers (and a pretty far-back entry point.) I have yet to figure out if this is condensable, if at all. Could be a fun project sometime.
Back to the story.
Vulgar Display of SPEED.
This is where things get interesting.
If you weren't aware, you can use the 'time' command to get the execution time of a program or command. Use it as a prefix to whatever command, just as you would sudo and at the bottom, it'll always display execution time (assuming it's not a cronjob or something.)
time outputs three values: real, sys and user. sys and user refer to the time spent in either user mode or kernel mode, and we don't really care about them. The real time is the time between pressing enter and the return code of the application, or in other words, the actual execution time.
So, starting with the Assembly code as the last challenge's winner:
Whew, a single millisecond. Cool!
Note that because the ASM sample is written using syscalls exclusively for output, it doesn't spend any notable time in user mode and does everything at a kernel level.
Now it's time to check the others which are now lagging a bit behind:
Ouch! the C++ code ran in 5 milliseconds. Sure, it's not a noticeable difference, but that's mainly because this is a simple problem that only involves one loop. If you're searching a 100000 item array using binary sort, you're going to wish your code runs five times faster.
Now, for the last one: C.
Before I go on here, I'd like to first raise the common claim that "modern C++ is just as fast as C." You've probably heard it from that idiotic software engineering or computer science undergrad who just doesn't like not being able to import a library that does all the work for him.
Onto the test:
Okay, not a huge increase in performance, I'll admit. But again, when you consider scalability, that's still a 20% increase in speed.
And don't forget: I just used whatever compiler optimizations I knew of. There are way more out there.
Conclusion
So, the king of filesize is...
ASM
And the king of runtime speed is...
ASM
But were we really surprised?
Why does it matter?
When you do anything at a relatively low level (ie. GPIO pins on Arduino, a Pi, or any other MCU/microprocessor,) you're not only limited by the amount of space you have to store a program, but you also want that program to run fast enough that it doesn't lag any other operations that might be going on. Say you're using interrupts instead of a loop function to save power in arduino (good for you!), then you want those interrupts to respond to the exact nanosecond.
Of course, it always depends on use-case.
But from a variety of posts I've read online, although there's lots of skewed information, many users report up to 70% increase in speed (almost DOUBLE) from writing their code in C or ASM for the digitalWrite arduino function. For other methods and functions, there's likely much more performance to be gained.
So optimize your code. Just because your PC has 16GB of RAM doesn't mean you need to use all of it constantly.
I've been learning assembly for quite some time now.
It's not a difficult language. Actually, it's insanely primitive. The difficulty stems from making something useful out of it, but generally, that's not a major issue if you have access to system calls or externally linked libraries.
So, while practicing my assembly skills, I tend to find simple challenged that could be a few small lines in C or whatever, and then I try to port them over to ASM.
I then remembered an 'interview' question I heard a while back to filter out all the braindeads that were applying to be some Malaysian dude's programmer.
The question is simple, and an astounding number of people couldn't solve it. Out of literally hundreds of applicants, only one guy was able to solve it. Sure, it says a lot about the quality of code overseas, but have a look at the question:
Quote:Count down from 700 by 13 until you reach 200 (DO NOT GO BELOW 200!)
That's it. No catch. Nothing weird. No need to try and find the primes in that list or anything.
Simple stuff.
So, back to my story. I decided to use this problem for my assembly challenges. At the same time, I figured it would be cool to have a quick test to see if ASM was significantly faster than C/C++, so I made a few programs to demonstrate, and compiled them with NASM and GCC respectively (with enough compiler flags for optimization, which I'll get into as well.
The Compiler.
Here are the sources I used. Pretty simple. No weird bitfuckery going on here.
C - https://pastebin.com/P8ZNwwSC
C++ - https://pastebin.com/zd0DmKfW
(N)ASM - https://pastebin.com/Mg9VT38s
Have a look at the difference in source code length. C/C++ are almost identical aside from how they handle output.
All these programs have the exact same output. Starts at 700, and prints every decrement of 13 until it dips below 200 (doesn't print 200, stops at 206.)
To compile the C samples, I used GCC/G++ to make my life easier.
Code:
gcc CTest.c -o CTest -Wall -s -O2
g++ CPPTest.cpp -o CPPTest -Wall -s -O2
To compile the ASM sample, I used NASM and then linked with ld.
Code:
nasm -o ASMTest.o -f elf64 ASMTest.asm
ld -m elf_x86_64 -s -o ASMTest ASMTest.o
Since I specified the main function as _start as a global for the ASM source, ld picked up on the default entry location. No need to f*** around with that.
Now, the first part of the tests: file size.
In each case, I used whatever compiler optimizations I knew to make it fast and have smaller file sizes. -s is the most notable, which basically strips comments and metadata during compilation. Most notably, in the C++ test with G++, I initially compiled without -s and the file size was 17.2 KB or so.
After everything was set and done, I ran a quick ls -l to get the actual byte sizes.
Code:
RUNNER ~/ASMTests # ls -l
total 60
-rwxr-xr-x 1 root root 8552 Sep 23 18:48 ASMTest
-rw-r--r-- 1 root root 666 Sep 19 10:27 ASMTest.asm
-rw-r--r-- 1 root root 1328 Sep 23 18:47 ASMTest.o
-rwxr-xr-x 1 root root 14384 Sep 23 18:53 CPPTest
-rw-r--r-- 1 root root 131 Sep 23 18:52 CPPTest.cpp
-rwxr-xr-x 1 root root 14336 Sep 23 18:57 CTest
-rw-r--r-- 1 root root 104 Sep 23 18:56 CTest.c
RUNNER ~/ASMTests #
The number beside the 'roots' in each line is the raw byte size of each file.
For the source files, you can see that C/C++ are almost identical, but the ASM source is 5x/6x larger.
But what's interesting is when we get to the binaries.
I first thought that 8.5KB was pretty big for a file I wrote in ASM. I mean, shouldn't it at least be smaller than the source once it's all condensed into bytecode?
And sure enough, I was right, but I forgot to factor in one thing: ELF format file headers.
ELF is an executable format pretty much made for linux and microcontrollers, and has lots and lots of header data. Why it can't be condensed is beyond me, but the easiest way to figure out the actual size of the code is to launch readelf:
Code:
RUNNER ~/ASMTests # readelf -S ASMTest -W
There are 5 section headers, starting at offset 0x2028:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000401000 001000 000084 00 AX 0 0 16
[ 2] .data PROGBITS 0000000000402000 002000 000005 00 WA 0 0 4
[ 3] .bss NOBITS 0000000000402008 002005 000008 00 WA 0 0 4
[ 4] .shstrtab STRTAB 0000000000000000 002005 00001c 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
RUNNER ~/ASMTests #
For this, you need to know a bit about assembly, but the important thing you need to know is that the actual code/logic is stored in the .text segment.
And how big is that segment?
Have a look at the size column (admittedly it's easier to read in a terminal.)
Eighty Four Bytes.
EIGHTY FOUR BYTES.
Oh wait, that's in hexadecimal. Convert it to decimal and you get 132 bytes.
Almost smaller than the raw C source code.
(.data and .bss segments are important as well for this program, but still, adding them all up is less than 100b.)
So the file itself is only large because of the headers (and a pretty far-back entry point.) I have yet to figure out if this is condensable, if at all. Could be a fun project sometime.
Back to the story.
Vulgar Display of SPEED.
This is where things get interesting.
If you weren't aware, you can use the 'time' command to get the execution time of a program or command. Use it as a prefix to whatever command, just as you would sudo and at the bottom, it'll always display execution time (assuming it's not a cronjob or something.)
time outputs three values: real, sys and user. sys and user refer to the time spent in either user mode or kernel mode, and we don't really care about them. The real time is the time between pressing enter and the return code of the application, or in other words, the actual execution time.
So, starting with the Assembly code as the last challenge's winner:
Code:
RUNNER ~/ASMTests # time ./ASMTest
700
[...snipped...]
206
real 0m0.001s
user 0m0.000s
sys 0m0.001s
RUNNER ~/ASMTests #
Whew, a single millisecond. Cool!
Note that because the ASM sample is written using syscalls exclusively for output, it doesn't spend any notable time in user mode and does everything at a kernel level.
Now it's time to check the others which are now lagging a bit behind:
Code:
RUNNER ~/ASMTests # time ./CPPTest
700
[...snip...]
206
real 0m0.005s
user 0m0.004s
sys 0m0.002s
RUNNER ~/ASMTests #
Ouch! the C++ code ran in 5 milliseconds. Sure, it's not a noticeable difference, but that's mainly because this is a simple problem that only involves one loop. If you're searching a 100000 item array using binary sort, you're going to wish your code runs five times faster.
Now, for the last one: C.
Before I go on here, I'd like to first raise the common claim that "modern C++ is just as fast as C." You've probably heard it from that idiotic software engineering or computer science undergrad who just doesn't like not being able to import a library that does all the work for him.
Onto the test:
Code:
RUNNER ~/ASMTests # time ./CTest
700
[...snip...]
206
real 0m0.004s
user 0m0.001s
sys 0m0.003s
RUNNER ~/ASMTests #
Okay, not a huge increase in performance, I'll admit. But again, when you consider scalability, that's still a 20% increase in speed.
And don't forget: I just used whatever compiler optimizations I knew of. There are way more out there.
Conclusion
So, the king of filesize is...
ASM
And the king of runtime speed is...
ASM
But were we really surprised?
Why does it matter?
When you do anything at a relatively low level (ie. GPIO pins on Arduino, a Pi, or any other MCU/microprocessor,) you're not only limited by the amount of space you have to store a program, but you also want that program to run fast enough that it doesn't lag any other operations that might be going on. Say you're using interrupts instead of a loop function to save power in arduino (good for you!), then you want those interrupts to respond to the exact nanosecond.
Of course, it always depends on use-case.
But from a variety of posts I've read online, although there's lots of skewed information, many users report up to 70% increase in speed (almost DOUBLE) from writing their code in C or ASM for the digitalWrite arduino function. For other methods and functions, there's likely much more performance to be gained.
So optimize your code. Just because your PC has 16GB of RAM doesn't mean you need to use all of it constantly.