Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5

Optimization and You

#1
If you aspire to do anything half-decent in programming, do yourself a favour and learn either C or Assembly properly. Not only will you learn about all the low-level intricacies of memory management, you'll also learn how to reinvent the wheel when you need to.

I've been learning assembly for quite some time now.

It's not a difficult language. Actually, it's insanely primitive. The difficulty stems from making something useful out of it, but generally, that's not a major issue if you have access to system calls or externally linked libraries.

So, while practicing my assembly skills, I tend to find simple challenged that could be a few small lines in C or whatever, and then I try to port them over to ASM.

I then remembered an 'interview' question I heard a while back to filter out all the braindeads that were applying to be some Malaysian dude's programmer.

The question is simple, and an astounding number of people couldn't solve it. Out of literally hundreds of applicants, only one guy was able to solve it. Sure, it says a lot about the quality of code overseas, but have a look at the question:

Quote:Count down from 700 by 13 until you reach 200 (DO NOT GO BELOW 200!)

That's it. No catch. Nothing weird. No need to try and find the primes in that list or anything.

Simple stuff.

So, back to my story. I decided to use this problem for my assembly challenges. At the same time, I figured it would be cool to have a quick test to see if ASM was significantly faster than C/C++, so I made a few programs to demonstrate, and compiled them with NASM and GCC respectively (with enough compiler flags for optimization, which I'll get into as well.



The Compiler.

Here are the sources I used. Pretty simple. No weird bitfuckery going on here.
C - https://pastebin.com/P8ZNwwSC
C++ - https://pastebin.com/zd0DmKfW
(N)ASM - https://pastebin.com/Mg9VT38s

Have a look at the difference in source code length. C/C++ are almost identical aside from how they handle output.
All these programs have the exact same output. Starts at 700, and prints every decrement of 13 until it dips below 200 (doesn't print 200, stops at 206.)

To compile the C samples, I used GCC/G++ to make my life easier.
Code:
gcc CTest.c -o CTest -Wall -s -O2
g++ CPPTest.cpp -o CPPTest -Wall -s -O2

To compile the ASM sample, I used NASM and then linked with ld.
Code:
nasm -o ASMTest.o -f elf64 ASMTest.asm
ld -m elf_x86_64 -s -o ASMTest ASMTest.o

Since I specified the main function as _start as a global for the ASM source, ld picked up on the default entry location. No need to f*** around with that.

Now, the first part of the tests: file size.
In each case, I used whatever compiler optimizations I knew to make it fast and have smaller file sizes. -s is the most notable, which basically strips comments and metadata during compilation. Most notably, in the C++ test with G++, I initially compiled without -s and the file size was 17.2 KB or so.

After everything was set and done, I ran a quick ls -l to get the actual byte sizes.

Code:
RUNNER ~/ASMTests # ls -l
total 60
-rwxr-xr-x 1 root root  8552 Sep 23 18:48 ASMTest
-rw-r--r-- 1 root root   666 Sep 19 10:27 ASMTest.asm
-rw-r--r-- 1 root root  1328 Sep 23 18:47 ASMTest.o
-rwxr-xr-x 1 root root 14384 Sep 23 18:53 CPPTest
-rw-r--r-- 1 root root   131 Sep 23 18:52 CPPTest.cpp
-rwxr-xr-x 1 root root 14336 Sep 23 18:57 CTest
-rw-r--r-- 1 root root   104 Sep 23 18:56 CTest.c
RUNNER ~/ASMTests #


The number beside the 'roots' in each line is the raw byte size of each file.
For the source files, you can see that C/C++ are almost identical, but the ASM source is 5x/6x larger.

But what's interesting is when we get to the binaries.

I first thought that 8.5KB was pretty big for a file I wrote in ASM. I mean, shouldn't it at least be smaller than the source once it's all condensed into bytecode?

And sure enough, I was right, but I forgot to factor in one thing: ELF format file headers.

ELF is an executable format pretty much made for linux and microcontrollers, and has lots and lots of header data. Why it can't be condensed is beyond me, but the easiest way to figure out the actual size of the code is to launch readelf:

Code:
RUNNER ~/ASMTests # readelf -S ASMTest -W
There are 5 section headers, starting at offset 0x2028:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000401000 001000 000084 00  AX  0   0 16
  [ 2] .data             PROGBITS        0000000000402000 002000 000005 00  WA  0   0  4
  [ 3] .bss              NOBITS          0000000000402008 002005 000008 00  WA  0   0  4
  [ 4] .shstrtab         STRTAB          0000000000000000 002005 00001c 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
RUNNER ~/ASMTests #

For this, you need to know a bit about assembly, but the important thing you need to know is that the actual code/logic is stored in the .text segment.

And how big is that segment?
Have a look at the size column (admittedly it's easier to read in a terminal.)

Eighty Four Bytes.
EIGHTY FOUR BYTES.

Oh wait, that's in hexadecimal. Convert it to decimal and you get 132 bytes.

Almost smaller than the raw C source code.
(.data and .bss segments are important as well for this program, but still, adding them all up is less than 100b.)

So the file itself is only large because of the headers (and a pretty far-back entry point.) I have yet to figure out if this is condensable, if at all. Could be a fun project sometime.

Back to the story.



Vulgar Display of SPEED.

This is where things get interesting.

If you weren't aware, you can use the 'time' command to get the execution time of a program or command. Use it as a prefix to whatever command, just as you would sudo and at the bottom, it'll always display execution time (assuming it's not a cronjob or something.)

time outputs three values: real, sys and user. sys and user refer to the time spent in either user mode or kernel mode, and we don't really care about them. The real time is the time between pressing enter and the return code of the application, or in other words, the actual execution time.

So, starting with the Assembly code as the last challenge's winner:

Code:
RUNNER ~/ASMTests # time ./ASMTest
700
[...snipped...]
206

real    0m0.001s
user    0m0.000s
sys 0m0.001s
RUNNER ~/ASMTests #

Whew, a single millisecond. Cool!
Note that because the ASM sample is written using syscalls exclusively for output, it doesn't spend any notable time in user mode and does everything at a kernel level.


Now it's time to check the others which are now lagging a bit behind:

Code:
RUNNER ~/ASMTests # time ./CPPTest
700
[...snip...]
206

real    0m0.005s
user    0m0.004s
sys 0m0.002s
RUNNER ~/ASMTests #



Ouch! the C++ code ran in 5 milliseconds. Sure, it's not a noticeable difference, but that's mainly because this is a simple problem that only involves one loop. If you're searching a 100000 item array using binary sort, you're going to wish your code runs five times faster.

Now, for the last one: C.

Before I go on here, I'd like to first raise the common claim that "modern C++ is just as fast as C." You've probably heard it from that idiotic software engineering or computer science undergrad who just doesn't like not being able to import a library that does all the work for him.

Onto the test:

Code:
RUNNER ~/ASMTests # time ./CTest
700
[...snip...]
206

real    0m0.004s
user    0m0.001s
sys 0m0.003s
RUNNER ~/ASMTests #

Okay, not a huge increase in performance, I'll admit. But again, when you consider scalability, that's still a 20% increase in speed.

And don't forget: I just used whatever compiler optimizations I knew of. There are way more out there.


Conclusion

So, the king of filesize is...
ASM

And the king of runtime speed is...
ASM

But were we really surprised?



Why does it matter?

When you do anything at a relatively low level (ie. GPIO pins on Arduino, a Pi, or any other MCU/microprocessor,) you're not only limited by the amount of space you have to store a program, but you also want that program to run fast enough that it doesn't lag any other operations that might be going on. Say you're using interrupts instead of a loop function to save power in arduino (good for you!), then you want those interrupts to respond to the exact nanosecond.

Of course, it always depends on use-case.

But from a variety of posts I've read online, although there's lots of skewed information, many users report up to 70% increase in speed (almost DOUBLE) from writing their code in C or ASM for the digitalWrite arduino function. For other methods and functions, there's likely much more performance to be gained.

So optimize your code. Just because your PC has 16GB of RAM doesn't mean you need to use all of it constantly.
Reply
#2
You're completely right.

basic is fine, as is python and C++ but it's in C and Assembly where the people who do crazy stuff with systems shine.

plus I think I've only found a basic, and a C assembler for old consoles like the genesis, Atari 2600, NES, and C64

beyond that it's pure assembly code.

kinda hard to imagine that sonic the hedgehog, Super Mario Bros, Pac man, Asteroids, and donkey kong were all written in assembly huh?
"I reject your reality and subsitute my own." - Adam Savage, Mythbusters
[Image: 5.jpg]
Reply
#3
(September 24th, 2019 at 4:03 AM)SpookyZalost Wrote: You're completely right.

basic is fine, as is python and C++ but it's in C and Assembly where the people who do crazy stuff with systems shine.

plus I think I've only found a basic, and a C assembler for old consoles like the genesis, Atari 2600, NES, and C64

beyond that it's pure assembly code.

kinda hard to imagine that sonic the hedgehog, Super Mario Bros, Pac man, Asteroids, and donkey kong were all written in assembly huh?

Funnily enough, I've actually started seeing BASIC come back as a demand in the security/systems sectors of software engineering/tech. A bunch of recruiters have approached me at hackathons (not the other way around of me visitting their booth) and asked if I wanted a co-op placement (I'm a student, it's basically the equivalent of a paid internship done through the school.)

In almost every single brochure/pamphlet/folder they've given me with skills or job descriptions, BASIC has been on the list.

I've played around with a BASIC terminal before, actually. There's a pretty cool Z80 emulator for Windows called WinAPE. Comes with the default BASIC terminal, but also includes an assembler. That's mainly where I started learning anything about assembly, actually.

I consider it the python of assembly, since it pretty much is all interpreted or compiled JIT as far as I know. At least that's what it felt like when I briefly played with it in WinAPE.



Unless you're talking about basic as in visualbasic. If that's what you meant, that's absolutely unforgivable as the language was only meant to get older devs to migrate to the .NET/C# platform Tongue
Reply
#4
I took a Visual Basic class after dropping out of high school before getting my GED but no... I meant BASIC as in the language used significantly in the 1970's, 80's, and early 90's on home computers considering many came with a basic programmer, assembler, and interpreter.
"I reject your reality and subsitute my own." - Adam Savage, Mythbusters
[Image: 5.jpg]
Reply
#5
Nice post, +rep. Thank you for sharing!

I wish that people would spend more time attempting to properly optimize applications. Interpreted languages (especially "slower" languages, such as Python) are becoming increasingly common. Java is relatively decent as far as performance is concerned. C is obviously great. C#, an interpreted language, is gaining some traction over classic C and C++, and overall, people care so much more about cross platform options than they do about efficient, high performance options.

It's to be expected. 15 years ago, when the best anyone had was a Pentium 3 or a Pentium 4, before gigabytes of RAM and multi-core processors, people had no real choice but to make sure their applications were highly optimized. As a result, if the application had any sort of demanding system requirements, you would pretty much have to do it in Java or C. Python wasn't going to cut it.

Nowadays, this is much less of a focus, as you mentioned. For example, Mac OS is now integrating a new cross platform toolkit and a set of libraries to allow developers to easily port iOS applications to Mac OS (coming with the new version of the OS next month). Much of the groundwork on a lot of this had already been completed for Mac OS Mojave, which is out now. Several iOS apps (such as Stocks, News, and Reminders) were ported to the desktop as a part of the release. These apps, despite launching very quickly on the iPhone, are very slow to launch on a presumably more powerful MacBook. (Although some of the newest iPhones have a total smoke show of a CPU built in. Apple truly undersold these things.

So many additional libraries have to be dynamically loaded that launching times can easily extend into the several seconds range. These were the kinds of launch times you would expect from a pre Core-2-duo CPU (e.g. Pentium 4, etc.). After 10-15 years, we haven't made a whole world of progress in terms of end-user performance (though we've definitely made strides in the right direction). Instead, all of this extra computational power is used in helping the developer.

In the end, I don't think this will be changing. Developers seem to work for other developers, and not so much for the user. If the development can be done more easily, or be ported to more devices, the incentives for developers to work on projects is greater. And as a result, the ecosystem of developers is larger, more diverse, and more qualified. We end up with more applications that have more features, and are ported to more devices. But the end-user ultimately has to pay the price in the performance gains that they would otherwise see.

Reply
#6
(September 25th, 2019 at 12:25 AM)Darth-Apple Wrote: Nice post, +rep. Thank you for sharing!

I wish that people would spend more time attempting to properly optimize applications. Interpreted languages (especially "slower" languages, such as Python) are becoming increasingly common. Java is relatively decent as far as performance is concerned. C is obviously great. C#, an interpreted language, is gaining some traction over classic C and C++, and overall, people care so much more about cross platform options than they do about efficient, high performance options.

It's to be expected. 15 years ago, when the best anyone had was a Pentium 3 or a Pentium 4, before gigabytes of RAM and multi-core processors, people had no real choice but to make sure their applications were highly optimized. As a result, if the application had any sort of demanding system requirements, you would pretty much have to do it in Java or C. Python wasn't going to cut it.

Nowadays, this is much less of a focus, as you mentioned. For example, Mac OS is now integrating a new cross platform toolkit and a set of libraries to allow developers to easily port iOS applications to Mac OS (coming with the new version of the OS next month). Much of the groundwork on a lot of this had already been completed for Mac OS Mojave, which is out now. Several iOS apps (such as Stocks, News, and Reminders) were ported to the desktop as a part of the release. These apps, despite launching very quickly on the iPhone, are very slow to launch on a presumably more powerful MacBook. (Although some of the newest iPhones have a total smoke show of a CPU built in. Apple truly undersold these things.

So many additional libraries have to be dynamically loaded that launching times can easily extend into the several seconds range. These were the kinds of launch times you would expect from a pre Core-2-duo CPU (e.g. Pentium 4, etc.). After 10-15 years, we haven't made a whole world of progress in terms of end-user performance (though we've definitely made strides in the right direction). Instead, all of this extra computational power is used in helping the developer.

In the end, I don't think this will be changing. Developers seem to work for other developers, and not so much for the user. If the development can be done more easily, or be ported to more devices, the incentives for developers to work on projects is greater. And as a result, the ecosystem of developers is larger, more diverse, and more qualified. We end up with more applications that have more features, and are ported to more devices. But the end-user ultimately has to pay the price in the performance gains that they would otherwise see.

Funny you mention MacOS/iOS, because you're absolutely right considering the porting mobile to desktop stuff.

Apple devs are probably the biggest sluts for optimization, actually, since everything is standardized. Each iPhone (assuming the same model) will always have the same specifications. Same battery, same CPU, same chipsets, etc. And because it's standardized, Apple can also write iOS with those specific parts in mind.

No extra drivers. No extra unused kernel modules (note that iOS and MacOS are heavily based on BSD). No extra runtimes because Swift (and ObjC before it) both compile natively. There's very little overhead, and Apple can make it work because they don't need to worry about any extra variables and they can focus on making their software work near perfectly for the hardware they choose for each model.

Of course, there is some overhead, since the same firmware file can be installed on any (supported) hardware, so backwards compatibility is probably those few extra millimeters which are more or less negligible.

I really do hope the toolkit makes some waves. I personally don't own any Apple products (a little too expensive for the dumpster-diver in me) but I have been debating getting a Macbook Pro or something for an incredibly long time. But I can only imagine that performance is gonna skyrocket and we can say that we've truly moved into the mobile era.

And with it comes learning RISC-V and ARM64 assembly.
Put me out of my misery, x86 is hard enough (although I'd only really be learning an ISA and maybe a few specifics like endianness or whatever, so I guess it's somewhat portable.)



Otherwise, yeah. It's a bit depressing to see Node.JS take up 500mb for a simple bot that prints Hello World a few times to stdout.

I especially wish that JS devs started to care more about performance. I mean, to some degree they do, but their 'solutions' are just quick and dirty hacks that somehow make their way upstream into releases for Node, jQuery, or the default ECMAScript-based browser sandboxes. No actual attempts at improving the dumpster fire. I would probably use it on the regular if it was a little better, since I do actually enjoy working with it for simple web projects, but I can't stand the overhead it produces with the runtime.

And don't get me started on the electron framework. It literally takes me hours to even compile it. That's traumatized me enough.

At least Ruby and Perl devs seem to care about optimization. I'm not sure where Python is headed since Rossum left.



(September 24th, 2019 at 10:53 PM)SpookyZalost Wrote: I took a Visual Basic class after dropping out of high school before getting my GED but no... I meant BASIC as in the language used significantly in the 1970's, 80's, and early 90's on home computers considering many came with a basic programmer, assembler, and interpreter.

Yeah, lmao. You know how horrible it is. VB is another dumpster fire. I can't fathom the fact that people actually use it for anything.

I can only imagine that's what BASIC is still used for, legacy systems. Since most of the companies that I briefly mentioned in the last post above do specialize in security/contracting, I can imagine they work with lots of legacy military systems or something along those lines.
Reply
#7
it would make sense, I'm sure BASIC is used in applications like Missile Silos, NORAD, and other places where it's difficult and expensive to upgrade hardware without introducing new security risks.

Security through obscurity and all that.
"I reject your reality and subsitute my own." - Adam Savage, Mythbusters
[Image: 5.jpg]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Project Euler and Software Optimization (Problem 4 & Palindromes) Darth-Apple 4 3,371 February 12th, 2020 at 7:06 PM
Last Post: Darth-Apple



Users browsing this thread: 1 Guest(s)

Dark/Light Theme Selector

Contact Us | Makestation | Return to Top | Lite (Archive) Mode | RSS Syndication 
Proudly powered by MyBB 1.8, © 2002-2024
Forum design by Makestation Team © 2013-2024