This guide will give you tips and ideas on how to speed up your programs using assembly language. The example will be of dimming an image/screen using FreeBasic making use of its inline assembler. The same idea of mixing in some assembly can apply to any language though.
It is code I have used in various games but the original version was too slow. In the games where this function was used, it needed to be as fast as possible as it was called every time the screen was refreshed.
This is the orginal FB version:
for j=0 to 479 for i=0 to 639 t=point(i,j) r=(t and &H00ff0000) shr 16 g=(t and &H0000ff00) shr 8 b=(t and &H000000ff) r=r shr 1 g=g shr 1 b=b shr 1 pset(i,j),rgb(r,g,b) next i next j
Time taken : 0.076 seconds
It reads a point, gets each separate colour, halves the colour then plots it back on the screen. It works but it is not very quick, both 'point' and 'pset' are very complex instructions.
This version uses a pointer to work directly on the screen:
Dim buffer As UByte Ptr = ScreenPtr() for i as integer=0 to (640*480*4)-1 tmp=*buffer shr 1 '' Read a byte from the screen, shift it and put it into tmp *buffer=tmp '' Put our shifted byte back on screen buffer+=1 '' Move screen pointer to the next byte next
Time taken : 0.006 seconds
It is over 10 times quicker! It reads a pixel byte from the screen, shifts it and then puts it back.
This version also directly manipulates the screen:
dim buffer As UByte Ptr = ScreenPtr() dim as uinteger c=(640*480*2) asm mov eax, [c] ''counter mov ebx, [buffer] ''screen buffer dtop: mov cx,[ebx] ''screen pos shr ch ''shift colour bytes shr cl mov [ebx],cx ''put shift colour back add ebx,2 ''move to next 2 bytes sub eax,1 '' decrease count jnz dtop end asm
Time taken : 0.002 seconds
Not as big a speed increase but still 3 times faster again! It is very similar to the previous version, this one reads the screen bytes two at a time, shifts both and then puts them back.
Note that there are probably assembly versions of this that are even faster but I think the above gets the point across.
This example is cherry picked in that it compares a slow complex function to an optimised specific one. The main points are:
See if there is a better method. Even the non-assembly version was over 10 times faster.
Find the slow parts of your program that would actually benefit from being optimised and target them. If a function is only called once in your program, spending time to make it take 0.1 second instead of 0.2 seconds is a waste of time. However, if that same function is called millions of times, shaving 0.1 seconds could massively speed up your program!
Time your functions, you don't want your hand optimised function to be worse...
There are many advantages to optimising your programs:
The main reason is that it will give your users a better experience. No one likes sluggish programs.
Your program will run better on lower hardware. On an Intel i7, the visible speed difference in the above functions is probably not noticable, however on a Pentium 3 it would be. This is better for the environment as older hardware can be used longer.
The faster your program finishes, the sooner it can get onto the next task.
Last updated 19/01/2019