CH552 Assembly: 2 - Again in Assembly

Apr 11, 2025

Now that we have written our first program and verified that the toolchain works, we are going to rewrite our C program in assembly. Before we can really get started, we need to be able to produce output so that we can see the results of our programs. It would be nice to be able to handle input as well, so that we can interact, which will allow us to make more complex programs. This is exactly what we did in our C program, so now it's time to learn to do it in assembly.

Back to Table of Contents

To start with, SDCC leaves behind a lot of build artifacts, including the .asm assembly language files. We can use these to help us learn assembly. For example, if you haven't run make clean since compiling the previous program, you should have a main.asm file in the project directory. You can open that up in an editor and see the assembly code SDCC generated from main.c. There will also be a WS2812.asm with the assembly code from WS2812.c, though most of that file is in-line assembly to begin with, so it won't be much different. What you will find that is different is a huge amount of setup information and a bit of extra code added by SDCC. We will go over what most of that does in future tutorials, but you can ignore most of it for now. A lot of the stuff there is for setting up variables in various parts of memory, but they are empty because we didn't create any variables in those locations. The labels that start with sdcc are special functions for maintaining thread safety in interrupts, which we aren't using and don't need. In main.c you can scroll all the way down to _main, where our code starts. In WS2812.asm, there is only one function, _neopixel_show. SDCC uses given function names prepended with an underscore for labels that mark the beginning of functions. You will also notice that most of the assembly code generated by SDCC includes comments showing the C code the assembly represents. This can be really handy for figuring out how to translate C concepts into assembly code.

Let's get started. First, create a new project directory. In that directory create main.asm. We also need to copy the simplified NeoPixel library for the QT Py CH552 into this directory from ch552_libs/WS2812_QT_Py_CH552/. You might want to open a second terminal window and open main.asm from the previous tutorial. This will allow you to compare what the compiler generated with what we are writing. This will likely look rather intimidating. It's not as bad as it looks. We can skip a lot of what the compiler includes. Imperative programming languages (like C, Python, Java, and most common languages) generally lack ability to convey intent. The compiler thus knows what each command should do but not why. This means that the compiler has to make a lot of assumptions about the programmer's intent, and it has to accommodate the broadest range of possible intents. Not only does this prevent certain kinds of optimizations, it requires much more verbose assembly to be generated. For example, it doesn't know whether a particular symbol is going to be used in another file or not, so it needs to make anything that could be used elsewhere visible at the global level. It also has to setup memory regions that we may never actually use. As the programmer, we do know our intent though, and that means we can skip a lot of fluff that the compiler has to include just-in-case.

Let's start by naming our module. This won't affect our program in any way, but it's probably a good habit to be in. If you eventually start writing libraries or more complex programs in assembly, module labels could become useful. You can name it whatever you want. I've named mine `main`, so that I know at a glance that this is the module containing the program starting location. We name modules with the `.module` directive. It is considered good style to indent directives and instructions by one tab, while labels and constant definitions are not indented.

	.module main

Next we need to setup our global references. Let's take a quick detour and learn about labels. Labels take the form of a name followed by a colon, and they are essentially markers for memory locations. A label represents the memory address of the next object directly after it. This could be an instruction, or it could be a memory allocation. Labels can be used in place of any direct memory address in an instruction argument. When the assembler comes across a label, it notes the label and the memory address associated with it, and when it encounters an instruction that uses that label, it replaces the label with its address. Normally, the label is discarded once the assembly process is complete. This is a problem though, when a label is used in another file or when an instruction uses a label that is in another file, because the assembler doesn't know the address of labels in other files, so it can't replace labels used in instructions with their addresses. This is why we need a linker as part of the compilation process. The assembler leaves information for the linker when it finds addresses that it can't resolve on its own, and the linker handles them. For that to work though, the assembler needs to know when labels need to be preserved for the linker.

We have to tell the assembler which labels and references to labels need to be preserved for the linker. This is what global references are for. Global references are created with the .globl directive.1 There are two places where we need to declare labels as global. One is labels in the current assembly file that will be used in external files. If you are following along in the assembly file created by SDCC in the previous tutorial, you will see that _main is in the list of labels declared global.2 _main needs to be declared global, because it could be called from SDCC's library modules.3 We don't need to do this, because we aren't using SDCC's library functions. In fact, our program doesn't contain any externally referenced labels, so we don't need to declare any labels within our program global.

The other place where we need to declare labels as global is external labels we will need to reference from within our program. Since we are using an external library for controlling the NeoPixel, we will need at least one global for that. We will actually need two. First, we are going to call neopixel_show(). Again, remember that the assembler adds an underscore to the beginning of the name. If you aren't sure what a function call is translated to by the assembler, compile the library in question and look at the assembly code. We can look at WS2812.asm from the previous tutorial and see in its list of globals _neopixel_show. So our first global will be .globl _neopixel_show.

Next, we have to figure out how this function accepts arguments. The SDCC calling conventions for the MCS51 can be difficult to find in the SDCC documentation. On Windows, the documentation can be found here C:/Program Files/SDCC/doc/sdccman.pdf, if you installed to the default location. Section 4.1.5 (page 70, on my installation) contains this information. What we find is that arguments are passed in DPL, DPH, B, and ACC consecutively, so long as they will fit (and aren't bit or struct types, and it appears two byte memory address args mess this up a bit). neopixel_show() takes two arguments, a two byte memory address to external RAM, and an 8-bit integer. The first argument will be put in DPL and DPH (which together are DPTR). The assembly code used in WS2812.c expects the len argument to be in B, however the two byte memory address argument seems to have interfered with this. I was able to correct this (so that you won’t need to) by specifying the memory address to pass the argument in, with the memory address of the B register. As a result, the compiler treats this as a parameter passed in RAM. Near the end of 4.1.5.3, we learn that parameters placed in RAM are placed in a location labeled _<function_name>_PARM_<n>, where n is the parameter number (starting from 1). Since this is the second parameter, we can work out that this label should be named _neopixel_PARM_2. It is probably a good idea to verify this by looking at the assembly files. Looking in WS2812.asm from the first tutorial, we find the line .globl _neopixel_PARM_2, and further down (line 40 in mine) we find the constant _neopixel_PARM_2, assigned the value 0x00f0, which is the memory address of B. From here, we don't care that the argument is being passed in B. We only care about the constant assigned to the address it will be passed in. So we will add this to our globals in our assembly language program. Now the program should look like the following.

	.module main

; Setup external references
	.globl _neopixel_show
	.globl _neopixel_PARM_2

If you can't figure out how to call a particular C library function from assembly, you can always write a very simple C program which makes the function call in question, compile it with the same compiler switches as you are using for your project, and then look at the assembly code generated for the program and the library. If you don't have much assembly experience, it will likely be hard to understand at first, but it gets much easier with experience.

Now we need to define some constants. Remember in the C program we had to setup variables using the memory addresses of the special function registers (SFRs) we wanted to use? We don't have to do that here, but it will make our code much more readable and easier to write, so we should. You might notice in the assembly for the previous project that SDCC even does it (lines 28-34 in main.asm in mine). Just like SDCC does, we can define constants merely by assigning a symbol a name. There are some rules for symbol names, like they can only contain underscore, letters, and numbers, and they can't start with numbers, much like C variable name rules. As usual, SDCC prepends them with underscores. We don't need to do that ourselves though. We need all of the same constants we define as SFR variables in the C program, and we just need to assign to them their respective memory addresses. (Note that SDCC uses 0x00??. Because these are all 8 bit addresses, we can skip the leading zeros. I do this to indicate that these are 8 bit memory addresses not 16 bit ones.)

; Define constants
SAFE_MOD  = 0xa1
CLOCK_CFG = 0xb9
P1_MOD_OC = 0x92
P1_DIR_PU = 0x93
P3_MOD_OC = 0x96
P3_DIR_PU = 0x97
P3        = 0xb0

You might have noticed the semicolon followed by text in our code. This is how you do comments in this particular version of assembly.

The next step is setting up various memory areas and allocating memory. First we have to setup the register segment. (This is mandatory. If this section does exist, the assembly will silently fail.) Strictly speaking, we technically only need to create the segment. We don't actually have to allocate register banks. That said, the linker can provide a nice graph of how we have allocated the internal RAM, and if we aren't explicit about how we are allocating things like registers, that graph won't be very useful. Also, the linker has some flexibility in how it allocates certain memory area types, and if we are using an external library that gives the linker this flexibility but we don't tell the linker how we are using memory, it could end up overlaying memory used by the library on top of memory that we are using and don't want touched. So if we are using an external library, it's probably a good idea to let the linker know what we are up to. Remember _neopixel_PARM_2? It is in an area declared .area OSEG (OVR, DATA). Because it is not declared absolute (ABS), the linker will attempt to put it in the lowest unallocated memory in internal RAM, if we don't allocate the lowest register bank, it would likely put it there. We don't want that, because we want to be able to trust that that bank won't be overwritten every time neopixel_show() is called. We will start by creating an area called RSEG, in the internal RAM, that is placed at a specific memory location.

The `.area` directive takes a name and a list of parameters. The name can be anything. The convention used by SDCC is to start with one or more initials identifying what it is for or where it is, followed by "SEG" for segment. Since this is the segment for registers, we will call it "RSEG". (Note that while you could name it something else, the linker may expect this name specifically for the purpose of producing memory maps and such.) The list of parameters tells the linker which memory bank to put the area in, whether it can overlay other stuff in the same area, and whether it is allowed to choose where in the area it can put allocated stuff. Multiple areas can be defined with the same name, and they will all be grouped together, so these arguments tell the linker about how it can or should arrange everything in multiple areas with the same name. In this case, we want to define the area to be in internal RAM, so we need to include DATA in the list, and we want to tell the linker where to put it rather than letting the linker choose, so we also need ABS.

; Set beginning of data area (register segment)
	.area RSEG (ABS, DATA)

Directly after this will use another directive, .org, which tells the linker to move its pointer to the specified address before continuing. This means that anything allocated after that within the current area will be allocated starting from the specified address. The CH552 has four banks of 8-bit registers, each containing 8 registers, and these start at the very beginning of internal RAM. Our program only really needs one of these register banks though, so we can leave the rest for the linker to use as it sees fit by allocating only the first 8 bytes as registers. We will do this by creating a new area in internal RAM and allocating 8 bytes.

; Allocate register banks
	.area REG_BANK_0 (REL, OVR, DATA)
	.ds 8

;	.area REG_BANK_1 (REL, OVR, DATA)
;	.ds 8

;	.area REG_BANK_2 (REL, OVR, DATA)
;	.ds 8

;	.area REG_BANK_3 (REL, OVR, DATA)
;	.ds 8

I've included the allocation code for the other three register banks as well, commented out, so that if we later want to use this program as a starting point for something more complex, we can just uncomment the additional banks we want to use.

There is some new stuff in these that we need to discuss. First though, you can find the documentation for this in C:/Program Files/SDCC/doc/sdas/asmlnk.txt, in section 1.4.22 (try opening this text file in your browser and use the browser's search capability to find the section). This includes a list of options, however it is missing some 8051-specific information. That can be found in C:/Program Files/SDCC/doc/sdas/README, under the heading "8051 ASSEMBLER". (The most important note in that section is that the _CODE should not be used for 8051 based CPUs.) We will go over the various options as we use them, but these are the official reference materials for this information.

Let's start with REL. According to the documentation, REL indicates that an area is relocatable. An area that is relocatable can be placed wherever the linker chooses to place it. This should generally be used if you don't care where it goes. The linker typically places relocatable areas starting at the next free memory region with enough room, after placing all absolute areas. In this case the RSEG segment is the only absolute segment in internal RAM, and it is empty (and should probably be left empty), so REG_BANK_0 will start at 0x00, which happens to be where the first register bank is. (I'm not sure why the register banks need to be relocatable, however I did some experimenting, and the linker doesn't get higher banks right unless they are relocatable. Additionally, allocating space in RSEG does not displace the register banks. This suggests the linker gives the register banks special treatment, perhaps based on their names.) Areas are relocatable by default, so this doesn't strictly need to be specified, but being explicit here is probably a good idea.

Areas declared with OVR are overlaid. This means they use the same memory as all other areas with the same name. You might notice that in the previous tutorial, both main.asm and WS2812.asm declare a REG_BANK_0 area. Declaring them as OVR means that they are literally using the same memory. They are overlaid on top of each other. This means that we need to be careful with how we use registers. When we call a function, we can't assume that the function won't change or overwrite data we've put in those registers, unless we've checked and verified that it doesn't. This means if we want to preserve data in registers, we may need to push that data onto the stack or store it somewhere else in memory before calling a function and then restore it after the function returns. The alternate to OVR is CON. CON is the default, and it concatenates areas with the same name rather than overlaying them. CON cannot be used with ABS, as ABS implies OVR.

We are already familiar with DATA. That tells the linker that the area is in the internal RAM. We also have BIT, which puts the area in the 16 bytes of bit addressable memory starting at 0x20 in the internal RAM. If we need bit addressable memory, we would declare the area with BIT. XDATA specifies that we want the area in external memory, and CODE puts the area in program memory (which is the only place the CH552 can execute instructions from). Here's some commented code for allocating BIT and non-absolute DATA memory that might be good to include for future use.

; Allocate memory in the bit addressable
; segment of internal RAM.
;       .area BSEG      (BIT)
;       .ds 16   ; Allocates all 16 bits of the bit addressable memory

; This area can be used for
; relocatable data in internal RAM.
;       .area DSEC      (DATA)

We aren't going to use the stack directly in this program, but the function call and return instructions will use it, and the linker uses it to make the memory use graph look right, so we need to set the stack pointer anyway. The first step here is to define a stack area and create a label we can reference to set the stack pointer.

; Set the stack area
	.area SSEG
__start__stack:
	.ds 1

We can let the linker handle where to put the stack, and since we are only declaring one area named SSEG, we can just let it use the default behaviors (REL, CON, and DATA). The important part of this is that we are labeling the beginning of the stack. We have to allocate a minimum of one byte, otherwise the label will apply to whatever the next thing allocated is. When we start writing the actual code, we will use that label to initialize our stack pointer.

Now we need to allocate memory for our led_data global variable. As mentioned in the previous tutorial, the NeoPixel library expects it to be in external RAM, so that's where we need to put it.

; Allocate variables in external memory
	.area XSEG (XDATA)
led_data:
	.ds 3

Again, the defaults (REL and CON) are appropriate here, so we only need to specify which memory we want it in with XDATA. .ds 3 skips forward three bytes, allocating the memory we need for the NeoPixel color data.

Now we can finally start writing code! According to the documentation, the CH552 (and other 8051 CPUs) begin execution at memory address 0x0000. On page 27 of the datasheet, we see the addresses of the interrupt vectors, and we find that the first one starts at 0x0003. This presents a problem. If we put our entire program at 0x0000, it will overlap with the interrupt vectors. This is only acceptable if our program will never use interrupts, because otherwise, when an interrupt is triggered, execution will move to some place near the beginning of our code. The last interrupt, the one for the watchdog timer, starts at 0x006B. When an interrupt triggers, it moves execution to the address of the interrupt vector. There are eight bytes between interrupt vector addresses, so each one has eight bytes to do whatever it is going to do and return. This isn't enough to do much, so common practice is for the interrupt vector to immediately call a function that is somewhere else in program memory that acts as the interrupt handler for the specific interrupt. This can be done with AJMP, LJMP, or SJMP. (It's not generally done with ACALL or LCALL, because the return address is already on the stack and the interrupt handler can use RETI to return to that address directly rather than returning to the interrupt vector.) AJMP and SJMP are each two bytes, and LJMP is three bytes. What this means is, with very few exceptions an interrupt vector will never need more than three bytes. So why is this important? Well, if we know the last interrupt vector is at 0x006B, and we know that interrupt vector won't need more than three bytes, we can put our code three bytes beyond that address, and we don't have to worry about getting in the way of the interrupt. So the closest we can put the code to the interrupt vectors while allowing for the normal use case is 0x006E. For the sake of simplicity let's give it two more bytes, so that it starts at 0x0070. (Note that certain other CH55x microcontrollers have more interrupts and thus will need the program to be pushed back a little bit further.) So we want our code to start 0x0070, but execution starts at 0x0000. The first interrupt vector starts at 0x0003. That means we have exactly three bytes to get from 0x0000 to 0x0070. That's just enough for an LJMP! And that's exactly why the developers of the 8051 put exactly three bytes before the first interrupt vector. So, we need to put our LJMP instruction right at the beginning of the program memory. We are going to do this by creating an absolute area in code memory, putting it at address 0x0000, and then putting our LJMP in it.

; Program execution starts at address
; 0x0000 in code memory.  This is
; followed by interrupt vectors, so we
; have to use the 3 bytes available
; for an ljmp to the program code.
        .area ISR       (ABS, CODE)
        .org 0x0000
        ljmp    _start

I've named the area ISR, because it is where the interrupts are (ISR is for Interrupt Service Routines). I've seen it named INTV (for interrupt vectors) as well. If that makes more sense to you, you can give it that name, but you'll have to remember this for later when we get to writing the linker script. We set the starting address to 0x0000 with .org. Then we LJMP to _start. Why _start? Because that's what is commonly used in Linux when writing pure assembly programs (at least, for 32-bit programs on ARM), as the starting place for the program. Either the linker or the OS itself knows to begin execution at _start. We can name it anything we want here, so long as we give LJMP the right label, but I've named it _start because it will be familiar to anyone with Linux assembly programming experience.

Now we can write our program. We will start by creating a new code area. We don't want it to concatenated to the ISR area, so we will give it a new name. Again, you can name it whatever you want. I've named it TEXT. If you are new to assembly, this might seem odd, as it's going to contain assembly instructions rather than text data. This is another familiarity thing. Traditionally, the area where the code goes is called the "text area". People familiar with assembly programming will be able to look at this assembly file and know exactly where the program code starts when they see this. While you can name it whatever you want, you should probably name it TEXT for now, to avoid later confusion. You might notice in a moment that this is going to be a relocatable area. How then are we going to put it at address 0x0070? We will do that in the linker script, and we have to use the same name there. So if you do choose to use a different name, keep in mind you'll have to change it in the linker script as well.

; This is our relocatable code area,
; where our program goes.
	.area TEXT (CODE)
_start:

There is the beginning of our _start "function".

First things first: We need to setup the stack. While we won't be using it directly in this program, it will actually get used. Regardless, it's probably good practice to setup the stack whether we will be using or not, because if we later modify the program to use the stack, and we haven't set it up, it could end badly. So we will start our program with the following.

	; Set the beginning of the stack
	mov sp, #(__start__stack - 1)

The stack pointer is a special register, SP. It is used to keep track of the top of the stack. On 8051 based CPUs, the stack pointer grows upwards, and it is incremented before writing, so that it always points at the last byte written to the stack. We can see this in the "Action" section for the PUSH instruction on the CH552 Instruction Set Quick Reference Card (which you can download from here). The action is listed as "SP = SP + 1, @SP = direct" So first, the stack pointer is incremented, then the byte in the memory address given as the argument is stored at the new stack pointer location. POP does the reverse, "direct = @SP, SP = SP - 1". First it copies the data from the current stack location, then it decrements the stack pointer. You might have noticed that the address we put in the stack pointer is the beginning of the stack minus one. This way, when we push something to the stack, it will increment SP to point to the first byte of the stack and then write the data there. (On the Quick Reference Card, you will notice that PUSH and POP are not the only instructions that use the stack pointer. All of the "Subroutine Call / Return" instructions also use the stack, to store and retrieve return addresses. This is where our program will use the stack and why we absolutely must set it up.)

Now we are ready to replicate the code in main() of our C program. The first step is to setup the clock. We start with putting the CPU into safe mod, so that we can write to CLOCK_CFG.

	; Set clock to 16MHz
	mov SAFE_MOD, 0x55
	mov SAFE_MOD, 0xAA

Note that the first argument is always the destination. Next we have to do this: CLOCK_CFG = (CLOCK_CFG & ~0b111) | 0x05;, but in assembly. We will go into more depth on math operations in a future tutorial, but for now we need to know how to do bitwise AND and bitwise OR. These are found in the "Logical" section of the Quick Reference Card. ANL is the instruction for bitwise AND, and ORL is the instruction for bitwise OR. First we need to AND the clock config register with ~0b111. This is 8 bits, so ~0b111 = 0b11111000. If you look at the code from the previous tutorial, you will find that the compiler has changed this to 0xf8. You can use that, or you can use ~0b111. The assembler can do the math to figure out what ~0b111 is in binary, so if you find that to be more readable, do it. To AND this with the clock config register, one of the two needs to be moved into the accumulator (the A register), because ANL can't do direct to direct, and because we don't want to write the clock config register during intermediate steps. So we will start with mov a, #~0b111. Now we can AND the clock config register into A with anl a, CLOCK_CFG. This ANDs the two and stores the result in A. Next we have to OR that with 0x05, to set the first three bits of CLOCK_CFG to 0b101, for 16MHz. We will do that with orl a, #0x05. (Again, you could use 0b101 in place of 0x05, if that's more readable to you. The assembler will understand that.) Then we write that back to CLOCK_CFG with mov CLOCK_CFG, a. Lastly, we can kick the CPU out of safe mode by writing to SAFE_MOD with mov SAFE_MOD, #0x00. That gives us the following code for clock setup.

	; Set clock to 16MHz
	mov SAFE_MOD, #0x55
	mov SAFE_MOD, #0xAA

	mov a, #~0b111
	anl a, CLOCK_CFG
	orl a, #0x05
	mov CLOCK_CFG, a

	mov SAFE_MOD, #0x00

We need a ~5ms delay here, to let the clock settle, but if you look at the delay assembly from the previous tutorial, you will see that it's kind of complicated. It's also incredibly imprecise. The delay is not really as long as it should be either. Now that we are working in assembly, we have a lot more control over timing. This means we can get much more precise delays. This is not a trivial task though, so let's drop a comment here and leave it for later. I like to make my comments that are indicating something is missing or needs fixed like this (no indent).

; !!! 5ms delay here !!!

This makes the comment stand out when I'm scanning through the code, so that I don't forget about it. If that doesn't stand out enough for you to notice it, find something else that does and use that. We will come back to this later.

Now we can setup the NeoPixel pin. This is fairly straightforward. We've already learned the ANL and ORL instructions, and all we are doing is ANDing and ORing some bits.

	; Setup NeoPixel pin
	anl P1_MOD_OC, #0b11111110
	orl P1_DIR_PU, #0b00000001
	;                        ^

It's as simple as that. Both of these instructions can take a direct destination and an immediate for the other operand, so we don't need to do any juggling of data. You might have noticed that I've included a comment after these instructions using a carat symbol to point to the bit we care about. You don't have to do this, but it improves readability, so it's probably a good idea.

Now we need to setup the BOOT button pin. We just need to set two bits to zero, which means AND operations. Again, in the last project the compiler translated our NOTed binary values into hex values, 0xbf. For readability purposes, I've elected to use the binary form. You can use whatever you find to be more readable, but you won't be able to point to the relevant bit with a carat if you don't use binary form.

	; Setup BOOT button pin
	anl P3_MOD_OC, #0b10111111
	anl P3_DIR_PU, #0b10111111
	;                  ^

Again, this is very simple. Higher level languages aren't always simpler than assembly.

Now we put the color data into the led_data variable. So far we've only interacted with data in the accumulator and in SFRs. Now it's time to learn how to interact with the external memory. It's only mildly more complicated. The internal RAM only has 256 addresses, so we can address it with a single byte. Those SFR registers we've used in the above instructions are just constants that represent the memory addresses we assigned to them near the top of the program. We won't be able to use memory addresses like that directly to interact with external RAM. Instead we have a special 16-bit register, DPTR that we load with the address we want to interact with, and then we use the MOVX instruction to transfer data between the memory at the address in DPTR and other places. In the Quick Reference Card, you will see that there is a special version of the MOV instruction that can load a 16-bit immediate value into DPTR. We can use that to load the address of led_data into DPTR with mov dptr, #led_data. MOVX can only move data between external RAM and the accumulator, so we have to put the values we want to put in led_data into A first. Then we can MOVX A into external RAM at the address pointed to by DPTR. The result is the following code.

	; Put color data in led_data
	mov dptr, #led_data
	mov a, #8
	movx @dptr, a
	mov dptr, #(led_data + 0x0001)
	mov a, #64
	movx @dptr, a
	mov dptr, #(led_data + 0x0002)
	rr a
	movx @dptr, a

This is what the compiler produced from the previous tutorial. There are some interesting optimizations in here. First, we can see the pattern. We load the memory address into DPTR, then we move the value we want to store into A, and after that we MOVX the contents of A into the memory at the address in DPTR. For led_data[1], we add one to the address we put into DPTR and for led_data[2] we add two. But what is this rr a instead of putting 32 in A? RR is the right rotate instruction. It rotates the value in A one bit to the right. That's the equivalent of dividing by two. And since the previous value in A was 64, rr a changes it to 32. Why use rr a instead though? mov a, #32 uses a direct argument. This means it is two bytes and takes two cycles to execute. rr a is one byte and takes one cycle. So by using rr a instead of mov a, #32 we save one cycle and one byte. In startup code, this is too trivial to matter, so if you want to change that rr a into mov a, #32 to make the code more readable, go right ahead. (Or you could just put a comment after rr a.) Keep in mind though: Sometimes there is more than one way to do something, and there are cases where a less intuitive choice will provide valuable benefits.

Now let's light up the NeoPixel! This is where that special global pointing to B comes in. We need to call _neopixel_show. To do that we need to setup the arguments, so that they are where the function is going to look for them. The first argument is a pointer to led_data. This is just a 16-bit memory address for external RAM, and since the first two argument passing registers (DPL and DPH) together make up DPTR, that's where it goes. After we move the address of led_data into DPTR, we need to prepare the second argument. This goes in the memory pointed to by _neopixel_show_PARM_2. So we put our 3 in _neopixel_show_PARM_2, and then we call the function. Here we use LCALL, because we don't know where the linker is going to place _neopixel_show, so we don't know if it will be close enough to call with ACALL, which requires the target to be within the same 2K block. (This is a short program, so it probably will be, but we can't guarantee it will stay that way if we add more code later.)

	; Call _neopixel_show
	; Prepare first arg in dptr
	mov dptr, #led_data
	; Prepare second arg (uint8_t len)
	mov _neopixel_show_PARM_2, #0x03
	; Call function
	lcall _neopixel_show

Now we need a delay again, but we don't have the delay function yet, so we will drop a note to ourselves to do that later.

; !!! Brief delay here !!!

That covers all of the initialization code! Following is what the program should look like now.

	.module main

; Setup external references
	.globl _neopixel_show
	.globl _neopixel_show_PARM_2


; Define constants
SAFE_MOD  = 0xa1
CLOCK_CFG = 0xb9
P1_MOD_OC = 0x92
P1_DIR_PU = 0x93
P3_MOD_OC = 0x96
P3_DIR_PU = 0x97
P3        = 0xb0


; Set beginning of data area (register segment)
	.area RSEG      (ABS, DATA)
	.org 0x0000

; Allocate register banks
	.area REG_BANK_0        (REL, OVR, DATA)
	.ds 8

;	.area REG_BANK_1        (REL, OVR, DATA)
;	.ds 8

;	.area REG_BANK_2        (REL, OVR, DATA)
;	.ds 8

;	.area REG_BANK_3        (REL, OVR, DATA)
;	.ds 8


; Allocate memory in the bit addressable
; segment of internal RAM.
;	.area BSEG      (BIT)
;	.ds 16


; This area can be used for
; relocatable data in internal RAM.
;	.area DSEC      (DATA)


; Set the stack area
	.area SSEG
__start__stack:
	.ds 1


; Allocate variables in external memory
	.area XSEG      (XDATA)
led_data:
	.ds 3


; Program execution starts at address
; 0x0000 in code memory.  This is
; followed by interrupt vectors, so we
; have to use the 3 bytes available
; for an ljmp to the program code.
	.area ISR       (ABS, CODE)
	.org 0x0000
	ljmp    _start


; This is the relocatable code area,
; where our program goes.
	.area TEXT      (CODE)
_start:
	; Set the beginning of the stack
	mov sp, #(__start__stack - 1)

	; Set clock to 16MHz
	mov SAFE_MOD, #0x55
	mov SAFE_MOD, #0xAA

	mov a, #~0b111
	anl a, CLOCK_CFG
	orl a, #0x05
	mov CLOCK_CFG, a

	mov SAFE_MOD, #0x00


; !!! 5ms delay here !!!


	; Setup NeoPixel pin
	anl P1_MOD_OC, #0b11111110
	orl P1_DIR_PU, #0b00000001
	;                        ^

	; Setup BOOT button pin
	anl P3_MOD_OC, #0b10111111
	anl P3_DIR_PU, #0b10111111
	;                  ^

	; Put color data in led_data
	mov dptr, #led_data
	mov a, #8
	movx @dptr, a
	mov dptr, #(led_data + 0x0001)
	mov a, #64
	movx @dptr, a
	mov dptr, #(led_data + 0x0002)
	rr a
	movx @dptr, a

	; Call _neopixel_show
	; Prepare first arg in dptr
	mov dptr, #led_data
	; Prepare second arg (uint8_t)
	mov _neopixel_show_PARM_2, #0x03
	; Call function
	lcall _neopixel_show

; !!! Brief delay here !!!

The next step is the main loop. Most of the main loop is stuff we've already done before. In fact, we can reuse the code for putting values in led_data and for calling neopixel_show(). We haven't written the delay code, but that's also something we do in the setup portion that is also done in main. The only new thing we are doing in the main loop is reading the BOOT button and conditionally branching based on the button state.

Let's start by setting up the main loop itself. This is trivial. We need to create a label, which we will call main_loop:, and we will put a jump instruction after it that goes back to it. Here's what that looks like.

main_loop:

	sjmp main_loop    ; Restart loop

Any code we place between these will continue running over and over indefinitely.

Now for the button. Note that the following code goes directly after the main_loop: label but before the SJMP instruction. Aside from the delay functions, all the rest of our code goes here. Now, to get the button data first we have to read P3. The button is at bit six, so that's the only bit we care about. In the C program, we handled this by shifting bit six right to the position of bit zero, and then we did a logical AND with 1 to clear all of the other bits. Now, this makes the code more readable, because it makes it clear which bit we care about. Technically, we could have just ANDed P3 with 0b01000000 and given that to the if statement, because any value that isn't zero is interpreted as true. Except, this is actually hardware and compiler dependent. The C standard only specifies that 1 must be true and 0 must be false. It does not specify how other values can or must be interpreted. Typically C compilers interpret anything that isn't zero as true, but this is mainly because of the hardware instructions available on most CPUs. The CH552 is no exception. The instruction SDCC uses in our previous tutorial is JZ. That is "jump if zero", and the value under test is the A register. What this means is zero is false, while anything else is true, and the instruction jumps to the specified location on false. In C we generally shouldn't make assumptions about undefined behaviors, but in assembly, there is no abstraction layer, so we know exactly how our code is going to be interpreted. As such, we can make some optimizations.

Following is the assembly generated by SDCC in the previous tutorial for the beginning of the if statement.

;	main.c:46: if ((P3 >> 6) & 1) {
	mov     a,_P3
	rl      a
	rl      a
	anl     a,#0x01
	jz      00103$

First, it reads P3. Then it rotates left twice. But wait, we told it to shift right, didn't we? It turns out they are logically approximately the same. Looking at the Quick Reference Card, we find that the CH552 doesn't even have shift instructions. To perform a shift, we would have to do a rotate and then AND mask off the part that rotated from one side to the other. That doesn't matter here though, because we are masking out seven of the eight bits anyway. Additionally, you don't get to specify the distance of the rotation. It's always one bit, so if you want to rotate more than one bit, you have to do multiple rotate instructions. And what this means is that to shift six bits to the right, we will need six RR instructions. Or, because this is rotation and not shifting, if we rotate left twice that will have the exact same effect. So SDCC chose the shorter distance to save cycles and executable space. That makes sense, doesn't it? Then we do a logical AND to mask out everything but the bit we care about, and now we jump to label 00103$: if that remaining bit is zero. Knowing that the instruction used here is JZ though, we can do better. There's no need to rotate, if the conditional jump is only testing for zero, because it doesn't actually matter where the one is, if the button is pressed.

        ; Check BOOT button and set LED color
        mov a, P3           ; Get port 3
        anl a, #0b01000000  ; Clear all other bits
        jz if_off           ; If button is not pressed

This is what we are going to use in the assembly version. We read P3, then we mask out all of the bits except the one we care about, using a logical AND, and then we fire off the conditional jump-on-zero. We've saved two cycles and two bytes. It might not seem a like a lot, but this is in our main loop, so saving two cycles could add up to a lot over time, for a more complex program. Additionally, because the behavior of JZ is very strictly defined (unlike the boolean value of numbers that aren't 0 or 1 in C), this optimization makes the code more readable rather than less.4

JZ jumps if the button value is zero. So the next part needs to be the code for if the button value is not zero. That's what JZ will be jumping over. This is literally nothing more than setting led_data to the color we want the NeoPixel to be when the button is pressed.

	; If BOOT button is pressed, do this...
	mov dptr, #led_data ; \
	mov a, #8           ;  Put 8 into led_data[0]
	movx @dptr, a       ; /
	mov dptr, #(led_data + 0x0001) ; led_data[1]
	movx @dptr, a
	mov dptr, #(led_data + 0x0002) ; led_data[2]
	mov a, #64
	movx @dptr, a
	sjmp end_if

You've seen this before, near the end of the setup section. The colors are different. The C code for this section in the previous tutorial was set_pixel_for_GRB_LED(led_data, 0, 8, 8, 64);. Notice the two eights in a row? Notice how there's no mov a, #8 for the second value? This is because we already put 8 in A, and copying it to external RAM didn't change it, so it wasn't necessary to do it again. The compiler can do it, and so can we. This saves two bytes of program memory and two CPU cycles. Also notice something new at the end? Yep, the next section is going to be what happens when the button isn't pressed, and we don't want that to run if this is running, so we have to jump over it. We will put the appropriate label at the end of it.

Now for what happens when the button is not pressed. Our JZ jumps to the label if_off, so we will start by making that. After that we copy the previous code that sets the NeoPixel color and change the values a bit. Instead of an SJMP at the end, we will put the label for the SJMP, end_if.

if_off:
	; If BOOT button is not pressed, do this...
	mov dptr, #led_data
	mov a, #64
	movx @dptr, a
	mov dptr, #(led_data + 0x0001)
	mov a, #8
	movx @dptr, a
	mov dptr, #(led_data + 0x0002)
	movx @dptr, a
end_if:

This one is setting the NeoPixel to (8, 64, 8), and after rearrangement to fit the NeoPixel's GRB data format, we once again can omit a MOV to set A to 8. Now our SJMP and JZ instructions have places to go, and we've set led_data to the colors we want.

Now we are out of the if statement. Next, just like in the C program, we will call neopixel_show() to tell the NeoPixel to change the color to whichever one was selected in the if statement. After that, we need that delay, so we will put our comment to remind us.

	; Call _neopixel_show
	; Prepare first arg in dptr
	mov dptr, #led_data
	; Prepare second arg (uint8_t)
	mov _neopixel_show_PARM_2, #0x03
	; Call function
	lcall _neopixel_show

; !!! Brief delay here !!!

Now the entire program is done, aside from the delays. We need somewhere in the ballpark of a 5 millisecond delay to let the clock stabilize after setting it to 16MHz. I looked up information on NeoPixels, and I found two latch times. Older NeoPixels needed a minimum of 6 microseconds of low signal after receiving data to latch and display. Certain newer ones require a minimum of 250 microseconds to latch. I don't know which is on the QT Py CH552, I don't really want to look it up, and even if I did, I would like this to work with the widest range of NeoPixels possible, so let's go for a 250 microsecond minimum delay for the NeoPixel.

I'm not going to go into a ton of details on the delay functions. The important thing is that we need different resolutions, and we only have 8-bit registers that we can reasonably use for counters. We could chain multiple registers, and that's sort of what we will end up doing, but let's not complicate it unnecessarily. First, the CPU is running at 16MHz. Our delays are going to assume this will always be the case. This means that for other clock speeds, the delay functions will have to be adjusted. 16MHz is 16,000,000 cycles per second, so one cycle is 1/16,000,000 seconds. I don't think we will need sub-microsecond accuracy, so let's start with this: 1 microsecond is 16 cycles. We need to pass our delay_us function an 8-bit value, and we need to delay for 16 times that many cycles. Ideally we can do this in as few bytes as possible, to minimize the code memory the function requires. This is what I came up with:

; Expects microsecond value in r0 (if r0 = 0, 256us delay)
; Changes r0, a, and b
; Tuned to 16MHz
; Delays 1 or 2 extra cycles (1/16 or 1/8 microsecond)
	.even  ; This is to make jump cycles predictable
delay_us:                  ; [clock cycles] memory used, even or odd address
	nop                ; [1] 1 byte        0
	nop                ; [1] 1 byte        1
	nop                ; [1] 1 byte        0
	div ab             ; [4] 1 byte        1
	div ab             ; [4] 1 byte        0
	djnz r0, delay_us  ; [2 | 5] 2 bytes   1
	ret                ; [4/5(target)]     1

As you can see from my comments, I managed to get pretty close. This takes an argument between 0 and 255, passed in R0, and it delays that number of microseconds, plus one or two extra cycles. (The number of cycles used by the RET instruction depends on whether the return address is even or odd. If it is even, it takes four cycles, and odd takes five.) For a relay of 1 microsecond, it is accurate within 6% if the return address is even or 11% if it odd. For a delay of 256 microseconds (if you give it 0, it will delay for 256 microseconds, because DJNZ decrements first then checks), it's accurate within 0.03% (even return address) or 0.05% (odd return address). So for short delays it is not super accurate. If you need extreme accuracy for very short delays, you should put the delay in-line rather than calling a delay function, to eliminate uncertainty in branch instructions. For longer microsecond delays, this is quite good accuracy.

That allows us to easily add delays to up to a few milliseconds quite easily. A single call can give us slightly over a quarter of a millisecond, but what happens when we want to add a 5 millisecond delay to allow the clock to stabilize? We would have to call this 20 times. It's also not great for high precision delays in the millisecond range, because each call consumes a few extra clock cycles. What would be really nice is if we also had a delay_ms function that will delay for the number of milliseconds we give it. So I wrote that as well.

; Expects millisecond value in r0 (if r0 = 0, 256ms delay)
; Changes r0, r1, a, and b
; Tuned to 16MHz
; Delays 4 or 5 cycles extra
	.even
delay_ms:
	; Move r0 into r1 (delay_us does not touch r1)
	mov a, r0               ; [1] 1 byte
	mov r1, a               ; [1] 1 byte
delay_ms_loop:
	mov r0, #250            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+4001 cycles)

	mov r0, #250            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+4001 cycles)

	mov r0, #250            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+4001 cycles)

	mov r0, #248            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+3969 cycles)

	djnz r1, delay_ms_loop  ; [2 | 4] 2 bytes
	ret                     ; [4/5(target)]

This function calls delay_us in 250 microsecond increments, but because the function calls, setting up parameters, and the loop itself consume cycles, I had to reduce the fourth call by two microseconds. This one is amazingly accurate. Sometimes you just get lucky and the numbers line up neatly. As the comments say, this only adds four to five extra cycles onto what you asked for. At the microsecond resolution, even one or two cycles make a significant difference for very small delays. At the millisecond resolution, with a 16MHz clock speed, not so much. One millisecond is 16,000 cycles. Four to five extra cycles is 0.025% and 0.031% respectively. Better yet, this can go up to 256 just like the previous one (by giving it a 0), and with that delay, the deviation is only 0.00012% to 0.000098%. The reduction of the final call by two microseconds doesn't just adjust for inaccuracy in this function, but it also helps to correct for the deviation in delay_us.5

Being able to do delays that are multiples of a second would be nice, but it isn't important for this program, so I didn't bother writing a function for that. Perhaps we will do it in a future tutorial. This is sufficient for now though. To call these functions, you would do the following.

	; Delay to let clock stabilize
	mov r0, #5
	lcall delay_ms

That will call delay_ms to give the 5 millisecond delay to let the clock stabilize.

	; Delay to let NeoPixel display
	mov r0, #250
	lcall delay_us

This will call delay_us to give the NeoPixel the 250 microseconds it needs to latch and display the specified color.

Now you can replace the delay comment placeholders in our code with those. The first goes right after the clock initialization, and the second goes after the two places where we call neopixel_show(). Following is what your program should look like now.

	.module main

; Setup external references
	.globl _neopixel_show
	.globl _neopixel_show_PARM_2


; Define constants
SAFE_MOD  = 0xa1
CLOCK_CFG = 0xb9
P1_MOD_OC = 0x92
P1_DIR_PU = 0x93
P3_MOD_OC = 0x96
P3_DIR_PU = 0x97
P3        = 0xb0


; Set beginning of data area (register segment)
	.area RSEG      (ABS, DATA)
	.org 0x0000

; Allocate register banks
	.area REG_BANK_0        (REL, OVR, DATA)
	.ds 8

;       .area REG_BANK_1        (REL, OVR, DATA)
;       .ds 8

;       .area REG_BANK_2        (REL, OVR, DATA)
;       .ds 8

;       .area REG_BANK_3        (REL, OVR, DATA)
;       .ds 8


; Allocate memory in the bit addressable
; segment of internal RAM.
;       .area BSEG      (BIT)
;       .ds 16


; This area can be used for
; relocatable data in internal RAM.
;       .area DSEC      (DATA)


; Set the stack area
	.area SSEG
__start__stack:
	.ds 1


; Allocate variables in external memory
	.area XSEG      (XDATA)
led_data:
	.ds 3


; Program execution starts at address
; 0x0000 in code memory.  This is
; followed by interrupt vectors, so we
; have to use the 3 bytes available
; for an ljmp to the program code.
	.area ISR       (ABS, CODE)
	.org 0x0000
	ljmp    _start


; This is the relocatable code area,
; where our program goes.
	.area TEXT      (CODE)
_start:
	; Set the beginning of the stack
	mov sp, #(__start__stack - 1)

	; Set clock to 16MHz
	mov SAFE_MOD, #0x55
	mov SAFE_MOD, #0xAA

	mov a, #~0b111
	anl a, CLOCK_CFG
	orl a, #0x05
	mov CLOCK_CFG, a

	mov SAFE_MOD, #0x00


	; Delay to let clock stabilize
	mov r0, #5
	lcall delay_ms


	; Setup NeoPixel pin
	anl P1_MOD_OC, #0b11111110
	orl P1_DIR_PU, #0b00000001
	;                        ^

	; Setup BOOT button pin
	anl P3_MOD_OC, #0b10111111
	anl P3_DIR_PU, #0b10111111
	;                  ^

	; Put color data in led_data
	mov dptr, #led_data
	mov a, #8
	movx @dptr, a
	mov dptr, #(led_data + 0x0001)
	mov a, #64
	movx @dptr, a
	mov dptr, #(led_data + 0x0002)
	rr a
	movx @dptr, a

	; Call _neopixel_show
	; Prepare first arg in dptr
	mov dptr, #led_data
	; Prepare second arg (uint8_t)
	mov _neopixel_show_PARM_2, #0x03
	; Call function
	lcall _neopixel_show


	; Delay to let NeoPixel display
	mov r0, #250
	lcall delay_us


main_loop:
	; Check BOOT button and set LED color
	mov a, P3           ; Get port 3
	anl a, #0b01000000  ; Clear all other bits
	jz if_off           ; If button is not pressed skip ahead
	; If BOOT button is pressed, do this...
	mov dptr, #led_data ; \
	mov a, #0x08        ;  Put 8 into led_data[0]
	movx @dptr, a       ; /
	mov dptr, #(led_data + 0x0001) ; led_data[1]
	movx @dptr, a
	mov dptr, #(led_data + 0x0002) ; led_data[2]
	mov a, #0x40
	movx @dptr, a
	sjmp end_if
if_off:
	; If BOOT button is not pressed, do this...
	mov dptr, #led_data
	mov a, #0x40
	movx @dptr, a
	mov dptr, #(led_data + 0x0001)
	mov a, #0x08
	movx @dptr, a
	mov dptr, #(led_data + 0x0002)
	movx @dptr, a
end_if:


	; Call _neopixel_show
	; Prepare first arg in dptr
	mov dptr, #led_data
	; Prepare second arg (uint8_t)
	mov _neopixel_show_PARM_2, #0x03
	; Call function
	lcall _neopixel_show

	; Delay to let NeoPixel display
	mov r0, #250
	lcall delay_us

	sjmp main_loop   ; Restart loop


; Expects microsecond value in r0 (if r0 = 0, 256us delay)
; Changes r0, a, and b
; Tuned to 16MHz
; Delays 1 or 2 extra cycles (1/16 or 1/8 microsecond)
	.even  ; This is to make jump cycles predictable
delay_us:
	nop                ; [1] 1 byte        0
	nop                ; [1] 1 byte        1
	nop                ; [1] 1 byte        0
	div ab             ; [4] 1 byte        1
	div ab             ; [4] 1 byte        0
	djnz r0, delay_us  ; [2 | 5] 2 bytes   1
	ret                ; [4/5(target)]     1


; Expects millisecond value in r0 (if r0 = 0, 256ms delay)
; Changes r0, r1, a, and b
; Tuned to 16MHz
; Delays 4 or 5 cycles extra
	.even
delay_ms:
	; Move r0 into r1 (delay_us does not touch r1)
	mov a, r0               ; [1] 1 byte
	mov r1, a               ; [1] 1 byte
delay_ms_loop:
	mov r0, #250            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+4001 cycles)

	mov r0, #250            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+4001 cycles)

	mov r0, #250            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+4001 cycles)

	mov r0, #248            ; [2] 2 bytes
	acall delay_us          ; [4] 2 bytes (+3969 cycles)

	djnz r1, delay_ms_loop  ; [2 | 4] 2 bytes
	ret                     ; [4/5(target)]

And that is it. Your program is finished. Now we just need to assemble it and link it. Assembling is easy, but linking requires one more step. You may have noticed in the previous tutorial, that the compiler produced a ton of different files. Some of those are instructions for the assembler. Some are files produced by the linker, so we can see where it put stuff. One in particular is a linker script, produced by the compiler. That file is what the linker needs to produce the executable. I'm not going to go over all of the details of writing linker scripts right now (the assembler documentation includes the linker documentation; you can find it here if you installed SDCC to the default location: C:/Program Files/SDCC/doc/sdas/asmlnk.txt), but I've written one for you, and I'll explain the parts it has.

-muwx
-i main.ihx
-M
-b TEXT = 0x0070
-b ISR  = 0x0000
-b XSEG = 0x0001
main.rel
WS2812.rel

-e

The first line is linker arguments. If you run sdld -h in the terminal, it will spit out help information explaining what the arguments do. -i tells it the name of the output file. -M tells the linker to produce a .mem file. For this it will be named main.mem. This is the file that gives a visual graph of how the linker used the internal memory. This can be really handy for things like making sure things you want allocated are allocated (register banks, for example) and for checking how much stack space you have available. The -b arguments tell the linker exactly where to put areas in their respective memory banks. You might recall we worked out earlier that the main code section, TEXT could go at address 0x0070 without getting in the way of interrupt vectors. This is where we specify that. We've also told the linker to put the ISR area at 0x0000. Sure, we used .org for that in the assembly code, but just to be 100% safe, I've included it here as well. I'm honestly not sure why XSEG is starting at address 0x0001. That's where SDCC put it, and since I don't know any better at this point, I did the same. I suspect I know why though. It's traditional to set uninitialized pointers to address 0. That's what "null pointer" means. NULL is 0, so a null pointer is a pointer set to 0. You could probably set XSEG to start at 0x0000 without any problems, but if you did that, you wouldn't be able to tell the difference between a null pointer and a valid pointer to address 0x0000. That said, the CH552 only has 1KB of external RAM but 64KB of address space. You could just set uninitialized pointers to 0xFFFF and test for that instead of null. Unless you really need that one byte of memory though, it probably doesn't matter. Lastly we have the two .rel files we are linking, and -e just means we are done and that is the end of the linker script. This needs to go in a file named main.lk. Now we have everything we need.

To assemble, we run sdas8051.exe -plosffwv main.rel main.asm. The assembler is not great at producing meaningful error output. On several occasions I forgot commas between instruction arguments, and the assembler silently failed. If you want to know what that long list of switches does, run sdas8051.exe -h. Don't forget to compile WS2821.c like we did before. Then run the linker, sdld -nf main.lk. (You can remove the n from the switches, if you want to see the commands from the linker script as they are executed.) Now you have main.ihx. As before, that needs to be packed into .hex format with packihx main.ihx > main.hex. Now you can upload to your QT Py CH552 using vnproch55x like we did in the previous tutorial. As usual, I've produced a Makefile specifically for this tutorial, that does not delete your main.asm file or the linker script when you tell it to clean.

F_CPU=16000000
C_FLAGS=


main.hex: main.ihx
	packihx main.ihx > main.hex

main.ihx: main.rel WS2812.rel
	sdld -nf main.lk

main.rel: main.asm
	sdas8051.exe -plosffwv main.rel main.asm

WS2812.rel: WS2812.c
	sdcc WS2812.c -c -DF_CPU=$(F_CPU) $(C_FLAGS)


clean:
	-rm *.lst
	-rm *.sym
	-rm *.rel
	-rm *.ihx
	-rm *.hex
	-rm *.map
	-rm *.mem
	-rm *.rst
	-rm WS2812.asm

Now your QT Py CH552 should do exactly the same thing it did once you completed the previous tutorial, except a little more optimally. Assuming this is your first time programming an 8051 clone in assembly, you've just successfully written your first 8051 assembly program! In addition, this can serve as a template to write bigger programs from, with just enough I/O for basic debugging. We will use this as the template for some of the future tutorials as well. For now, enjoy the feeling of achievement you’ve earned by completing this task.

Note that most modern assemblers will accept either .globl or .global. Early assemblers used .globl, likely to make parsing simpler on resource constrained systems. Most modern assemblers use .global but still accept .globl for backward compatibility. sdas8051 will not accept .global, so we will have to use .globl.

SDCC prepends symbols in the C program with an underscore, to avoid naming collisions with pure assembly modules. This is important to keep in mind when calling C functions from assembly, as we will see soon. It also means that _main is the C function main().

It actually isn't called externally. If you scroll down far enough (line 186 on my version), you'll find a label __sdcc_program_startup: directly followed by a long jump to _main. That's where SDCC calls main() from, not from an external file.

I also changed the name of the label. That 00103$ is a "local label" that is only in scope between two named labels, and that format is used to avoid naming collisions. We aren't using it here for readability reasons, but it's a good idea in more complex programs when making simple conditionals and loops. You can find more information in the footnotes of section 3.11.4 in the SDCC documentation installed at C:/Program Files/SDCC/doc/sdccman.pdf, if you installed SDCC at the default location. Long story short, it's five numerals followed by $, and the label is only in scope between the closest named labels.

Notice the .even directive used before each function? That tells the assembler to align the starting address to be even. That allows me to select whether return addresses are even or odd, which guarantees that delay_us will only add 1 extra cycle when called from within delay_ms.

Technium Adeptus

CH552 Assembly: 2 - Again in Assembly