User Tools

Site Tools


aslfaqs

Assembly Language FAQs

Some simple questions about how to translate high level language concepts into the idiom of assembly lnaguage.

Why are there so many instructions?

There are only really a few dozen instrucions in 6809 assembly language. At first it can be confusing because registers are tagged onto the end of operations without any punctuation. So LDA, STB, LDD, CMPX, LDY, STU and CMPS can seem like a mysterious random jumble rather than just three basic operations on different registers.

When faced with an unfamiliar instruction, try to remove a register from the end to see what is the real operation:

orb #20     ; Or B #20
addd #400   ; Add D #400
sty $1000   ; St Y $1000 
inca        ; Inc A
comb        ; Com B
pshu        ; Psh U
sex         ; *not* Se X, but Sign EXtend
daa         ; *not* Da A, but Decimal Addition Adjust

Which registers do I use?

For an address variable pick an index register, X, Y or even (user stack) U, it doesn't matter which. For a character or small numeric variable pick an accumulator, A or B, again it doesn't matter which. If you need 16-bit values use D, which is the two accumulators combined (A high, B low).

Choice between index or accumulator severely limits the operations you can perform. Only LD, ST, and CMP are common between all user registers. General arithmetic operations (NEG, AND etc.) are limited to the 8-bit accumulators, except for ADD and SUB which are available to 16-bit D. Index registers uniquely have the LEAX operation, and the stacks (U and S) have PSH and PUL.

Don't use the system registers (PC, CC, DP, S) directly unless you know exactly what you're doing.

In practice there are minor differences between registers. One accumulator may have a specialised operation not available to the other, and the LEA instruction works a little differently for stacks so you can't use LEAU -1,U as a loop counter. Also X is the preferred index register as its instructions are sometimes coded more efficiently

How do i declare, assign, and modify variables?

Reserve space for a global variable with the RMB (reserve memory bytes) directive. This isn't a machine language instruction, rather it simply tells the assembler how to arrange space in program memory. Remember that only byte (8-bit) and word (16-bit) values can be handled by the CPU in a single operation. More complex variables are referenced by their (16-bit) addressses.

Variables are fetched into a register with the LD (load) operation, the registers worked upon as needed, and updated in memory with the ST (store) operation. A useful operation for initialising string and other structured variables is LEAX (load effective address).

nxtChr   rmb 1       ; VAR nxtChr:byte
nxtLine  rmb 2       ; VAR nxtLine:word 
strAdr   rmb 2       ; VAR strAdr:^string
msgLen   fcb 5       ; VAR msgLen:byte=5 
msgTxt   fcb "Hello" ; CONST msgTxt="Hello" 
maxLine  fdb 1000    ; VAR maxLine:word=1000

lda #space        
sta nxtChr         ; nxtChr:=32
ldd #1  
std nxtLine        ; nxtLine:=1
leax msgTxt,pcr    ; address of string via PC relative mode
stx strAdr         ; strAdr:=@msgTxt
...
ldx strAdr         ; retrieve pointer  
lda ,x+            ; read character and increment pointer 
ora #$20
sta nxtChr         ; store character
stx strAdr         ; store pointer

A variable can be initialised at assembly time with the FCB (format constant byte) and FDB (format double byte) directives. This isn't a good idea except for constant data as the initialisation happens only when the program is first loaded, and not when it is run a second time.

What is a bigendian?

The 6809 is described by the Swiftian term “big-endian”; the Intel 8080, like many early micros, is “little-endian”. This refers to the way 16-bit values are stored in memory - big byte first, or little byte first.

It may seem the less sensible concept (which gets sillier as the design decision persists to 64-bithood), but it's simpler to design an 8-bit microprocessor as little-endian. For example, to add a 16-bit constant 8 bits at a time, storing the low byte first makes it easier to deal with the carry. If there was no ADDD instruction for the 6809 we might do this:

  addb ,x+      ; add low byte
  adca ,x+      ; add high byte, using the carry

The assembler will take care of the byte ordering for us when we specify 16-bit constants, but it's something to bear in mind when converting programs between micros.

What does the LEA instruction do?

The LEA (Load Effective Address) instruction is used to initialise and modify index registers, ie. address variables. Only indexed addressing modes are available.

The jargon term 'effective affress', in the context of a LDA instruction for example, is simply the address from which the register A is loaded:

  ldx #$0400
  ldb #$10
  lda $20,x   ; Effective Address is X+$20, ie. $0420
  lda b,x     ; Effective Address is X+B, ie. $0410

Therefore a LEA instruction is simply performing addition:

  leax $20,x  ; X:=X+32
  leax b,x    ; X:=X+B

Registers can be mixed as needed:

  leay ,x     ; Y:=X 
  leau 4,s    ; U:=S+4

Any indexed mode can be used, so index registers can be initialised using PC relative mode:

helloStr fcb "Hello World",0
  leax helloStr,pcr      ; point X to string
  lda ,x+                ; A contains 72 (ASCII 'H')

Don't use the autoincrement modes, stick to constant offsets. Only the Zero flag of the condition codes is modified, and only for assignments to X and Y. This is so the stacks can be adjusted at the end of a subroutine without affecting the zero flag which might be checked by the caller.

  leax ,x++   ; does nothing
  leay ,-y    ; not recommended
  leay -1,y   ; approved loop counter
  bne loop  

How do I call a subroutine?

Use the JSR (Jump to SubRoutine) or BSR (Branch to SubRoutine) instructions. How to pass parameters depends entirely on how the subroutine was written. Typically a parameter is loaded into an accumulator, or its address is loaded into an index register.

  lda xPos
  ldb yPos
  leax bmapShip,pcr
  jsr drawBmap
moreProgram:
  adda #20  

A subroutine continues until it reaches a RTS (ReTurn from Subroutine) instruction, at which point execution continues with the instruction following the JSR. Results are returned the same way as parameters are passed. Setting the condition codes is also an option, such as setting the Carry flag in case of an error.

drawBmap:
  pshs a,b,y        ; save registers as needed
  ldy #screenBase
  lsra
  leay a,y
  ...
failure:
  orcc #1
finish:        
  puls a,b,y        ; we didn't save X as we don't expect the caller to reuse it 
  rts

Subroutines work by pushing the PC (Program Counter), which always contains the address of the next instruction, onto the system stack (S register), then loading the PC with the destination address. The next RTS instruction should simply PULS the PC's value from the stack. Note that this won't work if the stack has been modified and not correctly restored.

It's often a good idea to save and restore the values or register that are modified during the subroutine. If the last instruction before the RTS is a PULS we can optimise by combining the two:

  pshs x,y       ; save registers
  leax 2,x
  leay -4,y
  ...
  puls x,y,pc    ; restore registers and return 

What is a stack and how do I (safely) use it?

A stack is a crucial concept of machine language programming. Basically it's a way of storing information and retrieving it later without too much worry about the fine details. You can PuSH registers onto the stack and PulL (sometimes known as POP) them off later. As long as pushing and pulling is done in matching order (like matching the brackets in an arithmetical expression) all we need to know is the name of the register (S or U in the case of the 6809).

People like to use physical analogies. For example, you're at your desk scribbling your memoirs when you need to check a detail in those photos you took on holiday in Ibiza. So you put your papers aside (onto the “stack”) to make room. Then the phone rings and you need to get out your appointment book, so the photos go onto the “stack”. And then there's someone at the door… this could go on indefinitely, though there is ultimately a physical limit to the size of the stack.

The point is that you can easily imagine retrieving your papers from the “stack” and smoothly carrying on with the last job after every interruption. It only needs a little order in your working methods. And if you have a whole program written down beside you, and the current line number is just another detail that can be jotted down and pushed on the stack, nothing could possibly go wrong.

The system stack in a 6809 system is pointed to by the register S. Typically it will be initialised by the operating system at the highest address in RAM (the user program doesn't usually set it). Pushing a byte means first decrementing S by 1, then storing the byte into address S. Doing this twice pushes a 16-bit word. To pull a byte, read its value from address S then increment S by 1. Note that this is identical to using the 6809's autoincrement/decrement modes.

There are two stacks on the 6809, U (User) and S (system); normally we don't bother much with U. Any combination of registers can be pushed and pulled in a single instruction. With one exception: we can't stack a stack register onto itself, which would have little point anyway.

  pshs a,b,x,y,u
  ldd #loopCount
loop:  
  pshs d
  lda ,x+
  cmpa ,y+
  ...
  puls d
  subd #1
  bne loop  
  puls a,b,x,y,u,pc

If we PSHS A, modify A, then PULS A we get the same value back, and we don't have to care what was done to the stack before the push. Equally, our “modify A” part might have included PSHS X,Y followed by PULS X,Y, and we would not notice. But we must do our stacking in matching pairs with no overlap; PSHS A then PSHS X,Y then PULS A will not restore A.

Calling subroutines relies on the system stack. A JSR pushes the Program Counter (PC) onto the stack, and a RTS pulls it. Thus the program can continue where it left off when the subroutine is over. Instead of using RTS, a subroutine might add the PC to a list of pulls at the end of a subroutine.

Stacking order is always the same, regardless of how the assembly language instruction is written. For example PSHS Y,B,A,X will be retrieved correctly by PULS D,Y,X. When all the registers are stacked on S it will look like this:

    CC    A    B   DP    X         Y         U        PC 
S+   0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 | 9  | 10 | 11 | 12

The S register can be used as an index register just like any other to access these values individually. But don't use negative offsets as you can't trust anything below the stack; the system may overwrite these bytes at any time, literally between instructions.

You can use the LEA instruction to reserve space on the stack; LEAS -12,S is like a push, LEAS 12,S a pull. As before, you must do this in matching pairs.

  pshs d,x,y,u
  leas -4,s         ; reserve 4 bytes to do what we like with
  clr 2,s
  ...
  lda #4
  pshs a
  lda 3,s           ; the same address we cleared above
  ...
  puls a            ; A equals 4
  leas 4,s
  puls d,x,y,u,pc

The dangers of using the stack:

  • There's nothing to stop it expanding to overwrite programs and data, but this shouldn't happen in practice.
  • Using the S register for our own purposes when a system interrupt may occur, overwriting our data seemingly at random.
  • Generally not doing pushing and pulling in matching pairs and mixing up the data, with particular reference to return addresses.

How do I do something a hundred times over?

In a high-level language, typically you would set a variable to zero, then count up to a hundred:

n:=0;
REPEAT
  writeln;
  n:=n+1
UNTIL n=100

Assembly language prefers an idiom where you start at a hundred and count down to zero:

  ldb #100
repeatLoop:
  lbsr writeln
  decb
  bne repeatLoop

Assuming the value of the loop counter isn't used anywhere, it's a little simpler this way.

How do I write an IF / THEN statement?

The easy way is to reverse the logic. So if you had:

IF xPos>100 THEN xPos:=100

Imagine instead it was:

IF xPos<=100 SKIP xPos:=100

In assembly language that could be:

  lda xPos
  cmpa #100      ; IF
  bls noLimit    ; SKIP 
  lda #100       ; optional
  sta xPos       ; statements 
noLimit:
  lda yPos       ; continue with program           

How do I write a REPEAT / UNTIL loop?

Simply write the body, then a comparison, and a conditional branch back to the start of the body.

REPEAT
  plotPixel (left, top)
  left:=left+1
UNTIL left>right  

Becomes:

  lda left
  ldb top 
plotLoop:          ; REPEAT
  jsr plotPixel
  inca
  cmpa right        
  bls plotLoop     ; UNTIL
  ldb bottom       ; continue

How do I write a WHILE loop?

This is very similar to REPEAT, except the comparison has to be made at the start of the loop. This is done simply by preceding the loop with a branch to the conditional test at the end:

WHILE left<=right DO
  plotPixel (left, top)
  left:=left+1    

Becomes:

  lda left    
  ldb top
  bra plotTest
plotLoop:          ; DO    
  jsr plotPixel
  inca
plotTest:    
  cmpa right
  bls plotLoop     ; WHILE
  ldb bottom       ; continue

How do I access an array?

Generally you won't be using arrays in quite the same way as with high level languages. But if you need to, the programmer must do all the work of reserving memory and calculating offset addresses. For an array of bytes this is simple; point to its start address with an index register and use an offset:

numList rmb 100
  leax numList,pcr    ; point to array
  ldb #99             ; last element
  lda b,x             ; A:=numList[99]

For an array of addresses (two bytes each) the offset has to be doubled by bit-shifting. Note that the range of offsets is limited to -128 to 127 with an 8-bit offset so D may have to be used.

adrList rmb 400
  leax adrList,pcr
  ldb #199
  clra
  lslb
  rola
  ldy d,x             ; Y=adrList[199]

Two-dimensional arrays can be accessed using the MUL (unsigned MULtiply) instruction. A 32×16 character text screen could be treated as an array of 16 lines of 32 bytes:

cursorFlash:
  ; A=column, B=line
  pshs a,b,x
  cmpa #32
  bhs out
  cmpb #16 
  bhs out
  ldx #textScreen
  leax a,x        ; X:=X+A
  lda #32
  mul             ; D:=B*32
  leax d,x        ; X:=X+D  
  lda ,x          ; A:=textScreen[A,B]
  eora #$40
  sta ,x
out:    
  puls a,b,x,pc

How do I use strings?

Like any other complex data structure you have to do all the work yourself. There is however a convention of the “null-terminated string” which is widely used and may take little work. It's simply a list of bytes terminated by a zero. Thus a string can be easily defined and as long as you like, the only limit being it can't include any zeroes.

strDelay   fcb "Please wait.",0                 ; easy static string
strWelcome fcb "Hello %USER% and welcome.",0    ; substitution expected, so a problem
  leax strDelay,pcr
  bsr strLen
  jmp romPrint16              ; print length of strDelay
strLen:
; return length of string at X in B
  pshs x
  ldd #0
  bra strL2
strL1:  
  addd #1                      ; count a valid character
strL2:  
  tst ,x+                      ; null terminator? 
  bne strL1
  puls x,pc

This works well for sending messages to the screen, or making simple modifications such as changing to upper case. But anything more, such as simply catenating two strings, requires the programmer to ensure there is enough space available. Getting this wrong may crash the program.

An alternative structure might be to precede the string by two bytes specifying its length and bytes available:

strWelcome    fcb strWelcomeEnd-strWelcome-2, 100  ; length followed by max length
              fcb "Hello %USER% and welcome."      ; string characters (25 bytes)     
strWelcomeEnd rmb 100-strWelcomeEnd+strWelcome+2   ; spare space (75 bytes)
strSpaceEnd:

These strings are awkward to define in assembly language. The programmer must then write a library of reliable routines to manipulate this structure.

How do I define data records and arbitrary structures?

As with arrays and strings, you point an index register where you hope the structure will be, and then do all the housekeeping yourself. Assembly language can help a little by allowing us to define constants to label offsets within the structure. Clearly this allows for variant records (fields having dual meanings).

; sprite structure (8 bytes total)
spr_type EQU 0    ; 8-bit type
spr_xPos EQU 1    ; 16-bit X co-ord
spr_yPos EQU 3    ; 16-bit Y co-ord
spr_mask EQU 5    ; 8-bit collision mask 
spr_bmap EQU 6    ; 16-bit bitmap address 
spr_size EQU 8
spr_maxN EQU 20   ; room for 20 sprites
spr_table rmb spr_maxN*spr_size

sprEraseAll:
; erase all active sprites
  leau spr_table,pcr
  ldb #spr_maxN
sprEraseLoop:  
  lda ,u
  beq sprNext         ; zero type means not active
  ldx spr_xPos,u      ; X co-ord
  ldy spr_yPos,u      ; Y co-ord
  lbsr sprErase  
sprNext:
  leau spr_size,u     ; point to next sprite  
  decb                ; loop count
  bne sprEraseLoop
  rts
  

How do I use local variables?

The simplest way to use a local variable is to push a register onto the stack (remembering to remove it later).

shift: 
; shift register A right B times
; return result in D
  pshs b             ; B is now a local variable with address S
  clrb               ; destroy B to hold result
  tst ,s             ; was B zero?
  beq shiftOut
shiftLoop:  
  lsra               ; 16-bit shift right
  rorb
  dec ,s             ; decrease local variable
  bne shiftLoop      ; loop (original) B times 
shiftOut:  
  leas 1,s           ; remove local variable
  rts

The problem with this is that when we push more temporary values on the stack, the offset to our local variable changes. In the example above, if we chose to PSHS X then our offset now becomes 2,S . We can deal with this, but it's awkward.

A solution is to reserve space on the stack, then point to it with another register, and use this one as the offset base. The principle is the same as with the structures described above. It can be extended to allow for the passing of arbitrary parameters on the stack. For example:

PROCEDURE drawBox (left, top, right, bottom:integer);
VAR xPos, yPos:integer;
BEGIN
  xPos=left;
  yPos=right;
  REPEAT
     ...
  UNTIL yPos>=bottom
END

Becomes:

xPos   EQU -4
yPos   EQU -2
left   EQU 8
top    EQU 10
right  EQU 4
bottom EQU 6          
; 2 would be the return address
; 0 would be the saved value of U

; draw a 10x6 box at (0,2)
  ldx #0            ; left:=0  
  ldy #2            ; top:=2
  pshs x,y
  ldx #10           ; right:=10 
  ldy #8            ; bottom:=8 
  pshs x,y         
  lbsr drawBox
  leas 8,s          ; remove 4 16-bit parameters 
  rts
drawBox:  
  pshs u            ; save value of U
  leau ,s           ; U is an address between the locals and the parameters
  leas -4,s         ; reserve space for 2 16-bit integers on stack
  ldd left,u        ; U+8
  std xPos,u        ; U-4 
  ldd top,u         
  std yPos,u
drawBoxLoopY:  
  ...
  ldd yPos,u
  cmpd high,u
  blt drawBoxLoopY
  leas 4,s         ; release reserved space
  puls u,pc        ; restore U and return

Can I do recursion?

Yes, there's no reason why an assembly language subroutine shouldn't call itself. But the same caveats apply as in any other language. You must use local variables for instance, and not alter global memory in unexpected ways.

Beware of very deep recursion; a pixel flood-fill routine might call itself thousands of times, overwriting more than the available RAM with the system stack.

What is position independence and when do I use it?

Ideally a machine language program can be loaded at any address in memory and still work correctly; the only thing that has to change is the program start address. This is Position Independent Code (PIC). It is more important for smaller routines as several may be loaded at the same time.

What prevents position independence? Addresses being hard-coded rather than relative. Direct jumps to subroutines are a common example. All jumps and branches (within the program, not eg. external ROM routines) can be as easily coded using relative offsets; instead of JMP and JSR use BRA/LBRA and BSR/LBSR.

The same applies to data references. Variables stored on the stack are fine. As are pointers returned by the operating system. Other structures need to be referenced using PC relative addressing.

The PC (Program Counter) always contains the address of the next instruction. The same mechanism used for branching can be used for data addressing. Giving a relative offset from the PC means the actual address at which the program is loaded is irrelevant. We rarely give the actual offset, just as we don't give the number of bytes to skip in a relative branch. Instead we specify the target address, and indicate this by following it with “,PCR”.

frameCount rmb 1
frameList rmb 20
  lda frameCount         ; direct extended addressing
  lda frameCount,pcr     ; same address calculated via PC relative
                         ; the assembler calculates the offset
                         ; NOT the same as PC+frameCount
  inca
  ldb -27,pc             ; PC-27 - this form is rarely useful
  ...
  ldx #frameList         ; initialise index using immediate mode
  leax frameList,pcr     ; X has the same value as above
  ldy ,x++               ; and the routine can proceed just as before

Not all addresses should be accessed via PCR; only the ones that move with the program. This means NOT I/O ports, screen buffers, ROM routines etc.

There are downsides to PCR, mainly that it is less efficient, taking more bytes and machine cycles to do the same task. Sometimes it can be more complex to program:

bufSize EQU 100
buffer rmb bufSize     
bufEnd:

bufClear:
  leax buffer,pcr
bufCloop:  
  clr ,x+
  cmpx #???            ; er, how to compare X with end of buffer space?
  blo bufCloop
  
bufClear:  
  leax bufEnd,pcr      ;
  pshs x               ; stack address of end of buffer space
  leax buffer,pcr
bufCloop:  
  clr ,x+
  cmpx ,s              ; compare X with end of buffer space
  blo bufCloop
  leas 2,s             ; tidy stack

An alternative to PCR addressing is direct paged addressing; copying PC to DP takes just a few steps. This is both time and space efficient, but limits the program to start on 256 byte pages. It's a good solution for large programs such as arcade games.

org $4000
start:
  lbsr main         ; stack PC
counter rmb 1
variables rmb 100  
main: 
  puls d            ; retrieve old PC (points to 'counter')
  tfr a,dp          ; high byte of PC goes to DP
  setdp $40         ; directive to help the assembler
  clr counter       ; most assemblers will now use DP automatically

In general, you should always use position independent code for programs intended to have any sort of longevity. The exceptions are special cases such as embedded systems, ROM cartridges, and plain quick-and-dirty mash-ups.

What are interrupts?

An interrupt is a physical signal that can occur at any time, triggering the CPU into stacking its registers, executing a subroutine, then retrieving its registers to continue as normal. This is a way of providing a software response to a hardware event. It may be as simple as updating a counter in memory in sync with an external timer.

The details of interrupts are intricate and a relatively advanced topic, but most home micros have a simple use for them: vertical blanking. On an old-fashioned CRT, every time the display of a graphics frame is complete the circuitry issues an interrupt. Syncing with this allows for smooth animation in games.

gameLoop:
  sync                  ; wait for interrupt
  lbsr eraseSprite
  lbsr moveSprite
  lbsr drawSprite       ; most of the time is spent with the sprite drawn, not erased
  bra gameLoop

The 6809 has four hardware interrupts (IRQ, FIRQ, NMI, and RESET), and three software interrupts (SWI, SWI2, SWI3), each with its own vector pointing to a service routine. Software interrputs are machine language instructions rather than hardware events, and typically are used for debugging breakpoints or calling operating system routines.

An interrupt service routine must end with the instruction RTI (Return from Interrupt) so that the registers are unstacked correctly. This first pulls the CC and checks the E (Entire) bit. If set it then pulls all the registers from the stack, if clear it just pulls the PC. The fast interrupt (FIRQ) doesn't stack the working registers, and the E bit is the way of flagging the difference.

Two other CC bits are involved with interrupts. Setting I disables (or masks) the IRQ response, and setting F disables the FIRQ response.

What is reentrant code and how do I write it?

A reentrant machine language program can be very roughly treated; you can stop it (saving its registers), restart it from the beginning, stop it again, and still have it continue correctly where it left off (restoring its registers of course). This is exactly what is needed for a multi-tasking operating system: the same code used by multiple processes.

The concept is related to interrupts and recursion. Whereas a recursive routine secures its variable so it's safe to call itself from just one point, a reentrant routine must try harder. Basically, it must allocate itself a new set of variables whenever it runs.

There's no reason why a short routine shouldn't be reentrant. Simply allocate variables on the stack and never use globals. And avoid eccentricities like self-modifying code. For a larger program things become complex. Fresh memory might be allocated from a heap by the operating system for example. It's an advanced topic, but still doable on the 6809.

What is indirection and how do I use it?

How do I use virtual methods to create polymorphic objects?

How do negative numbers work?

What do all the instructions do?

[Will get round to this one]

aslfaqs.txt · Last modified: 2025/09/09 19:26 by reggie

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki