===== Assembly Language FAQs ===== Some simple questions about how to translate high level language concepts into the idiom of assembly lnaguage. ==== Why are there so many instructions? ==== There are only really a few dozen instrucions in 6809 assembly language. At first it can be confusing because registers are tagged onto the end of operations without any punctuation. So LDA, STB, LDD, CMPX, LDY, STU and CMPS can seem like a mysterious random jumble rather than just three basic operations on different registers. When faced with an unfamiliar instruction, try to remove a register from the end to see what is the real operation: orb #20 ; Or B #20 addd #400 ; Add D #400 sty $1000 ; St Y $1000 inca ; Inc A comb ; Com B pshu ; Psh U sex ; *not* Se X, but Sign EXtend daa ; *not* Da A, but Decimal Addition Adjust ==== Which registers do I use? ==== For an address variable pick an index register, X, Y or even (user stack) U, it doesn't matter which. For a character or small numeric variable pick an accumulator, A or B, again it doesn't matter which. If you need 16-bit values use D, which is the two accumulators combined (A high, B low). Choice between index or accumulator severely limits the operations you can perform. Only LD, ST, and CMP are common between all user registers. General arithmetic operations (NEG, AND etc.) are limited to the 8-bit accumulators, except for ADD and SUB which are available to 16-bit D. Index registers uniquely have the LEAX operation, and the stacks (U and S) have PSH and PUL. Don't use the system registers (PC, CC, DP, S) directly unless you know exactly what you're doing. In practice there are minor differences between registers. One accumulator may have a specialised operation not available to the other, and the LEA instruction works a little differently for stacks so you can't use LEAU -1,U as a loop counter. Also X is the preferred index register as its instructions are sometimes coded more efficiently ==== How do i declare, assign, and modify variables? ==== Reserve space for a global variable with the RMB (reserve memory bytes) directive. This isn't a machine language instruction, rather it simply tells the assembler how to arrange space in program memory. Remember that only byte (8-bit) and word (16-bit) values can be handled by the CPU in a single operation. More complex variables are referenced by their (16-bit) addressses. Variables are fetched into a register with the LD (load) operation, the registers worked upon as needed, and updated in memory with the ST (store) operation. A useful operation for initialising string and other structured variables is LEAX (load effective address). nxtChr rmb 1 ; VAR nxtChr:byte nxtLine rmb 2 ; VAR nxtLine:word strAdr rmb 2 ; VAR strAdr:^string msgLen fcb 5 ; VAR msgLen:byte=5 msgTxt fcb "Hello" ; CONST msgTxt="Hello" maxLine fdb 1000 ; VAR maxLine:word=1000 lda #space sta nxtChr ; nxtChr:=32 ldd #1 std nxtLine ; nxtLine:=1 leax msgTxt,pcr ; address of string via PC relative mode stx strAdr ; strAdr:=@msgTxt ... ldx strAdr ; retrieve pointer lda ,x+ ; read character and increment pointer ora #$20 sta nxtChr ; store character stx strAdr ; store pointer A variable can be initialised at assembly time with the FCB (format constant byte) and FDB (format double byte) directives. This isn't a good idea except for constant data as the initialisation happens only when the program is first loaded, and not when it is run a second time. ==== What is a bigendian? ==== The 6809 is described by the Swiftian term "big-endian"; the Intel 8080, like many early micros, is "little-endian". This refers to the way 16-bit values are stored in memory - big byte first, or little byte first. It may seem the less sensible concept (which gets sillier as the design decision persists to 64-bithood), but it's simpler to design an 8-bit microprocessor as little-endian. For example, to add a 16-bit constant 8 bits at a time, storing the low byte first makes it easier to deal with the carry. If there was no ADDD instruction for the 6809 we might do this: addb ,x+ ; add low byte adca ,x+ ; add high byte, using the carry The assembler will take care of the byte ordering for us when we specify 16-bit constants, but it's something to bear in mind when converting programs between micros. ==== What does the LEA instruction do? ==== The LEA (Load Effective Address) instruction is used to initialise and modify index registers, ie. address variables. Only indexed addressing modes are available. The jargon term 'effective affress', in the context of a LDA instruction for example, is simply the address from which the register A is loaded: ldx #$0400 ldb #$10 lda $20,x ; Effective Address is X+$20, ie. $0420 lda b,x ; Effective Address is X+B, ie. $0410 Therefore a LEA instruction is simply performing addition: leax $20,x ; X:=X+32 leax b,x ; X:=X+B Registers can be mixed as needed: leay ,x ; Y:=X leau 4,s ; U:=S+4 Any indexed mode can be used, so index registers can be initialised using PC relative mode: helloStr fcb "Hello World",0 leax helloStr,pcr ; point X to string lda ,x+ ; A contains 72 (ASCII 'H') Don't use the autoincrement modes, stick to constant offsets. Only the Zero flag of the condition codes is modified, and only for assignments to X and Y. This is so the stacks can be adjusted at the end of a subroutine without affecting the zero flag which might be checked by the caller. leax ,x++ ; does nothing leay ,-y ; not recommended leay -1,y ; approved loop counter bne loop ==== How do I call a subroutine? ==== Use the JSR (Jump to SubRoutine) or BSR (Branch to SubRoutine) instructions. How to pass parameters depends entirely on how the subroutine was written. Typically a parameter is loaded into an accumulator, or its address is loaded into an index register. lda xPos ldb yPos leax bmapShip,pcr jsr drawBmap moreProgram: adda #20 A subroutine continues until it reaches a RTS (ReTurn from Subroutine) instruction, at which point execution continues with the instruction following the JSR. Results are returned the same way as parameters are passed. Setting the condition codes is also an option, such as setting the Carry flag in case of an error. drawBmap: pshs a,b,y ; save registers as needed ldy #screenBase lsra leay a,y ... failure: orcc #1 finish: puls a,b,y ; we didn't save X as we don't expect the caller to reuse it rts Subroutines work by pushing the PC (Program Counter), which always contains the address of the next instruction, onto the system stack (S register), then loading the PC with the destination address. The next RTS instruction should simply PULS the PC's value from the stack. Note that this won't work if the stack has been modified and not correctly restored. It's often a good idea to save and restore the values or register that are modified during the subroutine. If the last instruction before the RTS is a PULS we can optimise by combining the two: pshs x,y ; save registers leax 2,x leay -4,y ... puls x,y,pc ; restore registers and return ==== What is a stack and how do I (safely) use it? ==== A stack is a crucial concept of machine language programming. Basically it's a way of storing information and retrieving it later without too much worry about the fine details. You can PuSH registers onto the stack and PulL (sometimes known as POP) them off later. As long as pushing and pulling is done in matching order (like matching the brackets in an arithmetical expression) all we need to know is the name of the register (S or U in the case of the 6809). People like to use physical analogies. For example, you're at your desk scribbling your memoirs when you need to check a detail in those photos you took on holiday in Ibiza. So you put your papers aside (onto the "stack") to make room. Then the phone rings and you need to get out your appointment book, so the photos go onto the "stack". And then there's someone at the door... this could go on indefinitely, though there is ultimately a physical limit to the size of the stack. The point is that you can easily imagine retrieving your papers from the "stack" and smoothly carrying on with the last job after every interruption. It only needs a little order in your working methods. And if you have a whole program written down beside you, and the current line number is just another detail that can be jotted down and pushed on the stack, nothing could possibly go wrong. The system stack in a 6809 system is pointed to by the register S. Typically it will be initialised by the operating system at the highest address in RAM (the user program doesn't usually set it). Pushing a byte means first decrementing S by 1, then storing the byte into address S. Doing this twice pushes a 16-bit word. To pull a byte, read its value from address S then increment S by 1. Note that this is identical to using the 6809's autoincrement/decrement modes. There are two stacks on the 6809, U (User) and S (system); normally we don't bother much with U. Any combination of registers can be pushed and pulled in a single instruction. With one exception: we can't stack a stack register onto itself, which would have little point anyway. pshs a,b,x,y,u ldd #loopCount loop: pshs d lda ,x+ cmpa ,y+ ... puls d subd #1 bne loop puls a,b,x,y,u,pc If we PSHS A, modify A, then PULS A we get the same value back, and we don't have to care what was done to the stack before the push. Equally, our "modify A" part might have included PSHS X,Y followed by PULS X,Y, and we would not notice. But we must do our stacking in matching pairs with no overlap; PSHS A then PSHS X,Y then PULS A will not restore A. Calling subroutines relies on the system stack. A JSR pushes the Program Counter (PC) onto the stack, and a RTS pulls it. Thus the program can continue where it left off when the subroutine is over. Instead of using RTS, a subroutine might add the PC to a list of pulls at the end of a subroutine. Stacking order is always the same, regardless of how the assembly language instruction is written. For example PSHS Y,B,A,X will be retrieved correctly by PULS D,Y,X. When all the registers are stacked on S it will look like this: CC A B DP X Y U PC S+ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 The S register can be used as an index register just like any other to access these values individually. But don't use negative offsets as you can't trust anything below the stack; the system may overwrite these bytes at any time, literally between instructions. You can use the LEA instruction to reserve space on the stack; LEAS -12,S is like a push, LEAS 12,S a pull. As before, you must do this in matching pairs. pshs d,x,y,u leas -4,s ; reserve 4 bytes to do what we like with clr 2,s ... lda #4 pshs a lda 3,s ; the same address we cleared above ... puls a ; A equals 4 leas 4,s puls d,x,y,u,pc The dangers of using the stack: * There's nothing to stop it expanding to overwrite programs and data, but this shouldn't happen in practice. * Using the S register for our own purposes when a system interrupt may occur, overwriting our data seemingly at random. * Generally not doing pushing and pulling in matching pairs and mixing up the data, with particular reference to return addresses. ==== How do I do something a hundred times over? ==== In a high-level language, typically you would set a variable to zero, then count up to a hundred: n:=0; REPEAT writeln; n:=n+1 UNTIL n=100 Assembly language prefers an idiom where you start at a hundred and count down to zero: ldb #100 repeatLoop: lbsr writeln decb bne repeatLoop Assuming the value of the loop counter isn't used anywhere, it's a little simpler this way. ==== How do I write an IF / THEN statement? ==== The easy way is to reverse the logic. So if you had: IF xPos>100 THEN xPos:=100 Imagine instead it was: IF xPos<=100 SKIP xPos:=100 In assembly language that could be: lda xPos cmpa #100 ; IF bls noLimit ; SKIP lda #100 ; optional sta xPos ; statements noLimit: lda yPos ; continue with program ==== How do I write a REPEAT / UNTIL loop? ==== Simply write the body, then a comparison, and a conditional branch back to the start of the body. REPEAT plotPixel (left, top) left:=left+1 UNTIL left>right Becomes: lda left ldb top plotLoop: ; REPEAT jsr plotPixel inca cmpa right bls plotLoop ; UNTIL ldb bottom ; continue ==== How do I write a WHILE loop? ==== This is very similar to REPEAT, except the comparison has to be made at the start of the loop. This is done simply by preceding the loop with a branch to the conditional test at the end: WHILE left<=right DO plotPixel (left, top) left:=left+1 Becomes: lda left ldb top bra plotTest plotLoop: ; DO jsr plotPixel inca plotTest: cmpa right bls plotLoop ; WHILE ldb bottom ; continue ==== How do I access an array? ==== Generally you won't be using arrays in quite the same way as with high level languages. But if you need to, the programmer must do all the work of reserving memory and calculating offset addresses. For an array of bytes this is simple; point to its start address with an index register and use an offset: numList rmb 100 leax numList,pcr ; point to array ldb #99 ; last element lda b,x ; A:=numList[99] For an array of addresses (two bytes each) the offset has to be doubled by bit-shifting. Note that the range of offsets is limited to -128 to 127 with an 8-bit offset so D may have to be used. adrList rmb 400 leax adrList,pcr ldb #199 clra lslb rola ldy d,x ; Y=adrList[199] Two-dimensional arrays can be accessed using the MUL (unsigned MULtiply) instruction. A 32x16 character text screen could be treated as an array of 16 lines of 32 bytes: cursorFlash: ; A=column, B=line pshs a,b,x cmpa #32 bhs out cmpb #16 bhs out ldx #textScreen leax a,x ; X:=X+A lda #32 mul ; D:=B*32 leax d,x ; X:=X+D lda ,x ; A:=textScreen[A,B] eora #$40 sta ,x out: puls a,b,x,pc ==== How do I use strings? ==== Like any other complex data structure you have to do all the work yourself. There is however a convention of the "null-terminated string" which is widely used and may take little work. It's simply a list of bytes terminated by a zero. Thus a string can be easily defined and as long as you like, the only limit being it can't include any zeroes. strDelay fcb "Please wait.",0 ; easy static string strWelcome fcb "Hello %USER% and welcome.",0 ; substitution expected, so a problem leax strDelay,pcr bsr strLen jmp romPrint16 ; print length of strDelay strLen: ; return length of string at X in B pshs x ldd #0 bra strL2 strL1: addd #1 ; count a valid character strL2: tst ,x+ ; null terminator? bne strL1 puls x,pc This works well for sending messages to the screen, or making simple modifications such as changing to upper case. But anything more, such as simply catenating two strings, requires the programmer to ensure there is enough space available. Getting this wrong may crash the program. An alternative structure might be to precede the string by two bytes specifying its length and bytes available: strWelcome fcb strWelcomeEnd-strWelcome-2, 100 ; length followed by max length fcb "Hello %USER% and welcome." ; string characters (25 bytes) strWelcomeEnd rmb 100-strWelcomeEnd+strWelcome+2 ; spare space (75 bytes) strSpaceEnd: These strings are awkward to define in assembly language. The programmer must then write a library of reliable routines to manipulate this structure. ==== How do I define data records and arbitrary structures? ==== As with arrays and strings, you point an index register where you hope the structure will be, and then do all the housekeeping yourself. Assembly language can help a little by allowing us to define constants to label offsets within the structure. Clearly this allows for variant records (fields having dual meanings). ; sprite structure (8 bytes total) spr_type EQU 0 ; 8-bit type spr_xPos EQU 1 ; 16-bit X co-ord spr_yPos EQU 3 ; 16-bit Y co-ord spr_mask EQU 5 ; 8-bit collision mask spr_bmap EQU 6 ; 16-bit bitmap address spr_size EQU 8 spr_maxN EQU 20 ; room for 20 sprites spr_table rmb spr_maxN*spr_size sprEraseAll: ; erase all active sprites leau spr_table,pcr ldb #spr_maxN sprEraseLoop: lda ,u beq sprNext ; zero type means not active ldx spr_xPos,u ; X co-ord ldy spr_yPos,u ; Y co-ord lbsr sprErase sprNext: leau spr_size,u ; point to next sprite decb ; loop count bne sprEraseLoop rts ==== How do I use local variables? ==== The simplest way to use a local variable is to push a register onto the stack (remembering to remove it later). shift: ; shift register A right B times ; return result in D pshs b ; B is now a local variable with address S clrb ; destroy B to hold result tst ,s ; was B zero? beq shiftOut shiftLoop: lsra ; 16-bit shift right rorb dec ,s ; decrease local variable bne shiftLoop ; loop (original) B times shiftOut: leas 1,s ; remove local variable rts The problem with this is that when we push more temporary values on the stack, the offset to our local variable changes. In the example above, if we chose to PSHS X then our offset now becomes 2,S . We can deal with this, but it's awkward. A solution is to reserve space on the stack, then point to it with another register, and use this one as the offset base. The principle is the same as with the structures described above. It can be extended to allow for the passing of arbitrary parameters on the stack. For example: PROCEDURE drawBox (left, top, right, bottom:integer); VAR xPos, yPos:integer; BEGIN xPos=left; yPos=right; REPEAT ... UNTIL yPos>=bottom END Becomes: xPos EQU -4 yPos EQU -2 left EQU 8 top EQU 10 right EQU 4 bottom EQU 6 ; 2 would be the return address ; 0 would be the saved value of U ; draw a 10x6 box at (0,2) ldx #0 ; left:=0 ldy #2 ; top:=2 pshs x,y ldx #10 ; right:=10 ldy #8 ; bottom:=8 pshs x,y lbsr drawBox leas 8,s ; remove 4 16-bit parameters rts drawBox: pshs u ; save value of U leau ,s ; U is an address between the locals and the parameters leas -4,s ; reserve space for 2 16-bit integers on stack ldd left,u ; U+8 std xPos,u ; U-4 ldd top,u std yPos,u drawBoxLoopY: ... ldd yPos,u cmpd high,u blt drawBoxLoopY leas 4,s ; release reserved space puls u,pc ; restore U and return ==== Can I do recursion? ==== Yes, there's no reason why an assembly language subroutine shouldn't call itself. But the same caveats apply as in any other language. You must use local variables for instance, and not alter global memory in unexpected ways. Beware of very deep recursion; a pixel flood-fill routine might call itself thousands of times, overwriting more than the available RAM with the system stack. ==== What is position independence and when do I use it? ==== Ideally a machine language program can be loaded at any address in memory and still work correctly; the only thing that has to change is the program start address. This is Position Independent Code (PIC). It is more important for smaller routines as several may be loaded at the same time. What prevents position independence? Addresses being hard-coded rather than relative. Direct jumps to subroutines are a common example. All jumps and branches (within the program, not eg. external ROM routines) can be as easily coded using relative offsets; instead of JMP and JSR use BRA/LBRA and BSR/LBSR. The same applies to data references. Variables stored on the stack are fine. As are pointers returned by the operating system. Other structures need to be referenced using PC relative addressing. The PC (Program Counter) always contains the address of the next instruction. The same mechanism used for branching can be used for data addressing. Giving a relative offset from the PC means the actual address at which the program is loaded is irrelevant. We rarely give the actual offset, just as we don't give the number of bytes to skip in a relative branch. Instead we specify the target address, and indicate this by following it with ",PCR". frameCount rmb 1 frameList rmb 20 lda frameCount ; direct extended addressing lda frameCount,pcr ; same address calculated via PC relative ; the assembler calculates the offset ; NOT the same as PC+frameCount inca ldb -27,pc ; PC-27 - this form is rarely useful ... ldx #frameList ; initialise index using immediate mode leax frameList,pcr ; X has the same value as above ldy ,x++ ; and the routine can proceed just as before Not all addresses should be accessed via PCR; only the ones that move with the program. This means NOT I/O ports, screen buffers, ROM routines etc. There are downsides to PCR, mainly that it is less efficient, taking more bytes and machine cycles to do the same task. Sometimes it can be more complex to program: bufSize EQU 100 buffer rmb bufSize bufEnd: bufClear: leax buffer,pcr bufCloop: clr ,x+ cmpx #??? ; er, how to compare X with end of buffer space? blo bufCloop bufClear: leax bufEnd,pcr ; pshs x ; stack address of end of buffer space leax buffer,pcr bufCloop: clr ,x+ cmpx ,s ; compare X with end of buffer space blo bufCloop leas 2,s ; tidy stack An alternative to PCR addressing is direct paged addressing; copying PC to DP takes just a few steps. This is both time and space efficient, but limits the program to start on 256 byte pages. It's a good solution for large programs such as arcade games. org $4000 start: lbsr main ; stack PC counter rmb 1 variables rmb 100 main: puls d ; retrieve old PC (points to 'counter') tfr a,dp ; high byte of PC goes to DP setdp $40 ; directive to help the assembler clr counter ; most assemblers will now use DP automatically In general, you should always use position independent code for programs intended to have any sort of longevity. The exceptions are special cases such as embedded systems, ROM cartridges, and plain quick-and-dirty mash-ups. ==== What are interrupts? ==== An interrupt is a physical signal that can occur at any time, triggering the CPU into stacking its registers, executing a subroutine, then retrieving its registers to continue as normal. This is a way of providing a software response to a hardware event. It may be as simple as updating a counter in memory in sync with an external timer. The details of interrupts are intricate and a relatively advanced topic, but most home micros have a simple use for them: vertical blanking. On an old-fashioned CRT, every time the display of a graphics frame is complete the circuitry issues an interrupt. Syncing with this allows for smooth animation in games. gameLoop: sync ; wait for interrupt lbsr eraseSprite lbsr moveSprite lbsr drawSprite ; most of the time is spent with the sprite drawn, not erased bra gameLoop The 6809 has four hardware interrupts (IRQ, FIRQ, NMI, and RESET), and three software interrupts (SWI, SWI2, SWI3), each with its own vector pointing to a service routine. Software interrputs are machine language instructions rather than hardware events, and typically are used for debugging breakpoints or calling operating system routines. An interrupt service routine must end with the instruction RTI (Return from Interrupt) so that the registers are unstacked correctly. This first pulls the CC and checks the E (Entire) bit. If set it then pulls all the registers from the stack, if clear it just pulls the PC. The fast interrupt (FIRQ) doesn't stack the working registers, and the E bit is the way of flagging the difference. Two other CC bits are involved with interrupts. Setting I disables (or masks) the IRQ response, and setting F disables the FIRQ response. ==== What is reentrant code and how do I write it? ==== A reentrant machine language program can be very roughly treated; you can stop it (saving its registers), restart it from the beginning, stop it again, and still have it continue correctly where it left off (restoring its registers of course). This is exactly what is needed for a multi-tasking operating system: the same code used by multiple processes. The concept is related to interrupts and recursion. Whereas a recursive routine secures its variable so it's safe to call itself from just one point, a reentrant routine must try harder. Basically, it must allocate itself a new set of variables whenever it runs. There's no reason why a short routine shouldn't be reentrant. Simply allocate variables on the stack and never use globals. And avoid eccentricities like self-modifying code. For a larger program things become complex. Fresh memory might be allocated from a heap by the operating system for example. It's an advanced topic, but still doable on the 6809. ==== What is indirection and how do I use it? ==== ==== How do I use virtual methods to create polymorphic objects? ==== ==== How do negative numbers work? ==== ==== What do all the instructions do? ==== [Will get round to this one]