===== Assembly Language FAQs =====

Some simple questions about how to translate high level language concepts into the idiom of assembly lnaguage.

==== Why are there so many instructions? ====

There are only really a few dozen instrucions in 6809 assembly language. At first it can be confusing because registers are tagged onto the end of operations without any punctuation. So LDA, STB, LDD, CMPX, LDY, STU and CMPS can seem like a mysterious random jumble rather than just three basic operations on different registers.

When faced with an unfamiliar instruction, try to remove a register from the end to see what is the real operation:

  orb #20     ; Or B #20
  addd #400   ; Add D #400
  sty $1000   ; St Y $1000 
  inca        ; Inc A
  comb        ; Com B
  pshu        ; Psh U
  sex         ; *not* Se X, but Sign EXtend
  daa         ; *not* Da A, but Decimal Addition Adjust
  
==== Which registers do I use? ====

For an address variable pick an index register, X, Y or even (user stack) U, it doesn't matter which. For a character or small numeric variable pick an accumulator, A or B, again it doesn't matter which. If you need 16-bit values use D, which is the two accumulators combined (A high, B low).

Choice between index or accumulator severely limits the operations you can perform. Only LD, ST, and CMP are common between all user registers. General arithmetic operations (NEG, AND etc.) are limited to the 8-bit accumulators, except for ADD and SUB which are available to 16-bit D. Index registers uniquely have the LEAX operation, and the stacks (U and S) have PSH and PUL.

Don't use the system registers (PC, CC, DP, S) directly unless you know exactly what you're doing.

In practice there are minor differences between registers. One accumulator may have a specialised operation not available to the other, and the LEA instruction works a little differently for stacks so you can't use LEAU -1,U as a loop counter. Also X is the preferred index register as its instructions are sometimes coded more efficiently

==== How do i declare, assign, and modify variables? ====

Reserve space for a global variable with the RMB (reserve memory bytes) directive. This isn't a machine language instruction, rather it simply tells the assembler how to arrange space in program memory. Remember that only byte (8-bit) and word (16-bit) values can be handled by the CPU in a single operation. More complex variables are referenced by their (16-bit) addressses.

Variables are fetched into a register with the LD (load) operation, the registers worked upon as needed, and updated in memory with the ST (store) operation. A useful operation for initialising string and other structured variables is LEAX (load effective address).

  nxtChr   rmb 1       ; VAR nxtChr:byte
  nxtLine  rmb 2       ; VAR nxtLine:word 
  strAdr   rmb 2       ; VAR strAdr:^string
  msgLen   fcb 5       ; VAR msgLen:byte=5 
  msgTxt   fcb "Hello" ; CONST msgTxt="Hello" 
  maxLine  fdb 1000    ; VAR maxLine:word=1000
  
  lda #space        
  sta nxtChr         ; nxtChr:=32
  ldd #1  
  std nxtLine        ; nxtLine:=1
  leax msgTxt,pcr    ; address of string via PC relative mode
  stx strAdr         ; strAdr:=@msgTxt
  ...
  ldx strAdr         ; retrieve pointer  
  lda ,x+            ; read character and increment pointer 
  ora #$20
  sta nxtChr         ; store character
  stx strAdr         ; store pointer
  
A variable can be initialised at assembly time with the FCB (format constant byte) and FDB (format double byte) directives. This isn't a good idea except for constant data as the initialisation happens only when the program is first loaded, and not when it is run a second time.   

==== What is a bigendian? ====

The 6809 is described by the Swiftian term "big-endian"; the Intel 8080, like many early micros, is "little-endian". This refers to the way 16-bit values are stored in memory - big byte first, or little byte first.

It may seem the less sensible concept (which gets sillier as the design decision persists to 64-bithood), but it's simpler to design an 8-bit microprocessor as little-endian. For example, to add a 16-bit constant 8 bits at a time, storing the low byte first makes it easier to deal with the carry. If there was no ADDD instruction for the 6809 we might do this:

    addb ,x+      ; add low byte
    adca ,x+      ; add high byte, using the carry

The assembler will take care of the byte ordering for us when we specify 16-bit constants, but it's something to bear in mind when converting programs between micros.

==== What does the LEA instruction do? ====

The LEA (Load Effective Address) instruction is used to initialise and modify index registers, ie. address variables. Only indexed addressing modes are available.

The jargon term 'effective affress', in the context of a LDA instruction for example, is simply the address from which the register A is loaded:

    ldx #$0400
    ldb #$10
    lda $20,x   ; Effective Address is X+$20, ie. $0420
    lda b,x     ; Effective Address is X+B, ie. $0410

Therefore a LEA instruction is simply performing addition: 

    leax $20,x  ; X:=X+32
    leax b,x    ; X:=X+B

Registers can be mixed as needed:

    leay ,x     ; Y:=X 
    leau 4,s    ; U:=S+4
Any indexed mode can be used, so index registers can be initialised using PC relative mode:

  helloStr fcb "Hello World",0
    leax helloStr,pcr      ; point X to string
    lda ,x+                ; A contains 72 (ASCII 'H')

Don't use the autoincrement modes, stick to constant offsets. Only the Zero flag of the condition codes is modified, and only for assignments to X and Y. This is so the stacks can be adjusted at the end of a subroutine without affecting the zero flag which might be checked by the caller.     

    leax ,x++   ; does nothing
    leay ,-y    ; not recommended
    leay -1,y   ; approved loop counter
    bne loop  

==== How do I call a subroutine? ====

Use the JSR (Jump to SubRoutine) or BSR (Branch to SubRoutine) instructions. How to pass parameters depends entirely on how the subroutine was written. Typically a parameter is loaded into an accumulator, or its address is loaded into an index register.

    lda xPos
    ldb yPos
    leax bmapShip,pcr
    jsr drawBmap
  moreProgram:
    adda #20  

A subroutine continues until it reaches a RTS (ReTurn from Subroutine) instruction, at which point execution continues with the instruction following the JSR. Results are returned the same way as parameters are passed. Setting the condition codes is also an option, such as setting the Carry flag in case of an error.    

  drawBmap:
    pshs a,b,y        ; save registers as needed
    ldy #screenBase
    lsra
    leay a,y
    ...
  failure:
    orcc #1
  finish:        
    puls a,b,y        ; we didn't save X as we don't expect the caller to reuse it 
    rts

Subroutines work by pushing the PC (Program Counter), which always contains the address of the next instruction, onto the system stack (S register), then loading the PC with the destination address. The next RTS instruction should simply PULS the PC's value from the stack. Note that this won't work if the stack has been modified and not correctly restored.

It's often a good idea to save and restore the values or register that are modified during the subroutine. If the last instruction before the RTS is a PULS we can optimise by combining the two:

    pshs x,y       ; save registers
    leax 2,x
    leay -4,y
    ...
    puls x,y,pc    ; restore registers and return 

==== What is a stack and how do I (safely) use it? ====

A stack is a crucial concept of machine language programming. Basically it's a way of storing information and retrieving it later without too much worry about the fine details. You can PuSH registers onto the stack and PulL (sometimes known as POP) them off later. As long as pushing and pulling is done in matching order (like matching the brackets in an arithmetical expression) all we need to know is the name of the register (S or U in the case of the 6809). 

People like to use physical analogies. For example, you're at your desk scribbling your memoirs when you need to check a detail in those photos you took on holiday in Ibiza. So you put your papers aside (onto the "stack") to make room. Then the phone rings and you need to get out your appointment book, so the photos go onto the "stack". And then there's someone at the door... this could go on indefinitely, though there is ultimately a physical limit to the size of the stack. 

The point is that you can easily imagine retrieving your papers from the "stack" and smoothly carrying on with the last job after every interruption. It only needs a little order in your working methods. And if you have a whole program written down beside you, and the current line number is just another detail that can be jotted down and pushed on the stack, nothing could possibly go wrong.

The system stack in a 6809 system is pointed to by the register S. Typically it will be initialised by the operating system at the highest address in RAM (the user program doesn't usually set it). Pushing a byte means first decrementing S by 1, then storing the byte into address S. Doing this twice pushes a 16-bit word. To pull a byte, read its value from address S then increment S by 1. Note that this is identical to using the 6809's autoincrement/decrement modes.

There are two stacks on the 6809, U (User) and S (system); normally we don't bother much with U. Any combination of registers can be pushed and pulled in a single instruction. With one exception: we can't stack a stack register onto itself, which would have little point anyway.

    pshs a,b,x,y,u
    ldd #loopCount
  loop:  
    pshs d
    lda ,x+
    cmpa ,y+
    ...
    puls d
    subd #1
    bne loop  
    puls a,b,x,y,u,pc

If we PSHS A, modify A, then PULS A we get the same value back, and we don't have to care what was done to the stack before the push. Equally, our "modify A" part might have included PSHS X,Y followed by PULS X,Y, and we would not notice. But we must do our stacking in matching pairs with no overlap; PSHS A then PSHS X,Y then PULS A will not restore A.

Calling subroutines relies on the system stack. A JSR pushes the Program Counter (PC) onto the stack, and a RTS pulls it. Thus the program can continue where it left off when the subroutine is over. Instead of using RTS, a subroutine might add the PC to a list of pulls at the end of a subroutine.

Stacking order is always the same, regardless of how the assembly language instruction is written. For example PSHS Y,B,A,X will be retrieved correctly by PULS D,Y,X. When all the registers are stacked on S it will look like this:

      CC    A    B   DP    X         Y         U        PC 
  S+   0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 | 9  | 10 | 11 | 12
  
The S register can be used as an index register just like any other to access these values individually. But don't use negative offsets as you can't trust anything below the stack; the system may overwrite these bytes at any time, literally between instructions.
  
You can use the LEA instruction to reserve space on the stack; LEAS -12,S is like a push, LEAS 12,S a pull. As before, you must do this in matching pairs.

    pshs d,x,y,u
    leas -4,s         ; reserve 4 bytes to do what we like with
    clr 2,s
    ...
    lda #4
    pshs a
    lda 3,s           ; the same address we cleared above
    ...
    puls a            ; A equals 4
    leas 4,s
    puls d,x,y,u,pc

The dangers of using the stack:
  * There's nothing to stop it expanding to overwrite programs and data, but this shouldn't happen in practice. 
  * Using the S register for our own purposes when a system interrupt may occur, overwriting our data seemingly at random.
  * Generally not doing pushing and pulling in matching pairs and mixing up the data, with particular reference to return addresses.

==== How do I do something a hundred times over? ====

In a high-level language, typically you would set a variable to zero, then count up to a hundred:

  n:=0;
  REPEAT
    writeln;
    n:=n+1
  UNTIL n=100

Assembly language prefers an idiom where you start at a hundred and count down to zero:

    ldb #100
  repeatLoop:
    lbsr writeln
    decb
    bne repeatLoop

Assuming the value of the loop counter isn't used anywhere, it's a little simpler this way.

==== How do I write an IF / THEN statement? ====

The easy way is to reverse the logic. So if you had:

  IF xPos>100 THEN xPos:=100

Imagine instead it was:

  IF xPos<=100 SKIP xPos:=100
  
In assembly language that could be:

    lda xPos
    cmpa #100      ; IF
    bls noLimit    ; SKIP 
    lda #100       ; optional
    sta xPos       ; statements 
  noLimit:
    lda yPos       ; continue with program           
==== How do I write a REPEAT / UNTIL loop? ====

Simply write the body, then a comparison, and a conditional branch back to the start of the body.

  REPEAT
    plotPixel (left, top)
    left:=left+1
  UNTIL left>right  
Becomes:
  
    lda left
    ldb top 
  plotLoop:          ; REPEAT
    jsr plotPixel
    inca
    cmpa right        
    bls plotLoop     ; UNTIL
    ldb bottom       ; continue
==== How do I write a WHILE loop? ====

This is very similar to REPEAT, except the comparison has to be made at the start of the loop. This is done simply by preceding the loop with a branch to the conditional test at the end:

  WHILE left<=right DO
    plotPixel (left, top)
    left:=left+1    
Becomes:

    lda left    
    ldb top
    bra plotTest
  plotLoop:          ; DO    
    jsr plotPixel
    inca
  plotTest:    
    cmpa right
    bls plotLoop     ; WHILE
    ldb bottom       ; continue

==== How do I access an array? ====

Generally you won't be using arrays in quite the same way as with high level languages. But if you need to, the programmer must do all the work of reserving memory and calculating offset addresses. For an array of bytes this is simple; point to its start address with an index register and use an offset:
  
  numList rmb 100
    leax numList,pcr    ; point to array
    ldb #99             ; last element
    lda b,x             ; A:=numList[99]

For an array of addresses (two bytes each) the offset has to be doubled by bit-shifting. Note that the range of offsets is limited to -128 to 127 with an 8-bit offset so D may have to be used.

  adrList rmb 400
    leax adrList,pcr
    ldb #199
    clra
    lslb
    rola
    ldy d,x             ; Y=adrList[199]

Two-dimensional arrays can be accessed using the MUL (unsigned MULtiply) instruction. A 32x16 character text screen could be treated as an array of 16 lines of 32 bytes:

  cursorFlash:
    ; A=column, B=line
    pshs a,b,x
    cmpa #32
    bhs out
    cmpb #16 
    bhs out
    ldx #textScreen
    leax a,x        ; X:=X+A
    lda #32
    mul             ; D:=B*32
    leax d,x        ; X:=X+D  
    lda ,x          ; A:=textScreen[A,B]
    eora #$40
    sta ,x
  out:    
    puls a,b,x,pc

==== How do I use strings? ====

Like any other complex data structure you have to do all the work yourself. There is however a convention of the "null-terminated string" which is widely used and may take little work. It's simply a list of bytes terminated by a zero. Thus a string can be easily defined and as long as you like, the only limit being it can't include any zeroes.

  strDelay   fcb "Please wait.",0                 ; easy static string
  strWelcome fcb "Hello %USER% and welcome.",0    ; substitution expected, so a problem
    leax strDelay,pcr
    bsr strLen
    jmp romPrint16              ; print length of strDelay
  strLen:
  ; return length of string at X in B
    pshs x
    ldd #0
    bra strL2
  strL1:  
    addd #1                      ; count a valid character
  strL2:  
    tst ,x+                      ; null terminator? 
    bne strL1
    puls x,pc

This works well for sending messages to the screen, or making simple modifications such as changing to upper case. But anything more, such as simply catenating two strings, requires the programmer to ensure there is enough space available. Getting this wrong may crash the program.

An alternative structure might be to precede the string by two bytes specifying its length and bytes available:

  strWelcome    fcb strWelcomeEnd-strWelcome-2, 100  ; length followed by max length
                fcb "Hello %USER% and welcome."      ; string characters (25 bytes)     
  strWelcomeEnd rmb 100-strWelcomeEnd+strWelcome+2   ; spare space (75 bytes)
  strSpaceEnd:
  
These strings are awkward to define in assembly language. The programmer must then write a library of reliable routines to manipulate this structure.  

==== How do I define data records and arbitrary structures? ====

As with arrays and strings, you point an index register where you hope the structure will be, and then do all the housekeeping yourself. Assembly language can help a little by allowing us to define constants to label offsets within the structure. Clearly this allows for variant records (fields having dual meanings).

  ; sprite structure (8 bytes total)
  spr_type EQU 0    ; 8-bit type
  spr_xPos EQU 1    ; 16-bit X co-ord
  spr_yPos EQU 3    ; 16-bit Y co-ord
  spr_mask EQU 5    ; 8-bit collision mask 
  spr_bmap EQU 6    ; 16-bit bitmap address 
  spr_size EQU 8
  spr_maxN EQU 20   ; room for 20 sprites
  spr_table rmb spr_maxN*spr_size
  
  sprEraseAll:
  ; erase all active sprites
    leau spr_table,pcr
    ldb #spr_maxN
  sprEraseLoop:  
    lda ,u
    beq sprNext         ; zero type means not active
    ldx spr_xPos,u      ; X co-ord
    ldy spr_yPos,u      ; Y co-ord
    lbsr sprErase  
  sprNext:
    leau spr_size,u     ; point to next sprite  
    decb                ; loop count
    bne sprEraseLoop
    rts 
==== How do I use local variables? ====

The simplest way to use a local variable is to push a register onto the stack (remembering to remove it later).

  shift: 
  ; shift register A right B times
  ; return result in D
    pshs b             ; B is now a local variable with address S
    clrb               ; destroy B to hold result
    tst ,s             ; was B zero?
    beq shiftOut
  shiftLoop:  
    lsra               ; 16-bit shift right
    rorb
    dec ,s             ; decrease local variable
    bne shiftLoop      ; loop (original) B times 
  shiftOut:  
    leas 1,s           ; remove local variable
    rts

The problem with this is that when we push more temporary values on the stack, the offset to our local variable changes. In the example above, if we chose to PSHS X then our offset now becomes 2,S . We can deal with this, but it's awkward.

A solution is to reserve space on the stack, then point to it with another register, and use this one as the offset base. The principle is the same as with the structures described above. It can be extended to allow for the passing of arbitrary parameters on the stack. For example:

  PROCEDURE drawBox (left, top, right, bottom:integer);
  VAR xPos, yPos:integer;
  BEGIN
    xPos=left;
    yPos=right;
    REPEAT
       ...
    UNTIL yPos>=bottom
  END

Becomes:

  xPos   EQU -4
  yPos   EQU -2
  left   EQU 8
  top    EQU 10
  right  EQU 4
  bottom EQU 6          
  ; 2 would be the return address
  ; 0 would be the saved value of U
  
  ; draw a 10x6 box at (0,2)
    ldx #0            ; left:=0  
    ldy #2            ; top:=2
    pshs x,y
    ldx #10           ; right:=10 
    ldy #8            ; bottom:=8 
    pshs x,y         
    lbsr drawBox
    leas 8,s          ; remove 4 16-bit parameters 
    rts
  drawBox:  
    pshs u            ; save value of U
    leau ,s           ; U is an address between the locals and the parameters
    leas -4,s         ; reserve space for 2 16-bit integers on stack
    ldd left,u        ; U+8
    std xPos,u        ; U-4 
    ldd top,u         
    std yPos,u
  drawBoxLoopY:  
    ...
    ldd yPos,u
    cmpd high,u
    blt drawBoxLoopY
    leas 4,s         ; release reserved space
    puls u,pc        ; restore U and return

==== Can I do recursion? ====

Yes, there's no reason why an assembly language subroutine shouldn't call itself. But the same caveats apply as in any other language. You must use local variables for instance, and not alter global memory in unexpected ways.

Beware of very deep recursion; a pixel flood-fill routine might call itself thousands of times, overwriting more than the available RAM with the system stack.

==== What is position independence and when do I use it? ====

Ideally a machine language program can be loaded at any address in memory and still work correctly; the only thing that has to change is the program start address. This is Position Independent Code (PIC). It is more important for smaller routines as several may be loaded at the same time. 

What prevents position independence? Addresses being hard-coded rather than relative. Direct jumps to subroutines are a common example. All jumps and branches (within the program, not eg. external ROM routines) can be as easily coded using relative offsets; instead of JMP and JSR use BRA/LBRA and BSR/LBSR.

The same applies to data references. Variables stored on the stack are fine. As are pointers returned by the operating system. Other structures need to be referenced using PC relative addressing. 

The PC (Program Counter) always contains the address of the next instruction. The same mechanism used for branching can be used for data addressing. Giving a relative offset from the PC means the actual address at which the program is loaded is irrelevant. We rarely give the actual offset, just as we don't give the number of bytes to skip in a relative branch. Instead we specify the target address, and indicate this by following it with ",PCR".

  frameCount rmb 1
  frameList rmb 20
    lda frameCount         ; direct extended addressing
    lda frameCount,pcr     ; same address calculated via PC relative
                           ; the assembler calculates the offset
                           ; NOT the same as PC+frameCount
    inca
    ldb -27,pc             ; PC-27 - this form is rarely useful
    ...
    ldx #frameList         ; initialise index using immediate mode
    leax frameList,pcr     ; X has the same value as above
    ldy ,x++               ; and the routine can proceed just as before

Not all addresses should be accessed via PCR; only the ones that move with the program. This means NOT I/O ports, screen buffers, ROM routines etc.

There are downsides to PCR, mainly that it is less efficient, taking more bytes and machine cycles to do the same task. Sometimes it can be more complex to program:
  
  bufSize EQU 100
  buffer rmb bufSize     
  bufEnd:
  
  bufClear:
    leax buffer,pcr
  bufCloop:  
    clr ,x+
    cmpx #???            ; er, how to compare X with end of buffer space?
    blo bufCloop
    
  bufClear:  
    leax bufEnd,pcr      ;
    pshs x               ; stack address of end of buffer space
    leax buffer,pcr
  bufCloop:  
    clr ,x+
    cmpx ,s              ; compare X with end of buffer space
    blo bufCloop
    leas 2,s             ; tidy stack

An alternative to PCR addressing is direct paged addressing; copying PC to DP takes just a few steps. This is both time and space efficient, but limits the program to start on 256 byte pages. It's a good solution for large programs such as arcade games.

  org $4000
  start:
    lbsr main         ; stack PC
  counter rmb 1
  variables rmb 100  
  main: 
    puls d            ; retrieve old PC (points to 'counter')
    tfr a,dp          ; high byte of PC goes to DP
    setdp $40         ; directive to help the assembler
    clr counter       ; most assemblers will now use DP automatically

In general, you should always use position independent code for programs intended to have any sort of longevity. The exceptions are special cases such as embedded systems, ROM cartridges, and plain quick-and-dirty mash-ups.    
       
    
==== What are interrupts? ====

An interrupt is a physical signal that can occur at any time, triggering the CPU into stacking its registers, executing a subroutine, then retrieving its registers to continue as normal. This is a way of providing a software response to a hardware event. It may be as simple as updating a counter in memory in sync with an external timer.

The details of interrupts are intricate and a relatively advanced topic, but most home micros have a simple use for them: vertical blanking. On an old-fashioned CRT, every time the display of a graphics frame is complete the circuitry issues an interrupt. Syncing with this allows for smooth animation in games.

  gameLoop:
    sync                  ; wait for interrupt
    lbsr eraseSprite
    lbsr moveSprite
    lbsr drawSprite       ; most of the time is spent with the sprite drawn, not erased
    bra gameLoop

The 6809 has four hardware interrupts (IRQ, FIRQ, NMI, and RESET), and three software interrupts (SWI, SWI2, SWI3), each with its own vector pointing to a service routine. Software interrputs are machine language instructions rather than hardware events, and typically are used for debugging breakpoints or calling operating system routines.

An interrupt service routine must end with the instruction RTI (Return from Interrupt) so that the registers are unstacked correctly. This first pulls the CC and checks the E (Entire) bit. If set it then pulls all the registers from the stack, if clear it just pulls the PC. The fast interrupt (FIRQ) doesn't stack the working registers, and the E bit is the way of flagging the difference.

Two other CC bits are involved with interrupts. Setting I disables (or masks) the IRQ response, and setting F disables the FIRQ response.

==== What is reentrant code and how do I write it? ====

A reentrant machine language program can be very roughly treated; you can stop it (saving its registers), restart it from the beginning, stop it again, and still have it continue correctly where it left off (restoring its registers of course). This is exactly what is needed for a multi-tasking operating system: the same code used by multiple processes.

The concept is related to interrupts and recursion. Whereas a recursive routine secures its variable so it's safe to call itself from just one point, a reentrant routine must try harder. Basically, it must allocate itself a new set of variables whenever it runs.

There's no reason why a short routine shouldn't be reentrant. Simply allocate variables on the stack and never use globals. And avoid eccentricities like self-modifying code. For a larger program things become complex. Fresh memory might be allocated from a heap by the operating system for example. It's an advanced topic, but still doable on the 6809. 

==== What is indirection and how do I use it? ====

The standard way of accessing a variable in memory is via **direct extended** addressing. A 16-bit address is given, and an 8 or 16 bit operand is read from this 'effective address (EA)'. Indirect addressing adds an extra step: the 16-bits at the initial EA are read and used as the true EA of the operand. Put square brackets around the memory expression to use indirection (some assemblers allow the more usual curved brackets).

  CURSPOS rmb 2
  
    ldx #$0400    ; initialise cursor position
    ldd #$4849    ; two character message
    std ,x        ; print it 
    stx CURSPOS   ; save cursor position
  
    ldb [CURSPOS] ; what character is at the cursor position?
    rts

As well as direct extended mode, indirection can be used on PC relative and indexed addressing modes too. But note that because we are always specifying a 16-bit addresss, auto-increment/decrement by just one (rather than two) cannot be used.   

Another example: suppose we are using strings with a data structure of a length byte followed by the ASCII codes of <length> characters. We can now iterate through an array of the addresses of such strings to access the length values of each one in turn, perhaps to zero them for example.

  strCount EQU 3    ; 3 strings
  strTable fdb str1, str2, str3
  
  start:   
    leax strTable,pcr   ; point to table
    ldb #strCount
    bsr longest         ; find length of longest string 
    rts
    
  longest:
    pshs b,x
    clr ,-s             ; reserve and zero value of longest strng so far
  loop:
    lda [,x++]          ; read EA from X, then inc X by 2, then load A from EA
    cmpa ,s             ; is A higher than longest?
    bls short           ; skip if not
    sta ,s              ; up-date longest
  short:        
    decb                ; for B strings
    bne loop
    puls a,b,x,pc       ; return longest value in A   
  
  str1 fcb 11,"Hello World"
  str2 fcb 13,"Hows it going"
  str3 fcb 12,"Wotcha folks"  

How useful is this? Not very. We can save the use of an index register, but it's still slow, and very limited in terms of what we can do with a single address. Indirection seems powerful but has fallen out of fashion; it's a hangover from the early days of micros. The 6502 for example had no 16-bit index registers, instead relying on 16-bit addresses stored in zero page (8-bit addressable) memory.

Where indirection is truly useful is with jump tables. For example we might have a library of subroutines for system basics such as input/output and memory allocation. We can compile a list of addresses of these routines into a table. Our program can now access library routines only through the table. This makes things more flexible as the routines in the library can be changed as long as the table is updated in accordance; our program need not change at all.

  sprLibraryTable fdb sprInit, sprErase, sprDraw, sprMove, sprKill
  
  ; call third routine (+0, +2, +4 bytes from start) in table
    jsr [sprLibraryTable+4]     ; draw a sprite
  
    ldb #2                      ; use an index (0, 1, 2) into the table
    leay sprLibraryTable,pcr  
    aslb                        ; double B to 4 
    jsr [b,y]                   ; draw a sprite
    
  sprInit rts
  sprErase rts
  sprDraw rts
  sprMove rts
  sprKill rts 

This can be turned into a powerful mechanism.    
 
==== How do I use virtual methods to create polymorphic objects? ====

Importing high-level concepts into assembly language is something one can easily overdo. But there's no reason why we can't cherry pick some nice features.

Suppose we have a sprite library that deals with simple movement and animation, and want to expand it to include features such as collision detection, keeping in bounds, and more. To our sprite object we can add a field SPRMETHODS to point to a table of subroutines. Thus every sprite instance can have its own dynamic type specification.

We can create a heirarchical structure where one class inherits its methods from its parent. So for a game we might start with:

Basic:    A movable image with just a single frame that interacts with nothing.\\
--Animated: As Basic, but with a framelist displaying animated images\\
----Morpher:  As Animated, but morphing when a counter runs down, to turn into a bonus drop for example

  ; Structure offsets
  
  sprStatus EQU 0
  sprMask EQU 1
  sprXPos EQU 2
  sprYPos EQU 4
  sprGraphic EQU 6
  sprMethods EQU 8
  sprCounter EQU 10  
  sprMorphTo EQU 12
  
  ; Subroutine library
  
  sprInitBasic rts
  sprEraseBasic rts
  sprDrawBasic rts
  sprMoveBasic rts
  sprDoneBasic rts
  sprDrawAnimated rts
  sprDoneMorpher rts
  
  ; Class definitions:  Basic - - > Animated - - > Morpher
   
  sprMethodsBasic     fdb sprInitBasic, sprEraseBasic, sprDrawBasic, sprMoveBasic, sprDoneBasic
  sprMethodsAnimated  fdb sprInitBasic, sprEraseBasic, sprDrawAnimated, sprMoveBasic, sprDoneBasic
  sprMethodsMorpher   fdb sprInitBasic, sprEraseBasic, sprDrawAnimated, sprMoveBasic, sprDoneMorph
  
  ; Instance templates
  
  cloud fdb 0,0,0,bmCloud,0,sprMethodsBasic
  explosion fdb 0,0,0,flExplosion,60,sprMethodsAnimated
  bonus fdb 0,0,0,bmBonus,0,sprMethodsBasic
  bonusDrop fdb 0,0,0,flExplosion,30,sprMethodsMorph, bonus
  
Note that all the organisational work is down to the programmer; the assembler won't give any help keeping the structures in order. 

==== How do negative numbers work? ====

Think of an old-fashioned clock face as an analogy with an 8-bit register. An hour has 60 minutes, marked as 0 to 59. The minute hand goes only one way, and when it come round to zero the hour hand advances.

An 8-bit register has 256 values, marked as 0 to 255, and when it overflows round to zero the carry flag is set.   

How do negative numbers work on a clock? It's entirely down to how we choose to read them. We can say "to" the hour or "past" the hour. The mechanism of the clock doesn't have to change at all. Ten to eight in the morning is 07:50. So "minus ten" in clockface terms is calculated by "60-10". To turn the clock back another 10 minutes, we can instead turn it forward 50 to get the correct answer of 20 to the hour.

In binary the principle works just the same, only (for an 8-bit value) we subtract from 256 instead of 60, ie. "minus ten" equals "256-10", being 246, or in hex $f6. Then if we see 246 and want to see the "minus" variant, we do the same, subtract it from 256.

Which signed numbers are positive and which negative? Highest bit set means negative, so unsigned values 128 to 255 represent -128 to -1. Positive signed values range from 0 to 127. This is called "two's-complement" format, because to negate we can flip (or COMplement) all the bits and add 1 to the result.

This format is chosen because the underlying circuitry doesn't need to change; it's simply down to the programmer to keep track of which variables are seen as signed and unsigned. There are two flags in the condition codes to help with signed numbers: the N flag is set whenever bit 7 of a result is set, and V is set when a signed result overflows. An overflow is an unintended sign change; to use our clock analogy it's when the minute hand passes through the half-hour without going through the hour first.

Which instructions deal with signed numbers? There's SEX (Sign EXtend) which changes a signed value in B to a signed value in D by simply filling A with 8 copies of bit 7 of B. To negate an 8-bit value use NEG. All the addition and subtraction instructions work with either interpretation, but note that MUL is always unsigned. The conditional branches (also having long variants) are: 

  Signed branches    Meaning             Unsigned equivalent
  
       BPL             PLus                    
       BMI             MInus
       BLT             Less-Than                BLO
       BLE             Less-than or Equal       BLS
       BGT             Greater-Than             BHI
       BGE             Greater-than or Equal    BHS
       
  Miscellaneous instructions
    
  NEG (8-bit only)     NEGate                   COM
  SEX                  Sign EXtend B to D
  COMA; COMB; ADDD #1  negate D

Note that other than the simple BPL/BMI, the signed variants are not commonly used. If you do choose one, think twice whether you meant the unsigned variant. Signed comparisons should not be used with addresses for example.

 
==== What do all the instructions do? ====

[Will get round to this one]