In this, the first part of several, I will talk about the creation of Realms of Antiquity for the TI-99/4a home computer. There will be technical, narrative, and designer content, with plenty of side-treks. Strap yourselves in!
I should add that these articles are likely to be very “crunchy” with technical detail. The intended audience would have some passing knowledge of assembly language as well as being CRPG enthusiasts. If you want more detail, program listings, algorithm definitions, by all means post and ask!
The best place to start? The platform it was built upon, and the infrastructure of how the code drives the program. And on that note, the number one best resource EVER is the TI Tech pages. They were an awesome and useful resource with my project!
A big reason that there was a distinct lack of good software for the TI until well after the home computer division was cancelled in 1983 is the processor that drives the TI-99/4a, the TMS9900 microprocessor, released in 1976.
It has occasionally been called the “first” 16-bit processor, but that claim is disputed by IBM and Motorola. It’s VERY different from the 6502, arguably the most popular and well-known microprocessor of its era, in a multitude of ways:
- 3mz clock speed, around 3 times the speed of contemporary processors
- 16-bit instead of 8-bit
- Can do register-to-memory, memory-to-register, or even memory-to-memory operations
- Big-Endian (high byte first, then low byte, going left to right)
- Hardware unsigned multiplication and division operators
- No native stack implementation
- No memory page addressing; no “zero page” concept
- 15-bit address line, so it accesses 32,767 “words” of 16-bit size
- Byte operations are handled via special op codes
- 16 CPU general purpose registers available
- Instead of hardware registers, they are relocatable anywhere in CPU memory using a workspace pointer
- You can have as many register sets as you want; this effectively replaces a “stack” concept, as you can use registers as a means to pass values
- Only a few registers have special uses
- R11 is always the return address for a branch and link
- R12 is used by the communications register unit (CRU) for special purposes; the SAMS card needs this for page swaps
- R13-15 are used for context switches. They store the return address, workspace address, and status register from the prior context
Because opcodes are 16-bit, TMS9900 assembly uses more memory than a 6502 line-by-line, as instructions can run anywhere from 1-3 words (2-6 bytes). But you save memory because you can do in one instruction what takes several on other processors.
To use a car analogy, if a standard 8086 processor is your typical car, the TMS9900 is a Cadillac. Big luxurious driving, but kind of expensive and fuel-consumptive. 🙂
The TI-99/4a has a 16-bit addressing range, for 64k total.
Unlike other architectures, none of this addressing space is used to map video; the VDP chip has it’s own 16k of dedicated RAM which is accessed through memory-mapped ports. These ports only allow you to read/write bytes, not words. This bottleneck is the cause of much anguish and annoyance on the part of TI programmers. It can and HAS been overcome; the singular most impressive work in this area in my opinion is Mike Brent’s “Dragon’s Lair” for the TI-99/4a, which runs on the base console and renders all the original videos in a fairly decent rendition on TI’s bitmap mode.
The base console only has 256 bytes (!) of CPU RAM, nicknamed the “scratchpad”, which is fast 16-bit memory. Most of the time, it’s best to locate your register set here. Some values in the space are used by internal processes but most of it is available for your use. TI’s Parsec cartridge (which features horizontal pixel bitmap scrolling) had to locate it’s scrolling routine in the scratchpad for maximum speed.
If you have the 32K memory expansion, you get two large blocks of CPU RAM, a lower 8K block and the upper 24k block. This RAM is accessed with a slower 8-bit multiplexer, which adds wait states when accessed, so many 99’ers try and move time-critical code into the scratchpad for best performance. My personal experience has been the slower speed is not really an impediment unless you’re doing something really over the top.
So where, you ask, are memory pages, like Apple and Commodore have? Well, there aren’t any in the base TI architecture. The only page switches occur with some cartridges, which have their own 8K space. That’s where the SAMS card comes in.
Reverse-engineered from the never-released TI-99/8 architecture, the Super Advanced Memory System (SAMS) card allows you to swap out 4K pages anywhere in the addressing space that RAM exists, just using some simple instructions to configure it. These pages don’t even need to be unique; you technically could assign the same page twice in two different places. The base SAMS card gives the TI 1MB of memory, or 256 pages, which is a bounty of space to play in!
But how to write code for such a system? Well, that’s the tricky bit…
One thing to call out is that Realms of Antiquity is written in 100% assembly language. It’s reasonable to ask why, when high-level languages could be utilized to simplify maintenance and understanding.
Well, for one, because I wanted to. 🙂
Second, If I was writing a game for modern computers directly, I would not hesitate to use a high-level language. Besides being easier to manage, the most important thing about them is they can be compiled for different architectures. If I wrote a game in Java, I know it will run on a PC, MAC, or even Linux without any problems, and regardless of what kind of chipset or hardware are present.
But for a classic retro computer? You know the hardware and how it works and how to optimize for it. If you need speed and performance, assembly is the way to go. Any high-level language may apply a software pattern that works but could have been implemented with less memory or better efficiency.
The TI-99/4a differs from a lot of other microcomputers of the era in that assembly language isn’t readily accessible with the base console. TI BASIC completely blocks access to it, TI Extended BASIC offers some access to load and run but no assembler is provided.
Most 99’ers use the Editor/Assembler to do their work. It was the big package deal, requiring the full system (disk drives, 32K expansion) to use. It had both a text editor and assembler, and two disks of utilities. They even threw in the complete source code for one of their games, Tombstone City.
Now on the TI, there are two kinds of assembly binaries:
- Tagged-object code
- Can be loaded anywhere in memory
- Can co-exist with other object code and ran independently via name
- Can refer to each other using assembly directives
- Fixed-binary code
- Only loads to specific memory locations
- Stored as “memory images” with a maximum size of 8K per image minus six bytes for a header value
- Loaded as a chain of files to fill up the entire memory space
- Usually called “EA5” format as they were loaded using the Editor/Assembler cartridge’s option #5 “Load Program File”
Most 99’ers write assembly programs to start with as tagged object code. They are then converted to fixed-binary files using a utility. Programs load much faster this way as it loads them as 8K segments directly into memory.
As for how to load a SAMS program, which occupies more space than the 32K RAM? We’ll get to that in a bit. First, we go into…
So in the summer of 2017, I made the decision to convert Realms of Antiquity to use the SAMS memory card. As part of this, I had to figure out HOW to use it effectively.
The only assembler ever written for the AMS was an extension of a popular macro assembler called “Ragtime”, written by Art Green. I’ll give him credit; he did create an entire assembler/linker/loader platform which could utilize the card. But I had already been using a cross-assembler on the PC for speed and efficiency so I didn’t really want to try and compile everything on the TI in emulation.
So instead I read the documentation on how it built modules that were linked to each other and I figured out the pattern.
The first thing with any program in modules to do is identify your “root” functions that absolutely are needed everywhere. These form the basis of your “root” module, which always is present in memory and is accessed by everything. Then, figure out how many other modules you need. Ideally, if the root module and another are loaded, you should always be able to fit it into the existing address space. (Which on the TI is 32k, split into the 8k and 24k blocks.)
For Realms of Antiquity, I wanted the 8K block for data pages only, utilized by the modules for various functions. So I split the upper 24k into two modules, the root module and then potentially four other modules:
- Start Module (Contains the title screen, character creation, music player and data, and end game sequence)
- Travel Module (Contains the code for travel mode, includes map loading and mob interactions)
- Manager Module (Contains the code for inventory, stat screens, and complex transaction management)
- Combat Module (Contains the code for combat mode)
I later added more modules and sub-modules:
- Encounter Module (Contains the code to generate battlemaps, as well as end battle management such as chests, traps, rewards, etc.)
- FX/Scan Module (Contains all the code to create FX for combat, sprite based effects, as well as the code to create the monster stat screen. Only swaps out the last 4K page of the Combat or Encounter module.)
- AI Module (Contains all the code to determine monster actions. Only swaps out the last 4K page of the Combat module.)
The sub-modules occurred as modules got full and I didn’t want to try and create a whole new module. This made me realize after the fact that I could have done a better job splitting up functionality and making smaller modules instead of larger monolithic modules. A good lesson for future projects!
So how to compile it? I just created several batch files to execute my cross assembler at a combination of the root module files and each targeted modules files, effectively compiling them as separate binary files. I then created some utility programs that copy the binary code out into the program binary file in specific locations for each module.
Here’s a picture of my file and memory map:
The greyed out areas are pages that are technically assigned to ROM addressing space at start-up time. That is the default mode for SAMS; pages 0-15 are just assigned consecutively. So that means after the program has started I can freely use those pages for data.
Pages 2-3 are the 8K lower memory space which means they are switched as needed for different functions. Pages 10-12 are always the root module. Pages 13-15 are the alternate modules. Everything after those is raw data used by the game, up to page 53. I use page 64 onwards for storing saved game data in memory while you play.
So how to get this into the SAMS card? The E/A loader certainly can’t do this. Time for a custom loader…
The first issue to deal with is getting a loader in the right place. The default location for most assembly programs is the start of the upper 24K block. That’s not ideal here, though, because we want the root module there and if we swap it out at any point in the loading process the loader code will be lost! So we make sure it’s located in the lower 8K RAM instead. This is achievable by using an opcode called AORG (Absolute Origin) to relocate the program there.
The loader has to be self-contained, so it contains not just the loading code itself but subroutines for reading and writing to VDP. This is necessary beyond just updating the screen; the TI device service routines (DSR) which the disk system utilizes requires you to use buffer space in the video memory. This curious design is likely because on a base TI console that was the only RAM memory any architect could rely upon being there for buffer. Unfortunately that means all data has to be read from the VDP back into CPU memory.
The loader loads 8K chunks of data from the program binary at a time into the upper RAM, which are assigned to the requisite pages. It updates the page assignments on each pass, so each 4K blocks ends up in it’s correct page. With 44 pages of data, or 176k, it takes a bit! I originally designed my loader to read in 12K blocks from the program binary, but I found in practice this didn’t work, even when I was certain the VDP memory was freed up. I have noticed that TI was biased towards 8K blocks as a maximum size.
For the cartridge ROM, I have a different approach. I use 8K ROM pages to store the program, using 2K of the space for the loader and 6K for program segments. This is necessary because you aren’t guaranteed what ROM page a cartridge starts on, so you have to replicate your root code in every page. It does a direct CPU to CPU memory copy from the ROM page into the upper 24K page space with page swaps in a similar fashion. This is one reason the cartridge is by far the fastest way to load the game, no VDP in the middle of the process!
And here ends Part 1. In Part 2, we will start looking at specific modules and routines and going into excruciating detail on them!