Meow in Motion: A Z80 Assembly Version of a Classic Cat Animation on the ZX81


Watch the Feline Frenzy as Assembly Meets Basic in this Retro Remake!

After finishing the basic version of the cat animation, I wanted to improve the frame speed. This meant converting from BASIC to machine code. My first thought was to do the conversion with MCODER II. I dropped that idea, deciding to focus on a custom z80 Assembly version. Ergo, cat written in assembly or Fast Cat is, well, a faster version and a bit more.

# Boy, that is a bit faster.

Let’s start with the obvious. Written in machine code, Fast Cat out performs my BASIC version by a wide margin. While testing, the faster frame rate impressed me. It does seem to stutter a bit using the online emulator due to frame skipping. Yet, it should look fine in a real ZX81.

Now, I noticed that in the original version the cat had a gap in it on one of the frames. I did correct that in this version. Partly that was to make this version unique. But also because I liked that frame of the cat. Fixing it made it look better.

Fast Cat (Cat ASM), ZX81 Screenshot—Gap Fixed, 2023 by Steven ReidFast Cat (Cat ASM), ZX81 Screenshot—Gap Fixed, 2023 by Steven Reid

That said, I still don’t like the animation of the cat’s tail. Although I didn’t fix that, you’ll find that the tail slaps down at the end of the last frame. In real life the motion would be much smoother. I should note that the animation isn’t bad, but it looks odd to me. If I were to make any changes it would be to add a little mouse to chase.

# Building a new version

As I noted, this version does use the frames from my ZX81 BASIC program. I could have done a straight conversion to machine code, but where’s the fun in that? Instead, I actually reverted to the original BASIC code from the book.

In that version, the program stored each frame in a DATA statement. Taking that as inspiration, I loaded all the frames into a data section of my assembly code. To improve the memory use, I went a step further and implemented an RLE compression. This isn’t the first time I’d used run-length encoding, making the routine straight forward. Funny enough, I spent most of my time entering in the data for each frame instead of writing a program to do it.

I could have skipped the compression. One advantage to that would have been a simpler loop routine. The disadvantage would be the wasted memory. Fast Cat is only six frames, making that less of an issue here. It sure would have been a lot easier to enter the data. I don’t like easy. Plus, I’d already made the decision to compress the frames.

# Let’s talk video.

To be honest, I was a bit worried the compression would slow things down. It didn’t. Reviewing the program, it made me think about a video player I saw years ago. In that program, it displayed an Apple Shuffle commercial on a ZX81. As amazing it was, when I saw that program I did’t understand how it worked. After finishing Fast Cat, I understand it better.

As different as these programs are, I suspect they use similiar techniques. Fast Cat is a quite a bit simpler. It uses two bits of information, displaying an empty or filled space. The video version uses the ZX81's default graphics mode. That means it has to store a bit more information, especially given it has more frames to display. To compensate, I can assume it used a better compression algorithm.

Fast Cat (Cat ASM), ZX81 Screenshot #2, 2023 by Steven ReidFast Cat (Cat ASM), ZX81 Screenshot #2, 2023 by Steven Reid

Now that may seem slower, but an unrolled compression routine can be quite fast. In researching different compression for my own games, I found quite a few designed for speed. Even unrolled, which takes more memory, many were still manageable. Knowing the basics of Fast Cat, it wouldn’t be too hard to build something similar.

# Delving into the innards.

The routine to decompress and print each cat frame is straight forward. For each frame, I set the DE register to the screen position. I debated using a double buffer but it was fast enough without it. To be honest, I suspect that is why the frame can look off. This version works and keeps the code simple.

I then load the HL register with the frame location. The RLE compression always loads two bytes. The first byte holds the count, and the second holds the character. The the animation only uses two characters: $00 for space and $80 for filled space.

cat_frames:
  ; frame 1
  db 32,$00,1,$76
  db 32,$00,1,$76
  db 32,$00,1,$76
  db 22,$00,7,$80,1,$00,1,$80,1,$0,1,$76
  db 11,$00,11,$80,7,$00,2,$80,1,$00,1,$76
  db 8,$00,3,$80,20,$00,1,$80,1,$76
  db 7,$00,1,$80,15,$00,1,$80,2,$00,3,$80,2,$00,1,$80,1,$76
  db 5,$00,2,$80,16,$00,1,$80,2,$00,1,$80,2,$00,2,$80,1,$00,1,$76
  db 1,$00,5,$80,8,$00,11,$80,1,$00,2,$80,4,$00,1,$76
  db 1,$80,3,$00,1,$80,4,$00,5,$80,11,$00,5,$80,2,$00,1,$76
  db 4,$00,5,$80,19,$00,1,$80,1,$00,1,$80,1,$00,1,$76
  db 3,$00,2,$80,24,$00,1,$80,1,$00,1,$80,1,$76
  db 2,$00,1,$80,1,$00,1,$80,27,$00,1,$76
  db 4,$00,2,$80,26,$00,1,$fe

To a make things easy, I always use a full line or 32 characters. At the end of the line, I print the end of line charter $76. This is a bit wasteful, but makes the routine very manageable while still being fast. Each frame is 495 bytes expanded. The example frame is 180 bytes. That’s a decent reduction. As a plus, I don’t have to track frames or line positions.

With he above design, it made writing the routine to decompress and print the frame simple. The routine is eleven commands in 16 bytes. Not too shabby. The first section determines how many characters to print. The second section prints that many characters. It exits the loop when it reaches $FE which I used to denote end of frame.

line_loop:
  ; RLE encoded frame
  ld b,(hl) ; loop size
  inc hl
  ld a,(hl) ; what to print
  inc hl
  cp $fe ; done with frame?
  jp nc,frame_done

  ; print RLE
  rle_loop:
  ld (de),a
  inc de
  djnz rle_loop

  jp line_loop

Because each frame takes a variable amount of time to display. I didn’t want to add a delay loop. Instead, I grab the ZX81's frame counter and save it at the start. Then, after the program prints the frame, I restore it and subtract 5 frames. I then loop as the ZX81 counts down each frame. When it reaches the new value, I print the next frame.

# Doing something more complex.

As straight forward as Fast Cat is, the bones are there for a more complex routine. With the Shuffle demo in my head, I should be able to do something similar. I had tested a more complex bitwise compression writing Gem Quest. That routine was fast, although I later rewrote it to save space.

To make this more intestine, I would need to convert frames of video into stills. There are a couple of programs I could use to do that. Given the ZX81’s screen resolution and speed, a few frames per second would suffice. I’d also use a double buffer to smooth out each frame.

For this particular program. I could have eked out more space with some clever use of bit math. The line can only have 32 characters which is 5 bits. That leaves 3 bits I could have used for the black, white, end of line and end of frame. That would have made the frame 90 bytes with very little impact to frame speed. I fund exercise, but not needed here.

Accomplishing my goal, Fast Cat is a faster version of cat written in assembly. An easy to understand program that made me realize how I could do something more complex if I wanted to. Now that would make a great future project.



Comments on this article:

No comments so far.

Write a comment:

Type The Letters You See.
[captcha image][captcha image][captcha image][captcha image][captcha image][captcha image]
not case sensitive