Saturday, June 25, 2011

Putting the text in 'text adventures'

Decoding text so it can be displayed on-screen is something the Z-machine has to do a lot of. Let's see how it's done:

Firstly, text in a z-machine story file is stored in an encoded format where every two bytes represents three five-bit 'z-characters', each with values of 0-31. These z-characters are decoded into ZSCII characters, a character encoding scheme similar to ASCII. The algorithm used to convert z-characters to ZSCII varies somewhat in Z-machine versions 1 through 3, making it awkward to implement cleanly using an inheritance model like I have chosen.

First we convert the encoded text into z-characters.


protected ImmutableStack<byte> EncodedTextToZCharacters(ref int address)
{
  ushort character;
  ImmutableStack<byte> characters = null;
  var storyLength = this.StoryLength;
  do
  {
    if (address >= storyLength)
    {
      this.FrontEnd.ErrorNotification(ErrorCondition.InvalidAddress, "Encoded text does not end before the end of memory.");
      break;
    }

    character = this.Memory.ReadWord(address);
    characters = characters.Add((byte)(character >> 10 & 31));
    characters = characters.Add((byte)(character >> 5 & 31));
    characters = characters.Add((byte)(character & 31));
    address += 2;
  }
  while ((character & 32768) == 0);
  return characters.Reverse();
}


You may notice that three five-bit values per two bytes leaves one bit unused, this bit is set to mark the end of the text as no other length indicator is given. Once we have our z-characters, we can convert them to ZSCII. The common parts of the algorithm use the following method which in turn calls a virtual method by the same name containing the version specific differences.

protected ImmutableStack<Zscii> ZCharactersToZscii(bool calledRecursively, ImmutableStack<byte> zcharacters)
{
  byte lockedAlphabet = 0;
  byte nextAlphabet = 0;
  ImmutableStack<Zscii> zsciiText = null;
  while (zcharacters != null)
  {
    var currentAlphabet = nextAlphabet;
    nextAlphabet = lockedAlphabet;
    var zcharacter = zcharacters.Top;
    zcharacters = zcharacters.Tail;
    this.ZCharactersToZscii(calledRecursively, zcharacter, currentAlphabet, ref nextAlphabet, ref lockedAlphabet, ref zcharacters, ref zsciiText);
  }
  return zsciiText.Reverse();
}


This method loops through all the z-characters, building up the zscii text as it goes. Alphabet values range from 0 to 2 and affect the meaning of an individual z-character. Initially the alphabet is zero but can be shifted for a single character (changing nextAlphabet) or shift-locked (changing lockedAlphabet). The calledRecursively parameter is used in version 2 and above as we'll see in a bit. Next we'll see the actual decoding as it is done in version 1.

protected virtual void ZCharactersToZscii(bool calledRecursively, byte zcharacter, byte currentAlphabet, ref byte nextAlphabet, ref byte lockedAlphabet, ref ImmutableStack<byte> zcharacters, ref ImmutableStack<Zscii> zsciiText)
{
  switch (zcharacter)
  {
    case 0:
    zsciiText = zsciiText.Add(Zscii.Space);
    break;
    case 1:
    zsciiText = zsciiText.Add(Zscii.NewLine);
    break;
    case 2:
    case 3:
    nextAlphabet = (byte)((lockedAlphabet + zcharacter - 1) % 3);
    break;
    case 4:
    case 5:
    nextAlphabet = lockedAlphabet = (byte)((lockedAlphabet + zcharacter) % 3);
    break;
    default:
    if (zcharacter == 6 && currentAlphabet == 2)
    {
      if (zcharacters.Count() > 1)
      {
        zsciiText = zsciiText.Add((Zscii)((zcharacters.Top * 32) + zcharacters.Tail.Top));
        zcharacters = zcharacters.Tail.Tail;
      }
      else
      {
        zcharacters = null;
      }

      break;
    }

    zsciiText = zsciiText.Add(this.GetZsciiAlphabetCharacter((byte)((currentAlphabet * 26) + zcharacter - 6)));
    break;
    }
  }

  return zsciiText.Reverse();
}


As you can see, z-characters 0 and 1 are translated to 'space' and 'newline' respectively. Values 2 and 3 are single character alphabet shifts while 4 and 5 are shift locks. With a single exception, all other values are converted to zscii by subtracting 6 and adding 26 times the current alphabet number. The exception to this is the z-character 6 when the current alphabet is 2, which represents an escape sequence. In this case the zscii value is determined by the next two z-characters and is calculated as 32 times the first z-character plus the second. Next is version 2, which adds a new feature called abbreviations.

protected override void ZCharactersToZscii(bool calledRecursively, byte zcharacter, byte currentAlphabet, ref byte nextAlphabet, ref byte lockedAlphabet, ref ImmutableStack<byte> zcharacters, ref ImmutableStack<Zscii> zsciiText)
{
  if (zcharacter == 1)
  {
    this.AppendAbbreviation(zcharacter, calledRecursively, ref zcharacters, ref zsciiText);
    return;
  }

  base.ZCharactersToZscii(calledRecursively, zcharacter, currentAlphabet, ref nextAlphabet, ref lockedAlphabet, ref zcharacters, ref zsciiText);
}


Z-character 1 is no longer a newline, but instead represents an abbreviation decoded in the following method.

protected void AppendAbbreviation(byte zcharacter, bool calledRecursively, ref ImmutableStack<byte> zcharacters, ref ImmutableStack<Zscii> zsciiText)
{
  if (calledRecursively)
  {
    this.FrontEnd.ErrorNotification(ErrorCondition.NestedAbbreviation, "Nested abbreviation detected.");
    return;
  }

  if (zcharacters != null)
  {
    var abbreviationNumber = ((zcharacter - 1) * 32) + zcharacters.Top;
    zcharacters = zcharacters.Tail;
    var abbreviationsTableAddress = this.Memory.ReadWord(24);
    var abbreviationAddress = 2 * this.Memory.ReadWord(abbreviationsTableAddress + (2 * abbreviationNumber));
    var abbreviation = this.ZCharactersToZscii(true, this.EncodedTextToZCharacters(ref abbreviationAddress));
    foreach (var zsciiCharacter in abbreviation.Enumerable())
    {
      zsciiText = zsciiText.Add(zsciiCharacter);
    }
  }
}


An abbreviation is essentially just another encoded string which needs to be decoded separately and appended to the text we've decoded so far. After looking up the location of the text we end up calling our first method recursively (remember that parameter?) The calledRecursively parameter allows us to detect the situation where an abbreviation contains another abbreviation, which is illegal according to the Z-machine standard and could potentially lead to an endless loop otherwise. Lastly, version three expands the number of possible abbreviations and alters the behavior of alphabet shifts.

protected override void ZCharactersToZscii(bool calledRecursively, byte zcharacter, byte currentAlphabet, ref byte nextAlphabet, ref byte lockedAlphabet, ref ImmutableStack<byte> zcharacters, ref ImmutableStack<Zscii> zsciiText)
{
  switch (zcharacter)
  {
    case 1:
    case 2:
    case 3:
    this.AppendAbbreviation(zcharacter, calledRecursively, ref zcharacters, ref zsciiText);
    break;
    case 4:
    case 5:
    if (currentAlphabet == 0)
    {
      nextAlphabet = (byte)(zcharacter % 3);
      break;
    }

    nextAlphabet = lockedAlphabet = (byte)(currentAlphabet - ((zcharacter - currentAlphabet) % 3));
    break;
    default:
    base.ZCharactersToZscii(calledRecursively, zcharacter, currentAlphabet, ref nextAlphabet, ref lockedAlphabet, ref zcharacters, ref zsciiText);
    break;
  }
}


This implementation allows shift locks in version 3 caused by consecutive single shifts, although the 1.1 standard disallows it. Many (all?) of Infocom's version 3 and later interpreters did this which is the reason I included it.

That's it. The resulting ZSCII is mostly ready to be displayed. I say mostly because some ZSCII values fall in what is called the 'extra characters' range, outside the standard ASCII printable characters and used primarily to display accented characters and the like. These are easily converted to unicode via a simple lookup.

2 comments:

  1. Ahh, I love compact byte formats. Always lots of neat cleverness. (No, really, I'm not being sarcastic!)

    Do you aggressively use immutable types like this? Most C# devs, myself included, would use an int index into an array, since it seems more 'CSharpey', but I do have fond Haskell memories.

    ReplyDelete
  2. I do use immutable types quite a bit in my Z-machine code, but less so at work as few of the other developers I work with are familiar with them. I use them in my own projects more now that I am studying F#. I'm fairly new to functional programming but I like what I've seen of F# so far. I'm currently reading two good books: "Real-World Functional Programming" and "Expert F# 2.0".

    ReplyDelete