It looks like you're new here. If you want to get involved, click one of these buttons!
As I mentioned in a reply to @Leonardo.Flores's code critique of some NaNoGenMo works, we can think of Hugh Kenner and Joseph O'Rourke's 1984 "Travesty" program as the ancestor of many other methods for generating text based on some source text. It's a common point of reference for discussions of computer-aided creativity (Charles Hartmann, for example, demonstrates it in Virtual Muse), and it's relatively easy to understand what it does.
Since Kenner and O'Rourke first shared Travesty by publishing it in Byte Magazine, I thought it would be interesting to take a closer look at what the code does. Does it demonstrate some ideas about the original text it works on? Since the term "travesty" is pretty dramatic, is there evidence in what it's doing that validates that idea of what it's doing to the text? Is what goes on in Travesty truly similar to processes that we think of as its descendants (Markov chain and char-rnn, mainly), or is something else at work?
Travesty is written in Pascal, a language I don't have any prior experience with. Still, getting it to run was pretty straightforward. I've transcribed it line by line (including comments) from Byte and shared it in a Github repository, along with some instructions on using it.
Since the directions say to post the code you're critiquing in your thread, I'll include it here as well. I'll post some initial thoughts in a reply to this thread.
Listing 1: Travesty, a program for generating pseudo-text. The program will scan a sample text and generate a "nonsense" imitation. For an order-n scan, every n-character sequence in the output occurs somewhere in the input.
\n
\n
PROGRAM travesty (input,output); { Kenner/ O'Rourke, 5/9/84}
(* This is based on Brian Hayes' article in Scientific *)
(* American, November 1983. It scans a text and generates *)
(* an n-order simulation of its letter combinations. For *)
(* order n, the relation of output to input is exactly: *)
(* "Any pattern n characters long in the output *)
(* has occurred somewhere in the input, *)
(* and at about the same frequency." *)
(* Input should be ready on disk. Program asks how many *)
(* characters of output you want. It next asks for the *)
(* "Order" -- i.e. how long a string of characters will be *)
(* cloned to output when found. You are asked for the *)
(* name of the input file, and offered a "Verse" option. *)
(* If you select this, and if the input has a "|" char- *)
(* acter at the end of each line, words that ends lines in *)
(* the original will terminate output lines. Otherwise, *)
(* output lines will average 50 characters in length. *)
CONST
ArraySize = 3000; {maximum number of text chars}
MaxPat = 9; {maximum Pattern length}
VAR
BigArray : PACKED ARRAY [1..ArraySize] of CHAR;
FreqArray, StartSkip : ARRAY[' '..'|'] of INTEGER;
Pattern : PACKED ARRAY [1..MaxPat] of CHAR;
SkipArray : ARRAY [1..ArraySize] of INTEGER;
OutChars : INTEGER; {number of characters to be output}
PatLength : INTEGER;
f : TEXT;
CharCount : INTEGER; {characters so far output}
Verse, NearEnd : BOOLEAN;
NewChar : CHAR;
TotalChars : INTEGER; {total chars input, + wraparound}
Seed : INTEGER;
FUNCTION Random (VAR RandInt : INTEGER) : REAL;
BEGIN
Random := RandInt / 1009;
RandInt := (31 * RandInt + 11) MOD 1009
END;
PROCEDURE InParams;
(* Obtains user's instructions *)
VAR
InFile : STRING [12];
Response : CHAR;
BEGIN
WRITELN ('Enter a Seed (1..1000) for the randomizer');
READLN (Seed);
WRITELN ('Number of characters to be output?');
READLN (OutChars);
REPEAT
WRITELN ('What order? <2-', MaxPat,'>');
READLN (PatLength)
UNTIL (PatLength IN [2..MaxPat]);
PatLength := PatLength - 1;
WRITELN ('Name of input file?');
READLN (InFile);
ASSIGN (f, InFile);
RESET (f);
WRITELN ('Prose or Verse? <p/v>');
READLN (Response);
IF (Response = 'V') OR (Response = 'v') THEN
Verse := true
ELSE Verse := false
END; {Procedure InParams}
PROCEDURE ClearFreq;
(* FreqArray is indexed by 93 probable ASCII characters, *)
(* from " " to "|". Its elements are all set to zero. *)
VAR
ch : CHAR;
BEGIN
FOR ch := ' ' TO '|' DO
FreqArray[ch] := 0
END; {Procedure ClearFreq}
PROCEDURE NullArrays;
(* Fill BigArray and Pattern with nulls *)
VAR
j : INTEGER;
BEGIN
FOR j := 1 TO ArraySize DO
BigArray[j] := CHR(0);
FOR j := 1 TO MaxPat DO
Pattern[j] := CHR(0)
END; {Procedure NullArrays}
PROCEDURE FillArray;
(* Moves textfile from disk into BigArray, cleaning it *)
(* up and reducing any run of blanks to one blank. *)
(* Then copies to end of array a string of its opening *)
(* characters as long as the Pattern, in effect wrapping *)
(* the end to the beginning. *)
VAR
Blank : BOOLEAN;
ch: CHAR;
j: INTEGER;
PROCEDURE Cleanup;
(* Clears Carriage Returns, Linefeeds, and Tabs out of *)
(* input stream. All are changed to blanks. *)
BEGIN
IF ((ch = CHR(13)) {CR}
OR (ch = CHR(10)) {LF}
OR (ch = CHR(9))) {TAB}
THEN ch := ' '
END;
BEGIN {Procedure FillArray}
j := 1;
Blank := false;
WHILE (NOT EOF(f)) AND ( j <= (ArraySize - MaxPat)) DO
BEGIN {While Not EOF}
READ (f,ch);
Cleanup;
BigArray[j] := ch; {Place character in BigArray}
IF ch = '' THEN Blank := true;
j := j + 1;
WHILE (Blank AND (NOT EOF(f))
AND (j <= (ArraySize - MaxPat))) DO
BEGIN {While Blank} {When a blank has just been}
READ (f,ch); {printed, Blank is true,}
Cleanup; {so succeeding blanks are skipped,}
IF ch <> '' THEN {thus stopping runs.}
BEGIN {If}
Blank := false;
BigArray[j] := ch; {To BigArray if not a Blank}
j := j + 1
END {If}
END {While Blank}
END; {While Not EOF}
TotalChars := j - 1;
IF BigArray[TotalChars] <> '' THEN
BEGIN {If no Blank at end of text, append one}
TotalChars := TotalChars + 1;
BigArray[TotalChars] := ' '
END;
{Copy front of array to back to simulate wraparound.}
FOR j := 1 TO PatLength DO
BigArray[TotalChars + j] := BigArray[j];
TotalChars := TotalChars + PatLength;
WRITELN('Characters read, plus wraparound = ',TotalChars:4)
END; {Procedure FillArray}
PROCEDURE FirstPattern;
(* User selects "order" of operation, an integer, n, in the *)
(* range 1 .. 9. The input text will henceforth be scanned *)
(* in n-sized chunks. The first n-1 characters of the input *)
(* file are placed in the "Pattern" Array. The Pattern is *)
(* written at the head of output. *)
VAR
j : INTEGER;
BEGIN
FOR j := 1 TO PatLength DO {Put opening chars into Pattern}
Pattern[j] := BigArray[j];
CharCount := PatLength;
NearEnd := false;
IF Verse THEN WRITE (' '); {Align first line}
FOR j := 1 TO PatLength DO
WRITE (Pattern[j])
END; {Procedure FirstPattern}
PROCEDURE InitSkip;
(* The i-th entry of SkipArray contains the smallest index *)
(* j > i such that BigArray[j] = BigArray[i]. Thus SkipArray *)
(* links together all identical characters in BigArray. *)
(* StartSkip contains the index of the first occurrence of *)
(* each character. These two arrays are used to skip the *)
(* matching routine through the text, stopping only at *)
(* locations whose character matches the first character *)
(* in Pattern. *)
VAR
ch : CHAR;
j : INTEGER;
BEGIN
FOR ch := ' ' TO '|' DO
StartSkip[ch] := TotalChars + 1;
FOR j := TotalChars DOWNTO 1 DO
BEGIN
ch := BigArray[j];
SkipArray[j] := StartSkip[ch];
StartSkip[ch] := j
END
END; {Procedure InitSkip}
PROCEDURE Match;
(* Checks BigArray for strings that match Pattern; for each *)
(* match found, notes following character and increments its *)
(* count in FreqArray. Position for first trial comes from *)
(* StartSkip; thereafter positions are taken from SkipArray. *)
(* Thus no sequence is checked unless its first character is *)
(* already known to match first character of Pattern. *)
VAR
i : INTEGER; {one location before start of the match in BigArray}
j : INTEGER; {index into Pattern}
Found : BOOLEAN; {true if there is a match from i+1 to i+j - 1 }
ch1 : CHAR; {the first character in Pattern; used for skipping}
NxtCh : CHAR;
BEGIN {Procedure Match}
ch1 := Pattern[1];
i := StartSkip[ch1] - 1; {is is 1 to left of the Match start}
WHILE (i <= TotalChars - PatLength - 1) DO
BEGIN {While}
j := 1;
Found := true;
WHILE (Found AND (j <= PatLength)) DO
IF BigArray[i+j] <> Pattern[j]
THEN Found := false {Go thru Pattern til Match fails}
ELSE j := j + 1;
IF Found THEN
BEGIN {Note next char and increment FreqArray}
NxtCh := BigArray[i + PatLength + 1];
FreqArray[NxtCh] := FreqArray[NxtCh] + 1
END;
i := SkipArray[i + 1] - 1 {Skip to next matching position}
END {While}
END; {Procedure Match}
PROCEDURE WriteCharacter;
(* The next character is written. It is chosen at Random *)
(* from characters accumulated in FreqArray during last *)
(* scan of input. Output lines will average 50 character *)
(* in length. If "Verse" option has been selected, a new *)
(* line will commence after any word that ends with "|" in *)
(* input file. Thereafter lines will be indented until *)
(* the 50-character average has been made up. *)
VAR
Counter, Total, Toss : INTEGER;
ch : CHAR;
BEGIN
Total := 0;
FOR ch := ' ' TO '|' DO
Total := Total + FreqArray[ch]; {Sum counts in FreqArray}
Toss := TRUNC (Total * Random(Seed)) + 1;
Counter := 31;
REPEAT
Counter := Counter + 1; {We begin with ' '}
Toss := Toss - FreqArray[CHR(Counter)]
until Toss <= 0; {Char chosen by}
NewChar := CHR(Counter); {successive subtractions}
IF NewChar <> '|' THEN
WRITE (NewChar);
CharCount := CharCount + 1;
IF CharCount MOD 50 = 0 THEN NearEnd := true;
IF ((Verse) AND (NewChar = '|')) THEN WRITELN;
IF ((NearEnd) AND (NewChar = ' ')) THEN
BEGIN {If NearEnd}
WRITELN;
IF Verse THEN WRITE (' ');
NearEnd := false
END {If NearEnd}
END; {Procedure Write Character}
PROCEDURE NewPattern;
(* This removes the first character of the Pattern and *)
(* appends the character just printed. FreqArray is *)
(* zeroed in preparation for a new scan. *)
VAR
j : INTEGER;
BEGIN
FOR j := 1 to PatLength - 1 DO
Pattern[j] := Pattern[j + 1]; {Move all chars leftward}
Pattern[PatLength] := NewChar; {Append NewChar}
ClearFreq
END; {Procedure NewPattern}
BEGIN {Main Program}
ClearFreq;
NullArrays;
InParams;
FillArray;
FirstPattern;
InitSkip;
REPEAT
Match;
WriteCharacter;
NewPattern
UNTIL CharCount >= OutChars;
END. {Main Program}
Comments
I had to add those two
\n
's at the top of the code block to make the line numbers in this forum match what was printed in the magazine. For some reason the forum here kept insisting on representing two blank lines as<br/>
'sI think I've transcribed pretty carefully, but there may be errors in what I've typed. Please let me know if you find any (and I've already caught one since posting it here.)
Also, the code is printed in Helvetica in the magazine so, because that's not a monospace font, my representation of the spaces and indentation may not be exact.
Finally, I seem to remember that when I first tried out this transcribed code, I encountered an error that turned out to be a typo in the the published code. I can't find that now, though, so perhaps I'm mistaken. If you do see something like that, let me know!
ETA: Also the syntax highlighting has gone all wonky here. It may not be reliably interpreting Pascal.
I suppose it depends on how we characterize a "travesty." If we use it in the more light-hearted burlesque drama sense of "an adaptation with incongruous language" then the "incongruous" part certainly seems to fit. The etymology from "dressed up" and "disguised" (travesti) is also interesting: One of the central preoccupations of text generators and chatbots as cultural interventions has always been about their outputs passing or not passing... so it is an interesting twist to see the article illustrated (caption: "with apologies to James Joyce and Henry James") by two authors lightly disguised and then unmasked as each other. Interface indeed.
I was also interested in this part of the article, on the optimized version of the code based on Shannon's text-as-lookup-table:
I went ahead and forked the repo, added a Hellbat branch and added a rough draft of the code and conversion notes for creating that variant. I haven't gone ahead and applied them to creating an actual hellbat.pas yet, but it might be interesting -- particularly given that this is a second version of the code that is only being distributed in the artiicle conceptually as a procedure or diff rather than a transcript.
The article itself was fantastic - a very nice write-up of various attempts to improve the algorithm given the constraints of the machines of the day.
This may be of interest, by way of contrast. I've thrown a gist together that uses two approaches that would've been beyond the reach of machines the authors had access to.
The first uses the approach they describe as unworkably slow - it just scans through the whole text to produce each successive character, building a frequency table as described in the article. Over the entirety of the text of Ulysses*, that runs at about two characters per second.
The second builds a sparse table of every combination of n-gram prefixes, mapping each to a frequency table of succeeding characters. There's a small up-front cost for preprocessing, but then the whole thing runs blazingly fast.
It goes to show how the constraints of yesteryear are turned completely on their head. Does that matter? Is the concern with the output of those algorithms (or their equivalent), or with the effort required to express them at the time?
* I made no attempt to split the source back up into its eighteen constituent parts. Newlines were removed; the whole converted to lower-case before processing.
To respond to @zachwhalen and agree with @jeremydouglass, when I think of a travesty generator I do associate it with an over-the-top, outrageous performance. And I think this sits very comfortably with the 'fake research paper' genre which seems to be routinely used to scandalize and make an example of conferences and journals suspected to have low standards, or to call-out predatory conferences which attempt to pass as legitimate ones.
One fairly involved travesty generated fake paper performance which I have heard of is the one put on by Jeremy Stribling, Daniel Aguayo and Maxwell Krohn, who are the creators of SCIgen. It's "...a program that generates random Computer Science research papers, including graphs, figures, and citations. It uses a hand-written context-free grammar to form all elements of the papers. ...[its aim] is to maximize amusement, rather than coherence. One useful purpose for such a program is to auto-generate submissions to conferences that you suspect might have very low submission standards"
https://pdos.csail.mit.edu/archive/scigen/#talks
Surprisingly, I wasn't able to find references to the BYTE article on the SCIgen webpage.
A github repository for SCIgen exists, and looks like it's written in perl: https://github.com/strib/scigen
The creators of SCIgen used it to generate a paper which was accepted to a predatory conference, but it was later declined because of the bad publicity that the generated paper had created. The creators were later able to raise money to book a space across the hallway from the predatory conference, and on the same day hosted a session where 3 randomly generated talks were presented. Photos and a video documenting the session show that the generated talk speakers/creators were using false names and were wearing novelty disguise or costume items such as wigs, although it is not explained why this is done.