Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

2024 Participants: Hannah Ackermans * Sara Alsherif * Leonardo Aranda * Brian Arechiga * Jonathan Armoza * Stephanie E. August * Martin Bartelmus * Patsy Baudoin * Liat Berdugo * David Berry * Jason Boyd * Kevin Brock * Evan Buswell * Claire Carroll * John Cayley * Slavica Ceperkovic * Edmond Chang * Sarah Ciston * Lyr Colin * Daniel Cox * Christina Cuneo * Orla Delaney * Pierre Depaz * Ranjodh Singh Dhaliwal * Koundinya Dhulipalla * Samuel DiBella * Craig Dietrich * Quinn Dombrowski * Kevin Driscoll * Lai-Tze Fan * Max Feinstein * Meredith Finkelstein * Leonardo Flores * Cyril Focht * Gwen Foo * Federica Frabetti * Jordan Freitas * Erika FülöP * Sam Goree * Gulsen Guler * Anthony Hay * SHAWNÉ MICHAELAIN HOLLOWAY * Brendan Howell * Minh Hua * Amira Jarmakani * Dennis Jerz * Joey Jones * Ted Kafala * Titaÿna Kauffmann-Will * Darius Kazemi * andrea kim * Joey King * Ryan Leach * cynthia li * Judy Malloy * Zachary Mann * Marian Mazzone * Chris McGuinness * Yasemin Melek * Pablo Miranda Carranza * Jarah Moesch * Matt Nish-Lapidus * Yoehan Oh * Steven Oscherwitz * Stefano Penge * Marta Pérez-Campos * Jan-Christian Petersen * gripp prime * Rita Raley * Nicholas Raphael * Arpita Rathod * Amit Ray * Thorsten Ries * Abby Rinaldi * Mark Sample * Valérie Schafer * Carly Schnitzler * Arthur Schwarz * Lyle Skains * Rory Solomon * Winnie Soon * Harlin/Hayley Steele * Marylyn Tan * Daniel Temkin * Murielle Sandra Tiako Djomatchoua * Anna Tito * Introna Tommie * Fereshteh Toosi * Paige Treebridge * Lee Tusman * Joris J.van Zundert * Annette Vee * Dan Verständig * Yohanna Waliya * Shu Wan * Peggy WEIL * Jacque Wernimont * Katherine Yang * Zach Whalen * Elea Zhong * TengChao Zhou
CCSWG 2024 is coordinated by Lyr Colin (USC), Andrea Kim (USC), Elea Zhong (USC), Zachary Mann (USC), Jeremy Douglass (UCSB), and Mark C. Marino (USC) . Sponsored by the Humanities and Critical Code Studies Lab (USC), and the Digital Arts and Humanities Commons (UCSB).

[Code Critique] Cloak of Darkness and the Code Style of Large Language Model Generated Code

Cloak of Darkness
Emily Short and Graham Nelson, Inform 7, 2006
& ChatGPT 3.5, 2024

Large-language model (LLM) generated text is text-from-nowhere, it has no one author, but like all written text it has a style and that style is generally quite poor. Being synthesised from a pool of existing text, LLM prose tends to be flat and clichéd. Consider the first paragraph of Arthur Conan Doyle's The Sign of Four:

Sherlock Holmes took his bottle from the corner of the mantel-piece and his hypodermic syringe from its neat morocco case. With his long, white, nervous fingers he adjusted the delicate needle, and rolled back his left shirt-cuff. For some little time his eyes rested thoughtfully upon the sinewy forearm and wrist all dotted and scarred with innumerable puncture-marks. Finally he thrust the sharp point home, pressed down the tiny piston, and sank back into the velvet-lined arm-chair with a long sigh of satisfaction.

Compare it with this ChatGPT alternative (the prompt being "Write an opening paragraph in the third person in which Sherlock Holmes injects himself with a syringe of cocaine and then sits in a chair with pleasure. The paragraph should have four sentences."):

In the dimly lit study of 221B Baker Street, Sherlock Holmes methodically prepared a syringe of cocaine with practiced precision. With a swift and practiced motion, he injected the solution into his arm, feeling the familiar rush of euphoria cascade through his veins. As the drug took hold, a contented sigh escaped his lips, and he sank into the plush armchair, his mind already racing ahead to the mysteries awaiting his razor-sharp intellect.

The generated prose has more clichéd constructions — "rush of euphoria", "razor-sharp intellect" — and is less viscerally described. There is lack of specificity of the details. Over the course of several such paragraphs, to the practised reader, LLM generated prose is almost unmistakeable. Unlike the author, the generator has no internal picture of the scene or sense of the world (of mantel-pieces, morocco cases, velvet-lined arm chairs). But is the same true of generated code?

It would be a gross mistake to treat code and literature them as having the same aim: the primary 'audience' for code is the compiler, not other human beings (Spolsky, 2004, pg.72). Does it matter if the tone is flat? Surely 'cliched' code is time-tested and more reliable? Compare the following two code snippets, excerpts from an Inform 7 implementation of Cloak of Darkness. One is generated by ChatGPT (from the Cloak of Darkness) and the other is written by Emily Short and Graham Nelson, from the same specifications:

After going south from the Foyer:
    now the Bar is lit;
    if the message is not scratched:
        now the message is scratched.    

A message is a kind of thing.
The message is in the Bar. The printed name is "message".
The description of the message is "[if the Bar is lit]You have won[otherwise]You have lost[end if]."

The scrawled message is scenery in the Bar. Understand "floor" or "sawdust" as the message.

Neatness is a kind of value. The neatnesses are neat, scuffed, and trampled. The message has a neatness. The message is neat.

Instead of doing something other than going in the bar when in darkness:
    if the message is not trampled, change the neatness of the message to the neatness after the neatness of the message;
    say "In the dark? You could easily disturb something."

Instead of going nowhere from the bar when in darkness:
    now the message is trampled;
    say "Blundering around in the dark isn't a good idea!"

In the first example, the code alludes to a property (it doesn't get around to defining), that a message can be 'scratched' or 'not scratched'. This has been derived from the line of the specification "A message is scratched in the sawdust on the floor." Assuming the rest was well produced, this would be a perfectly serviceable distinction to make. However, it doesn't do anything with the distinction thus made, the message doesn't need to be scratched, it's wholly extraneous.

Superficially, the properties of the message in the human-authored example are similarly described. Here the messages can be 'neat', 'scuffed' or 'trampled'. But unlike in the generated example, the distinction serves a purpose: the message can become progressively more trampled. The way this progressive trampling is enacted, "change the neatness of the message to the neatness after the neatness of the message", i.e. from neat to scuffed, or scuffed to trampled, is very elegant, achieving its intent in one well-formed English language sentence.

Cloak of Darkness is an example game intended to show off some features of an interactive narrative language, a minimal implementation, and the authors do that here with aplomb. The code may be compiler-facing (in that it literally will compile), but the style of the code is designed to teach and impress. Not only won't the generated code example compile (a message is defined as a kind of thing, and then immediately as a specific thing in a definite location, a clear contradiction), but, having no author, it has no agenda. Anything we read into it is accidental.

LLM-generated computer code has greatly increased in popularity— some two thirds of ChatGPT's 180 million users are using it for programming, but before these tools were available, programmers were already copying code snippets from existing sources. However, this new borrowing is somewhat different. On an internet forum such as Stack Exchange, code offered is by a person who might be expected to understand how it works, while an LLM has no deep conceptual understanding or analysis of the purpose and connectedness of code (Sharma & Sodhi, 2023), any more than it would have an understanding of narrative style. Writing code, like writing prose, is a craft.

The literary critic J. Middleton Murray, in his Problems of Style describes three sense of the word style as it pertains to writing. Namely, idiosyncrasy of expression, lucid exposition of ideas, and magnificent, personal, particular expression. Generated code and prose has notable idiosyncrasies, so it has a 'style' in the most trivial sense. At its best (when explaining what some piece of code does, say), generated prose can be clearly exposited— though through its frequent mistakes, its code generally isn't lucid. Most of all, lacking an agential agenda, having no sense of craft underpinning it, it can never have a magnificence of style (at least in the very personal sense Murray is talking about). If the prompt-writer is lucky, the code generated may have the intended effect on the compiler, but as an artefact for a human to read, it is an ugly thoughtless thing.

Bibliography
Cloak of Darkness, originally devised by Roger Firth, circa 1999
ChatGPT and the Death of the Author - Saffron Huang , The New Statesman, 26/2/2023.
The Sign of Four, Arthur Conan Doyle, Broadview, 2010
Calculating Originality of LLM Assisted Source Code, Shipra Sharma & Balwinder Sodhi, Computing Research Repository, 2023
Joel on Software, Joel Spolsky, Apress, 2004
Problems of Style, John Middleton Murray, OUP, 1960

Comments

  • LLM-generated computer code has greatly increased in popularity— some two thirds of ChatGPT's 180 million users are using it for programming, but before these tools were available, programmers were already copying code snippets from existing sources. However, this new borrowing is somewhat different [. . .] Writing code, like writing prose, is a craft.

    I've been engaging in an extended experiment with GitHub Copilot and my own code projects. In some recent commits to my tool Extwee, for example, I've used Copilot as part of writing unit tests. In programming, a unit test is part of checking if every block of code works as it should and verifying the results of feeding different types of data to functions. In my own code, the use of Copilot has been pretty mixed. While it can adopt to common patterns, my tests tend to have the same general test data, for example, it seems to struggle with anticipating creative edge cases. It is good at "ugly" code; it is not good with inventing new structures.

    One of the observations I've seen, matching to the "thoughtless" comment, is the role hallucinations can play in code patterns. Often, I have to fight with Copilot when it thinks the next section of code should follow a pattern I've used before but would not make sense again. It would prefer I test the same section of code many times rather than move to another. It strongly prefers things it knows than trying to come up testing for something unexpected.

  • edited February 28

    @JoeyJones I love the use of Cloak of Darkness here as an example here--not because it is IF, but because it was designed to serve as a kind of Rosetta Stone for multiple sometimes very different IF languages / parsers / IDEs, and as such it has a specification that you can feed into an LLM.

    Very very few pieces of code have a complete natural language specification that precedes the code itself. In fact, I am currently struggling to think of any other at-hand examples outside of Knuth textbooks....

    Regarding the flatness of code generated from the specification, I wonder to what extent this is inherent and to what extent it might be remediated with chain-of-thought prompting that breaks the coding down into smaller steps and/or models stylistics. Of course, by the time you exert that much effort perhaps you should have just written it yourself....

Sign In or Register to comment.