It looks like you're new here. If you want to get involved, click one of these buttons!
by Jeremy Douglass & Mark Marino
“AI” is a current zeitgeist phrase in academia and culture at large, due in large part to the recent rise to public prominence of (and hype about) large language models (LLMs) and the consequences of their rapidly increasing capability, especially in the generation of images, prose, and code.
For Critical Code Studies the large language model era raises a number of questions with respect to our methodologies. Systems which could automatically summarize and translate code into plain-text descriptions (or could generate code from plain-text descriptions) were previously rare, highly specialized, and limited--and suddenly they are becoming commonplace, in part due to a concentrated research agenda on code generation (i.e., if there’s one thing programmer’s would like LLMs to produce…). This evolving situation raises at least three broad categories of questions about the intentional humanistic reading of code:
There are of course many questions beyond these. Conversations about CCS+AI occur in the context of a number of related discourses, with one notable recent addition being Critical AI. As they write on https://criticalai.org:
Though rooted in critical methods from the humanities, social sciences, and arts, Critical AI works with technologists, scientists, economists, policy makers, health professionals, teachers, community organizers, legislators, lawyers, and entrepreneurs who share the understanding of interdisciplinary research as a powerful tool for building and implementing accountable technology in the public interest. Open to ideas born of new interdisciplinary alliances; design justice principles; antiracist, decolonial, and democratic political practices; community-centered collaborations; experimental pedagogies; and public outreach, Critical AI functions as a space for the production of knowledge, research endeavors, teaching ideas, and public humanities that bears on the ongoing history of machine technologies and their place in the world.
Our goal for this “AI” special topic of the Critical Code Studies working group is to solicit through this discussion as wide a range as possible of different experiences, perspectives, and insights into the intersection of contemporary AI and code, and what that tells us about Critical Code Studies. For some of our members this is a current area of active research--or active pedagogical practice. For others, being drawn into the hype of “AI” headlines may ultimately be a trap, whether due to the empty signifier of artificial “intelligence,” the devastating environmental impacts that corporate LLM paradigm appears to entail, or the implication of AI agents in the ongoing alienation of labor / “deskilling” enacted by algorithmic neoliberalism--among other possible reasons.
To kick off this week’s conversation, Mark and I brainstormed a list of a few CCS+LLM-related topics and questions to share with each other in an informal conversation. These included questions about intentional writing, interpreter personas, code and accessibility, and the role of the detail in code interpretation.
Below is the ~15 minute video:
We have also provided our shared pre-discussion topic brainstorm list below as an aid to discussion:
Our ask for participants is to:
...and, in addition, you might consider:
Comments
Still mulling over the topic, but wanted to briefly comment on bespoke code (great term btw!) vs Copilot-generated or other generative code. The generative code that I've seen is very similar in (lack of) style to prose from ChatGPT: homogeneous, vanilla coding, interspersed with random tangents when it gets confused.
Early on, I tried generating boilerplate code for a cloud project in AWS, something I figured would be easy: IDEs already generate these and there are many near-identical examples to train on. It got stuck importing hundreds of hallucinated packages, and never got to the code itself.
I then requested a brainfuck interpreter. This also has many examples on github, as it's often done by programmers for fun. The human-written bf-interpreters are often (but not always) flamboyantly written; in a single line, for example, in reference to the minimalism of brainfuck. Copilot crafted a perfect interpreter and could do so in multiple languages, but always utilitarian and clearly organized, with none of the flashiness of the human-written scripts. It seems designed to maintain that -- at least, I have not yet seen it break from a style that defers to "good" code as per most corporate guidelines.
Thank you Daniel--asking a code generation LLM to write a code interpreter is an excellent experiment that I hadn't thought of in this context, even though many of my own experiments have involved asking it to act as a (hermeneutic) interpreter of code.
Were your experiments using GitHub Copilot for code generation, or one of the other various Microsoft "Copilot" general purpose interfaces that they have been rolling out over the past year or so? (I saw that Microsoft bought a Super Bowl advertisement last night: "Copilot: Your everyday AI companion".
It was the Github Copilot Beta.
@jeremydouglass, I hadn't considered your point about how the AI prompt becomes the part of the chain of meaning of code. I'm reminded here of Katherine Hayles's flickering signifier, that the code or prompt we write is the top layer of these many levels of activity and re-interpretation (eg Python to C to assembly to machine code).
When we write Python code, we may have little control over how our code is optimized at the assembly level. But we do expect that our code will function according to the Python spec; if we add two numbers, they will not subtract instead. An LLM doesn't have such a spec and the prompts we use are not formalized in that way. How it responds to the same prompt may change over time as the training data changes. So can we think of it in code, and what does it mean to have code with such ambiguity?
Also, I was unfamiliar with the use of ChatGPT to explain existing code. That is very cool if it works, and I wonder if code will be written to be more understandable to an AI reader -- or perhaps as an antistyle, to obfuscate it against AI summarizing.
Also still mulling over this, but some stream-of-conscious thoughts on this:
Mostly as an aside, so far I haven't had much luck with getting something like chatGPT to write code snippets for me, but it's possible I'm just not skilled at prompt engineering because I found that my most common issue was it ignoring part of my requirements. I think similar to asking it to generate any text (be it for a thank you note or a letter of recommendation or anything), it requires editing and picking through to fix/fine tune, and that experience really felt similar to picking through a stranger's (or your past self's) code and figuring out what's going on.
Out of curiosity I threw some a (mostly) uncommented solution from last year's advent of code in chatGPT and asked it to explain, and it broke it down pretty well. I also asked it to rewrite it in pseudocode, and I found that to be less impressive. I want to do this with some actually old code I have no memory of writing (especially if it's from when I was more novice), but I don't have any of that on hand. I could see the former being super useful for accessibility to non-coders to follow along, and understand what they're looking at, but I'm not sure how useful it would be to actual learn at a deeper level. I feel like in some ways it might be analogous google translate to have a conversation with somebody, it can get you by, but some things definitely get lost.
One can see prompts as the first step of a series of translations, as always happen while programming (transpiling, or interpreting, or compiling) programs. The more you are precise, the more the code generated (step 2) corresponds to your first "code". Until you write a perfect prompt that is nearly the code - this is the dream of Literate Programming of Donald Knuth.
This way of programming as "expressing a desire of a program" is killing the idea that "there is more than one way to do it" (TIMTOWTDI): the code produced, as @Temkin pointed out, is utilitarian and clearly organized . Even if we know that this style depends on how the data for the training were chosen, even if we know that the "temperature" of the result could be changed, I imagine that in the end this style will be a model for young coders.
In the long run it will have the effect of splitting definitely the codes in two: the good and the evil (e.g. insecure, intrusted, poetic, artistic, politic, queer, ...) one. There will be a single step 1: the one and only one prompt that gives the correct result.
Very interesting thoughts!
On the prompting as programming aspect, I do think that GPT natural language interfaces (prompts) are, ultimately, still interfacing with a computational machine, so they're ultimately another kind of programming language. There are some people who are starting to make experiments with more traditional programming constructs (loops, conditionals), such as this blog post or this paper. I wouldn't be surprised that, as novelty wears off and model versions/architectures stabliize, strongly pattern-based linguistic expression becomes the norm! (I also wonder what people thought of languages like Algol at the time of their release, and if they thought it was "almost" like english!)
When we start to include code analysis, then I start to wonder what is the extent of the difference between a LLM and a code editor, from a functional perspective: they both create standard snippets of code, and they both represent an abstracted version of the source code (in the editor, via class declaration lists, function signatures, and general tooltips from standard library functions). I guess what we gain from a nicer reading experience of the LLM, we loose in terms of accuracy.
And finally, the point of LLMs in the context of critical code studies I also found quite striking. From a writing perspective, does it become relevant to know the exact metadata of the model used? version, date, training dataset? beyond reproducibility, how might we know the idiosyncracies of a particular version? From a reading perspective, it might be interesting to think of it as a sort of dynamic documentation? Using the documentation or the reference for a piece of software is one part of the methodological scaffolding which, without being considered the canonical description of the software, is nonetheless very useful in getting one's bearings!
So many great questions here. I want to offer one bit of sleuthing note. Unless you change the prompt, the comments in most ChatGPT-created code has one tell: the comments are in the second person or are at least in conversation with the reader/programmer of the code. Here are some examples from some code I recently had generated:
The first come from a Javascript page for displaying the output of multiple Tracery bots: Here are two of the comments.
and
Then, I had ChatGPT make a color changer for Prism.js as an experiment in Code Poetry. See these comments:
So in my VERY preliminary experiments, the comments in the code seem to maintain that ChatGPT conversational tone. No doubt that is a product of the prompts that I write that are often requests: Write me some code that does x... And this could be changed with a System prompt, more detail in the prompts about the style of the comments, or fine tuning.
But I do think a very naive use of ChatGPT to generate code would reveal similar patterns -- assuming there was something out there who was trying to, say, discourage students from using ChatGPT to generate their code. I don't recommend playing the game of cat-and-mouse, though. Too many holes in it. Swiss cheese in fact.
Has anyone noticed similar patterns?
A lot of these questions seems to hinge on "style." Quite apart from ChatGPT, it is certainly interesting how easy it is to tell two pieces of code apart based on style alone, apart from comments, variable names, etc. Even easier when style becomes consciously enforced in a project. For example, Linus Torvalds mandates 8 space tabs to discourage nesting, which ends up making the Linux kernel very flat and (arguably) easy to read.[1] To me, the kind of code ChatGPT puts out is most similar to what you'd find in a tutorial, rather than a finished product. That doesn't necessarily mean that it's worse or less functional, but there's a certain kind of thing that's generally present in any large-scale coding project that I've never seen ChatGPT produce. Not entirely sure what to call it—maybe a casually poetic moment. Fully fluent programmers play with their languages (whether they are aware of it or not), and big projects tend to accumulate little moments of this interspersed throughout. What comes to mind currently is a spot in the very old x86 linux kernel code where the author uses 12 instructions for a print function, in the process abusing the daa instruction to convert to ascii hex. (daa is an incredibly obscure instruction that works with Intel's support of binary coded decimal, where you store decimal numbers as if they were hexadecimal numbers—so nothing to do with its actual use here.) This sort of thing is not always remarked on, since it's not usually essential or necessarily useful for any actual purpose. But you find it all over code written by real humans. When I asked ChatGPT to give me a similar routine, it does an ok job and uses the much more predictable and readable method of adding the number corresponding to ascii '0' to the register masked by 0x0F, then shifting right for the next 4 bits.[3]
The thing is, since some of these questions are about the relationship between style and intention with AI code, I don't believe for a second that ChatGPT never ran across a more clever way of writing this routine in its training data. Moreover, I don't believe for a second that ChatGPT autonomously arrived at this as the best way to code. Rather, there is something about this style that is either baked into the objective function when training, or dictated after the fact with one of the vague extra bits that aren't just a completely vanilla LLM. In other words, yes, clearly there is intent here. In part, I think this relates to one of the more general challenges LLMs have: commercial viability might require mimicking styles, but has to somehow appear as unstyled, completely generic, uncultural, uninteresting. Obviously this has the same issues with code that it has with every other language: ChatGPT never accidentally slips into a dialect where "he walk" is more common than "he walks," and it never slips into a dialect of code where obfuscations, tropes, etc. are more common than "clearly written," functional, and well-commented code. We have to be careful in both cases to recognize that this is not a "less particular" dialect, but in fact is something styled, particular, cultural, interesting, but is pretending to be none of those.
So just for fun I thought I might try and see if it could create programs in different styles. The results are...not amazing.[4] I mean, it actually did a sort of plausible job at mimicking a lot of styles, which is not the easiest task in the world. But, only sort of plausible. It's kind of like making a musical phrase sound more like Mozart by adding more notes, and then commenting that "Mozart was known for using too many notes" [sorry, gratuitous Amadeus reference]. In particular, at the end I tried to force it to generate a program mimicking a beginner, full of errors and conceptual misunderstandings. Interestingly, this caused it to hallucinate more errors than it was actually willing to create, for example, "return 0; // Incorrectly returning non-zero value". It's not definitive evidence, of course, but it does seem possibly indicative of some sort of guardrail that is fixing up the code after the LLM generates it.
These are interesting questions, but of course it does bear saying that neither intent nor authorship are necessary for something to have meaning, and anything hermeneutic ought to fall in our remit.
[1] https://github.com/torvalds/linux/blob/master/Documentation/process/coding-style.rst
[2] https://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/linux-2.4.git/tree/arch/i386/boot/bootsect.S
[3] https://chat.openai.com/share/065848d9-ac91-496d-a1d8-1a66160d1786
[4] https://chat.openai.com/share/6db8552b-7fe9-4ae5-a1ba-9c54838c713f
@ebuswell very interesting. I agree that LLM models are not natural objects and they should obey to some business goals - and a weird code is less sellable than a classical styled one. The problem comes to who will be the intended buyer: a freshman? a programmer with 1 year of working experience? the boss of the programmer?
They are surely aware that there are some (tentative) studies on coding style and relation to good programming, like this one Exploring the Impact of Code Style in Identifying Good
Programmers . So it will be interesting to see if in the end they will try to mimic different styles, or just to identify the "good" style, the one recommended by Torvalds.
@markcmarino if I understood, you are suggesting to keep into account comments and not only code. I was interested about how coders use different persons in their code comments ("here we are looping..."), because often there was an identification between coder and some kind of magic entity (the program). In your example, on the contrary, the author of comments is clearly different from the program itself. There are three subjects: the questioner, the oracle, and (hidden somewhere) the program.
@Temkin -- thanks for that interesting comparison of the chain of prompt-LLM-generation to "flickering signifiers." For those not familiar, that's from Hayles' How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics in Chapter 2: “Virtual Bodies and Flickering Signifiers”:
Regarding being "unfamiliar with the use of ChatGPT to explain existing code" -- I have created a separate thread with a very short demo of the kind of persona-based prompt engineering for critical code studies that I'm talking about:
So, along with reading code with AI (or LLMs), we can write code with them, whether using a generator through its website or app, or as an extension on popular programming platforms, like the way VS Code integrates Co-Pilot. What can CCS do with computer-written code? What do experiments producing code with LLMs indicate about the future of programming? We have addressed this question of "computer-generated code" throughout the years in different forms, for example, when popular programming platforms generate code or stub code.
Though some might scholars be concerned that writing code with LLMs disrupts the intentionality of messaging, intentionality was always a slippery notion.
A larger concern might be that the relationship between the programmer and the code changes when they have only prompted the code. Is that all that different form times in the history of computing when one person designed the algorithm or the process and another person had to encode it? If so, how?
Furthermore, what happens when we use an LLM to write code in a language we can't read or aren't fluent in? I've added an example of a time I used an LLM to generate an interactive game using Inform 7. The game was themed around Searle's Chinese Room because of that thought experiment's echoes with our current conundrum. I have launched a discussion of that code as a code critique for more in depth discussion.
LLM Writes The Chinese Room (Code Critique)
During our meetup, @jeremydouglass and I demonstrated a few of the ways to prompt an LLM to perform a Critical Code Studies reading on source code. @ranjodh had the brilliant idea of uploading the contents of my Critical Code Studies book and 10 PRINT. That left Jeremy and I wondering whether the LLM does better or worse with that training data. ChatGPT already seems to have quite a bit of awareness of CCS, perhaps from training on the conversations on our forum. We were also wondering what might make good training content: Code Critique Threads, the DHQ special issue, or other content from the CCS Bibliography. Any thoughts?
Human language interfaces to code summarization and generation:
The question here is whether the generated code would reliably reflect the stated functionality it is to perform, or, more importantly, the intended functionality. True understanding and authentic representation rely heavily on context and shared beliefs. While NL interfaces can lead to a valid version of the code and facilitate entry into code generation, it is unlikely to lead to reliable, extendable code usable for sophisticated systems. Thus, it is most useful for artistic code generation, rather than, say, the generation of code for sophisticated banking systems and aerospace systems. At the same time, art and exploration and fundamental avenues for understanding our world.
Intentional writing and distinguishing AI-generated code from bespoke code:
Is it important to do this, aside from a desire to properly attribution authorship – and blame if something goes wrong? Perhaps we need to articulate the conditions under which it matters whether something is AI-generated or human-generated.
Interesting details in code:
How do we define what is interesting? What mechanism is used to recognize interestingness? Bacon, for example, is an early machine learning/production system model developed by Pat Langley and colleagues in the 1970s named after Roger Bacon [1] [2]. One of the challenges of automatic discovery proved to be knowing when what you have discovered is interesting. Recognizing a pattern of interest in a data set is one thing, finding something of interest that we aren’t looking for is far more difficult.
[1] P. Langley, “Bacon. 1 : A general discovery system,” in Proc. 2nd Biennial Conf. of the Canadian Society for Computational Studies of Intelligence, 1978, 1978, pp. 173–180. Accessed: Feb. 17, 2024. [Online]. Available: https://cir.nii.ac.jp/crid/1571417125831781120
[2] P. Langley, “BACON: A PRODUCTION SYSTEM THAT DISCOVERS EMPIRICAL LAWS,” in International Joint Conference on Artificial Intelligence, 1977. Accessed: Feb. 17, 2024. [Online]. Available: https://www.ijcai.org/Proceedings/77-1/Papers/057.pdf
Explaining code is also an interesting proposition. Doing well requires first identifying the level of abstraction desired, then being able to generate an explanation at that level, thus, the context toward which the explanation is aimed. A response to the question “What is your dissertation topic/code about?” will vary depending upon whether you are speaking with your great grandmother, a dissertation advisor, or a CCSWG colleague, and how much time you have. The same with a FORTRAN 77 sophisticated formatted print statement. Given a set of goals, requirements, and knowledge of how a defense system is expected to work, one can write a program to analyze whether the code meets its objectives and is likely to be reliable, then translate that analysis into natural language. We can do the first part, I’m not sure whether we have tackled the second part – or need to, given that the appropriate people seem to be able to understand the analysis and report out on it as needed.
The more we understand our motivations and define what we are looking for and how it is we do things, the more we will unlock the ability to converse with computer code.
I did some exercises with the aim to better understand which could be the uses of Generative Artificial Intelligence to (critically) read code, as an help for someone who isn't able to do so by himself. You can find a full description of my attempts in this thread.
I asked to ChatGPT to act as a senior advisor explaining the interesting part of my PHP poem.
ChatGPt 3.5 was surprisingly able to understand the jokes in the license and the philosophical intentions behind the code (the play among different identities in the same person). It was also able to translate the code from PHP to English, and from a program to a story or a theater play, to explain the "meaning" of the poem (not of the code!) to a 8 yo child.
Questions arising:
One of the interesting experiments I have been carrying out concerns the use of institutional API's within digital humanities. These are often quite complicated and although they are available for use by digital humanists, I wonder how much they are actually usable by humanities scholars. However using ChatGPT it is very easy to get the protocol, the summaries of the API and other information, for example:
Library of Congress API
Library of Congress API description:
Protocol: REST (Representational State Transfer)
Authentication: API key required for access
Data Format: Responses typically in JSON or XML
Endpoints: Endpoints for searching, retrieving metadata, and accessing digital content
Rate Limiting: Requests may be subject to rate limits based on API usage policies
The Library of Congress offers various APIs allowing access to its vast collections, including books, manuscripts, photographs, maps, and more. These APIs enable developers to build applications that search, retrieve, and explore the Library's resources programmatically.
Asking ChatGPT to generate sample code in Python to access the API gives this:
Which ChatGPT can then be asked to critique, and will happily do so, and seek to improve its original code fragments. But this was a technical code review:
API Key Handling: Storing the API key directly in the code is not recommended for security reasons. It's better to use environment variables or configuration files to manage sensitive information.
Error Handling: The code checks for a successful response (status code 200), but it doesn't handle other possible HTTP status codes, network errors, or exceptions that may occur during the request. Adding more robust error handling would make the code more resilient.
Pagination Handling: While the code checks for pagination information in the response, it doesn't demonstrate how to handle pagination to retrieve additional pages of results. Adding logic to handle pagination and concatenate results from multiple pages would be beneficial.
Code Structure and Readability: The code could benefit from better organization and comments to improve readability and maintainability. Breaking down the logic into smaller functions and adding descriptive comments would make it easier to understand.
API Endpoint Handling: Hardcoding the API endpoint (search_endpoint) may make the code less flexible. It's better to make this configurable or provide options for different endpoints if the API offers multiple endpoints.
Consistency in Error Reporting: The error reporting is currently printed to the console using print, which may not be the most consistent or user-friendly approach. Consider using Python's logging module for consistent error reporting and logging.
Testing and Validation: The code doesn't include any validation or testing of the response data. It's important to validate the structure and content of the response to ensure that the code can handle various scenarios and edge cases gracefully.
This seems to me to have a lot of potential for using LLMs like ChatGPT to provide critical code readings and analysis of working with institutional code interfaces. Perhaps performing comparative analysis but also seeking to understand how these APIs are shaped in particular ways to reflect the embedded norms of an institution.
I then asked ChatGPT to make a critical code studies reading of this code:
Agency and Control: The code reflects a form of technological agency, enabling users to access and interact with digital resources provided by the Library of Congress API. However, it also embodies control mechanisms, as the API provider dictates the terms of access, usage, and data availability, shaping users' interactions with cultural heritage materials.
Visibility and Access: The code embodies the digital divide by facilitating access to digital resources for those who have the technical proficiency to interact with APIs and access digital content. This raises questions about equity, as not all individuals or communities have equal access to digital tools, resources, and opportunities for cultural participation and engagement.
Surveillance and Privacy: The code's reliance on API keys and authentication mechanisms underscores issues of surveillance and privacy in digital environments. Users must disclose personal information (such as API keys) to access digital resources, potentially subjecting themselves to monitoring, tracking, and data collection by both the API provider and third parties.
Ownership and Control of Information: The code highlights issues of ownership and control of information in digital spaces. While the Library of Congress API provides access to cultural heritage materials, it also retains ownership and control over the data and the terms under which it can be accessed, shared, and reused. This reflects broader debates about intellectual property rights, open access, and the commodification of cultural artifacts in digital contexts.
Interrogating Technological Determinism: The code exemplifies technological determinism, the idea that technology shapes society and human behavior. However, a critical perspective challenges deterministic assumptions by acknowledging the role of social, political, and cultural factors in shaping technological development, implementation, and impact. It encourages questioning the underlying power structures, biases, and ideologies embedded within technological systems.
Digital Labor and Exploitation: The code implicates issues of digital labor and exploitation, as developers and users invest time, effort, and resources into navigating technological systems and accessing digital resources. This labor is often unpaid or undervalued, contributing to the maintenance and reproduction of digital infrastructures and platforms.
Cultural Memory and Representation: The code intersects with questions of cultural memory and representation by facilitating access to historical and cultural materials preserved by the Library of Congress. However, it also raises questions about whose histories, voices, and perspectives are represented in digital collections and how these representations shape collective memory, identity, and cultural narratives.
Interestingly ChatGPT can be prompted to try to address some of these issues, but often these are relatively limited code changes in the first instance. Nonetheless, there is a real possibility of deepening this kind of analysis on other code sources.
I have also attempted to query the Norwegian Nasjonalbiblioteket API
and then make usable python code to query it
@davidmberry -- in your LOC API example, after asked your LLM to generate a critique (with headings like "Surveillance and Privacy" and "Digital Labor and Exploitation") I was particularly struck by your final note:
In some ways I have held these two cases separate:
What is interesting to me about your final gesture is that it suggests that there is no particular reason there need to be separate. CCS could also be part of a generative circuit in which the code is rewritten based on the findings.
It also suggests an interesting way to reimagine engaging with code as cultural. For example, rather than a Wikipedia edit-a-thon, you could imagine a code jam in which coders working with LLMs are exploring a particular cultural area (e.g. gender representation in code) across a very broad and diverse code base (e.g. large portions of GitHub in multiple programming languages) and generating proposed pull requests.
I hope I don't go too far off topic. But I would like to focus my comment on two points that I believe are relevant to thinking about the code as a cultural product: intentionality and work. Which I am also interested in linking in this comment.
In terms of intentionality, I think that a framework that can be useful to think about the transformations that it undergoes in the context of Artificial Intelligence can be Latour's concept of mediation (https://sixyears2011.files.wordpress.com/2011 /03/latour-bruno-on-technical-mediation.pdf). On the one hand, I find relevance on the idea that intentionality is not something that belongs exclusively to people, nor to objects, but rather that it is realized in the encounter between the two, in the actualization of the possibilities of action proposed by the object. In the case of AI, in this sense, I wonder what the ultimate goal of a technology that generates code is. And what I can speculate is that in the end it has to do with making the code completely opaque in favor of a technology that executes actions not prescribed in its initial programming, only through prompts. Returning to Latour, I think that another dimension of his mediation theory that is useful is 'delegation'. In this sense, what Latour argues is that when a user updates the potential of a technical object, the intentionality is dislocated in time and space, as what is put into play is a dispersed and past intentionality that corresponds to the intention of the creator of the tool. In this sense, it is not that our negotiation in terms of intention happens as a human-machine cooperation, but rather that this machine mediates and expressed other more diffuse intentions that are, on the one hand, those who wrote the code on which the LLM was trained as a statistical model, but also those who make the tool available with their respective interfaces. I wonder then, what intentionality are we questioning or trying to interpret when we do a critical code analysis?
On the other hand, in relation to these different intentions that materialize in a technology like chatGPT, I wonder who owns the labor that is present and that it is not only that of machines that adapt a statistical model to a series of samples. In this sense, Marx's concept of “appropriation of living labor” is useful to me. This concept refers to how, within capitalism, the 'social knowledge' or 'general intellect' that is initially found in the realm of the commons, and that is individualized in the subjects, is privatized as fixed capital in the concretization of said knowledge in the form of machinery. It is my intuition that there is a strong parallel between that scenario with our current moment. The main difference is that, while in the ‘machine age’ living labor is represented mainly by manual skills and practical knowledge, what we see now is a privatization of immaterial labor, which until now was thought to be inseparable from individuals. The parallel is clear, when we think of other cultural artifacts, such as the images that Dall-e produces, where our sense of authorship is stronger, the work of thousands of illustrators or artists is privatized through statistical normalization to produce a result, which, although not creative in the classical sense, does displace the original creators carrying out a massive privatization of 'social talent'. In the field of computer code, the creativity and cleverness that we attribute to a good programmer runs the risk of being overshadowed by the idea that the code is only instrumental and functional, given that our notion of authorship is already weaker when talk about creating software.
To respond on the question of intentionality: In my code critique on LLM-generated code style, I touched upon the argument that it has no author, but I'd like to expand a little more on whether this is a problem for interpreting and critically engaging with code as an artefact.
In their book Truth, Fiction, and Literature, Lamarque and Olsen (OUP, 1996) argue that when we read fiction we take a "fictive stance". That is to say, when read what we take to be fiction, we treat it as fiction and take its propositions to hold within the fiction. Imagine you discover what appears to be a diary in the attic, its provenance completely unknown. You can treat it as a historical artefact, as non-fiction, and in doing so you might judge whether it succeeds as such (e.g. does it accurately record anything?). Or you could treat it as a novel, take the fictive stance towards it, and then you would judge it on completely different grounds (e.g. character, plot, pacing). In either case, you might get a fruitful reading of it without having to know what it was really intended as. In a similar way, a modern reader can read J.W. Dunne's An Experiment With Time (A & C Black, 1927) in which he describes how he came to believe in prophetic dreams either as a philosophical work, or, taking the fictive stance, as an occult thriller. In the manner it was intended, the work isn't very convincing, but as an imaginative work it is quite compelling.
AI-Generated code from large language models has no initial author, but in curating or using the code it becomes authored. Imagine someone sticking their hand in a bin and casting out the rubbish onto the ground. The litterer is analogous to the prompt writer who puts in a prompt and hopes for a useful or interesting response. There may be some strategy in picking the right trashcan or the best angle for delving, but the composition of the resulting mess is unplannable. The people that put their rubbish in the bin in the first place didn't author the mess, nor the factory that produced the bin. However, say that litterer discovered a slurry of crumbs and half decomposed salad that produced an interesting splatter on the paving slab: the nutritional value of the mixture could be questioned, or an artistic stance could be taken towards the splatter as if it had been intended. And if the litterer took that slurry and ate it, or displayed it in an art gallery, its value as food or art could be judged: the trash would be transformed through human curation into an artefact worth considering (even if, most of the time, we might expect it to be the effluvial waste that it appears to be).
All to say that yes, we can critically examine AI-generated code, we can take a stance towards it treating it as if it were something intentional designed, but we should expect the experience to be more like dumpster-diving than fine dining.
@StephanieAugust -- Great question re: "How do we define what is interesting? What mechanism is used to recognize interestingness?" ... "finding something of interest that we aren’t looking for is far more difficult."
In the context of large language models I conceptualize "interesting" as related to AI prompt engineering of personas and how they influence salience in responses. Even though the token 'interesting' is probably overdetermined for the language model (and we could use any synonym to similar effect) the concept has value. Here is a toy example:
So in the simplest and least ambiguous case, "interesting" is related to salience, and defining a persona is crafting the space of self-attention that the transformer uses to search for salient words ('pirate' is closer to 'parrot' than to 'bagel'). This isn't very exciting at first--you don't even need transformers to do these kind of ngram correlations, and you could use many different approaches (e.g. topic modeling) to find a bunch of words in a "pirate cluster." Still:
As soon as things get even a bit complex, there is the possibility that the output may be surprising--the LLM unprompted suggests 'Venus' for both the relationship of biology to astronomy (the sciences) and use in plant names. This isn't something I necessarily could have anticipated, and it matters that the LLM doesn't just return the result ('Venus') which might seem like an error but instead describes why it thinks it is interesting (i.e. it is salient in the discursive contexts of the sciences and of plant names).
For the AI persona, "interesting" connects a form of attention to a discursive space. By analogy, I can get an LLM to search for "interesting" excerpts from source code by telling it who I am (or who it is): a Marxist, an eco-critic, a games studies scholar, et cetera. For code hermeneutics here is a second small example in code poetry (it still doesn't involve real functionality, just matching keywords and strings).
Here, rather than a Babbage and Lovelace "difference engine", the AI language model is acting as a kind of attention engine: tell it who it is / who you are, and it will tell you what it finds "interesting."
I am going to be brief in this one, but something I have been thinking about is the long history of "AI" as a field, and how what that means has shifted over time.
In 1979, Hubert Dryfus published "What Computers Can't Do." His criticism of the foundations of the field are ones that I find still apply today. Pamela McCourduck summarized one of Dryfus' points as such:
I find this statement to be just as applicable now as it was in 1979.
The Wikipedia page is a pretty decent introduction: https://en.wikipedia.org/wiki/Hubert_Dreyfus's_views_on_artificial_intelligence
A theme that underlies many posts in this fascinating discussion —thank you! – is the issue of who originates and care takes the information in any AI system.
A disturbing illustration of why this is Important, particularly for artists and writers is
A student of mine at the School of The Art Institute of Chicago input images of her own sometimes process oriented, sometimes with conceptual elements work into an AI system.
In return, she got conventional amateur abstracts, conventional figurative work, and a scowling, creepy, hostile-looking audience figure in front of one of her works.,
Welcome to the culture wars.
A paper I read ("How AI Fails Us" Siddarth D. et al) illustrates a point that might be similar to Hubert Dryfus's point mentioned by @Steve.Klabnik is how major AI companies creates "self-reinforcing and uncritically deterministic narratives about intelligence and innovation." A core idea of the paper was that these narratives which AI companies push polarize the future of AI into only two directions:
However, this vision of the future is dependent on the current ubiquitous goals of these AI companies promote of "solving intelligence" and "outperform humans", a development ideology for AI that focuses specific on human competition and AI autonomy. In addition to both of these directions are dependent on the concentration of AI control and AI power in a small amount of people, namely the technical and executive core of exactly these companies.
@leonardoaranda I think your commentary of work is insightful, and I believe it could be applied to services like AMT (Amazon Mechanical Turk) where a similar process where this type of immaterial labor is being managed and algorithmized, thus transformed into a programming interface for the "innovative/creative" entrepreneur/programmer. (I'm thinking of Irani, L "Difference and Dependence among Digital Workers") This and AI models similarly appropriate labor during their training (for AI) and use (both), both transforms human processes into software which the private tech company and programmer can use algorithmically and bypass the crucial component of labor relations, labor management, and labor rights that traditional industry needs to acknowledge. But the reality is different in that these components of labor is not really being removed or simplified, but "abstracted" in the programming sense. Complex processes are being bracketed into simple interfaces, and the mechanisms are being made deliberately opaque so that the user cannot even examine inside to evaluate and understand it. For the example of AMT, companies that should realistically be seen as a labor/management companies are being invested by venture capital as tech companies. Whereas the problematic labor practices are being backgrounded by Amazon, and ultimately just harming the rights of the AMT digital workers. AI's data problems are not new as web crawling and data scraping have been around for a while, but the issues of it become prominent as AI now possesses the real capability in replicating this type of labor. But this dilution is not insignificant, I think that it is improbable and unrealistic that attribution and credit can be given to all the source of a large dataset, maybe for a fine-tuning of a model on a specific person's style it becomes realistically meaningful and significant. So I think it is definitely a real issue that needs to be explored theoretically and technically.