Defactoring Code as a Critical Methodology

jorisvanzundert · February 2024

Hi all, this thread focuses on the article that Matthew Burton (Pittsburg University) and I wrote for the DHQ special issue that is still to be published: “Defactoring ‘Pace of Change’”.

Abstract

Our article highlights the increasing importance of code in computational literary analysis and the lack of recognition and visibility code as scientific output currently receives in scholarly publishing. We argue that bespoke code (i.e. purpose built analytical code intended for single or infrequent use) should be considered a fundamental part of scholarly output. As such, a more overt inclusion of code in the scholarly discourse is warranted. An unanswered question is what a proper methodology for this would be.

As an experimental contribution to developing such a methodology we introduce the concept of “defactoring”. We propose defactoring as a technique for critically reading and evaluating code used in humanities research. Defactoring involves closely reading, thoroughly commenting, and potentially reorganizing source code to create a narrative around its function. This technique is used to critically engage with code, peer review code, understand computational processes, and teach computational methods.

We describe a case study where we have analyzed in this way the code associated with a publication in literary studies by Ted Underwood and Jordan Sellers (“The Longue Durée of Literary Prestige”). Based on our case study experience we question the separation between scholarly publications and code, and we advocate breaking down these boundaries to create a more robust scholarly dialogue. Linking code to theoretical exposition can enhance scholarly discourse and invites further exploration of the relationship between literary interpretation and computational methodology.

Finally, we also reflect on the challenges we met in publishing an article that combines theoretical discussion with defactored code, and we highlight the gap between scholarly argument and case-study material that is enforced by current academic publishing platforms. We suggest that there is a need for academic genre conventions for publishing bespoke code and we proposes the idea of a notebook-centric scholarly publication that integrates code and interpretation seamlessly.

Questions

I have been wondering again off late (but this is years old unease tbh) why scholars and scientists seem so indifferent to the quality of the code and algorithms they routinely apply in research. To me it seems that the technical and methodological quality of code applied in any analytical fashion should be interrogated quite rigorously. Any tiny error in that code may completely invalidate any finding. However, all our quality control processes (peer review, metrics, academic crediting, institutional evaluation – flawed as they may be by themselves) are almost exclusively aimed at the final outcome of research: the publication. One could argue that a thorough discussion of the code applied should be part of any research paper and should therefore have been scrutinized during peer review. But this is almost never the case. Only a facile methodological abstraction is presented on paper, and there is the assumption that this expresses what the code actually does. Arguably in many cases tailor made algorithms will not exactly do as the author will have us believe. (To substantiate this at least a little bit: a colleague of mine who is a research software engineer, almost completely refuses to use any Python libraries for statistical analysis on account of her finding them almost all flawed tot lesser or greater extent in mathematical precision.) Many reasons have been forwarded why we as researchers do not engage with code in some peer review fashion: lack of technical skills and knowledge, the impossibility of adding yet another infeasible task to the academic process, lack of resources, misplaced trust in perceived impartiality and mathematical correctness of code, assumption that quality control of code is being fully covered by RSEs (Research Software Engineers). All of these explanations are part of the problem. However, I cannot escape the impression that we mostly hesitate at taking intellectual responsibility because we acknowledge the sheer insurmountable effort involved with organizing and executing code peer review, all the while being rather poignantly aware that we are methodologically falling short. Will code peer review by in a nascent state perpetually, because it presents us with too many inconvenient truths?

ebuswell · February 2024

What would you say the difference is between "peer review" and "open source"? Proponents of the latter often say that the key advantage of open source is that it has been scrutinized and critiqued by many (admittedly largely nonacademic) people, which sounds a lot like peer review. There's also a certain kind of article that goes something like "x library/language/the linux kernel has problems with y," sometimes adding "we implemented y and here are the results." E.g. this article addresses concerns about python library accuracy: https://doi.org/10.1002/widm.1531

This article model seems like it works ok for compensating review labor, but it doesn't do anything for concerns about compensating the coding labor itself. Without that, the code is always going to be minimized in relation to the article—except by people like me (and probably a lot of us) who don't know what's good for them.

StephanieAugust · February 2024

“Bespoke code” is analogous to a short story or other literary work. What makes a literary work interesting? Elegant structure or language? Its message? The metaphor it represents? Parts that can be purloined and repurposed? Scholarly publication venues each have their own mission. Defining the mission of critical code studies in terms and thoughts familiar to a lay person, and perhaps more challenging, to a software engineer accustomed to dealing with and dwelling in concrete absolutes, would open up computational literary analysis in a way that could lead to greater scholarly discourse on the topic. There is a lot of potential here if we can construct a window that reveals the purpose of CCS to outsiders.

anthony_hay · February 2024

Hi @jorisvanzundert. Some unstructured thoughts about defactoring code. I'm a computer programmer, not an academic. Like many people I'm interested in truth and honesty.

You mentioned thoroughly commenting and reorganizing source code to create a narrative around its function. This reminded me of Knuth's Literate Programming:

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reïnforce each other.

Writing code and determining that it is correct can be hard; making it comprehensible adds another layer of difficulty. As a programmer my main concern was in making the code correct and maintainable (understandable) and sometimes fast or small enough. Although style was important to me, because it affects understandability, exposition and excellence of style was not my main concern because I was not attempting to communicate ideas, I was only interested in making something that worked and was useful.

It's becoming more common for scientists to publish their code and data and I hope this will become obligatory. But it seems that often published results are not reproducible. This perhaps supports your call for defactoring in academic publication.

Stefano · February 2024

Probably a necessary step towards this kind of analysis should be something similar to DOI: there should be a direct, permanent URI that allows for retrieving of single chunks of code, without the need of inserting all the code inside the publication, which could be difficult or impossible for many reasons (mainly: dimension, inclusion of libraries, copyright infringement, and so on).
This goal is what the Software Heritage project is going to achieve in some way.

ranjodh · February 2024

Fascinating! I wonder if one model to reach for is a variation of the open data registration that sometimes happens in sciences, where a peer-reviewed publication can sometimes come (depending on the publication venue) with an attachment/appendix that has the 'raw' data. This is not the same thing (and one could argue stuff like the post45 data collective is already taking steps in the raw data direction), but it could be an argument to borrow for code critique. (Would also answer some of the valid questions raised by Nan Da in her influential CI article against CLS.)

dancox · February 2024

We argue that bespoke code (i.e. purpose built analytical code intended for single or infrequent use) should be considered a fundamental part of scholarly output.

I absolutely agree! Much of my own contributions to multiple communities exists solely as code or as part of extended documentation. An ongoing problem I have as an early-career scholar is making this work visible in different ways. Like you mention, @jorisvanzundert, this is problem for many within and on the edge of DH projects and publications

However, all our quality control processes (peer review, metrics, academic crediting, institutional evaluation – flawed as they may be by themselves) are almost exclusively aimed at the final outcome of research: the publication.

I like the issue @anthony_hay raises of a lack of reproducibility; the issue for many programmers, and I include myself in this, is often of "making something that worked and was useful." However, at the same time, much of the research and consulting work I do is on long-term projects. Paired with the lack of reproducibility is a sister issue of maintainability.

Once the publication is out, many people do not continue to maintain their code or work to fix any security issues. Which, on the one hand, I understand: I spend several hours a week just keeping up with new versions of the dependencies of a handful of software projects and what changes I need to make to my own code or testing processes as a result. It's a great deal of labor. That written, many projects exist as a proof-of-concept or a hobby exploration of an idea. Within the interactive digital storytelling field, for example, there are hundreds of new tools created to tell stories with very, very few surviving beyond a couple years. (For those interested in this problem, I have a chapter on it.)

I like the idea of "bespoke" code, but I worry, as someone who studies old code and software projects, of the ongoing issues highlighted as a companion to this also mentioned for DH projects:

...lack of technical skills and knowledge, the impossibility of adding yet another infeasible task to the academic process, lack of resources, misplaced trust in perceived impartiality and mathematical correctness of code, assumption that quality control of code is being fully covered by RSEs (Research Software Engineers).

On the topic of code review, defactoring presents an intriguing promise of helping to explain the code for a particular problem. A central worry of mine, and a topic I often stress to my programmer students, is in the creation of testing frameworks. This is adding more labor to programmers, it absolutely is. However, something I remind my students is how testing frameworks can serve as a demonstration of working code in a better way than a link to a GitHub project can. Showing each part working through a series of tests can help highlight the reasoning behind the code as well as documentation claiming the code can do things it might not in practice or in certain contexts.

In an ideal world, the bespoke code could be defactored and include its tests and documentation, demonstrating not only that it works but clearly explaining why and how to a potential lay audience. It's asking for more labor, but as many people have commented before me, being able to point at sections or clearly articulate reasoning behind code chunks is increasingly vital for bringing attention to code as scholarly output able to be recognized by institutions and peers as major contributions to projects and for possible promotion.

jeremydouglass · February 2024

@jorisvanzundert I'm thinking here about you and Matthew Burton's imminently forthcoming article and the way that it pushes at the boundaries of what is possible in current critical code studies publishing about code (e.g. sharing a large notebook environment through an XML-based scholarly publication). The available forms for code distribution are shaped in part by institutions, and to reshape the practices we need to incrementally change the institutions.

Part of the issue for code peer review relates to portable reproducibility, and part of reproducibility relates to a software and hardware stack (to use the Platform Studies term, and gesture at Benjamin Bratton's larger concept). So, we can distribute the code or post it to GitHub and license it as open source, and we know that code runs, but we may also know that the code only runs on one specific machine on Earth. That is, our research code may only run on a 2014 macOS version tied to specific scarce old hardware, and/or only runs on one such particular machine with a particularly slurry of specifically versioned external libraries and tools whose installation process was never specified and is now lost to memory, and/or only runs depending on the presence of specific data which we cannot easily redistribute alongside the code in total due to copyright law or IRB restrictions or a cursed monkey paw, and/or et cetera.

I don't mean that we should not normalize the distribution of code--I absolutely agree that we should normalize it, even before we solve further problems or even if we could never solve further problems. I'm just noting that the next steps of making that distributed code part of scholarly rigor--like getting others to examine the code in review and perhaps even having others run it for reproducibility--is sometimes a huge leap from the reality of internal Rube Goldberg research code. This then is one concrete distinction regarding @ebuswell's question above about the "difference is between 'peer review' and 'open source'": code may be posted publicly yet only ever executable internally, and the inability to execute can put hard limits on what we mean by "review". Emulation, virtual machines and containers have given us one way of tackling such reproducibility problems, but that increases the size of the object to be archived and peer-reviewed: it is never the code, always the code-in-a-stack.

Steve.Klabnik · February 2024

A further problem here is that reproducibility is not trivial, even for professional software developers. Many languages and libraries do not make it easy to exactly reproduce a particular build of software, let alone its output. I both understand the clear crisis here, as the bar I'm speaking of is even higher, but also can't entirely blame folks when things being truly reproducible is not always trivial.

That being said, inclusion of source code and a collection of the necessary data feels like the bare minimum.

jorisvanzundert · March 2024

@ebuswell Thx for that thought. And apologies for returning late, I was at a conference, and well.. conferences keep one busy :-) We treat this matter a little more in depth in the article, but in essence open source is often a good guarantee for the technical soundness of code. If, and only if I would add, the code is used by many. In the case of bespoke code for a specific analytical task that is performed only once, the code is arguably only seen by the coder and/or researcher creating it. In the case of "Pace of Change" I doubt that besides Ted, Jordan, Matt and me anybody has looked at the code. Thus in such cases a peer review process for code could well be useful. But there is another reason why I would suggest that code peer review could be useful, maybe even essential. Where open source and test first development will often guarantee that code functions correctly in a technical sense, it does not (usually in any case) evaluate code in a methodological sense. We wanted to test drive code peer review as a means to evaluate if what Ted wrote in the eventual article methodologically matched with what we found in the code (it did btw, kudos to Ted and Jordan). What we think is pivotal here, is that there is no sort of "hard link" between the methodology that is expressed in the code and the methodology that is expressed in the writing. Which convinced us that code peer review might have to be part of a scientific quality assurance process.

jorisvanzundert · March 2024

@StephanieAugust: Absolutely! I personally think that the strong boundary between (creative) writing and programming has been seriously overplayed by IT industry. I am going to be over generalizing in what follows, but for the sake of argument… I tend to think professional software development as an (industry) practice has all but colonized code as a language of expression. According to the myths of “rigorous programming” the only decent code is a type of hyper functional super modularized unit tested highly scalable software. But why would we accept only that form of coding as valid and allowable? Code that intentionally utilizes absurd functions or sacrifices speed and performance in the interest of an interesting style or affordance can represent great value, and can be highly creative expressions (see: code work).

jorisvanzundert · March 2024

@anthony_hay Great comment! And yes, actually Donald Knuth's thoughts on literate programming were a great inspiration for our thinking during the research. I certainly appreciate the tension between readability, maintainability, performance, scalability, and correctness of code. I regularly write custom code for my own research, and I often have witnessed that upping one of those aspects will have detrimental effects on one of the others. It is a hard job indeed to find the sweet spot for all those aspects. I can only guess how big a part of the reproducibility crisis should be attributed to bad code, but I tend to think it is much bigger than we dare imagine. Hence my advocacy for code peer review indeed.

jorisvanzundert · March 2024

@Stefano A great suggestion. Part of improving code consistency and reproducibility is indeed to make it ever easier to create reusable parts of code. Having constraints like you suggest (in the form of URIs so that we know exactly what code is used) and others that might guard soundness of code, could certainly be useful. On the other hand I tend to think that the affordances of rearrangement and composition of such “code legos” would reintroduce a lot of potential for erroneous code. I guess we need some sort of a combination of quality assured code objects and processes?

jorisvanzundert · March 2024

@ranjodh I would be in favor of having to publish code through open code repositories as a requirement for publication, most certainly. Ideally researchers would also provide a virtualized environment (e.g via Docker) to show that the code actually produces what is suggested. Of course, like sensitive data there will also be sensitive code, so maybe not everything can be open for good ethical reasons, but we can strive in any case. The technology to support this is readily available, so much is certain. Obviously this would also add to the workload and responsibilities of researchers and programmers, but that cannot imho be an excuse to relegate publishing code and warranting reproducibility to some afterthought.

jorisvanzundert · March 2024

@dancox Your situation is so tremendously familiar to me! I am currently not deeply embedded in research software engineering, but yes: I have witnessed all that you report as well. You point to one of the nastiest core problems in mitigating our suboptimal processes. There are good strategies, methods, and tools to improve security, maintainability, trustworthiness, etc. But implementing all these strategies in many cases will triple maybe even quadruple work load. Figuring out how we build an ecosystem, an academic infrastructure, and code publication processes that balance all the quality assurance aspects with time and resource feasibility, is a seriously daunting and complicated task.

From experience in any case I can hand you one tip (although I am pretty sure from what you write that you will have experienced this yourself): test-first development is worth the effort. It takes programmers an initial investment that they usually don't like, but once they for the first time catch a critical defect before instead of after a new version is released, they will never go back. I firmly believe that test-first is one of the easiest “fixes” we have to improve the trustworthiness of our methods.

Lastly, yes the academic credit system is still very much in of favor of article publication, much to the disadvantage of those whose indispensable contributions are of any other form. Very slowly though, very slowly indeed, there seems to be some movement to remedy this. At least in my (Dutch) context. But we need to keep reminding academic management how important it is that we get this system that is now rigged against so much important work in a fair state.

jorisvanzundert · March 2024

@jeremydouglass Agree, especially regarding the Rube Goldberg code bases

jorisvanzundert · March 2024

@all: I apologize that my answers and reactions have been rather more asynchronous than intended. Possibly quite naively I assumed that I would be able to keep an eye on the blog while attending an intense conference. I do hope my reactions are of some use to you in any case. Thanks to all of you big time for what is a very worthwhile exchange of thoughts and ideas most certainly from my perspective!

annatito · March 2024

We have talked a lot about research reproduceability here which is important, but we haven't really touched on research recoverability. In creative, and non-creative, research code seeing how the code came to be in the state that it is at the point of research release is important. It provides a layer of context to the research i.e. the challenges faced, particularly technical ones e.g. floating point accuracy inconsistencies between libraries or platforms, and the concessions made to work with them. This is where using things like source control as part of your research methodology becomes critical as it becomes a 'time of change' code documentation system allowing for future researchers to follow and recover the process that went into the development of the research code. For me source control as a research recoverability tool has been grossly underestimated, Rilla Khaled, Jonathan Lessard and Pippin Barr in their paper Documenting Trajectories in Design Space: a Methodology for Applied Game Design Research have discussed using source control as a tool for recoverability in creative research but their proposed solution does have its limitations, particularly in the regularity of check ins, but it really develops the way we can use of source control as part of the methodology of documenting research software development.

jorisvanzundert · March 2024

@annatito That is a very good point there, about recoverability. It is an important aspect, which Matt and I have mused a tiny bit about in the imminent article, but not much and not in great depth. For us it was not the code check ins that drew our attention, but what happens to recoverability after all check ins have been made, and the code is released. In bespoke research code we saw that there is a kind of evolution of description of the code, or rather maybe a devolution. Where the code itself and the traces of its development through repository insights are probably the closest possible description of methodology for the analytics, we see that each following description loses concreteness and becomes more abstract and remote. In the case of Pace of Change there was a blogpost, a pre-publication, and a final article. We found that each in turn lost a lot of detail about the concrete operations of the software. Not because Underwood and Sellers wanted to increasingly provide more abstract description, but because the journal editors wanted them to be sparse with technical detail, surmising it might just distract or confuse the intended audience. What this does for recoverability and understanding the code, I shall leave as an exercise for the reader, I guess.

A point I'd like to add about check in history as a tool for code recoverability is that this will have its limitations indeed. Although code check ins form at least a concrete and quite often lasting trace, I would argue that commit messages constitute a literary genre by themselves, with as many styles and a poetics as there are programmers. A similar observation holds for the motivations of check ins, I think. I suspect that more than one check in history will therefore suggest a development path quite different actually to what the code in reality went through. Research into recoverability will necessarily, I guess, always also involve studying issues trackers painstakingly, as well as post hoc interviews to get a good sense of what really went on during development. But yes, at least we should appreciate most certainly the trace check ins constitute. Just critically, as we do everything :-)

Howdy, Stranger!

Categories

In this Discussion

Defactoring Code as a Critical Methodology

Abstract

Questions

Comments