Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

2026 Participants: Martin Bartelmus * David M. Berry * Alan Blackwell * Gregory Bringman * David Cao * Claire Carroll * Sean Cho Ayres * Hunmin Choi * Jongchan Choi * Lyr Colin * Dan Cox * Christina Cuneo * Orla Delaney * Adrian Demleitner * Pierre Depaz * Mehulkumar Desai * Ranjodh Singh Dhaliwal * Koundinya Dhulipalla * Kevin Driscoll * Iain Emsley * Michael Falk * Leonardo Flores * Jordan Freitas * Aide Violeta Fuentes Barron * Erika Fülöp * Tiffany Fung * Sarah Groff Hennigh-Palermo * Gregor Große-Bölting * Dennis Jerz * Joey Jones * Titaÿna Kauffmann * Haley Kinsler * Todd Millstein * Charu Maithani * Judy Malloy * Eon Meridian * Luis Navarro * Collier Nogues * Stefano Penge * Marta Perez-Campos * Arpita Rathod * Abby Rinaldi * Ari Schlesinger * Carly Schnitzler * Arthur Schwarz * Haerin Shin * Jongbeen Song * Harlin/Hayley Steele * Daniel Temkin * Zach Whalen * Zijian Xia * Waliya Yohanna * Zachary Mann
CCSWG 2026 is coordinated by Lyr Colin-Pacheco (USC), Jeremy Douglass (UCSB), and Mark C. Marino (USC). Sponsored by the Humanities and Critical Code Studies Lab (USC), the Transcriptions Lab (UCSB), and the Digital Arts and Humanities Commons (UCSB).

[Code Critique] getCrimeaStatusCookie, Yandex and very large codebases

Yandex Code Critique


Title: Yandex Maps
Author/s: Yandex Corporation
Language/s: JavaScript
Year/s of development: 2021
Software/hardware requirements (if applicable): Web


Code

/**
 * Возвращает куку, отвечающую за статус Крыма.
 *
 * @see https://st.yandex-team.ru/MAPSUI-720
 */
function getCrimeaStatusCookie(cookies: Record<string, string>): string | undefined {
    if (!cookies.yp) {
        return;
    }
    const values = yandexYCookie.parseYpCookie(cookies.yp);
    return values.cr && values.cr.value;
}

Context

In 2023, the source code of Yandex, the equivalent of Google in the Russophone internet, was leaked. Given the ties of the Yandex engineers with their western counterparts, and the ties of the Yandex management with the government of the Russian Federation, this is quite a unique corpus, as it inscribes both corporate and governmental power. It is also an incredible challenge to make sense of it.

I attempted to do that in a paper that was recently published createPoliticsResponse: the political computation of state borders in Yandex maps (edited by @orladelaney9 and @davidmberry). Most of the article is focused on tracing how Yandex Maps decides which borders to show, to whom, and under which conditions. In this sense, it is a material testimony of what is already asssumed, but hard to prove at the interface level, and in thus case it really shows the specific contribution of CCS to platform studies.

One code snippet that I looked at in the paper was the function above, from the frontend part of Yandex.Maps, which seems to extract a value about Crimea's status from a client cookie. So on one side, it is quite obvious that a Kremlin-linked Yandex wants to treat one of the most contested geopolical areas of Europe as an edge case. But on the other side I have found it particularly hard to show how exactly this is treated, and as which kind of edge case. This is a big limitation of the critically studying this function: it only allows us to study the reading of a value, and not its writing, hence telling only half of the story.

One reason for this is that the size of the Yandex codebase is orders of magnitude more vast than the usual snippets that constitute most of the corpus of CCS: the whole leaked codebase clocks in at upwards of 44Gb, and the maps module at more than 4Gb; both are mostly composed of plaintext files (see the resources section for a repo containing the maps section of Yandex's source code). This shift in quantity seems to me to be a shift in quality, and asks new questions for methods of CCS, some of which I've sketched out below.


Questions

  • Quite an uncritical start, but where is the value of the crimeaStatus cookie field set? How do we handle variable names changing as they get passed as arguments/references/assignments?

  • How does one go about reading 44Gb of code? The default means of search in textual software (matching patterns of characters) is heavily biased towards a syntactic approach, rather than a semantic approach. Could tools that focus on the structure of code (e.g. class relationships, function definitions and references, data structuring, argument passing) rather than on the surface of the code? If we do CCS on large corpora with only tools that enable such lexical analysis, rather than tools that do static structural analysis, what are we missing?

  • Is it enough to focus on the name of a function (e.g. getCrimeaStatusCookie) as an argument to critique the relationship between a private corporation and the (imperial) policy of a nation-state, without knowing exactly what the function does? In other words, what is the relationship between lexical choices and semantic choices as epistemic building blocks in a critical code study? Is there a critique of the data structuring that is independent of how data structures are called?

  • Thinking of structure, how much can/shold CCS draw on existing CS entities and denominations? I'm thinking here of design patterns, best practices, testing strategies, application architectures or language features? Specifically here, what kind of parts a CCS grammar could something like middlewares or localizations be (getCrimeaStatusCookie being both of these)?

  • The nature of leaked code seems to always imply a lack. In this case, there is missing documentation, specification, as well as all the ML components of Yandex. So how does one investigate incomplete code? How does one account for the part that is lacking, and how can one make extrapolations about it? What kind of forensic is this?


Resources

Sign In or Register to comment.