Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

2024 Participants: Hannah Ackermans * Sara Alsherif * Leonardo Aranda * Brian Arechiga * Jonathan Armoza * Stephanie E. August * Martin Bartelmus * Patsy Baudoin * Liat Berdugo * David Berry * Jason Boyd * Kevin Brock * Evan Buswell * Claire Carroll * John Cayley * Slavica Ceperkovic * Edmond Chang * Sarah Ciston * Lyr Colin * Daniel Cox * Christina Cuneo * Orla Delaney * Pierre Depaz * Ranjodh Singh Dhaliwal * Koundinya Dhulipalla * Samuel DiBella * Craig Dietrich * Quinn Dombrowski * Kevin Driscoll * Lai-Tze Fan * Max Feinstein * Meredith Finkelstein * Leonardo Flores * Cyril Focht * Gwen Foo * Federica Frabetti * Jordan Freitas * Erika FülöP * Sam Goree * Gulsen Guler * Anthony Hay * SHAWNÉ MICHAELAIN HOLLOWAY * Brendan Howell * Minh Hua * Amira Jarmakani * Dennis Jerz * Joey Jones * Ted Kafala * Titaÿna Kauffmann-Will * Darius Kazemi * andrea kim * Joey King * Ryan Leach * cynthia li * Judy Malloy * Zachary Mann * Marian Mazzone * Chris McGuinness * Yasemin Melek * Pablo Miranda Carranza * Jarah Moesch * Matt Nish-Lapidus * Yoehan Oh * Steven Oscherwitz * Stefano Penge * Marta Pérez-Campos * Jan-Christian Petersen * gripp prime * Rita Raley * Nicholas Raphael * Arpita Rathod * Amit Ray * Thorsten Ries * Abby Rinaldi * Mark Sample * Valérie Schafer * Carly Schnitzler * Arthur Schwarz * Lyle Skains * Rory Solomon * Winnie Soon * Harlin/Hayley Steele * Marylyn Tan * Daniel Temkin * Murielle Sandra Tiako Djomatchoua * Anna Tito * Introna Tommie * Fereshteh Toosi * Paige Treebridge * Lee Tusman * Joris J.van Zundert * Annette Vee * Dan Verständig * Yohanna Waliya * Shu Wan * Peggy WEIL * Jacque Wernimont * Katherine Yang * Zach Whalen * Elea Zhong * TengChao Zhou
CCSWG 2024 is coordinated by Lyr Colin (USC), Andrea Kim (USC), Elea Zhong (USC), Zachary Mann (USC), Jeremy Douglass (UCSB), and Mark C. Marino (USC) . Sponsored by the Humanities and Critical Code Studies Lab (USC), and the Digital Arts and Humanities Commons (UCSB).

Week 2: Indigenous Programming (Main thread)

Indigenous Programming

by Jon Corbett, Outi Laiti, Jason Edward Lewis, Daniel Temkin

Our group will discuss Indigenous Programming; that is, code and programming languages built on spoken and written Indigenous languages.

With the wider support for Unicode, there has been an uptick in languages that challenge English's dominance. The most widely remarked-upon in the academic community is قلب by Ramsey Nasser, an Arabic LISP (also discussed in the 2014 working group). It showed not just that such a language is possible, but through the difficulties in making it functional, illustrated the Western biases of tools used by programmers. Another example is Yorlang, a Yoruba-based programming language by Karounwi Anu, a Nigerian developer. He uses English with his clients, but Yoruba with his friends, and wanted to be able to write code in the language they consider their own.

Outi Laiti's thesis on Ethnoprogramming gives insight into these works. She shows how programming's hardware and software levels rest on a terminology level, which itself rests on a cultural level. People who are excluded from the cultural level may find the other levels more challenging and be discouraged from programming. Alternately, they may be forced to embrace an English-based programming language, thus reinforcing a colonial language’s dominance locally. As Laiti says, "If we want to see a new generation of computer programmers who blur the borders of language, gender and culture, the ethnic side of computing needs to be researched and discussed."

However, what sets apart indigenous languages specifically can be difficult to define. Even the term Indigenous itself is debated. In her thesis, Laiti remarks on how the term has been criticized in defining people by their connection to colonial history. However, she provides an outline that might be helpful: "Indigenous people have their own language. They have a small population inside a dominant culture of the country. They still practice their cultural traditions and at last, some of them live in a territory that is, or used to be, theirs. They identify themselves as Indigenous people." Part of the issue arises from Western concepts of Indigeneity as analogous, that regardless of geographic locale being Indigenous merely means non-European/Western. But to say, for example, that Indigenous people in Papua New Guinea share the same or similar political and cultural concerns of the Inuit people would be false. However, there is a commonality that could be seen as a Pan-Indigenous lens and that is the socio-political and cultural perspectives of Indigenous people collectively contest the power of assimilationist nation-states and strongly self-advocate for community sovereignty and autonomy.

Beyond the question of bringing these languages into the text of code are how Indigenous culture might shape programming language design at a deeper level. Jon Corbett’s Cree# language, which we will look at more deeply in a code critique thread, serves as an example. In Cree#, each program begins with smudging, a command that mirrors the ceremonial cleansing practice common to many North American Native people. This practice in life is intended to clear the mind, ears, eyes, mouth, heart, body and spirit. In his digital incorporation of this ceremony, the smudge command clears the buffer, resets the virtual machine to an initial state, and prepares it for a new activity. This connects the computer’s activities to living practices and by extension becomes braided and harmonious with everyday life.

We invite you to join us in exploring these and other issues in Indigenous computation, with these questions in mind:

  • With the advancements in language technologies (unicode, etc), why is 90% of all computing done in English still? Is this changing? Why can’t we write code in our ancestral languages?

  • What is uniquely at stake for Indigenous communities vs. other communities that happen to speak languages other than English?

  • How do we get code to reflect Indigenous thinking beyond just using relevant keywords? How can we bring the cultural logic of the community into the language?

Comments

  • Thank you! It's so great to learn about the critical complexities and generative work emerging from this field of research and practice.

  • @Temkin said:

    With the advancements in language technologies (unicode, etc), why is 90% of all computing done in English still? Is this changing? Why can’t we write code in our ancestral languages?

    This seems to be a relic of English being the expected lingua franca. We write code in English because "everyone" understands it. Where it becomes hairy is that not everyone understands it and, as I just brought up on @joncorbett 's thread, the language we write in influences the way we think. For example, I (and many other bilingual people) can tell you that who I am in English is different than who I am in Spanish. By making everything in English, it seems like stifling creativity as well as the ability to bring your background into your work, as I am in the camp that culture and language directly reflect and encourage each other. I'd love to hear your thoughts on this and whether or not you completely disagree! I want to encourage coding in other languages because it will be beneficial for everyone who uses code, but don't know how to do that. If you have any tips, I'd be grateful for those as well.

  • This is perhaps a poor analogy, but I was thinking about languages in outer space recently. The French director Claire Denis, when she was writing the film High Life (2018), her first English-language film, said "I had a screenplay which was naturally in English, because the story takes place in space and, I don't know why, but for me, people speak English – or Russian or Chinese – but definitely not French in space."

    Coding is not space travel, but at some level they are both about futurity, right? When Denis is asking which languages seem correct in space, she is closing off many languages from the possibility of sharing that future. Of course French, like Russian, English, and Chinese, is an imperial language. Other languages are in a far more precarious position when it comes to futures, whether in a science-fiction film or a Silicon Valley startup. That is what the question of "stakes" calls to my mind.

  • @KalilaShapiro said:

    @Temkin said:

    With the advancements in language technologies (unicode, etc), why is 90% of all computing done in English still? Is this changing? Why can’t we write code in our ancestral languages?

    This seems to be a relic of English being the expected lingua franca. We write code in English because "everyone" understands it.

    This is one way to see this. However, when I studied formal programming, my code was a combination of three languages: the programming language with the reserved English words and libraries, and then Finnish or Northern Sámi in variables or comments. The end-product, the actual code, does not represent any language but a multilingual combination of known words and expressions. If you don't know for example Northern Sámi, in my code you see programming language, English and symbols that could form words but they do not have any meaning to you.

    The purpose of the code defines a lot. If I'm programming just for me, I can choose the language combination. And usually this is the case when kids are learning programming in elementary schools, they are not producing code for the masses. Programming in this kind of framework can be a dialectic process between the programmer and the machinery, where the cultural background of the programmer comes visible.

    And if we think programming as a tool that gets things done. If someone asks me to build a house, does it actually matter which hammer I use? In the house 1.0-version it does not matter. But if someone wants to make updates or improvements to my house 1.0, they probably want to know my tools. They can see that oh, here she used an apple-kind of screwdriver that does not fit to any commonly known standards and where can we find that tool. So the common sense of politeness and expectations (our own or from the big audience) guide our choices. Awareness of our choices helps to recognize these situations. It is not necessary that English is the defult option, we just work that way.

  • edited January 2020

    It would be interesting to consider the ways in which not just the words that are used (GOTO, For, While, If) but also the syntactical language construction contains embedded assumptions about a certain grammatical standard or practice (e.g. using the English constructions of conditions).

    So for example "Hello, World!" in:

    Jelly ( https://github.com/DennisMitchell/jellylanguage/wiki/Tutorial ):

    1. 3a;»“3aė;œ»

    run it

    ><>

    1. | v~" H e l l o , W o r l d ! "
    2. ~o<< ;!!!? l

    run it

    Brainfuck

    1. - - < - < < + [ + [ < + > - - - > - > - > - < < < ] > ] < < - - . < + + + + + + . < < - . . < < . < + . > > . > > . < < < . + + + . > > . > > - . < < < + .

    run it

    Piet

    https://i.stack.imgur.com/TnaHd.png

    Help, WarDoq!

    1. H

    run it

    And then in a number of different languages:

    易语言 - compiler-independent Chinese programming language

    1. 公开 启动类
    2. {
    3. 公开 静态 启动()
    4. {
    5. 控制台.输出("你好,世界!");
    6. }
    7. }

    Linotte - Linotte is a French programming language

    1. BonjourLeMonde:
    2. début
    3. affiche "Bonjour le monde !"

    Rapira - Rapira is a Russian procedural programming language initially used to teach computer programming in USSR schools

    1. ПРОЦ СТАРТ()
    2. ВЫВОД: 'Привет, мир!!!'
    3. КОН ПРОЦ

    Shali Prathmik - Shaili Prathmik is an educational programming language part of the Hindawi Programming System

    1. 10 टिप्पणी Hello world in Hindi BASIC
    2. 20 लिखो ( "नमस्ते दुनिया" )
    3. 30 इति

    Aheui - Aheui is a Korean-based esoteric programming language. While it is largely impractical, it is the first esolang to be designed using the Korean alphabet, Hangul.

    1. 밤밣따빠밣밟따뿌
    2. 빠맣파빨받밤뚜뭏
    3. 돋밬탕빠맣붏두붇
    4. 볻뫃박발뚷투뭏붖
    5. 뫃도뫃희멓뭏뭏붘
    6. 뫃봌토범더벌뿌뚜
    7. 뽑뽀멓멓더벓뻐뚠
    8. 뽀덩벐멓뻐덕더벅

    I also think there is an interesting line of thought to pursue around the distinction between the language of keywords (e.g. English) and the underlying thought model, e.g. instrumental relationships between subject-objects. So is this embedded within the computer language, or does it inherit that relation from the spoken/written language? Perhaps it is an imposition on the symbolic representation regardless of language due to the underlying mechanical organisation of the computer? Hence the language of implementation (e.g. Piet above) is merely a surface manifestation of a deeper way of thinking embedded in the underlying hardware and fundamental operating software?

  • I also think that the problem can have multiple layers of complexity. For instance, here in Colombia, we had the colonizing process of Spain, in which almost all the indigenous groups were lost with his language and traditions. But nowadays, in the technical front, the hegemonic language is English, so we also need to override our Spanish language.
    Maybe this can have a parallel with the concept of intern colonialism, the process in which a country that has been a colony, as our case, reproduces the same colonial patters excluding the most vulnerable groups like the indigenous. Perhaps, in programming languages, we are replicating a similar process, adopting English as our primary language and segregating the rest.
    There is also the perception of time and progress, something that creates new difficulties, for example, when indigenous groups learn how to use software tools for editing videos. In this case, our models of reality are challenged, and the sequential nature of our programming languages may create another great barrier.

  • edited January 2020

    @outilaiti said:
    Awareness of our choices helps to recognize these situations. It is not necessary that English is the defult option, we just work that way.

    Yes! This is a much more eloquent and clear way of stating what I was trying to get across.

    @Zach_Mann I like your analogy :smile:

  • @KalilaShapiro said: We write code in English because "everyone" understands it. Where it becomes hairy is that not everyone understands it and, as I just brought up on @joncorbett 's thread, the language we write in influences the way we think.

    I agree with this, Kalila, specifically where you say "not everyone understands it". I very much wonder what the experience is for someone who is learning to program and does not already possess any working knowledge of English. Recalling my own experiences learning to code many years ago, the coding part was already hard enough on its own and I'm a native English speaker. However, also being bilingual, I know that in LatAm, for example, there are not a lot of developer resources in Spanish and Brazilian Portuguese; they're mostly in English. Meetups and organizations like Platzi are stepping up to fill in the content gap that developers in this region need in their native language (albeit, not an indigenous one).

    I'd love to hear from developers who did not speak English as a first language, and/or who perhaps first worked with English via code. Given how code is English-based, how has that shaped their understanding of coding? What is "missing" in code or not easily addressed by existing code paradigms in any particular programming language? (For example: @joncorbett's Cree # has a smudge function that can be included in the beginning or end of a program to "reset"—it's culturally and programmatically significant.) What else could code do if we extended the mental models of coding beyond those more readily accessible to English speakers?

  • @DavidBerry I am in similar things - ie the instrumental relationship between subject and object. What is the logic of english and how much does that logic/grammar inform boolean logic.
    What logics do other languages possess - this is sort of what @joncorbett is working on. I also remember someone writing a language in tagalog a while back https://github.com/jjuliano/bato

  • @smorillo said:
    Recalling my own experiences learning to code many years ago, the coding part was already hard enough on its own and I'm a native English speaker.

    I grew up as a first-generation Turkish American with a father and uncle as engineers. They both learned technical terms (I'm expanding the conversation here outside of just programming languages) in English while attending college in Turkey. While my dad moved here, my uncle remained in Turkey, yet they both talk about technical terms in English.

    I asked my uncle if there were Turkish words for the things he was mentioning and he said 'Yes, but they are absurd and are not as concise as the English equivalents."

    This remark has stuck with me since then. I wonder if implicit in learning programming in another language perpetuates a hierarchy? It was clear in my uncle's response that he found learning the Turkish equivalents to be not worth his time. Does this mean there is an economic value in learning to program in one way or another?

  • @derya

    Thank you for sharing this! I'm particularly intrigued by this statement:

    'I asked my uncle if there were Turkish words for the things he was mentioning and he said 'Yes, but they are absurd and are not as concise as the English equivalents."'

    Did your uncle specify what about them was absurd? Are the terms direct translations of the English? And do other Turkish engineers prefer to use the English terms or Turkish ones?

    This made me think a lot about how the Icelandic language and how linguists in Iceland are tasked with coming up with Icelandic equivalents for digital terms that are originally in English. In other words, Icelanders try not to adopt the anglicization of tech terms—that way, the language continues to evolve. (See Iceland has the best words for technology and Iceland is inventing a new vocabulary for a high-tech future).

    In response to your question:

    I wonder if implicit in learning programming in another language perpetuates a hierarchy? It was clear in my uncle's response that he found learning the Turkish equivalents to be not worth his time.

    I do believe this is true, though not from any intention or preference of non-native English speakers. With many programming languages in use commercially being derived from English, developers in other countries will likely use these technical terms. I'd be very interested in hearing if there are developers who prefer to use their native language equivalents of technical terms than the English ones, and whether this is a personal preference or a regional/cultural practice (as is the case with Iceland).

  • edited January 2020

    I'm interested in how some of the linguistic aspects of challenges faced by indigenous programming are part of a continuum of problems for non-English programming -- and how indigenous programming may be disproportionately likely to face extreme forms of such problems.

    Right now, Unicode is also the most common infrastructure for making existing code languages expressible through different languages, but this infrastructure very is unevenly distributed. The possibilities for multilingual programming vary both across languages and also within each language, where the specific limitations may be different between strings, variables, classes, functions and methods, output, and so forth. In particular, I'm thinking about three layers in this example: documentation, editor interface, and language design.

    For example, Processing 3 is localized in Arabic, Chinese, Dutch, English, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, and Ukrainian -- but not (for example) in languages like those of the Chakma or Cree peoples. In the languages for which that localization exists, the application menus and basic documentation are available to make the Processing 3 programming software accessible, however this does not mean that the code examples, reference pages, and tutorials are available in translation, even in the most supported non-English languages such as Spanish. There is no technological infrastructure reason that they could not be; it is primarily a question of community and of labor.

    There are however technological limitations in the PDE editor software, and these are different at the level of software interface from the limitations of the coding language implementation. For example, I can set a String actor equal to "Raven" (English) "Cuervo" (Spanish) "掠夺" (Chinese), or "ᑳᐦᑳᑭᐤ" (Cree) and each of these is valid Processing 3 that will compile and run. There is no limitation from the point of view of language design. However, in the case of both Chinese and Cree, the editor software cannot actually display the characters correctly in the interface, even though they can be typed, edited, saved, and run.

    Here I define a String containing Cree unicode text:

    1. String actor = "ᑳᐦᑳᑭᐤ"; // "Raven"

    However it will not display correctly in the PDE editing interface, even though the code is valid and will run.

    A more fundamental kind of limitation is the Java 8-based Processing 3 language, rather than the PDE editor interface. Even if I can assign certain kinds of values in Cree, I still cannot name a variable in Cree, nor can I name a class, method, or function in Cree:

    1. class ᑳᐦᑳᑭᐤ {}

    ERROR: Not expecting symbol 0x1473, which is CANADIAN SYLLABICS KAA.

    However, what if I want to name a class Raven in German, French, or Spanish? No problem -- I must still use the English keyword class, but I can still name all my variables, classes, and methods in whatever language I want... until I need step outside the allowed character set for naming, then I am stuck with the English alphabet:

    1. class Raven {} // good!
    2. class Rabe {} // gut!
    3. class Corbeau {} // bien!
    4. class Cuervo {} // bueno!
    5. class Tree {} // good!
    6. class Baum {} // gut!
    7. class Arbre {} // bien!
    8. class Árbol {} // ... malo!

    ERROR: Not expecting symbol 0xC1, which is LATIN CAPITAL LETTER A WITH ACUTE.

    Note that this is not getting into trying to translate the keywords of the language itself, such as if, for, while, int, void, class et cetera -- or the Java built-ins like String or ArrayList -- or the Processing API like setup(), draw(), line(), rect(). Instead, these are limits that we encounter when we try to engage in naming new variables, functions, classes and objects as freely as an English speaker does, even within the bounds of those pre-existing English designs.

    But that is just one programming language (based on Java 8). From what I have observed on the Processing forums as a moderator, it is actually fairly common for beginner programmers in any of the Processing dialects--whether Java, JavaScript, Python 2, et cetera--to learn to write code in a mix of required English keywords along with 75% other language identifiers and code comments in their primary language of Spanish, German, or French. However, the farther that primary language gets from the expected norm (English) the harder it is. Ukrainian is far and hard. Chinese is farther and harder. Cree would be even farther and even harder.

    By contrast, in languages like Swift or Julia, any unicode identifier can be used for variables (again, setting keywords aside).[1] Similarly in Python 3, valid variable names may contain any unicode language character. This is valid code for raven = "3" in either Julia 0.6 and Python 3:

    1. ᑳᐦᑳᑭᐤ = "ᐃ"
    2. print(ᑳᐦᑳᑭᐤ)

    And yet, the Cree numeral is still only stored as a string. ᑳᐦᑳᑭᐤ = ᐃ with no quotes is an invalid identifier error in either language, as is print( ᐃ + ᐃ ). In Julia, print( "ᐃ" + "ᐃ" ) is a MethodError, while in Python 3 it is valid code to join the strings but does not compute them.

    In fact, Cree# also uses the vertical bar character as a counting stick for numbers such as | (1), || (2), et cetera-- and this character is reserved by many computer languages, not just as a keyword, but as a primitive operator, commonly for "or". Those languages generally consider this to not have semiotic implications because operators (like + - / * | & et cetera) are assumed to be non-alphanumeric across languages. That assumption has design implications: Julia can assign to a variable raven1, but assigning to ᑳᐦᑳᑭᐤ| causes a syntax error because the | character is assumed to be an operator rather than part of a name.

    I've been discussing these kinds of problems in terms of the possibilities of non-English and indigenous programming within existing code languages and frameworks. To me, careful consideration of these challenges provides powerful arguments both for alternatives and for accessibility. On the one hand, the problems are so systemic and extreme that there need to be real alternatives, including autonomous indigenous programming languages with their own authentic paradigms and practices, not beholden to imperial or hegemonic infrastructure. On the other hand, this is no reason to stop continued engagement with making dominant programming culture more inhabitable and more humane for non-English learners and for indigenous peoples. As an example, I would love to see meaningfully complete Cree support added to Processing or to p5.js -- if Cree learners were interested in such a thing, of course! Such a thing might involve work on documentation, interface, code language design flexibility, hosting new sites of encounter, and more. That would not be instead of Cree#, but as a companion and compliment to it, with the potential that each (authentic autonomous code languages and humanely inhabitable polyglot code languages) might serve as a gateway to the other for learners who ultimately need more than a little of both.



    1. That includes emoji. In Julia one can write ?= "ᐃ". However, emoji are culturally delimited as well. If you want to indicate a specific kind of bird in unicode, you can designate a chicken or chick or rooster, dove, duck, eagle, owl, parrot, peacock, penguin, swan, turkey, or a bird (generic) which is variously depicted in different image sets as a bluebird, cardinal, or pidgeon. However you cannot designate a raven, which has not yet been deemed significant enough for inclusion. ↩︎

  • I love the thorough examination Jeremy! Thanks so much - you beat me too a lot of descriptions of some of the challenges I am encountering. Using a unicode platform is first on my list - for exactly all the reasons you list. And you are absolutely correct, my use of the stick "|" as a number glyph has already been problematic resulting in me using a hex or ascii code equivalent to properly escape it in the construction of my interpreter. Which just further highlights some of the forced logics the system applies. Even independent of English, these limitations hinder the amount culturally defined perspectives that can actually be utilized.

    Also in relation to what @smorillo said...

    This made me think a lot about how the Icelandic language and how linguists in Iceland are tasked with coming up with Icelandic equivalents for digital terms that are originally in English.

    This is an initiative I would love to lead in Canada because as was related to me by a Haudenosaunee elder, the reason Canadian First Nation peoples do not have (or haven't created) "proper" words for technological developments as they came into being through the 20th century is primarily because of the Canadian Residential School system, and governmental policies that forced Indigenous people to use English or French in all communications. Especially in the case of residential schools - using your heritage language was punishable with a variety of physical abuses. He said in that presertvation of the self left little room to exercise the natural practice and evolution of these languages in relation to advancements of the technologies. Which results in my obscenely long name for Cree# in Cree - I am passionate about seeing that change.

  • @jeremydouglass, I would love to use emojis! ?= "ᐃ" is definitely in line with Indigenous worldviews, I might add that as a feature!

  • My first encounter with programming was in Finnish – at some after-school computer club thing in suburbs of Helsinki, mid 1980s, we were doing turtle programming in a Logo variant with C=64s. This was in elementary school well before the idea that some other people speak and think in some other languages even had occurred to me. I would never had made any semantic sense of "KÄÄNNY OIKEALLE 90; KYNÄ ALAS; ETEEN 10;" or whatever out of my own language. Would it even have been language to me, or just pressing buttons on a keyboard?

    English as a language for programming is a double tension, since it is also a very long (and I think laudable) outcome of careful and slow development away from artificial mathematical-logical or engineering notations, in a movement towards natural languages (well, a language). If programming was done artificial notation, it would be more "universal" in the sense that it would be more detached from everyone's spoken languages. I wonder if it was easier then to imagine programs as music or movements then?

  • @jeremydouglass , it's an important point that there are different levels of validation of a language as you've described. Just to summarize: Is its alphabet valid in a string? Can it be used in a variable names? Does the IDE support it (here I'm also thinking of right-to-left languages etc)? Is there documentation written in that language?

    In terms of variable names, different languages have taken very different approaches to the question of what's allowed. Gcc only supports non-Western symbols in C variables when using the compiler flag -fextended-identifiers, and even then, it's limited, and excludes Canadian Syllabics, among other character sets.

    C# allows for more symbols, so long as they are of the correct Unicode character category. There's a list of these in section 9.4.2 of the C# spec. Letter-characters are allowed in any position of a variable name, but also combining characters, connecting characters, decimal characters, and formatting characters in any letter after the first.

    So in C#, I should be able to declare a variable with the LATIN CAPITAL LETTER A WITH ACUTE symbol used in the Processing example above. But there are actually two different version of A with acute: one produced by a single Unicode character, 0x00C1, and then the combination of capital A and the acute accent as a combining character, which is 0x0041 followed by 0x0301. It turns out these are both valid, so I can create two variables that look identical in the same program, but actually have different values:

    1. namespace UnicodeTest
    2. {
    3. class Program
    4. {
    5. static void Main(string[] args)
    6. {
    7. int Á = 1;
    8. int Á = 2;
    9. Console.WriteLine($"{Á} {Á}");
    10. }
    11. }
    12. }

    Open this code file in HxD or similar hex editor, and we can see that the two variables have different names and so are in fact distinct variables:

    Run it, and it will print 1 2.

    It seems like this is the type of issue that gcc tries to avoid. But is it really that much of a problem? It's easy to create confusing or misleading code, but whether or not this is actually an issue has been treated very differently in these two languages.

    Even using the Unicode categories can be misleading, btw. The project Stack Trace Art uses a control character meant to combine Korean characters into composites (in the Lo -- Letter, Other category) which actually appears as whitespace visually. This allowed Stack Trace Art's creator to make function names that seem to have whitespace in them, to build ascii art in the stack trace. A clever project made possible by exhaustively testing every possible whitespace option.

  • So much going on in this thread :) .

    I just want to throw in a little analogical historical material. If we look at the history of mathematical notation (which I think someone suggested might be seen as more neutral) there is a mix of direct influence by the dominant scientific languages of the time, along with more abstract concerns. In the Mediterranean world, prior to Descartes, all mathematics was written in the form of natural language—but then maybe that's a stretch. The actual language was heavily formulaic and abbreviated, but it was still basically Latin, Arabic, etc. This abbreviation got more and more terse, but then there was a leap into modern mathematical notation. We can see remnants of this in the modern "+" sign, which comes from the ligature version of Latin "et" ("and"). Descartes symbol for equals was actually also taken from Latin and was a sort of left facing fish, or an infinity sign with the right loop cut in half. This was an abbreviation of the ae ligature in aequilitas. This is the only part of his notation that didn't catch on; the modern "=" sign coming from Robert Recorde, "for nothing can be as equal as parallel lines" (IIRC)—probably not as obvious as he thought in other discourses not so obsessed with parellel lines. Of course, we also use "Arabic" numerals, which actually were mostly invented in India, but are so named because for a while the Arab world was the center of learning from which the Latin world got its scientific knowledge, especially in mathematics. Though also of course for a while the Arab world was reading and writing in Syriac, rather than Arabic. And before that Greek. Etc.

    So mathematical language certainly isn't neutral. In fact, it asserts the scientific dominance of the Latin language. But in addition to this there are other things going on that are a bit further from natural language, like the = sign, or the function notation which does away with the infix notation. Infix notation can be traced right back to abbreviation, in "1 and 2 equal 3" (although there are arguments for it other than traditionalism as well). Functional notation clearly separates function and arguments linguistically in the f(x,y,z) notation, in order to also separate them conceptually and think about function in general as a mathematical concept. Because this latter has become so important in computer science (though perhaps only through bad metaphors), we see a constant tension between allowing for or disallowing the infix notation for + and *, with different languages choosing different things. Do some languages order the "and" relationship differently? Does this make algebraic notation more difficult for those speakers? That'd be interesting.

    Even in so-called "natural language" scientific discourse, we can see a pretty generalized historical pattern where scientific discourse is pulled into the hegemonic language of an empire, but with considerable lag, and where it often ends up written in the language of dead empires. In fact, it might even be the case that scientific discourse has flourished the most when it's been conducted in the languages of dead empires. English is definitely going that way; there's already divergences between the emerging international standard English language and the way native speakers talk. So, again: huge relationship between scientific discourse within a language and the hegemonic domination by the people who speak that language—but an odd relationship.

    I don't know exactly what all this adds up to in terms of programming languages, and indigenous programming. Like, "if" is from English, but does it reflect English language thinking? Historically speaking, is "if" a symbol of colonial domination, or of the internationalization of scientific discourse? (Those could also be the same thing, too, but I think we tend to attach different valences to them.) Is it an advantage to know the English "if" or is it maybe an advantage to have "if" presented without the nontechnical baggage of everyday English? Is "if" a mark of the inability of programming languages to be wholly independent of a "host language" or is it a vestigial remnant in the path towards total symbolization of certain not yet entirely clear "programmatic" way of thinking? I don't know.

    But one thing I think is useful to think of via this history is that if mathematical notation is a product of the attempted domination of the Latin-speaking world, obviously that domination entirely failed, in so far as the language is concerned. The Latin roots are basically forgotten, and as far as use goes, it's become extremely clear that mathematics is an international discipline which for a while now has not been dominated by the diaspora of Latin-speaking countries. So the master's tools can certainly be used to take down the master's house—granting that possibly they shouldn't be so used, and that maybe there are better options. But also, when the master's tools do take down the master's house, they tend to change in the process, even if only changing the context in which they get meaning.

    Which leads to some other questions: what about indigenous programming in vanilla languages like C or Java? I'd be interested to hear about that part, too.

  • edited January 2020

    @ebuswell said:
    Which leads to some other questions: what about indigenous programming in vanilla languages like C or Java? I'd be interested to hear about that part, too.

    I used this Java example in my thesis:

    This was just for testing that Northern Sámi words can be compiled.

    The mathemathics-part is also interesting. I have loved the ethnomathematics studies of Ubiratàn D'Ambrosio. Mathematics is seen as a cultural product and the focus is on mathematical expressions and ideas of those (Indigenous) people who do not share the latin version of mathematics.

  • @Mace.Ojala said:
    My first encounter with programming was in Finnish – at some after-school computer club thing in suburbs of Helsinki, mid 1980s, we were doing turtle programming in a Logo variant with C=64s.

    My first line of code was LOAD "*",8,1, I was four or five. Platform was also C64 in the mid 1980s. I saw programming as abstract symbols, not words. The first time of mixing Finnish and a programming language happened in the early 2000s. I realized that programming can be learned by using a supporting language on the side, as the teaching happened in Finnish and all the comments and variables were in Finnish. And then I started to wonder, if it would be possible to combine Northern Sámi with a programming language and if so, would it help to learn as well. When learning languages (natural or programming), one language supports another. It was interesting to think that maybe a natural language can support learning of a programming language. And I guess the main idea back then was that a natural language can help when you learn programming as long as the natural language is English. In Finland, the Curriculum Reform in 2016 placed programming in the syllabus of grades 1 to 9 and programming was integrated in all subejcts. It is also a basic human right to learn in your native tongue, so it is a problem if programming does not fit in that scenario. Of course it depends the interpretation of the concept of programming in that framework, but anyway, this can be challenged.

  • edited January 2020

    @ebuswell - in terms of this:

    Is it an advantage to know the English "if" or is it maybe an advantage to have "if" presented without the nontechnical baggage of everyday English?

    you might want to look at the for/else here or while/else here constructs in Python.

    I've yet to meet a native English speaker who correctly guesses what the else does (in particular, people with a bit of prior programming experience typically make the same wrong guess for this). I think the construct was initially proposed by Knuth - but the choice of term only really makes sense if you know the underlying code that a construct like while might get compiled down into.

    (There's an analogous else in the - rather overloaded - try statement too: here.)

    On the question of infix versus prefix notation, it might be of interest to note the ability in Haskell to turn infix operators into prefix functions (and vice versa):

    1. # ordinarily
    2. let a = f x y
    3. ...
    4. # can also be
    5. let a = x `f` y
    6. ...
    7. # and alternatively
    8. let b = 1 + 2
    9. ...
    10. # could be
    11. let b = (+) 1 2
    12. ...

    (... and furthermore, there's a pretty strong glyphic tradition in Haskell where conventionally, you might be able to guess that <$> is sort-of related to $ and <> - which can make the shorthand notations selected by those well-versed in the language particularly opaque to newcomers.)

    On the subject of alienness versus naturalness of various syntaxes, I think it's always worth having a look at APL. Not only does it use an invented notation, typed on a specially invented keyboard - many of its conventions (such as right-to-left evaluation) fly in the face of conventional approaches of languages contemporary to it - so it rather diverts from established mathematical convention (at least regarding the relative precedence of arithmetic operations) too. Anyone who's looking to understand what it's like to experience programming in an alien language need look no further :-)

  • I am loving this conversation and join in only to keep the ball in play.

    I was caught by @outilaiti's observation:

    when I studied formal programming, my code was a combination of three languages: the programming language with the reserved English words and libraries, and then Finnish or Northern Sámi in variables or comments. The end-product, the actual code, does not represent any language but a multilingual combination of known words and expressions. If you don't know for example Northern Sámi, in my code you see programming language, English and symbols that could form words but they do not have any meaning to you.

    In rhetoric studies we have a concept called "code meshing," which is gaining popularity as both a means of describing what people already do when we mix languages but also a means of celebrating (rather than correcting) a true polyglossia, to borrow a word form Bakhtin. Unlike another term, code switching, code meshing presents the mixing of languages as a productive amalgam, the formation of a sophisticated mix, rather than a kind of error or binary switching between systems. Now, code seems to be a system that is unforgiving of this pluralism -- except in the very solution that @outilaiti describes. Could code-meshing be a useful concept to understanding the multiplicity of languages in a program?

    But this also made me wonder, perhaps naively, are computer languages deterministic? Does the syntactic structure of English pervade programming languages at a more fundamental level. Friedrich Kittler points out that English suits programming because of its “context-free verbal units,” presumably as opposed to German. We can find evidence of English down to the mnemonics of assembly. But is English baked into to programming paradigms as well? And here I am picking up @DavidBerry's comment:

    there is an interesting line of thought to pursue around the distinction between the language of keywords (e.g. English) and the underlying thought model, e.g. instrumental relationships between subject-objects.

    As we open up to alternatives, such as @joncorbett's Cree#, are we opening up new opportunities for more fundamental programming structures?

  • @ebuswell said:

    Is "if" a mark of the inability of programming languages to be wholly independent of a "host language" or is it a vestigial remnant in the path towards total symbolization of certain not yet entirely clear "programmatic" way of thinking?

    An interesting example of a "symbolized" if/else is the conditional ternary operator. A common form uses ? to indicate the if() and the : to separate the true / false conditions, and this can be found in programming languages including awk, bash, C, C++, C#, Java, JavaScript, Perl, PHP, Python, Ruby, and Swift, among others.

    1. a ? b : c

    The expression can be read "if a then b, else c" -- not unlike if(a) { b } else { c }, but with symbols rather than keywords. This ternary operator is built-in to a lot of languages, but in many learning contexts using ternary form may be considered advanced, obscure, or even bad style. It is usually considered harder than if/then rather than easier or more accessible.

    Also re: "the path towards total symbolization": @jang recommended looking at APL, and I agree, noting that APL was first developed in the 1960s. Instead of built-in keywords, APL uses a rich set of one-character glyphs. One consequence of this design decision is that logical operations in APL are essentially a separate language that one must learn -- although they are systematically biased for a particular prior literacy, that is mathematics. Another consequence is that special keyboards (or soft keyboard interfaces) are required so that the programmer can generate all those special symbols.

    "APL keyboard with glyphs"

    These glyphs include:

    ⋄ ¨ ¯ < ≤ = ≥ > ≠ ∨ ∧ × ÷ ? ⍵ ∊ ⍴ ~ ↑ ↓ ⍳ ○ * ← → ⊢ ⍺ ⌈ ⌊ _ ∇ ∆ ∘ ' ⎕ ⍎ ⍕ ⊂ ⊃ ∩ ∪ ⊥ ⊤ | ⍝ ⍀ ⌿

    An APL program may look like this popular implementation of Conway's Game of Life:

    1. life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}
    2. in 5 5 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0
    3. life in

    The only keywords used here were user defined. However, the scope for definition is still limited: 生活←life gives an INVALID TOKEN error, while str←'生活' lists and prints as str ← '*******'.[1]

    Another issue is that, in order to speak about these symbols, they need names. For example, the ⍝ or U+235D is named "up shoe jot" corresponding to other "shoe" glyphs ⊂ ⊃ ∩ ∪ -- although it is more commonly called "lamp." So there is a symbol that is beyond language -- like the # or // comment markers -- and yet precisely because it is a glyph there becomes a need to put it back into language in a way that comes with a lot of culturally specific assumptions about common objects and their (English) names.


    1. Tested on tryapl.org ↩︎

  • edited February 2020

    I think the English names for common objects stem from later ISO standardisation efforts to unify glyph names.

    As a kid (I got hold of Iverson's APL report through an inter-library loan) the only names I knew for these were the original ones that reflected their purpose. So / might be listed as "slash" in the unicode tables today, but then - if I thought of it by name at all, and having nobody else to talk to about APL, I largely coded in it nonverbally - I'd have called it "reduce", and its partner, "scan".

    So there are at least two contexts here - and that's not counting just writing software without verbalising the sequence of symbols. There's an original context with historical glyphic naming, and a modern context where the names have been updated. APL still sees use; which of these context should be used to interpret its glyphic traditions?

  • @Mace.Ojala said:
    If programming was done artificial notation, it would be more "universal" in the sense that it would be more detached from everyone's spoken languages. I wonder if it was easier then to imagine programs as music or movements then?

    I like this thought experiment. I'll note that before computer science was formalized and just being thought of as a concept back in the 1800s, Ada Lovelace (the mother of programming!!) hypothesized that the code could be used to build symphonies if musical notes were input as numbers.

    In response to @markcmarino:

    @markcmarino said:
    As we open up to alternatives, such as @joncorbett's Cree#, are we opening up new opportunities for more fundamental programming structures?

    I think @joncorbett's Cree# is the perfect example of new coding structures appearing along with new coding languages (in spoken languages other than English). From my understanding, Cree# will mimic the Cree language down to word development leading to a different language basis. If languages inform how we think and speak (which I believe they do, though it is up for debate in the cognitive science community to my knowledge), coding in English leads us to think in an "English" way. It seems that Cree# will encourage visual storytelling in a way that follows the Cree culture, leading coders to code in that style of thinking, etc.

  • @jeremydouglass said:
    For example, Processing 3 is localized in Arabic, Chinese, Dutch, English, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, and Ukrainian -- but not (for example) in languages like those of the Chakma or Cree peoples. In the languages for which that localization exists, the application menus and basic documentation are available to make the Processing 3 programming software accessible, however this does not mean that the code examples, reference pages, and tutorials are available in translation, even in the most supported non-English languages such as Spanish. There is no technological infrastructure reason that they could not be; it is primarily a question of community and of labor.

    (for reference in what comes next, I am responsible for the translation of the Arduino IDE to over 40 languages, plus the full website to Spanish back in 2010 -with over 700 documents)

    The Arduino IDE was first translated to Japanese thanks to the contribution of Shigueru Kanemoto. This contribution included a well described process on how we could render the whole IDE in multiple languages using properties files including translations that could be loaded on the fly. Someone made a Github repo of the instructions I shared with the community back then: https://github.com/juandg/IDE_translations

    While I automated the process of including the different translations of the IDE, I found some challenges like dealing with right-to-left languages that would not render properly in my Linux machine. Also because translations of each string in the language had to be declared as strings with completely different characters to represent e.g. EOL or CR. The support for those extra languages is something that I had to install, somehow breaking part of my distro, making my life a lot more interesting ;-)

    Thus, to me the statement "There is no tecnological infrastructure reasons that they could not be" is not entirely accurate. The developer needs to have a system supporting something as simple as an editor capable of mixing properties files that will be imported by a binay file compiled in the Java language, which is written left-to-right, with strings of text (from the properties file) that are written right-to-left. The fact is that IDEs are compiled by someone and people need to add translations given a framework established by the tech in the hands of the developer. In my computer I mix: Spanish, Swedish, English, and German ... all of them on top of the same keyboard layout. Adding extra languages to the mix, specially some I neither understand, nor can be rendered by my tools is a challenge. So I would like to introduce the perspective of the well-intended developer in the conversation. The developer that tries to integrate multiple translations to help disseminate the culture of building [with] digital tools.

    There are however technological limitations in the PDE editor software, and these are different at the level of software interface from the limitations of the coding language implementation. For example, I can set a String actor equal to "Raven" (English) "Cuervo" (Spanish) "掠夺" (Chinese), or "ᑳᐦᑳᑭᐤ" (Cree) and each of these is valid Processing 3 that will compile and run. There is no limitation from the point of view of language design. However, in the case of both Chinese and Cree, the editor software cannot actually display the characters correctly in the interface, even though they can be typed, edited, saved, and run.

    The fact is that most of these languages are written in an underlying language which is typically C. This raises the issue of needing someone that knows enough English and enough C to write a compiler that will understand a programming language made in a different natural language. And I am not talking about a translation here, but about a language supporting that other language. In order to make this happen, we need some cultural absortion of someone talking and writing that human language. C is built on top of a whole lot of cultural assumptions (as someone mentions in this thread) that come from a certain culture. An indigenous programmer creating an IDE in her own language will have to go through a phase of adoption of the English language that will for sure influence her thinking. IOW, there is no way, given how digital technology works, that a person will get hold of making a compiler without learning English.

    I have been thinking for a while that concepts like the passing of time have influenced the way we understand the world, and therefore have influenced the way we create human language, culture, digital technology, and programming languages. Have you never wondered how a programming language could emerge from a digital technology created by people who speak a tenseless language like e.g. Maya? This idea came to me during a recent visit to Guatemala where I agreed in collaborating with a local researcher in creating a translation of Arduino's reference and IDE to one of the many Mayan variations. If the language doesn't think about time in the same way English or Spanish does it ... how am I suppossed to explain concepts like concurrency by time-sharing the processor?

    If time is not understood in the same way, what sense does it make to think about a device that accurately counts the time since it booted (like any computer does)? How would a tenseless computer look like? Would it work commanded by what we call events and interrupts? How would a processor operate if it was not multiplexing operations in time, if it was fully event operated?

    This is the challenge to me, indigenous programming requires indigenous computers, which BTW might not even count in binary.

    Is there any kind of reference anyone in the list can provide in this direction? So far I have found this text:

    https://www.acsu.buffalo.edu/~jb77/MdG_ECC-Time_04_Bohnemeyer.pdf

    (and this nice Youtube video that summarizes the main aspects of it in just 10m: )

  • @outilaiti said:
    The first time of mixing Finnish and a programming language happened in the early 2000s. I realized that programming can be learned by using a supporting language on the side, as the teaching happened in Finnish and all the comments and variables were in Finnish. [...] It is also a basic human right to learn in your native tongue, so it is a problem if programming does not fit in that scenario.

    I really feel that tension. Beyond computing, on the one hand international standards like ISO 7001 for "public information symbols" attempt to make things like bathrooms legible to many different linguistic communities simultaneously by creating an additional symbolic context. On the other hand, many countries have standards for pervasive multilingual signage in a few supported languages. The "international airport" model of language infrastructure for programming languages might be a combination of these two approaches -- dominant lingua franca(s) plus shared symbols. Still, those chosen few languages and those symbols will never be culturally neutral (there is nothing culturally neutral about a stick figure in a dress signifying a bathroom). The other thing about these approaches is that they cannot provide a rich "supporting language" experience. Even if they are more accessible, for a great number of people they are not local and never will be.

    I wonder to what extent it would matter if basic control-flow operators of a language -- such as if, while, for, break et cetera -- were automatically localizable in all major programming languages and replaced during a pre-processing step, even if only in LTR ASCII. This would not change the underlying English paradigm of the language, but it is shocking that we aren't already there -- it seems like so little to do. Perhaps it is because translating the code text disconnects it from various code paratexts (like reference documentation and examples) and from the peer systems other than compilers that also support a programming language (like syntax highlighters, linters, documentation generators, code folding and various other IDE plugins, et cetera) -- and each of these would then need to support the same localization in parallel. When you change just the vocabulary in a minimal way, you are already pulling at many strands of the web.

  • edited February 2020

    @dcuartielles: Thanks for those mind-expanding diagrams of the Mayan language and how it depicts time!

    I think there might (?) be a small misunderstanding re: my statement:

    "this does not mean that the code examples, reference pages, and tutorials are available in translation, even in the most supported non-English languages such as Spanish. There is no technological infrastructure reason that they could not be"

    By "they" I did not mean "all languages into the IDE", I meant those supporting examples, reference pages, and tutorials, e.g. in Spanish. My point was to highlight the deficiencies even in the easiest scenario, not to argue that hard languages are trivial to support -- you are right, they are not! To unpack my point Spanish, which is "easy" in the sense that it has no RTL or significant unicode challenges, is already localized in Processinig PDE. However, even then, that is not full language support. The code examples (.pde files) and reference pages (.html files) are not all also available in Spanish. There are no technological reasons why those particular file formats could not be provided in Spanish as well, they just aren't -- even in the easiest case. I was trying to point this out as context for the hard cases, when even localizing a single word on an application menu can be a significant challenge, let alone translating hundreds of pages of supporting documentation.

  • This was also something I was grappling with when I was thinking about the meaning of indigenous languages in the context of Malaysia with different communities making claims of indigeneity. I am no expert in most indigenous languages in Malaysia, but the one language that likes to claim indigenous ascendancy is Malay (in the Malaysian, which is different from the Indonesian or Bruneian, context - since all of these countries also use a similar official language). Therefore, much of studies on ethnoscience/ethnomathematics is centered around the Malay, rather than consideration of other older indigenous cultures occupying the same land (although there are some scholars attempting to codify the oral cultures of these other indigenous groups). Therefore, there is little study as of now with regarding to how logic is coded within these indigenous cultures (I am exploring that through a non critical code studies project). There is a lot of colonial, postcolonial and decolonial aspects waiting to be unpacked there, let alone how one would define race in this context.

    That said, given Malay's status as an official language of Malaysia, it has been appropriated and used by other non-native Malay speakers in many other ways (the non-natives are defined as those who did not speak the language from birth or at home, even if they might be steep in that language at school), which gives rise to the develop of very colorful informal lexicon overlaying the formal syntax. The same applies to English as a working language for most, although a segment of Malaysians would still claim English to be their first language. On top of that, Malaysians' ability to speak in a melange of languages at once (sometimes even within the same sentence!) means that it is not uncommon to have English, Malay, and another language (or two!) mashed together in a way that only other Malaysians, or 'naturalized' residents, could understand. For instance, this forum for discussing the Malaysian stockmarket provides ample examples of cross-trolling in English (I have other examples in Malay but that would not be useful here) https://klse.i3investor.com/servlets/stk/5099.jsp. I picked the forum specific to this particular company because of its relevance to the Malaysian political landscape. This goes beyond thinking about Englishes as representing different articulation of Standard English - instead, we get a form of argot. For insiders, the choice of certain expressions give away their ethnic identity.

    This give birth to the dilemma that I am facing in a project on internet trolling in the Malaysian context (that is presently on temporary hiatus as I have to work on something else). As mentioned in my previous posting, I am interested in understanding the Malaysian troll's appropriation of existing platforms to serve their cultural/personal perspectives and am planning out the design of a bot that could do more than merely scrape postings from various social media/digital platforms, but could proceed more dialectically in the process of engaging with the trolls ala Eliza. However, identifying types of trolls by ethnicity requires understanding the linguistic heterogeneity of merely a single ethnic community (a linguistic babel, so to speak), let alone multiple communities, and perhaps, determining a pattern of logic in their expressivity across language. How does one write a code that could differentiate between such forms of expressivity? There are works studying Malaysian English (whatever that means), for instance, but not from an indigenous coding perspective as yet.

    That said, an environment like this is also a great exploratory place for inventive programming languages.

    @alvarotriana said:
    I also think that the problem can have multiple layers of complexity. For instance, here in Colombia, we had the colonizing process of Spain, in which almost all the indigenous groups were lost with his language and traditions. But nowadays, in the technical front, the hegemonic language is English, so we also need to override our Spanish language.
    Maybe this can have a parallel with the concept of intern colonialism, the process in which a country that has been a colony, as our case, reproduces the same colonial patters excluding the most vulnerable groups like the indigenous. Perhaps, in programming languages, we are replicating a similar process, adopting English as our primary language and segregating the rest.
    There is also the perception of time and progress, something that creates new difficulties, for example, when indigenous groups learn how to use software tools for editing videos. In this case, our models of reality are challenged, and the sequential nature of our programming languages may create another great barrier.

  • That is the same issue for Malaysia https://appsmu.ukm.my/etisas/index.php.
    It is also one reason why, although there is a surfeit of programming manuals in Malay, no one has yet to write a compiler or create a program using that language.

    @derya said:

    @smorillo said:
    Recalling my own experiences learning to code many years ago, the coding part was already hard enough on its own and I'm a native English speaker.

    I grew up as a first-generation Turkish American with a father and uncle as engineers. They both learned technical terms (I'm expanding the conversation here outside of just programming languages) in English while attending college in Turkey. While my dad moved here, my uncle remained in Turkey, yet they both talk about technical terms in English.

    I asked my uncle if there were Turkish words for the things he was mentioning and he said 'Yes, but they are absurd and are not as concise as the English equivalents."

    This remark has stuck with me since then. I wonder if implicit in learning programming in another language perpetuates a hierarchy? It was clear in my uncle's response that he found learning the Turkish equivalents to be not worth his time. Does this mean there is an economic value in learning to program in one way or another?

  • @jeremydouglass said:
    I wonder to what extent it would matter if basic control-flow operators of a language -- such as if, while, for, break et cetera -- were automatically localizable in all major programming languages and replaced during a pre-processing step, even if only in LTR ASCII.

    I like this idea, and while most languages have fewer than a hundred keywords, there are also the standard libraries, which are essentially part of the language; the functions and extensions used in almost every program (allowing for i/o, interaction with other systems, etc). If we can write code that compiles using Malay characters, but then have to use the word "print" in English to interact with the user, that's a problem. Also, these libraries are more likely to change in new releases of the language.

    But I could see that changing if there were demand for it. Microsoft ensures that each new version of Excel is usable in a host of different languages; they could do the same with new versions of C# if this were what it took to stay competitive.

    @jeremydouglass said:
    Also re: "the path towards total symbolization": @jang recommended looking at APL, and I agree, noting that APL was first developed in the 1960s. Instead of built-in keywords, APL uses a rich set of one-character glyphs.

    This is an easier way to do it, and in our new theoretical all-symbol language, we could replace the Greek letters with, say, box-drawing characters.

  • edited February 2020

    @jeremydouglass said:
    The code examples (.pde files) and reference pages (.html files) are not all also available in Spanish. There are no technological reasons why those particular file formats could not be provided in Spanish as well, they just aren't -- even in the easiest case. I was trying to point this out as context for the hard cases, when even localizing a single word on an application menu can be a significant challenge, let alone translating hundreds of pages of supporting documentation.

    This is much more clear now, thanks :-)

    I also made the experiment of translating the whole Arduino wiki to multiple languages through community collaboration, being Spanish the first language. It took a bit more than a week to get the 700 documents the website had back then into Spanish. The issue is that the technology, the way it was made back then, was not suited for multi-dimensional localization (many collaborators, many languages, many directions of corrections). Therefore the translations aged pretty quick and it was hard to trace changes.

    Also I made an educational programme in Spanish and English that reached over 2000 schools in several countries (and later was translated to Italian, German, Polish ...). Once you go into making courseware, contemporary translation tools are in no way supportive. They are ready to support unidirectionality (e.g. English to other languages) but nothing else. Translating the comments in the examples was one of the most problematic issues, exactly as you mention.

    If anyone wanted to make something good for the community, building an understanding of these three dimensions of localization is key, in my opinion: collaboration, languages, directionality.

Sign In or Register to comment.