Published

Where are the non-English programming languages? Thoughts prompted by a small but mighty bug

A short two-parter

Part 1: A small-but-mighty bug

I’m doing a few coding-for-designers workshops with the students in the MA Graphic Media Design programme at the LCC, the first one was this past Thursday.

We’re focusing on web stuff since that’s what they expressed the most interest in and it’s in my wheelhouse. We started with a broad and brief overview of code, about how we use human-readable programming languages to communicate with computers. But 99% of the time, when we say “human-readable” we really mean “English-based”. More on this later.

After my short intro, we put together a simple webpage with HTML and CSS. A few of the students ran in to a small-but-mighty bug that I’ve never encountered before.

We were working on a basic CSS rule set, something like:

body {
  background-color: linear-gradient(blue, pink);
}

And for one of the students, it just wasn’t working. I checked it a bunch of times, it was all typed perfectly. No missing spaces or characters, spelling was fine. VSCode indicated that the background-color declaration wasn’t finished, which seemed weird. I looked really closely at it and noticed that the semicolon seemed a little thinner than the others in the file.

Turns out that the student had typed a full-width semicolon (U+FF1B) instead of a semicolon (U+003B). The full-width semicolon is used in Chinese to “demarcate parallel structures in a paragraph”. Another student ran in to the same problem a few minutes later, using a full-width left curly bracket (U+FF5B) instead of a left curly bracket (U+007B).

I had asked them to type in a semicolon, to type in a curly bracket, and so they typed the characters the way they normally do in their native languages. Super understandable.

If you were just starting to learn what code is and how it works, I can’t imagine how hard it would be to debug this sort of language-based problem on your own.

Part 2: Where are the non-English programming languages?

Gretchen McCulloch wrote a very worthwhile Wired article titled “Coding Is for Everyone—as Long as You Speak English”. I feel like every programmer / dev should read it.

Programming doesn’t have to be English-centric. As McCulloch puts it:

The computer doesn’t care. The computer is already running an invisible program (a compiler) to translate your IF or <body> into the 1s and 0s that it functions in, and it would function just as effectively if we used a potato emoji 🥔 to stand for IF and the obscure 15th century Cyrillic symbol multiocular O ꙮ to stand for <body>.

How would we go about implementing non-English HTML tags? W3C has an FAQ on the topic where they state that “HTML or XHTML tags are all pre-defined (in English) and must remain that way if they are to be correctly recognized by user agents (eg. browsers).

Since browsers just implement the standards set by W3C (I’m pretty sure that’s right?), I’m guessing that W3C would have to approve it if we wanted native non-English HTML support. It seems like it would be a bit of a mountain to climb but if we take something like Wikipedia’s translation efforts as an example, surely there are tons of people out there that would help with translation?

Gonna keep an eye on this. Need to find a book or good article on the history of ALGOL.