Low-resource languages: what they are and why they matter

Every so often, I like to talk about low-resource languages. They’re a matter of particular importance in today’s tech world and, for me, the topic is especially important in terms of expanding access to vital information throughout the world. So today I want to discuss a little bit about what low-resource languages are, why they matter, and illustrate an example using Harry Potter.

Google Translate, artificial intelligence, and how they work

Google Translate and other machine translators have come such a long way. You’ve especially seen this if you remember how awful it used to be: translating literally word-for-word in poorly constructed sentences and often misinterpreting the intended meaning. Nowadays, Google Translate usually gets it right.

It’s a powerful tool that’s too easily taken for granted. If you need to find information that’s only available in German, you can just copy and paste the German text into Google Translate and get the information in English. It’s a great resource for people around the world to access all sorts of important information, from the latest medical studies of a certain ailment to how to fix a car problem under the hood of your foreign-built vehicle.

At least, that’s the case if you speak one of just a couple dozen languages. The processes en vogue today for natural language processing (NLP; the type of artificial intelligence used in machine translation) rely on ginormous corpora of written language. They inform the translation by serving as models of proper language use, as written by native human speakers. It’s a brilliant concept that works very well.

Low-resource languages and their speakers are left out

But the vast majority of the world’s thousands of languages do not have ginormous corpora of written language. Some languages don’t even have a writing system, and most of those that do have a few hundred books’ worth of material in them at most. (For perspective: the number of books that have been written in English or German is in the tens of millions.) Those languages with relatively few written language samples are called “low-resource languages” because they don’t have enough resources to serve as models for natural language processing.

Speakers of these low-resource languages generally don’t have ready access to information on the latest medical advice or computing mechanics. They have to learn a second language to get much out of the internet. What’s more, because so little literature exists in these languages in the first place, they often don’t have the discourse available to talk about basic modern medicine or finance. Most languages don’t even have the sufficient vocabulary to talk about those topics.

Building meaning: Harry Potter as example

To understand this issue a little bit more, let’s take a look at this excerpt from the 7th book in the Harry Potter series, which I selected simply by opening the book to a random page. The underlined words exist as part of a discourse introduced to you by the Harry Potter series.

 “I say we find a quiet place to Disapparate,” Hermione whispered.

“Can you do that talking Patronus thing, then?” asked Ron.

““I’ve been practicing and I think so,” said Hermione.

Two Death Eaters drew their wands. The force of their spells shattered the tiled wall where Ron’s head had just been, as Harry, still invisible, yelled, “Stupefy!”

Harry Potter and the Deathly Hallows

This passage is meaningless to anyone who hasn’t read the whole series. It only makes sense after reading the previous six novels.

Now let’s stop and think about this. That’s a tremendous amount of information you had to consume to really appreciate the meaning of this one passage! It didn’t require any study or hard work on your part, and it was probably a lot of fun along the way.

We first learn about Disapparating in the Chamber of Secrets, when Ron explains why his parents don’t need the flying car. Harry learns to cast a Patronus in the Prisoner of Azkaban to protect himself from Dementors. Death Eaters are also mentioned in the Prisoner of Azkaban, the same novel where we start to learn Harry’s connection to a whole movement of wizards that worked to defeat them. We dive deeper into our understanding of the Patronus charm in the Order of the Phoenix, where we find it can be manipulated by well-practiced wizards and witches into a means of communication. And although simple, the Stupefy spell is introduced only in the Goblet of Fire, where the seriousness of the Dark Arts threat begins to cast its shadow.

That’s a ton of information to swallow all at once. Imagine reading Deathly Hallows before the rest of the series and having to read an article from the Harry Potter Lexicon every time a new word or character pops up.

Using Harry Potter to make low-resource languages more resourceful

But the fact that a children’s novel is able to build a discourse around a fantasy world—a set of abstract ideas that don’t even exist—is a testament to the fact that we can certainly articulate new ideas in any language. The key is building up to it, rather than introducing it all at once.

There are two ways you can build up knowledge. One is through oral education. That tends to be labor-intensive and costly, and it’s not so easy to go back and review what was said. To be fair, the accessibility of audio and video recordings has at least alleviated much of those concerns in the past decade or two.

The other method is through the written word, which survives long-term and can be easily reviewed as needed. But the type of literacy needed for useful written material requires time, patience, and stamina that readers must attain through practice.

The translation of Harry Potter into low-resource languages is one great way to give speakers of low-resource languages such practice. Not only is it proven as an enjoyable time-pass for children, but its length and its genre as a fantasy novel gives them practice in patience and stamina and in piecing together a whole new world from slow and gradual detail.

A number of low-resource languages have translated Harry Potter precisely for this reason—some more successfully than others. Frisian, Occitan, and Luxembourgish are a few. I mentioned in a previous post about how Maori was particularly successful, building a world from within Maori traditions. This should be done in more low-resource languages, while being careful not to overwhelm children with so many neologisms that they can’t keep up. That would also require more flexibility from J. K. Rowling and Warner Bros. for translators to transform the story as needed: in some cases, restrictions on translating names like “Slytherin” may even be burdensome.

Want more?

Since the topic of this post is relatively dry, I kept it intentionally brief, despite having a lot more to say. If you’re interested in learning more about anything discussed in this post, leave a comment or contact me and let me know, and I can expand on that discussion in a future post!

Meanwhile, be sure to check back weekly for new posts at Potter of Babble, and follow us on Instagram and Twitter! And don’t forget to join the discussion on our email list.

One thought on “Low-resource languages: what they are and why they matter

Add yours

Leave a Reply

Powered by WordPress.com.

Up ↑

%d bloggers like this: