Zeerak Ahmed has spent years in the U.S., working for some of the world’s biggest tech companies. But one thing he has grown frustrated with is how “computing treats non-Latin languages as second class citizens.” One such language is his mother tongue, Urdu, the national language and lingua franca of Pakistan, which is also widely spoken in India. Ahmed, who is from Lahore, has had many conversations with his friends and family about the difficulties of trying to use existing Urdu keyboards or read Urdu type. And he has witnessed many young people instead resorting to English or so-called Roman Urdu, using the Latin script to produce a phonetic transliteration, in the absence of a better solution.
While undertaking his Master’s degree in engineering design at Harvard University, he came up with his own solution. After five years of working on the project, last year he launched the Matnsaz iOs app. The app offers users a more refined Urdu keyboard that groups letters by shape, autocorrects and even suggests subsequent words. It’s a stark improvement on the standard Urdu keyboards available on mainstream devices.
Despite being the 10th most widely spoken language in the world, according to reference publication Ethnologue, Urdu has fallen behind in the digital age due to multiple limitations. Many Pakistanis outside of the tech industry believe that Urdu text is incompatible with computing, says Ahmed. But he argues that’s a flaw on the part of computing rather than the language. An effort is underway to change the narrative.
The challenges of the Urdu script
“We live in a text-saturated society, so the exposure the younger generations have to typographic complexity is very high,” says graphic designer and web developer Abeera Kamran. “They expect sophisticated results.” What’s available in Urdu often doesn’t meet those expectations, because writers have resisted digitization for a long time due to the complexity of its written form. (Urdu uses the Nastaliq font, an ornate and fluid variation of written Arabic that is particularly complex because the shape of each letter relies on the following letter.) That now means there’s very little digital content available in Urdu that can compete with what users are used to in Latin scripts. Roman Urdu is often used as a stand-in online. Earlier attempts at digitizing the Urdu script relied on the Naskh Arabic font, which is straighter and therefore easier to code. But some have argued that the Naskh font is inferior to Nastaliq when used to express the Urdu language in writing. As more of our lives become dependent on digital information and communication, some worry that the lack of an accessible digital version of the true written form of the language might lead to Urdu becoming irrelevant for younger generations, who spend more time online than their elders.
“There’s this belief that you can’t use Urdu for modern purposes, and so it makes it hard for the language to evolve and stay relevant for young people,” says Ahmed.
Ahmed and Kamran are among those who are leading the push to prevent that from happening. The Matnsaz app is part of a larger initiative by the same name, which aims to build consumer and developer tools for Urdu online. Currently Ahmed’s work includes Makhzan, an open source Urdu text corpus, and Naqqash, a string processing library for Arabic script.
Ahmed says he’d been toying with the idea for years before starting the effort in earnest in 2017. “In Europe most people are using computers in their native languages, but in Pakistan we don’t do that,” he says. “If you talk to Pakistanis outside the tech industry they believe you can’t do modern computing in Urdu.” Ahmed believes Urdu could easily be used in computing in the same way if it was given the same importance as Latin scripts and had tools built to support it. That idea became the basis for Matnsaz because so much progress in Urdu was being hindered simply because the basic building blocks didn’t exist, he says.
Urdu is spoken by roughly 230 million people globally—largely in Pakistan and India, as well as among diaspora communities around the world. While there have been individual attempts to digitize the language, gaps need to be bridged between different efforts to have a global impact, Kamran says. She notes that the adoption of typographic printing in Urdu didn’t happen until late in the 20th century, because of the complexity of the Nastaliq font and a lack of interest among Pakistani society in Naskh as an alternative. Before that, newspapers and books were handwritten and then photocopied to make multiple copies as needed.
Pakistan has a longstanding cultural tie with Nastaliq, and Lahori Nastaliq in particular—the style in which Urdu is written—which can be complex to code with existing datasets, Kamran says. The cultural associations with the language are also important to understand and consider in the effort to digitize Urdu, she adds. “I realized we can’t do any of this without a cultural reckoning of how Urdu and Nastaliq are ideologically tied to each other.”
Pakistan’s socio-political struggles and the push to create a Muslim-Pakistani national identity, have resulted in resistance to certain changes deemed to be western- or foreign-influenced developments, Kamran says. Urdu is entwined with Pakistani identity, and is the country’s official language, though a number of other languages are spoken there. In pre-partition India, Urdu speakers used the language as a way of standing up against British colonization. Today, it remains a point of tension in India, where it is still spoken by millions of people, but is under threat. Although Urdu is not exclusively spoken by Muslims, it is closely associated with the practice of the faith in the region. “When it comes to Lahori Nastaliq, we think of both Pakistanis and Muslim identity together, and because of that we resisted change,” Kamran says.
This connection is what led Zeeshan Nasar and his father Nasrullah Mehr to start MehrType, a digital type foundry focusing on customized Urdu, Arabic, and Persian fonts. Mehr, an acclaimed calligrapher in Pakistan, works in the Lahori Nastaliq script. For him, the available ligature-based keyboards—which allow for flexibility when typing scripts like Arabic and Nastaliq by adapting to changing letter styles depending on what’s being typed—were simply not enough. Nasar terms ligature-based approaches to Urdu as a ‘jugad,’ an Urdu term for a makeshift solution. “With ligature-based fonts, if you enter a new word into the keyboard, and it doesn’t recognize it, it will break it up and ruin the word and style of the letters,” Nasar says.
That’s why MehrType focused on creating character-based lightweight fonts that would work well for web embedding. Nasar says that if a file is too large, it takes too long to work when embedded into a link and can cause websites to slow down. The team is currently testing a new setting tool that will not only give their fonts better online security but will also include typography features. In the future, the company aims to preserve different Urdu calligraphy styles by turning them into typography. It also hopes to get more people involved in using Urdu digitally, by providing services to customize typography, offering short courses to learn Urdu typography, and converting existing books and texts to Mehr fonts.
Across the border in India, Sanjiv Saraf has curated the largest online library of Urdu poetry under the banner Rekhta. The online platform, named after an early name for the Urdu dialect, digitizes Urdu content to make it more accessible. The site started with work from 50 poets in 2013 and is now home to more than 5,000 writers’ work. Saraf’s goal is to bring Urdu to a wider audience, and the website presents Urdu literature and poetry in various formats. “A lot of the content we have online has also been recited so people get an idea of the diction, because pronunciation is also very important for the language,” Saraf says. While the Rekhta team, which consists of 230 employees and 100 volunteers, is scanning huge numbers of Urdu books to publish the contents online each day, they can’t truly digitize the works until they’re able to type out Urdu text and literature. The current lack of optical character recognition (OCR)—which converts typed, written or printed text into machine-encoded text—for Urdu has limited the accessibility of scanned materials in the language. Kamran says there are multiple reasons for this.
“Firstly, text reads differently in print and online, so that creates an accessibility issue. You can’t search in images and that means that any sort of research and finding [available] resources gets difficult, because you might never know they are there. Roman Urdu also has no set spellings so there’s no one way to search up Urdu material,” she says.
Kamran started her Master’s degree in typography after trying to create an Urdu website for Karachi Urban Lab, an organization focused on research, teaching and advocacy around development and urbanization in Karachi through data. She found the lack of typographical resources available to be a stumbling block. Her goal is to help contribute to the work that developers and language experts across the world are doing to digitize Urdu by reckoning with its cultural history. She says that the importance of Nastaliq cannot be understood until its links to Muslim-Pakistani identity building are equally understood. “Urdu and Nastaliq are ideologically tied to each other,” Kamran says. Because of the sensitivities around Urdu, she believes any changes to its presentation must be accepted in society before progress can be made.
“The result should be creating complex resources that users can use in documents to create detailed stylized documents in Urdu the same way we see formatting in English,” she says.
The current stage of development in Urdu digitization, with keyboards and basic fonts now available, has been a long time in the making and there is much more to be done. Nasar has been working on Urdu development for 18 years. Many of the fonts he wants to work on are still in progress simply because font development is such a costly process. But there has actually been great progress over the last decade. Developers have gone from previously relying on Inpage—a word processor and page layout software used for languages such as Arabic, Urdu and Persian—for years, to now having multiple efforts underway on data sets and design. This rapid growth in Urdu digitization efforts and resource development over the past few years offers hope that the foundation will become easier to build upon. And now, with AI language models in focus, tech giants like Google might help contribute to that momentum. The company announced in July that AI platform Google Bard now supports nine Indian languages, including Urdu.
Saraf has a front-row seat to the progress and is optimistic. “I don’t think Urdu is struggling online, with the way our readership is growing. We have 24 million followers on Rekhta for Urdu content and every month it’s growing,” he says. “So the key is simply presenting the information in an easy-to-access manner.”
The post The Fight to Preserve the Urdu Script in the Digital World appeared first on TIME.