English dictionaries in the age of the internet (III)

Most people, of course, now never go near a dictionary, but simply type phrases into Wikipedia (used more often as a dictionary than an encyclopedia, research suggests) or rely on Google, which – through a deal with Oxford Dictionaries – offers thumbnail definitions, audio recordings of pronunciations, etymology, a graph of usage over time and translation facilities. If you want to know what a word means, you can just yell something at Siri or Alexa.

Dictionaries have been far too slow to adjust, argues Jane Solomon of Dictionary.com. “Information-retrieval is changing so fast,” she said. “Why don’t dictionaries respond intelligently to the semantic or user context, like figuring out that you’re searching for food words, and give you related vocabulary or recipes?” And not just words: “I’d love to include emojis; people are so creative with them. They’ve become a whole separate language. People sometimes need explanation; if you send your daughter the eggplant emoji, she might think that’s weird.”

SPONSOR AD

Some have dared to dream even bigger than polysemous aubergines. One is a computer professor at the Sapienza University of Rome called Roberto Navigli, who in 2013 soft-launched a site called Babelnet, which aims to be the dictionary to beat all dictionaries – in part by not really being a dictionary at all. Described as a “semantic network” that pulls together 15 existing resources including Wikipedia, Wiktionary and Microsoft Terminology, it aims to create a comprehensive, hierarchical root map of not just English but of 271 languages simultaneously, making it the largest lexicon/encyclopedia/thesaurus/reference work on the web. Navigli told me that his real aim was to use “semantic technology” to enable the holy grail for software engineers everywhere: autonomous machine-reading of text. “This is the dream, right?” he said. “The machine that can read text and understand everything we say.”

Machines already understand a lot, of course. Some have talked of “culturomics”, a form of computational lexicology that uses corpus tools to analyse and forecast trends in human behaviour. A 31-month study of Twitter tried to measure the shifting sentiments of the British population about austerity, and there is even a claim – somewhat disputed – that a “passively crowd-sourced” study of global media could have foretold the Arab spring. At least on a large scale, computers, and the information giants who own and lease the data, may be able to comprehend language better than we comprehend it ourselves.

For lexicographers and Google alike, one linguistic frontier remains stubbornly inaccessible. Whereas it’s now easy to assemble written-text corpora and open a window on how language functions in a particular environment, doing so for spoken language has always been far harder. The reason is obvious: recording speech, then transcribing it and creating a usable database, is both time-consuming and hugely expensive. Speech corpora do exist, but are notoriously small and unrepresentative (it’s easy to work with court transcripts; far harder to eavesdrop on what lawyers say down the pub).

For lexicographers, speech is the most precious resource of all, and the most elusive. If you could capture large samples of it – people speaking in every context imaginable, from playgrounds to office canteens to supermarkets – you could monitor even more accurately how we use language, day to day. “If we cracked the technology for transcribing normal conversations,” Michael Rundell said, “it really would be a game-changer.”

For OED’s editors, this world is both exhilarating and, one senses, mildly overwhelming. The digital era has enabled Oxford lexicographers to run dragnets deeper and deeper through the language, but it has also threatened to capsize the operation. When you’re making a historical dictionary and are required to check each and every resource, then recheck those resources when, say, a corpus of handwritten 17th-century letters comes on stream, the problem of keeping the dictionary up to date expands to even more nightmarish proportions. Adding to that dictionary to accommodate new words – themselves visible in greater numbers than ever before, mutating ever-faster – increases the nightmare exponentially. “In the early years of digital, we were a little out of control,” Peter Gilliver told me. “It’s never-ending,” one OED lexicographer agreed. “You can feel like you’re falling into the wormhole.”

Adding to the challenge is a story that has become wearily familiar: while more people are consulting dictionary-like resources than ever, almost no one wants to shell out. Sales of hard-copy dictionaries have collapsed, far more calamitously than in other sectors. (OUP refused to give me figures, citing “commercial sensitivities”. “I don’t think you’ll get any publisher to fess up about this,” Michael Rundell told me.) While reference publishers amalgamate or go to the wall, information giants such as Google and Apple get fat by using our own search terms to sell us stuff. If you can get a definition by holding your thumb over a word on your smartphone, why bother picking up a book?

“Go to a dictionary conference these days and you see scared-looking people,” Rundell said. Although he trained as a lexicographer, he now mainly works as a consultant, advising publishers on how to use corpus-based resources. “It used to be a career,” he went on. “But there just aren’t the jobs there were 30 years ago.” He pointed to his shelves, which were strikingly bare. “But then I’m not sentimental about print; I gave most of my dictionaries away.”

Even if the infrastructure around lexicography has fallen away or been remade entirely, some things stay pleasingly consistent. Every lexicographer I spoke to made clear their distaste for “word-lovers”, who in the dictionary world are regarded as the type of person liable to scrawl “fewer” on to supermarket signs reading “10 items or less”, or recite “antidisestablishmentarianism” to anyone who will listen. The normally genial John Simpson writes crisply that “I take the hardline view that language is not there to be ‘enjoyed’”; instead, it is there to be used.

But love is, most grudgingly admit, what draws people to spend their lives sifting and analysing language. It takes a particular sort of human to be a “word detective”: something between a linguistics academic, an archival historian, a journalist and an old-fashioned gumshoe. Though hardly without its tensions – corpus linguists versus old-school dictionary-makers, stats nerds versus scholarly etymologists – lexicography seems to be one specialist profession with a lingering sense of common purpose: us against that ever-expanding, multi-headed hydra, the English language. “It is pretty obsessive-compulsive,” Jane Solomon said.

The idea of making a perfect linguistic resource was one most lexicographers knew was folly, she continued. “I’ve learned too much about past dictionaries to have that as a personal goal.” But then, part of the thrill of being a lexicographer is knowing that the work will never be done. English is always metamorphosing, mutating, evolving; its restless dynamism is what makes it so absorbing. “It’s always on the move,” said Solomon. “You have to love that.”

There are other joys, too: the thrill of catching a new sense, or crafting a definition that feels, if not perfect, at least right. “It sounds cheesy, but it can be like poetry,” Michael Rundell reflected. “Making a dictionary is as much an art as a craft.”

Despite his pessimism about the industry, he talked with real excitement about a project he was about to join, working with experts from the Goldfield Aboriginal Language Centre on indigenous Australian languages, scantily covered by lexicographers. “Dictionaries can make a genuine difference,” he said. “They give power to languages that might have had very little power in the past; they can help preserve and share it. I really believe that.”

Throughout it all, OED churns on, attempting to be ever so slightly more complete today than it was yesterday or the day before. The dictionary team now prefer to refer to it as a “moving document”. Words are only added; they are never deleted. When I suggested to Michael Proffitt that it resembled a proud but leaky Victorian warship whose crew were trying to keep out the leaks and simultaneously keep it on course, he looked phlegmatic. “I used to say it was like painting the Forth bridge, never-ending. But then they stopped – a new kind of paint, I think.” He paused. “Now it’s just us.”

These days OED issues online updates four times a year; though it has not officially abandoned the idea of another print edition, that idea is fading. Seven months after I first asked how far they had got into OED3, I enquired again; the needle had crept up to 48.7%. “We are going to get it done,” Proffitt insisted, though as I departed Oxford, I thought James Murray might have raised a thin smile at that. If the update does indeed take until 2037, it will rival the 49 years it took the original OED to be created, whereupon it will presumably need overhauling all over again.

A few days ago, I emailed to see if “mansplain” had finally reached the OED. It had, but there was a snag – further research had pushed the word back a crucial six months, from February 2009 to August 2008. Then, no sooner had Paton’s entry gone live in January than someone emailed to point out that even this was inaccurate: they had spotted “mansplain” on a May 2008 blog post, just a month after the writer Rebecca Solnit had published her influential essay Men Explain Things to Me. The updated definition, Proffitt assured me, will be available as soon as possible.

Join Daily Trust WhatsApp Community For Quick Access To News and Happenings Around You.