It is easier to achieve good results from CAT when the input conforms not to a natural language in its raw and living state but to a restricted code, a delimited subspecies of a language. In an aircraft-maintenance manual you find only a subset of the full range of expressions possible in English. To produce the hundred or so language versions of the manual that are needed through an automatic-translation device, you do not need to make the device capable of handling restaurant menus, song lyrics, or party chitchat—just aircraft-maintenance language. One way of doing this is to pre-edit the input text into a regularized form that the computer program can handle, and to have proficient translators post-edit the output to make sure it makes sense (and the right sense) in the target tongue. Another way of doing it is to teach the drafters of the maintenance manuals a special, restricted language—Boeinglish, so to speak—designed to eliminate ambiguities and pitfalls within the field of aircraft maintenance. This is now a worldwide practice. Most companies that have global sales have house styles designed to help computers translate their material. From computers helping humans to translate we have advanced to having humans help computers out. It is just one of the truths about translation that shows that a language is really not like a Meccano set at all. Languages can always be squeezed and shaped to fit the needs that humans have, even when that means squeezing them into computer-friendly shapes.
Computer-aided human translation and human-aided computer translation are both substantial achievements, and without them the global flows of trade and information of the past few decades would not have been nearly so smooth. Until recently, they remained the preserve of language professionals. What they also did, of course, was to put huge quantities of translation products (translated texts paired with their source texts) in machine-readable form. The invention and the explosive growth of the Internet since the 1990s has made this huge corpus available for free to everyone with a terminal. And then Google stepped in.
Using software built on mathematical frameworks originally developed in the 1980s by researchers at IBM, Google has created an automatic-translation tool that is unlike all others. It is not based on the intellectual presuppositions of Weaver, and it has no truck with interlingua or invariant cores. It doesn’t deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before. It uses vast computing power to scour the Internet in the blink of an eye looking for the expression in some text that exists alongside its paired translation. The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the Web by individuals, libraries, booksellers, authors, and academic departments. Drawing on the already established patterns of matches between these millions of paired documents, GT uses statistical methods to pick out the most probable acceptable version of what’s been submitted to it. Much of the time, it works. It’s quite stunning. And it is largely responsible for the new mood of optimism about the prospects for FAHQT, Weaver’s original pie in the sky.