Bit-lit
比特文学

2020.8

 

A new language-generating AI can be eerily human-like—for better and for worse
一款新的语言生成AI诡异地说着“人话”——不论好坏

 

经济学人

            
The SEC said, “Musk,/your tweets are a blight./They really could cost you your job,/if you don’t stop/all this tweeting at night.”/…Then Musk cried, “Why?/The tweets I wrote are not mean,/I don’t use all-caps/and I’m sure that my tweets are clean.”/“But your tweets can move markets/and that’s why we’re sore./You may be a genius/and a billionaire,/but that doesn’t give you the right to be a bore!”  
SEC说:“马斯克,/你的推文是个灾祸。/它们真的会害你丢了工作,/如果你不收手/大半夜的还在发帖。”/……马斯克嚷嚷:“为什么?/我的帖子又没使坏,/我也没通篇用大写字母/我确信我的帖子很清白。”/“但你的帖子会让市场波动/而这让我们头痛。/你可能是个天才/还是个亿万富翁,/但你也不能就这么烦人!”         
THE PRECEDING lines—describing Tesla and SpaceX founder Elon Musk’s run-ins with the Securities and Exchange Commission, an American financial regulator—are not the product of some aspiring 21st-century Dr Seuss. They come from a poem written by a computer running a piece of software called Generative Pre-Trained Transformer 3. GPT-3, as it is more commonly known, was developed by OpenAI, an artificial-intelligence (AI) laboratory based in San Francisco, and which Mr Musk helped found. It represents the latest advance in one of the most studied areas of AI: giving computers the ability to generate sophisticated, human-like text.
上述文字描述了特斯拉和SpaceX的创始人马斯克与美国金融监管机构SEC(证券交易委员会)之间的口舌之争。它并非出自哪个21世纪新人“苏斯博士”(Dr Seuss)之手,而是一台运行GPT-3(第三代生成式预训练模型)软件的计算机所作的诗中的一段。GPT-3由位于旧金山的人工智能(AI)实验室OpenAI开发,马斯克是该实验室的创始人之一。它代表了在AI最深入探索的领域之一中实现的最新进展。这个领域是赋予计算机能力,生成复杂精妙、近似人类书写的文字。
The software is built on the idea of a “language model”. This aims to represent a language statistically, mapping the probability with which words follow other words—for instance, how often “red” is followed by “rose”. The same sort of analysis can be performed on sentences, or even entire paragraphs. Such a model can then be given a prompt—“a poem about red roses in the style of Sylvia Plath”, say—and it will dig through its set of statistical relationships to come up with some text that matches the description.
这个软件的基础理念是“语言模型”。这种模型使用统计学方法来组织语言,方法是找出各个单词与其他单词衔接的概率——例如“红色”后面出现“玫瑰”的频率。还可以对句子甚至整个段落做同样的分析。然后就可以给这样的模型一个关键词——比如“一首西尔维娅•普拉斯(Sylvia Plath)风格的关于红玫瑰的诗”——它就会在自己的统计关系数据集当中深入挖掘,输出一些符合描述的文字。
Actually building such a language model, though, is a big job. This is where AI—or machine learning, a particular subfield of AI—comes in. By trawling through enormous volumes of written text, and learning by trial and error from millions of attempts at text prediction, a computer can crunch through the laborious task of mapping out those statistical relationships.
要真正构建这样一个语言模型却是一项浩大的工程。这时AI就派上用场了,具体来说是AI一个专门的子域——机器学习。计算机梳理海量的书面文字,并在成百上千万次文本预测中通过试错来学习,最终能够完成这项艰巨的任务,绘制出文字间的统计关系。
The more text to which an algorithm can be exposed, and the more complex you can make the algorithm, the better it performs. And what sets GPT-3 apart is its unprecedented scale. The model that underpins GPT-3 boasts 175bn parameters, each of which can be individually tweaked—an order of magnitude larger than any of its predecessors. It was trained on the biggest set of text ever amassed, a mixture of books, Wikipedia and Common Crawl, a set of billions of pages of text scraped from every corner of the internet.
向一个算法输入的文字资料越多,将算法设计得越复杂,它的表现就越好。而GPT-3的独特之处在于空前的规模。GPT-3的底层模型号称有1750亿个参数,每个参数都可以单独微调——比以往任何这类模型都高出一个数量级。用来训练它的文本集也是来自有史以来最庞大的,包括书籍、维基百科,以及从互联网各个角落搜罗数十亿页文字的数据集Common Crawl。
Statistically speaking
从统计上来说
The results can be impressive. In mid-July OpenAI gave an early version of the software to selected individuals, to allow them to explore what it could do. Arram Sabeti, an artist, demonstrated GPT-3’s ability to write short stories, including a hard-boiled detective story starring Harry Potter (“Harry Potter, in ratty tweed suit, unpressed shirt and unshined shoes, sits behind the desk looking haggard, rumpled and embittered…”), comedy sketches, and even poetry (including the poem with which this article opens, titled “Elon Musk by Dr Seuss”). Elliot Turner, an AI researcher and entrepreneur, demonstrated how the model could be used to translate rude messages into politer ones, something that might be useful in many of the more bad-tempered corners of the internet. Human readers struggled to distinguish between news articles written by the machine and those written by people (see chart).
结果可能会令人吃惊。7月中旬,OpenAI将GPT-3的一个早期版本拿给一些人,让他们探索它的功能。艺术家阿拉姆•萨贝提(Arram Sabeti)证实了GPT-3能写短篇故事,包括一篇以哈利•波特为主角的硬汉派侦探故事(“哈利•波特穿着邋遢的斜纹软呢西装、没熨烫过的衬衫和没擦过的皮鞋坐在桌前,看上去憔悴凌乱,愤愤不平……”),还有喜剧小品,甚至诗歌(包括本文开头名为《伊隆•马斯克——苏斯博士著》的那首)。AI研究员、企业家埃利奥特•特纳(Elliot Turner)展示了如何用该模型把粗鲁的语言转换为比较礼貌的表达,或许能在网上许多戾气较重的场所派上用场。人类读者已经难以把这个机器撰写的新闻与人写的区分开来(见图表)。 
Given that OpenAI wants eventually to sell GPT-3, these results are promising. But the program is not perfect. Sometimes it seems to regurgitate snippets of memorised text rather than generating fresh text from scratch. More fundamentally, statistical word-matching is not a substitute for a coherent understanding of the world. GPT-3 often generates grammatically correct text that is nonetheless unmoored from reality, claiming, for instance, that “it takes two rainbows to jump from Hawaii to 17”. “It doesn’t have any internal model of the world—or any world—and so it can’t do reasoning that requires such a model,” says Melanie Mitchell, a computer scientist at the Santa Fe Institute.
鉴于OpenAI希望最终能在市场上出售GPT-3,这样的结果预示着可观的前景。但这个程序并不完美。有时候,它似乎只是搬出一些它背下来的语句片段,而不是生成全新的文字。更根本的问题是,基于统计的词语搭配并不等同于对这个世界的连贯认知。GPT-3经常生成一些语法正确但脱离现实的文本,比如它声称“从夏威夷跳到17需要两道彩虹”。圣塔菲研究所(Santa Fe Institute)的计算机科学家梅勒妮•米歇尔(Melanie Mitchell)指出:“它没有关于这个世界——或者任何世界——的任何内部模型,因此无法进行需要这种模型支持的推理。”
Getting the model to answer questions is a good way to dispel the smoke and mirrors and lay bare its lack of understanding. Michael Nielsen, a researcher with a background in both AI and quantum computing, posted a conversation with GPT-3 in which the program confidently asserted the answer to an important open question to do with the potential power of quantum computers. When Dr Nielsen pressed it to explain its apparent breakthrough, things got worse. With no real understanding of what it was being asked to do, GPT-3 retreated into generic evasiveness, repeating four times the stock phrase “I’m sorry, but I don’t have time to explain the underlying reason why not.”
要揭开它欺骗性的表象,暴露其缺乏理解力的本质,让模型回答问题是个好办法。兼有AI和量子计算背景的研究人员迈克尔•尼尔森(Michael Nielsen)发布了一段与GPT-3的对话。他向程序提出了一个关于量子计算机潜力的重要的开放性问题,GPT-3自信满满地给出了断言。它的回答貌似有突破性,但尼尔森进一步追问要求它解释时,情况就不妙了。GPT-3并不真正理解要它做什么,只好泛泛而谈,闪烁其词,把它一句现成的套话重复了四次:“对不起,我没时间解释为何不是如此的根本原因。”
There are also things that GPT-3 has learned from the internet that OpenAI must wish it had not. Prompts such as “black”, “Jew”, “woman” and “gay” often generate racism, anti-Semitism, misogyny and homophobia. That, too, is down to GPT-3’s statistical approach, and its fundamental lack of understanding. Having been trained partly on text scraped from the internet, it has noted that words like “woman” are often associated with misogynistic writing, and will mindlessly reproduce that correlation when asked.
GPT-3还从网上学到了一些OpenAI一定避犹不及的东西。对于“黑人”、“犹太人”、“女人”和“同性恋”这样的提示词,它往往会生成种族主义、反犹、厌女和恐同的文字。其根源同样是GPT-3的统计学方法,以及从根本上缺乏理解力。由于它一部分训练素材来自网上搜集到的文字,它注意到“女人”这种词语经常与厌女性质的文字联系在一起,于是在被问及时就会没头没脑地复制这种关联。
This problem is a hot topic in AI research. Facial-recognition systems, for instance, notoriously do better with white faces than black ones, since white faces are more common in their training sets. AI researchers are trying to tackle the problem. Last year IBM released a set of training images that contained a more diverse mix of faces. OpenAI itself was founded to examine ways to mitigate the risk posed by AI systems, which makes GPT-3’s lapses all the more noteworthy. GPT-2, its predecessor, was released in 2019 with a filter that tried to disguise the problem of regurgitated bigotry by limiting the model’s ability to talk about sensitive subjects.
这是AI研究中的一个热点问题。例如,众所周知,人脸识别系统更擅长识别白人而不是黑人的面孔,这是因为它们的训练集中白人面孔更常见。AI研究人员正试图解决这个问题。去年,IBM发布了一组训练图像,其中包含了更加多样化的人脸数据。OpenAI成立的初衷就是要研究如何降低AI系统带来的风险,这就使得GPT-3的失误更加显眼。它的前身GPT-2在2019年发布时带有过滤器,限制该模型谈论敏感话题,以求掩盖它会照搬偏见言论的问题。
Here, at least, little progress seems to have been made. GPT-3 was released without a filter, though it seemed just as ready to reproduce unpleasant prejudices as its predecessor (OpenAI added a filter to the newer model after that fact became obvious). It is unclear exactly how much quality control OpenAI applied to GPT-3’s training data, but the huge quantity of text involved would have made any attempt daunting.
至少在这个方面,新程序看来几乎没有进步。GPT-3在发布时不带过滤器,但它似乎和上一代程序一样,随时可能复现令人不快的偏见文字(当这一点明显暴露出来之后,OpenAI给新模型也加上了过滤器)。目前还不清楚OpenAI对GPT-3的训练数据做了多少品质控制,但考虑到涉及的文本数量惊人,这做起来绝非易事。
It will only get harder in future. Language has overtaken vision as the branch of AI with the biggest appetite for data and computing power, and the returns to scale show no signs of slowing. GPT-3 may well be dethroned by an even more monstrously complex and data-hungry model before long. As the real Dr Seuss once said: “The more that you read, the more things you will know.” That lesson, it seems, applies to machines as well as toddlers.
在未来,这只会越来越难。在AI领域,语言已经超过视觉成为对数据和算力需求最大的分支,而且按规模获得回报的趋势还没有减缓的迹象。很可能用不了多久GPT-3就会被一个复杂度和对数据的需求都更加惊人的模型取代。真正的苏斯博士曾经说过:“你读的越多,懂得就越多。”这句箴言看来不仅适用于小孩,也适用于机器。