Artificial creativity in the eye of the beholder

A shorter version of this essay was published in 2014 by TheConversation.com with the title No need to panic – Artificial intelligence has yet to create a doomsday machine.


This month [December 2014] saw a remarkable intervention into the debate about the future of AI from the well-known physicist Stephen Hawking.  The intelligent system that Hawking uses to convert facial movements into speech has recently received an upgrade allowing him to talk faster and in a more fluent manner.  This is achieved partly by a greater capacity of the software to predict what Hawking might wish to say next.  Somewhat surprised by the machine’s facility to anticipate his next word, Hawking’s related to a BBC reporter his concerns that one day AI will surpass human intelligence and be able to improve itself.  Further, this eventuality, according to Hawking, could mean the end of the human race.

Stephen Hawking in 2015 using his predictive speech device. CCBY4.0.

Stephen Hawking is not alone in worrying about superhuman AI. As discussed in my earlier Conversation piece, “Superintelligent machines aren’t to be feared” [or in “Runaway human intelligence” in this blog], a growing number of futurologists, philosophers and AI researchers are beginning to express concerns about a possible rapid rise in AI capability that could leave the human race outsmarted and outmanoeuvred. 

My own view, however, is that such a scenario is unlikely, or at least very far off.  The reason, as I argued in my previous essay, is that humans will always take advantage of the latest AI to boost their own intelligence.  Therefore it will not be enough for a malevolent AI to outwit raw human brain power, rather it will have to be better than us together with whatever loyal AI tech we are able to command.   That the combination of human intelligence with machine intelligence will be better than either alone, is not only to be expected, there are also many good examples of this already.  For instance, Clive Thompson, in his book “Smarter than you Think” describes the current status of world championship chess, where AIs surpassed individual human grandmaster capability some time ago.  It turns out that the best chess players in the world right now are not humans or AIs working alone, but humans with AIs (chess-playing programs) working together in teams.

Nevertheless, for those who are still afraid that the machines will one day take over, it would be useful to have measures for how close the AIs are getting so that we know when to set off the alarm bell.  Whilst I don’t believe that surpassing raw (unaided) human intelligence will be the trigger for an apocalypse I would still agree that it provides an interesting benchmark.  But how will we know once AI has reached it?   Unfortunately, AI researchers are not agreed on this issue either. 

One of the best-known and long-lasting ideas for bench-marking AI achievements is the so-called “Turing Test”, developed from a thought experiment described by the late, great code-cracking AI pioneer Alan Turing, whose ultimately tragic personal history was retold in the recent movie “The Imitation Game”.    Turing took the question “Can a machine think?” and sought to turn it from a philosophical, mathematical or even theological question into a simple practical matter, indeed, into an “imitation game” that we can play with our computers.  The challenge was, and is, could a machine partake in a conversation, on any topic whatsoever, in such a way that its human interviewer would be unable to guess whether they were communicating with another person or with a machine?

Turing, who died in 1954 four years after proposing his game, might have been surprised at the level of interest it would go on to generate.   In 1991, the inventor Hugh Loebner instituted an annual competition in honour of Turing, and named the Loebner Prize, to try to find an AI, more specifically what we now term a ‘chatbot’, that could pass Turing’s test.  The 2014 edition of the competition was held at was has to be the perfect venue—Alan Turing’s old stomping grounds at Bletchley Park where the Enigma code was once cracked. The panel of expert judges included writers, computer scientists, and the worldly-weary and technology-wise James May (of BBC’s “Top Gear” fame). With a substantial cash prize ($25,000) on offer to any eventual winner one would hope for some significant improvement each year given the relentless rise of the machine (at least in its smartphone/tablet variant). However, as Ian Hocking, one of the 2014 competition judges reported in this blog [n.b. link no longer available], if today’s Loebner Prize entrants represent the our best shot at human-like intelligence, then current AI’s are only matching the tip of iceberg of human cognition, and anything like success looks decades away.  

Don’t be fooled, either, by efforts to water down the Turing test by imitating child-like intelligence or the linguistic capacity of a non-native speaker. The University of Reading recently exhibited a chatbot, that claimed to beat the Turing test, but the competition was to match the conversational capability of a thirteen-year-old Ukrainian boy, and the level of intelligence displayed, though reflecting some cunning on the part of the programmers, was a long way from what Turing might have envisaged.  AI systems have been fooling people with simple conversational skills for decades, Joe Weizenbaum artificial therapist Eliza, developed in the early 1960s being a famous early example.  Eliza, like many early Loebner Prize entrants, used a pattern-matching algorithm to simulate conversation, basic computer science in modern terms, rather than having anything like human understanding.  Eliza showed that you can fool some of the people some of the time, but that the Loebner prize has never been won demonstrates, that when performed correctly, the Turing test is a fairly demanding measure of human-level intelligence.

So if the Turing test cannot be passed (or at least not just yet), are there other key aspects of human intelligence at which AI might be doing better?  One recent proposal is that we might develop a measure that tests computer creativity.  After all, if robots are going to take over the world then they will need to at least match the human capacity for thinking “outside the box” that has got our species to where we are today.

Inspired by Turing’s imitation game, but also by an earlier proposal to measure machine creativity, Mark Riedl from Georgia Tech, has proposed something he calls the Lovelace 2.0 test.   The new test proposes that AIs are required to create artefacts that match a plausible but arbitrarily complex set of design constraints, where success in meeting these requirements would be deemed as evidence of creative thinking in a person (and thus by extension in a machine).  The design constraints are set by an evaluator, who is also the judge of its success.  For example the evaluator might ask the machine to “create a story in which a boy falls in love with a girl, aliens abduct the boy, and the girl saves the world with the help of a talking cat” (this example is suggested by Riedl!).  A crucial difference from the Turing test is that there is no comparator, that is, we are not testing the output of the machine against that of a person. Creativity, and by implication also intelligence, is therefore to be decided by appropriately chosen experts. Interestingly, Riedl suggests that we leave aesthetics out of the picture, judges are simply asked if the output meets the constraints.  Thus if the machine tells a suitable science fiction tale in which Jack, Jill and Bagpuss, repel ET and save Earth, then that’s a pass, even if the result would rate as rather unoriginal in terms of childrens’ fiction.

I like the idea of testing creativity because I do think that there are talents that underlie human inventiveness that AI developers have barely begun to fathom.  However, the essence of Riedl’s test appears to be large-scale constraint satisfaction, challenging perhaps, but not everyone’s idea of creativity.  Also, I am not sure that I am ready to drop the competitive element of Turing’s verbal tennis match.  With the Loebner prize you at least get a result that is easy to agree on—the chatbot fooled this percentage of judges this much of the time. With Lovelace 2.0, creativity is in the eye of the beholder, and when it comes to art (even leaving out aesthetics), it can be hard for different judges to agree.   

Reading the proposal for Lovelace 2.0, I was also reminded of the artist Harold Cohen who created the robotic painter AARON whose outputs have been exhibited in art galleries across the world.  Cohen was sometimes surprised by AARON’s creations and considered his program to be creative, but only to a degree—rather, he considered that much of the true creativity was in the work that he had done himself in writing the program.   Riedl seems ambivalent about this too, watering down the requirements of an early creativity test, Lovelace 1.0, that the AI should produce an output that was surprising and unexplainable even to its program’s originator.

The 2013 Living Machines exhibition, at the London Science Museum, featured live visual art from Harold Cohen’s AARON (centre-top of photo) and a string quartet performing music generated by the Artificial Intelligence EMI, created by David Cope. Photograph by Sarah Prescott (used with permission).

Ada Lovelace, the friend of computer-inventor Charles Babbage, and for whom Riedl has named his new test, is reported to have said that “the Analytical Engine [Babbage’s computer] has no pretensions to originate anything. It can do whatever we know how to order it to perform”.   With modern AI systems we certainly do have the possibility to be surprised by their behavior, indeed, we now understand that complex systems, of any kind, including computer programs, can have “emergent” properties that weren’t envisaged by their inventors. 

Would Ada Lovelace recognise any present-day AIs as truly creative?

For me, however, that an AI system does something surprising, whilst also matching constraints, is a rather weak measure of creativity.  In humans our creativity seems to rely as much on our ability to re-imagine ideas—translating images into words then back into ideas, ruminating over them for long periods where they are subject to subconscious processes (where who knows what happens), shuffling them from one person’s brain into another via conversation or artistic interchange, then getting them back again but in a modified form.  These seem to be some of the prerequisites for the kind of creativity we see in human culture.   Without such capacities, and I think we are some way from developing them in AI, I don’t see computers being able to match human creativity. 

So, for the meantime, I believe AIs will be most successful when working alongside their human peers, and benefiting from our ability to think imaginatively, while we make use of their capacity for faultless memory and perfect precision.   Its certainly worth monitoring AIs progress against intelligence and creativity tests, but for the most part these show how far AI has still to go before it will catch up with us.  All things considered, I don’t think we need to hit the panic button just yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s