‘Like a small child’: Left alone, ChatGPT got up to mischief

Jan 17, 2024 at 04:20 pm by admin

South African publisher Daily Maverick experimented with OpenAI to discover what the technology could do and how it could be used, while learning from its editors.

In an INMA ideas blog, head developer of MavEngine, Jason Norwood-Young told of the impact the “new-fangled GPT thing” had at the Cape Town company.

“Take GPT and use it to summarise the news. How hard could it be,” he asked.

David Maverick was among those blown away by OpenAI’s ChatGPT-3, human-like responses to questions.

“Everyone wanted in on the hype, and news publishers – still suffering from FOMO and PTSD from the rise of the internet – were not going to be left out,” he says.

“Inspired by a talk from FT Strategies’ Tim Part on newsroom experimentation, I wanted to build something small to see what the technology could do. I started experimenting with OpenAI’s Playground, which offers more options than the usual ChatGPT interface.

“I fed articles to GPT and asked it to summarise, create headlines, check for errors, and create Tweets. That taught us its shortcomings quickly: GPT-3 was good at summarisation, OK at headlines, useless at finding copy errors. For Tweets, it was unpredictable, creating a brilliant Tweet one second and an unusable one the next. But we could see the potential; it was surprisingly creative, generating hashtags and the occasional emoji for Tweets.”

Norwood-Young says the its biggest struggle was counting. “GPT-3 wouldn’t limit itself to 140 characters, or X number of sentences for a summary, or even summarise in four bullet points with any reliability.

“It’s the same reason GPT-3 is poor at rhyming: It cannot plan ahead. (This is what AI researchers call ‘thinking slow’. Large language models like GPT-3 are good at ‘thinking fast’, but planning ahead is their Achilles’ heel.)

“Another problem is that OpenAI’s models want to summarise every major point in an article. This isn’t necessarily what a newspaper wants; depending on the use case, it might want to entice readers to read a summary giving a few top facts and then click through to the full article if they want more details. AI summaries didn’t complement the articles, they made them redundant.”

And he says if an article wasn’t written in a strict pyramid style, the summaries were often wildly off the point. “A flowery intro – common in opinion writing – would throw GPT-3 off completely.

“Occasionally the summaries would miss the hook completely or get the article factually wrong.”

Another issue was tone: The summaries were dry without careful prompting, lacking the stab-and-thrust of Daily Maverick’s typical jousting style. “This was solved largely by adding, ‘...in the style of Daily Maverick’,” he says.

However, adding this occasionally resulted in summaries that began, ‘According to an article in Daily Maverick…’ or ‘This article is about…’. “Gouge my eyes out!.

“While GPT-3 was astonishingly good, it was also get-yourself-sued bad. Like a toddler, you wouldn’t want to leave it home alone, unsupervised, because, at some point, mischief would occur.”

With a “human-in-the-loop” – with a live person assisting the technology –summaries were marked as unapproved until an editor had checked them. “The editor could decide to accept it as is, reject it (which immediately generates a new option), or edit the summary.”

Norwood-Young says he built two separate interfaces for what became the SummaryEngine WordPress plugin. One interface appeared while the editor was editing the article; the second was an overview of all summaries for all articles, which lets an editor work through multiple summaries quickly.

“It was an inexpensive experiment to start generating summaries, see how editors use them, and build a foundation of knowledge, awareness, and acceptance of AI within the organisation.

“Of the two interfaces, only the article CMS is really used – providing one of my big takeaways from this project: Editors are busy people, apparently. They want to do everything on one screen, and this screen is the one they work on the most. Just because it’s someone’s job to use a web interface, the same rules that work on a potential browser or customer apply: ‘Don’t make me think’ and definitely ‘Don’t make me load a new page’.

Two major design iterations with the user interface on the edit article page saw one of the smallest changes having one of the biggest impacts: changing ‘Unapprove’ to ‘Reject’. (Thanks to Styli Charalambous for suggesting this.)

“To my delight, the editors started using the summaries, but we still didn’t know what we were going to do with them.

“Options included a summary newsletter, use in our mobile app, or even as a popup or sidebar while reading an article. And while I wouldn’t usually recommend starting a project without a firm design, in this case it worked out very well: Daily Maverick has quietly launched a completely new interface with summaries as another cheap experiment to see how our users interact with them. It uses two types of summaries – a short summary and bullet points, which has quickly become my favourite way to read Daily Maverick and was visited by over 17,000 readers in just one week.

“The fact that we could quickly put out this product, and maintain it, even though we’re just starting to experiment with AI, shows the promise of the technology.”

Norwood-Young says the project has also opened up the possibility of other use cases: experiments with GPT-4 show better results in terms of length and tone. It still cannot sub-edit or copy-edit worth a damn, but it can do translations, and suggest headlines. “Rolled into the headline scorer I built – which has changed how news editors structure headlines based on what our readers respond to – it has helped Daily Maverick surpass 10 million unique readers a month.”

He says the interventions will again be built with people in mind, to give them a tool and not try to replace them. And again, a warning: “If you’re going to let AIs write headlines alone, expect the same result as giving a toddler access to a permanent marker, a bucket of paint, and your makeup drawer, and then leaving them alone for a few hours.”

Daily Maverick runs on Wordpress, so SummaryEngine was built in PHP on top of Wordpress, using MySQL as the data store, with some of the user interface built in Svelte with Typescript. The GPT-3 Completion interface was used but they plan to trial GPT-3.5 and GPT-4 using the Chat interface.


or Register to post a comment