In a case study, researchers estimate that between 33% and 46% of Mechanical Turk workers used LLMs when completing a text summarization task

File this one under inevitable, but hilarious. Mechanical Turk is a service that from its earliest days seemed to invite shenanigans …

TechCrunch 2023-06-15 Devin Coldewey

Discussion

@manoelribeiro Manoel on x
One of our key sources of human data is no longer fully “human”! We estimate that 33-46% of crowd workers on MTurk used large language models (LLMs) in a text production task - which may increase as ChatGPT and the like become more popular and powerful. https://arxiv.org/... [ima…
@maggiekb1 Maggie Koerth on x
So a major place scientists go to run experiments is about 1/3 ai now. And those results are still getting treated as human, including in tasks used to train ai. https://twitter.com/...
@jeremybowers Jeremy Bowers on x
It's too late, we poisoned MTurk https://twitter.com/...
@noamchompers Noam Chompers on x
In addition to being a disaster for a lot of empirical work, this would also be very funny https://twitter.com/...
@emollick Ethan Mollick on x
In a great (& destructive) irony, Mechanical Turk is just AI, now. The MTurk crowdworking platform is a major place for researchers (and companies) to get humans to do small tasks & experiments. But this paper finds 33-46% of Turkers use LLMs to do tasks. https://arxiv.org/... [i…
@rdbinns @rdbinns on x
Our AI couldn't really do the job so we outsourced it to a human who could, but didn't, who outsourced it to another AI that couldn't really do the job either. https://twitter.com/...
@atabarrok Alex Tabarrok on x
Wait, so the mechanical Turk now actually is a mechanical Turk?! What a time to be alive! https://twitter.com/...
@perttu_h @perttu_h on x
Large Language Models are making MTurk etc. fundamentally unreliable in collecting text data, as predicted in our CHI paper (https://dl.acm.org/...) https://twitter.com/...
@veredshwartz Vered Shwartz on x
This seems consistent with my experience lately. We had to manually verify text written by annotators and filtered out quite a bit of text that looked LM-generated. In addition, the annotations for verification/ranking tasks were so noisy that we decided to not do them in mturk. …
@rstephens Robert Stephens on x
“Rechanical Turk” https://www.techmeme.com/...
@noahpinion Noah Smith on x
There goes every MTurk study... https://twitter.com/...

Chronicles

In a case study, researchers estimate that between 33% and 46% of Mechanical Turk workers used LLMs when completing a text summarization task

Related Coverage

Discussion