In a case study, researchers estimate that between 33% and 46% of Mechanical Turk workers used LLMs when completing a text summarization task
File this one under inevitable, but hilarious. Mechanical Turk is a service that from its earliest days seemed to invite shenanigans …
TechCrunch Devin Coldewey
Related Coverage
Discussion
-
@manoelribeiro
Manoel
on x
One of our key sources of human data is no longer fully “human”! We estimate that 33-46% of crowd workers on MTurk used large language models (LLMs) in a text production task - which may increase as ChatGPT and the like become more popular and powerful. https://arxiv.org/... [ima…
-
@maggiekb1
Maggie Koerth
on x
So a major place scientists go to run experiments is about 1/3 ai now. And those results are still getting treated as human, including in tasks used to train ai. https://twitter.com/...
-
@jeremybowers
Jeremy Bowers
on x
It's too late, we poisoned MTurk https://twitter.com/...
-
@noamchompers
Noam Chompers
on x
In addition to being a disaster for a lot of empirical work, this would also be very funny https://twitter.com/...
-
@emollick
Ethan Mollick
on x
In a great (& destructive) irony, Mechanical Turk is just AI, now. The MTurk crowdworking platform is a major place for researchers (and companies) to get humans to do small tasks & experiments. But this paper finds 33-46% of Turkers use LLMs to do tasks. https://arxiv.org/... [i…
-
@rdbinns
@rdbinns
on x
Our AI couldn't really do the job so we outsourced it to a human who could, but didn't, who outsourced it to another AI that couldn't really do the job either. https://twitter.com/...
-
@atabarrok
Alex Tabarrok
on x
Wait, so the mechanical Turk now actually is a mechanical Turk?! What a time to be alive! https://twitter.com/...
-
@perttu_h
@perttu_h
on x
Large Language Models are making MTurk etc. fundamentally unreliable in collecting text data, as predicted in our CHI paper (https://dl.acm.org/...) https://twitter.com/...
-
@veredshwartz
Vered Shwartz
on x
This seems consistent with my experience lately. We had to manually verify text written by annotators and filtered out quite a bit of text that looked LM-generated. In addition, the annotations for verification/ranking tasks were so noisy that we decided to not do them in mturk. …
-
@rstephens
Robert Stephens
on x
“Rechanical Turk” https://www.techmeme.com/...
-
@noahpinion
Noah Smith
on x
There goes every MTurk study... https://twitter.com/...