Have you looked into METR at all? They’ve done some interesting work on measuring ...

2025-11-04 07:42:20 UTC

Have you looked into METR at all? They’ve done some interesting work on measuring AI model capability and autonomy against a benchmark of software engineering tasks which have baseline measurements of how long they take a human to do.

TL;DR, frontier models currently have a ~50% success rate when performing tasks that would take humans a little over two hours, and that time has been doubling every 7 months. Assuming that trajectory, they’ll have the autonomy to complete a human 40 hour work week worth of work in less than three years.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Author Public Key

npub1jgnwmgfxcch992rkaqm5sk740vzsnpnmkrrnrm3pxu3uv6smmc8qhkqcgk

Seen on

wss://relay.bitcoinpark.com

Show more details

Doc Orange on Nostr: Have you looked into METR at all? They’ve done some interesting work on measuring ...