Join Nostr
2025-11-04 07:42:20 UTC
in reply to

Doc Orange on Nostr: Have you looked into METR at all? They’ve done some interesting work on measuring ...

Have you looked into METR at all? They’ve done some interesting work on measuring AI model capability and autonomy against a benchmark of software engineering tasks which have baseline measurements of how long they take a human to do.

TL;DR, frontier models currently have a ~50% success rate when performing tasks that would take humans a little over two hours, and that time has been doubling every 7 months. Assuming that trajectory, they’ll have the autonomy to complete a human 40 hour work week worth of work in less than three years.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/