Join Nostr
2025-11-24 16:41:08 UTC
in reply to

Jimmy on Nostr: One can recreate results with a simpler transformer architecture, without multiple ...

One can recreate results with a simpler transformer architecture, without multiple levels. The trick is in training setup, and the iterative Q learning loss, not the hierarchy and the recursion via latent space.