Grokking Algorithm Pattern

Muon Optimizer Significantly Accelerates Grokking in Transformers: Microsoft Researchers Explore Optimizer Influence on Delayed Generalization

In recent years, the phenomenon of grokking—where deep learning models exhibit a delayed yet sudden transition from memorization to generalization—has prompted renewed investigation into training ...

GitHub

EugeneSim/modular-addition-grokking

This project is a hands-on exploration of the "grokking" phenomenon in neural networks, using a simple algorithmic task: modular addition (e.g., predicting (a + b) mod p). The project trains a small ...

blockchain

Grokking in AI: OpenAI’s Accidental Discovery Unlocks Perfect Generalization in Deep Learning Models (2022)

According to God of Prompt (@godofprompt), grokking was first discovered by accident in 2022 when OpenAI researchers trained AI models on simple mathematical tasks such as modular addition and ...

blockchain

DeepMind Reveals 'Grokking' in Neural Networks: Sudden Generalization After Prolonged Training – Implications for AI Model Learning

According to God of Prompt on Twitter, DeepMind researchers have identified a phenomenon called 'Grokking' where neural networks may train for thousands of epochs with little to no improvement, then ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results