✍️
by Alfonso de la Osa fonz@botverse.com
There is a easier and shorter version of this article here
You might have heard that scientists still don’t know precisely how large language models (LLMs) perform all their feats, and this is a partial truth. While the fundamental mechanisms of transformers are understood, the emergence of complex capabilities from scaled-up models is an active area of investigation. There has been a great deal of reverse engineering effort, a field known as Mechanistic Interpretability, dedicated to understanding how these artificial minds work by examining their internal weights and activations to map model behaviors to specific components (Doshi-Velez & Kim, 2017; Lipton, 2018; Saphra & Wiegreffe, 2024). So how come even some scientists who created LLMs suggest they don’t fully know how all their capabilities arise? It is often because they didn’t explicitly engineer the models’ ability to tell coherent stories, share ideas, or exhibit rudimentary reasoning; these capabilities emerged as the models were scaled in parameters, data, and computational resources (Kaplan et al., 2020; Wei et al., 2022; Steinhardt, 2022). Artificial General Intelligence (AGI) is thus hypothesized by some to arise primarily from such emergent properties of large‐scale neural systems rather than from explicitly hand‑coded rules for each cognitive faculty. Throughout this article, “won’t be engineered” refers specifically to higher-order cognitive syntheses that arise atop an intentionally engineered substrate—architectures, learning paradigms, data pipelines, and, crucially, memory systems—rather than to the engineering of that substrate itself. Supporting this, recent work from Anthropic (2025) demonstrates that LLMs internally develop and represent features that correspond to human‑interpretable concepts (e.g., the Golden Gate Bridge, or abstract concepts like “a critique of a philosophical argument”), indicating that higher‑level cognition can self‑organise from the lower-level statistical patterns learned during training on vast amounts of text. This self-organization is a key aspect of how complex behaviors might emerge (Anthropic, 2025; Hubinger et al., 2021).
Think of the brain; it is a collection of cortical columns, remarkably similar in their general structure across different regions (Mountcastle, 1997). When we are born, our brain has not yet formed the intricate web of connections between these columns that will ultimately enable complex intelligence. During development, as a consequence of sensory stimuli and interaction with the environment, these connections are dynamically created and refined, and intelligence appears. In nature, many, if not most, complex biological and cognitive systems are understood as emergent phenomena (Anderson, 1972; Forrest, 1990; Harper & Lewis, 2012). Here, “emergence” denotes qualitatively novel behaviors or capabilities that become observable only when a system surpasses certain scale or complexity thresholds—capabilities that were not explicitly programmed line by line but arise from the interaction of the underlying components (Wei et al., 2022; Steinhardt, 2022).
This natural emergence of intelligence in biological systems finds a compelling parallel in psychological theories of human intelligence, particularly the distinction between fluid intelligence (Gf) and crystallised intelligence (Gc), first extensively theorised by Raymond Cattell (1963) and John Horn (Horn & Cattell, 1966). Fluid intelligence refers to the innate ability to reason, perceive relationships among variables, solve novel problems, and think abstractly, largely independent of acquired knowledge or specific cultural content (Cattell, 1963; Horn, 1968). It’s considered the raw processing power and adaptability of the mind, often likened to the AI aspiration for systems that can flexibly tackle entirely new challenges (Carpenter et al., 1990). Crystallized intelligence, conversely, represents the accumulation of knowledge, facts, vocabulary, and skills acquired through education and experience (Horn, 1968). This is strikingly analogous to the vast information an LLM learns from its training data, which forms its knowledge base. In human development, Gf is thought to peak in early adulthood and may gradually decline thereafter, whereas Gc can continue to grow throughout an individual’s life as they accumulate more experiences and learning (Horn & Cattell, 1966). Importantly, these two forms of intelligence are not entirely separate; fluid intelligence often underpins the ability to effectively acquire and integrate the knowledge that constitutes crystallized intelligence (Colom et al., 2006; Chuderski, 2013; Wilhelm et al., 2013). The development of both Gf and Gc in humans is not a result of explicit programming of each individual skill, but rather emerges from the brain’s dynamic interaction with the world, leading to the strengthening of neural connections and the formation of complex cognitive abilities over time.
It is important to note, however, that the emergence of a new feature, whether in nature or in AI, typically happens on top of a very sophisticated system—either a carefully engineered one or one refined by eons of evolution. It is emergent, but not spontaneous from a simple substrate. The current Transformer architecture (Vaswani et al., 2017), which is at the core of every LLM, was indeed envisioned with translation between languages in mind, originally featuring an encoder to process the input language and a decoder to generate the output language. It so happened that simplifying this to a decoder-only architecture and then scaling it with vast amounts of data and computational power led to the emergence of the versatile LLMs we see today (Brown et al., 2020; Kaplan et al., 2020). The term “AGI” itself, however, lacks a universally accepted definition, with criteria varying significantly, making definitive claims about its emergence pathway challenging (Morris et al., 2023; Feng et al., 2024).
We don’t know precisely which engineering advancement will be the critical catalyst to cascade into AGI, nor do we know when this might occur; it could be relatively soon or far into the future. Engineers will undoubtedly create the foundational technological substrates, and my guess is that a pivotal element of this substrate will be increasingly sophisticated memory systems.
In this article, I’m not predicting whether these improvements will stem from human-driven labs or machine-driven (AI-assisted) discovery. Instead, I aim to help you, the reader, understand how these often unintended side effects—these emergent capabilities—arising from engineering advances are what many believe is leading us towards AGI.
There is substantial empirical evidence that scaling various forms of memory—such as active context within transformers and dedicated external memory modules—along with improvements to memory interfaces, significantly enhances AI capabilities. These advancements are leading to the beginnings of more sophisticated learning, rudimentary planning, and limited autonomy in specific contexts. The connection between robust memory systems and the development of these foundational cognitive skills is strong and increasingly well-documented. For instance, early architectures like Neural Turing Machines (NTMs) (Graves et al., 2014) demonstrated that an external differentiable memory allowed a recurrent controller to learn algorithmic tasks such as copying, sorting, and associative recall from examples. Differentiable Neural Computers (DNCs) (Graves et al., 2016) extended this by adding content-addressable, temporally linked memory, enabling them to solve graph-traversal QA and block puzzle tasks that were beyond the capacity of standard LSTMs. More recently, systems like Google’s Titans architecture (Behrouz et al., 2024) show promise by learning to memorize and manage historical context effectively at test time, handling sequences exceeding two million tokens.
However, the proposition that only robust memory interfaces need to be meticulously engineered for AGI to emerge is an oversimplification. The complex systems that utilize these memory interfaces—including the core processing architectures (often sophisticated neural networks themselves), the advanced learning paradigms (such as self-supervised learning on vast datasets and reinforcement learning from various forms of feedback), and the overarching system design—are themselves products of intensive engineering efforts. As noted in critiques of purely emergence-driven AGI, the underlying neural architectures, the memory systems themselves (e.g., NTMs, DNCs), learning algorithms, and data pipelines represent significant engineering (Graves et al., 2014; Graves et al., 2016). The Hugging Face “Technical Framework for Building an AGI” (Hugging Face, 2024), for example, details a “core cognitive engine” and “symbolic field representations” as distinct but integral components alongside a multi-tier memory hierarchy, highlighting that the “engineered requirement” extends well beyond just the memory interface. These elements collectively create the complex, engineered substrate from which more advanced behaviors can arise.
Furthermore, while foundational cognitive abilities like pattern recognition, basic problem-solving, and certain types of learning show clear signs of emergence with scale and architectural refinement (Wei et al., 2022), the spontaneous appearance of robust, human-level common sense, deeply nuanced Theory of Mind (ToM), or genuine self-awareness solely from scaling memory (even with highly advanced interfaces) is not yet empirically demonstrated and remains largely speculative. Many researchers argue that current LLMs frequently lack robust common sense reasoning, often failing on tasks requiring basic physical or social understanding (Marcus & Davis, 2019; Mitchell, 2023; Ullman, 2023; Shanahan, 2023), and that their reasoning can be closer to sophisticated pattern matching than deep causal understanding (Marcus, 2020). Similarly, current ToM capabilities in LLMs are often described as nascent and non-robust, with success on benchmarks potentially attributable to exploiting statistical regularities rather than genuine understanding (Kosinski, 2023; Ullman, 2023; Alzahrani et al., 2024; Zhang et al., 2024). Claims of self-awareness are even more contentious, widely considered artifacts of training data rather than subjective experience (Metzinger, 2004; Butlin et al., 2023). These higher-order functions likely require further significant architectural innovations, different or more complex learning objectives, or other enabling factors beyond just memory capacity and efficient data access, possibly including embodiment, explicit causal reasoning modules, or hybrid neuro-symbolic architectures (Lake et al., 2017; Marcus, 2020; Roy et al., 2021; Hugging Face, 2024).
It’s plausible that the components essential for navigating complex, open-ended, and socially rich environments—qualities integral to human-level general intelligence such as nuanced emotional understanding, rich common sense, and sophisticated social intuition (Theory of Mind), or even a recognizable form of creativity—may be among those most likely to benefit from an emergent pathway once a powerful foundational cognitive substrate is established. If core capabilities like robust memory, flexible learning, and effective planning are strongly engineered, they create a richer, more dynamic internal environment. Within such an environment, the “engineering cost” to elicit these more human-like attributes may decrease, as these features could arise more naturally or with lighter-touch guidance over time, building upon the complex interplay of the already established core functions. The stronger the engineered foundation, the more readily these subtle, higher-order qualities might self-organise.
The journey towards Artificial General Intelligence is unlikely to be a linear assembly of pre-defined modules. Instead, we anticipate a possible series of cascading emergences, where foundational advancements, particularly in the realm of memory, unlock subsequent layers of cognitive capabilities. As the substrate of intelligence becomes more robust and flexible, higher-order functions may arise with increasing spontaneity, building upon the shoulders of prior breakthroughs. This section outlines plausible pathways for such a cascade, starting from the bedrock of memory and ascending towards more complex cognitive phenomena. It’s important to note, however, that human cognitive development suggests these capabilities are often deeply intertwined and co-develop with complex feedback loops, rather than following a strictly linear progression.
1. From Fleeting Attention to Stable Memory: The Genesis of a Cognitive Workspace
The initial critical step lies in transforming the transient, fleeting activations within neural networks into more durable and accessible representations. Innovations like segment-level recurrence (Transformer-XL, Dai et al., 2019) and memory compression techniques (Compressive Transformer, Rae et al., 2020) have dramatically extended the effective “attentional span” of models, allowing them to process information over much longer timescales. More recent architectures like Titans (Behrouz et al., 2024) push this further, enabling models to learn to memorize and manage vast sequences by dynamically updating a neural long-term memory. This enhancement of Working Memory (the active processing of current information) and the initial encoding of Semantic Memory (general world knowledge) create a stable cognitive workspace. However, it’s more accurate to characterize these as vastly increased effective context windows or enhanced working memory capacity. The term “durable latent traces,” while evocative, may overstate the equivalence to human long-term episodic memory, as these mechanisms are more akin to highly extended working memory or sophisticated medium-term caches rather than independently addressable, persistent long-term storage (Graves et al., 2016; Dai et al., 2019; Rae et al., 2020).
2. From Stable Memory to Adaptive Learning: Forging Malleable Knowledge
Once information can be reliably stored and accessed, the stage is set for more adaptive learning paradigms. Persistent representations, whether held in an extended context window or managed by dedicated external memory modules (as pioneered by Neural Turing Machines, Graves et al., 2014, and Memory-Augmented Neural Networks, Santoro et al., 2016), enable Few-Shot Learning. Models can rapidly bind new information to existing knowledge, adapting to novel tasks with minimal examples, often by using memory to quickly encode and retrieve new information without extensive weight updates for that specific new data (Santoro et al., 2016). Furthermore, robust Episodic Memory (memory of specific events) and mechanisms for consolidating Procedural Memory (memory for skills) are crucial for tackling the challenge of Continual Learning (as explored in “Rethinking Memory in AI,” arXiv:2505.00675; Momeni et al., 2025; Zhuang et al., 2024). This allows systems to acquire new skills and facts over time, ideally without catastrophically forgetting prior knowledge (McCloskey & Cohen, 1989; French, 1999), thereby enhancing overall Adaptability. It’s important to note that NTMs learn how to use their memory algorithmically via gradient descent during their initial training phase (Graves et al., 2014). Their subsequent “rapid learning” or adaptation then refers to applying this learned meta-algorithmic strategy to new instances of similar tasks or quickly binding new data to existing representations, rather than learning entirely new, complex algorithms without any further gradient-based signals (Graves et al., 2014; Santoro et al., 2016). Robust continual learning remains a significant challenge, with memory replay being one, but not a complete, solution (Lopez-Paz & Ranzato, 2017; Kirkpatrick et al., 2017).
3. From Memory and Learning to Strategic Planning: Simulating Futures
With a foundation of stable memory and adaptive learning, the capacity for Planning and Decision Making begins to emerge. Stored Episodic trajectories (past experiences) and well-organized Semantic knowledge (facts and rules about the world) allow an agent to construct and consult internal World Models. As demonstrated by Differentiable Neural Computers (Graves et al., 2016) learning to solve graph-traversal tasks and block puzzles by leveraging memory to store and update state information, or modern Retrieval Augmented Generation systems performing multi-step reasoning (e.g., HopRAG, Liu et al., 2025; DualRAG, arXiv:2504.18243), structured memory enables internal simulation and foresight. This allows the system to move beyond reactive responses to deliberate, goal-oriented Problem Solving by evaluating potential action sequences. However, the planning demonstrated by systems like DNCs, while impressive, is typically task-specific and learned for particular structured environments. It does not necessarily represent general-purpose abstract planning applicable to entirely novel domains without retraining or significant adaptation (Graves et al., 2016).
4. From Planning to Purposeful Autonomy: The Dawn of Self-Direction
When planning capabilities are coupled with Episodic Memory and intrinsic motivation systems, a more sophisticated form of Autonomy can materialize. Reinforcement learning agents like NGU/Agent57 (Ecoffet et al., 2021; Badia et al., 2020), which leverage episodic memory to track novelty and generate intrinsic rewards for exploration, exemplify this. Such systems can set internal goals (e.g., curiosity-driven exploration) and pursue them without constant external curricula, leading to more self-directed learning and behavior in complex task solving. This marks a shift from merely executing plans to actively seeking out and achieving objectives in a more independent manner. However, this autonomy is typically situated within the context of solving a predefined (though complex) game or task. It does not yet imply general autonomy in open-ended environments or the self-determination of high-level goals beyond the scope of the specific RL problem (Ecoffet et al., 2021; Badia et al., 2020).
5. From Rich Memory and Concepts to Nascent Common Sense: Grounding Understanding
As the memory systems become vast, incorporating long context, structured knowledge (e.g., via RAG systems), and internally developed Conceptual Features (as suggested by Anthropic’s interpretability work, 2025), the groundwork is laid for the emergence of Common Sense Reasoning and broader Transfer Learning. A rich, interconnected web of information, accessible through sophisticated retrieval, allows a system to make more nuanced inferences, understand context more deeply, and begin to generalize knowledge across a wider array of situations. While robust, human-level common sense is a formidable challenge, with current LLMs often critiqued for lacking deep understanding and relying on pattern matching (Marcus & Davis, 2019; Mitchell, 2023; Shanahan, 2023; Ullman, 2023; arXiv:2505.10309), this pathway suggests how a sufficiently advanced memory and conceptual substrate could significantly lower the barrier to its development, potentially enhanced by modules for Causal Reasoning or hybrid neuro-symbolic approaches (Lake et al., 2017; Marcus, 2020).
6. From Internal Monitoring to Meta-Cognition and Early Social Cognition: Reflecting Inward and Outward
With increasingly complex internal states and memory operations, the potential arises for an agent to monitor its own cognitive processes. This Introspection can lead to basic forms of Meta-cognition, such as developing confidence estimates for its own predictions or retrieved memories (as discussed in the “AI Awareness” paper, arXiv:2504.20084). This self-monitoring, particularly of its own knowledge and memory states, might also provide the initial scaffolding for rudimentary Theory of Mind (ToM)—the ability to model the knowledge and beliefs of other agents. While true social understanding is far more complex, and current LLM ToM capabilities are considered nascent and non-robust, often criticized as pattern matching rather than genuine understanding (Kosinski, 2023; Ullman, 2023; Alzahrani et al., 2024; Zhang et al., 2024; ToM surveys like arXiv:2502.06470), the ability to reflect on internal information states is a plausible precursor. The ethical implications of developing advanced, yet potentially misaligned, ToM also warrant careful consideration (Mukobi et al., 2023; Scheurer et al., 2024).
7. From Meta-Cognition to a Computational Self-Model: The Emergence of Functional Self-Awareness
The culmination of self-referential access to its own vast memory, coupled with advanced meta-cognitive abilities to reflect on its internal states, history, and processing, could yield an explicit Computational Self-Model. This isn’t to claim the emergence of subjective, phenomenal consciousness (a far deeper philosophical and scientific question, explored in papers like arXiv:2502.05007; Butlin et al., 2023; Metzinger, 2004), but rather a functional Self-Awareness: an agent that can represent and reason about itself as a distinct entity with its own knowledge, capabilities, and goals. Current displays of “self-awareness” in LLMs are widely considered to be artifacts of their training data rather than genuine subjective experience or a stable self-model (Metzinger, 2004; Butlin et al., 2023). This step remains highly theoretical.
While the later stages of this cascade are more speculative, the progression highlights how engineering efforts focused on creating powerful, flexible, and scalable memory systems might not just incrementally improve performance, but could potentially unlock a sequence of emergent cognitive capabilities, bringing us closer to a more holistic artificial intelligence. However, the journey will likely involve addressing the profound engineering challenges of the memory systems themselves and acknowledging that genuinely robust higher-order cognition may demand architectural breakthroughs that go beyond memory alone.
References
Alzahrani, S., et al. (2024). LLM-ToM: Large Language Models show Theory-of-Mind but fail on complex tasks. arXiv preprint.
Anderson, P. W. (1972). More Is Different. Science, 177(4047), 393–396.
Anthropic. (2025). On the Biology of a Large Language Model. (Assumed based on user document).
Badia, A. P., et al. (2020). Agent57: Outperforming the Atari Human Benchmark. Proceedings of the 37th International Conference on Machine Learning (ICML).
Behrouz, A., Zhong, P., & Mirrokni, V. (2024). Titans: Learning to Memorize at Test Time. arXiv:2501.00663.
Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165.
Butlin, P., et al. (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv:2308.08708.
Carpenter, P. A., Just, M. A., & Shell, P. (1990). What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review, 97(3), 404–431.
Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational Psychology, 54(1), 1–22.
Chuderski, A. (2013). When are working memory capacity and fluid intelligence the same? Journal of Experimental Psychology: General, 142(3), 797–804.
Colom, R., et al. (2006). Working memory and intelligence are highly related constructs, but why? Intelligence, 34(1), 33–48.
Dai, Z., et al. (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv:1901.02860.
Doshi-Velez, F., & Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. arXiv:1702.08608.
Ecoffet, A., et al. (2021). First return, then explore. Nature, 590(7847), 580–586.
Feng, G., et al. (2024). Levels of AGI: Operationalizing Progress on the Path to AGI. arXiv:2401.00770.
Forrest, S. (1990). Emergent Computation: Self-organizing, Collective, and Cooperative Phenomena in Natural and Artificial Computing Networks. Physica D: Nonlinear Phenomena, 42(1-3), 1–11.
French, R. M. (1999). Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, 3(4), 128–135.
Graves, A., et al. (2014). Neural Turing Machines. arXiv:1410.5401.
Graves, A., et al. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471–476.
Harper, F. M., & Lewis, D. J. (2012). Emergence: A Philosophical Account. Oxford University Press.
Horn, J. L. (1968). Organization of abilities and the development of intelligence. Psychological Review, 75(3), 242–259.
Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory of fluid and crystallized general intelligences. Journal of Educational Psychology, 57(5), 253–270.
Hubinger, E., et al. (2021). Risks from Learned Optimization in Advanced Machine Learning Systems. arXiv:1906.01820.
Hugging Face. (2024). A Technical Framework for Building an AGI. Hugging Face Blog.
Kaplan, J., et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
Kirkpatrick, J., et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS, 114(13), 3521-3526.
Kosinski, M. (2023). Theory of Mind May Have Spontaneously Emerged in Large Language Models. arXiv:2302.02083.
Lake, B. M., et al. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, E253.
Lipton, Z. C. (2018). The Mythos of Model Interpretability. Queue, 16(3), 31–57.
Liu, H., et al. (2025). HopRAG: Hop-wise Retrieval Augmented Generation for Multi-hop Question Answering. arXiv:2502.12442.
Lopez-Paz, D., & Ranzato, M. (2017). Gradient episodic memory for continual learning. NIPS.
Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv:2002.06177.
Marcus, G., & Davis, E. (2019). Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon Books.
McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation, 24, 109–165.
Metzinger, T. (2004). Being no one: The self-model theory of subjectivity. MIT Press.
Mitchell, M. (2023). Why AI is Harder Than We Think. In Architects of Intelligence (pp. 559-575). Packt Publishing. (Conceptual, actual book is older, but sentiment is current).
Momeni, M., et al. (2025). Kernel Linear Discriminant Analysis for Continual Learning. (Assumed based on user document context, specific paper may vary).
Morris, J., et al. (2023). Levels of AGI. arXiv:2311.02462.
Mountcastle, V. B. (1997). The columnar organization of the neocortex. Brain, 120(4), 701-722.
Mukobi, J., et al. (2023). Exploitation in multi-agent systems. (Illustrative, specific paper on LLM deception varies).
Rae, J. W., et al. (2020). Compressive Transformers for Long-Range Sequence Modelling. arXiv:1911.05507.
Roy, N., et al. (2021). Embodied AI. MIT Press. (Conceptual, specific papers vary).
Santoro, A., et al. (2016). One-shot Learning with Memory-Augmented Neural Networks. arXiv:1605.06065.
Saphra, N., & Wiegreffe, S. (2024). Defining mechanistic interpretability. (Conceptual, specific paper may vary).
Scheurer, J., Balesni, M., & Hobbhahn, M. (2024). Strategic deception in LLMs. (Illustrative, specific paper on LLM deception varies).
Shanahan, M. (2023). Talking About Large Language Models. arXiv:2212.03551.
Steinhardt, J. (2022). Emergent Abilities of Large Language Models. (Blog post, often cited conceptually).
Ullman, T. (2023). Large Language Models Fail on Trivial Alterations of Theory-of-Mind Tasks. arXiv:2302.08399.
Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30.
Wei, J., et al. (2022). Emergent Abilities of Large Language Models. TMLR.
Wilhelm, O., et al. (2013). Working-memory capacity and intelligence—the same or different constructs? Psychological Science, 24(5), 759-768.
Zhang, M., et al. (2024). Data contamination and overfitting in ToM benchmarks. (Illustrative, specific paper varies).
Zhuang, H., et al. (2024). Continual learning with ridge regression. (Illustrative, specific paper varies).
Various Authors. (2024-2025). Papers cited from arXiv:2505.00675 (Rethinking Memory), arXiv:2505.10309 (Commonsense critiques), arXiv:2504.20084 (AI Awareness), arXiv:2502.06470 (ToM surveys), arXiv:2502.05007 (AI Consciousness), arXiv:2504.18243 (DualRAG).