My research focuses on three interconnected areas. Below are my publications organised by these research themes, with each paper listed in its primary category despite potential overlap.

Open-Source LLMs and Responsible AI

I focus on developing open-source LLMs that are transparent and compliant with current legal frameworks. This work provides a foundation for society to build trustworthy AI products and services while enabling researchers to better understand the benefits and risks of LLM-based systems. Key technical challenges include quantifying the impact of training data governance decisions, understanding memorization risks in large-scale models, and developing infrastructure that democratizes access to language modeling research across different computational scales.

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Dongyang Fan, Vinko Sabolčec, Matin Ansaripour, Ayush Kumar Tarun, Martin Jaggi, Antoine Bosselut, Imanol SchlagPreprint 2025
Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks
Yixuan Xu, Antoine Bosselut, Imanol SchlagPreprint 2025
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol SchlagPreprint 2023

Neural Architecture Research

I advance neural architecture research through fast weight programmers such as the DeltaNet, which contributes to the most significant architectural innovation since the rise of the Transformer. Similar to linear RNNs, like Mamba or RWKV, it offers enhanced efficiency and generality compared to attention-based architectures. This research explores self-referential weight matrices that learn to modify themselves, block-recurrent mechanisms that maintain global context while enabling parallel training, and novel activation functions derived through mathematical integration principles.

Towards Fully FP8 GEMM LLM Training at Scale
Alejandro Hernández-Cano, Dhia Garbaya, Imanol Schlag, Martin JaggiPreprint 2025
Deriving Activation Functions Using Integration
Allen Hao Huang, Imanol SchlagPreprint 2024
Block-Recurrent Transformers
DeLesley Hutchins*, Imanol Schlag*, Yuhuai Wu, Ethan Dyer, Behnam NeyshaburNeurIPS 2022
Fast Weight Programmers for Greater Systematic Generalisation in Language
Imanol SchlagThesis 2023
Learning Associative Inference Using Fast Weight Memory
Imanol Schlag, Tsendsuren Munkhdalai, Jürgen SchmidhuberICLR 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie*, Imanol Schlag*, Róbert Csordás, Jürgen SchmidhuberNeurIPS 2021
Linear Transformers are Secretly Fast Weight Programmers
Imanol Schlag*, Kazuki Irie*, Jürgen SchmidhuberICML 2021
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen SchmidhuberDeep RL Workshop NeurIPS 2021 2021
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng GaoPreprint 2019
Learning to Reason with Third Order Tensor Products
Imanol Schlag, Jürgen SchmidhuberNeurIPS 2018
Gated Fast Weights for On-the-fly Neural Program Generation
Imanol Schlag, Jürgen SchmidhuberNIPS Metalearning Workshop 2017

LLM Scaling and Generalization

I investigate fundamental questions around LLM scaling and generalisation. In particular, how to train these systems more efficiently and enable them to generalise beyond their current limitations. This includes exploring self-modifying neural networks as a pathway toward more general AI systems. Technical focus areas include developing compute-optimal training strategies through adaptive scaling laws, investigating the role of tokenization and language imbalance in cross-lingual transfer, and creating hybrid neuro-symbolic approaches that augment classical algorithms with neural components for robust generalization.

On the Effect of (Near) Duplicate Subwords in Language Modelling
Anton Schäfer, Thomas Hofmann, Imanol Schlag*, Tiago Pimentel*ACL 2024 2024
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas HofmannICML 2024
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments
Anton Schäfer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel*, Imanol Schlag*Preprint 2024
Large Language Model Programs
Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian LiPreprint 2023
Mindstorms in Natural Language-Based Societies of Mind
Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, othersComputational Visual Media 2025
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, othersNeurIPS 2022
Augmenting Classic Algorithms with Neural Components for Strong Generalisation on Ambiguous and High-Dimensional Data
Imanol Schlag, Jürgen SchmidhuberNeurIPS 2021 Workshop AIPLANS 2021
Improving Baselines in the Wild
Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen SchmidhuberNeurIPS 2021 Workshop on Distribution Shifts 2021