Research - Your Research Blog

My research focuses on three interconnected areas. Below are my publications organised by these research themes, with each paper listed in its primary category despite potential overlap.

Open-Source LLMs and Responsible AI

I focus on developing open-source LLMs that are transparent and compliant with current legal frameworks. This work provides a foundation for society to build trustworthy AI products and services while enabling researchers to better understand the benefits and risks of LLM-based systems. Key technical challenges include quantifying the impact of training data governance decisions, understanding memorization risks in large-scale models, and developing infrastructure that democratizes access to language modeling research across different computational scales.

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

Dongyang Fan, Vinko Sabolčec, Matin Ansaripour, Ayush Kumar Tarun, Martin Jaggi, Antoine Bosselut, Imanol Schlag — Preprint 2025

Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks

Yixuan Xu, Antoine Bosselut, Imanol Schlag — Preprint 2025

The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag — Preprint 2023

Neural Architecture Research

I advance neural architecture research through fast weight programmers such as the DeltaNet, which contributes to the most significant architectural innovation since the rise of the Transformer. Similar to linear RNNs, like Mamba or RWKV, it offers enhanced efficiency and generality compared to attention-based architectures. This research explores self-referential weight matrices that learn to modify themselves, block-recurrent mechanisms that maintain global context while enabling parallel training, and novel activation functions derived through mathematical integration principles.

Towards Fully FP8 GEMM LLM Training at Scale

Alejandro Hernández-Cano, Dhia Garbaya, Imanol Schlag, Martin Jaggi — Preprint 2025

Deriving Activation Functions Using Integration

Allen Hao Huang, Imanol Schlag — Preprint 2024

Block-Recurrent Transformers

DeLesley Hutchins*, Imanol Schlag*, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur — NeurIPS 2022

Fast Weight Programmers for Greater Systematic Generalisation in Language

Imanol Schlag — Thesis 2023

Learning Associative Inference Using Fast Weight Memory

Imanol Schlag, Tsendsuren Munkhdalai, Jürgen Schmidhuber — ICLR 2021

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

Kazuki Irie*, Imanol Schlag*, Róbert Csordás, Jürgen Schmidhuber — NeurIPS 2021

Linear Transformers are Secretly Fast Weight Programmers

Imanol Schlag*, Kazuki Irie*, Jürgen Schmidhuber — ICML 2021

A Modern Self-Referential Weight Matrix That Learns to Modify Itself

Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber — Deep RL Workshop NeurIPS 2021 2021

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng Gao — Preprint 2019

Learning to Reason with Third Order Tensor Products

Imanol Schlag, Jürgen Schmidhuber — NeurIPS 2018

Gated Fast Weights for On-the-fly Neural Program Generation

Imanol Schlag, Jürgen Schmidhuber — NIPS Metalearning Workshop 2017

LLM Scaling and Generalization

I investigate fundamental questions around LLM scaling and generalisation. In particular, how to train these systems more efficiently and enable them to generalise beyond their current limitations. This includes exploring self-modifying neural networks as a pathway toward more general AI systems. Technical focus areas include developing compute-optimal training strategies through adaptive scaling laws, investigating the role of tokenization and language imbalance in cross-lingual transfer, and creating hybrid neuro-symbolic approaches that augment classical algorithms with neural components for robust generalization.

On the Effect of (Near) Duplicate Subwords in Language Modelling

Anton Schäfer, Thomas Hofmann, Imanol Schlag*, Tiago Pimentel* — ACL 2024 2024

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann — ICML 2024

The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments

Anton Schäfer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel*, Imanol Schlag* — Preprint 2024

Large Language Model Programs

Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li — Preprint 2023

Mindstorms in Natural Language-Based Societies of Mind

Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, others — Computational Visual Media 2025

Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, others — NeurIPS 2022

Augmenting Classic Algorithms with Neural Components for Strong Generalisation on Ambiguous and High-Dimensional Data

Imanol Schlag, Jürgen Schmidhuber — NeurIPS 2021 Workshop AIPLANS 2021

Improving Baselines in the Wild

Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber — NeurIPS 2021 Workshop on Distribution Shifts 2021