I'm an AI Research Scientist at the ETH AI Center and co-lead of Apertus developed as part of the Swiss AI Initiative. With a large team of engineers, researchers, and students throughout multiple Swiss institutions, we are currently developing the largest open-source, responsibly-trained, and compliant large language model (LLM) in the world. I also teach a novel course at ETHZ called Large-Scale AI Engineering, where we provide hands-on, practical training to MSc graduates on how to efficiently train large distributed neural networks using the Alps supercomputer by CSCS.
Bio. I began my career with an apprenticeship in informatics at a Swiss bank followed by my military service. I then earned my BSc in Computer Science from FHNW and MSc in Artificial Intelligence with distinction from the University of St Andrews, Scotland. I completed my PhD with distinction at USI/IDSIA under Prof. Jürgen Schmidhuber in 2023, focusing on systematic generalisation of neural networks and fast weight programmers—scalable self-modifying neural architectures (thesis). During my PhD I was invited to join Meta FAIR, Google Research, and Microsoft Research for research internships, where I investigated foundational questions in neural computation, scalable neural network architectures, and LLMs. After my defense I worked with Prof. Thomas Hofmann before moving to the ETHZ AI Center.
Opportunities
Students. We welcome motivated MSc students from ETHZ, EPFL, and other universities to join our research efforts through a semester project, MSc thesis, or a student assistant position. We offer research and engineering opportunities across various topics like LLM development, high-performance infrastructure, and responsible AI. Students can apply through our application form.
Engineers. We're actively hiring machine learning research engineers to join our team developing cutting-edge foundation models in collaboration with researchers through the Swiss AI Initiative. Open positions are available through ETHZ or EPFL.
Research Focus
My research centers on three interconnected areas that advance both the capabilities and responsibility of large-scale AI systems.
First, I focus on developing Apertus, a state-of-the-art open-source LLMs that is transparent and compliant with current legal frameworks. This work provides a foundation for society to build trustworthy AI products and services while enabling researchers to better understand the benefits and risks of LLM-based systems.
Second, I advance neural architecture research through fast weight programmers such as the DeltaNet, which contributes to the most significant architectural innovation since the rise of the Transformer. Similar to linear RNNs, like Mamba or RWKV, it offers enhanced efficiency and generality compared to attention-based architectures. Recently DeltaNet became a core component of Qwen3-Next which is a major release from one of the leading AI labs in the world.
Third, I investigate fundamental questions around LLM scaling and generalisation. In particular, how to train these systems more efficiently and enable them to generalise beyond their current limitations. This includes exploring self-modifying neural networks as a pathway toward more general AI systems.
Recent News
Oct 2025 — Apertus featured in 3sat and SRF nano TV report on open AI models for European independence
Oct 2025 — Hosted the Apertus workshop and Swiss AI SME workshop at the AI+X Summit, and presented to the general public on why Switzerland is building its own language models at the Public Night
Sep 2025 — Received an award for our work on Apertus at the Culture & Society AI Awards Night
Sep 2025 — Gave a keynote at the Zürich AI Safety Day
Sep 2025 — 2 papers accepted at NeurIPS 2025 (1, 2) and 1 oral paper accepted at COLM 2025 (link)
Sep 2025 — Met with National Council members Gerhard Andrey and Benoit Gaillard (with Chris Beyeler, Judith Niederberger, and Alberto Pasquale Ferrara from KImpact) on AI legislation
Sep 2025 — Follow-up interview with 10vor10 on national TV
Sep 2025 — Inside AI Podcast appearance hosted by Marcel Salathé (EPFL AI Center)
Sep 2025 — Presented the Swiss AI Initiative and Apertus to the SRG SSR AI & Data Guild
Sep 2025 — Keynote at the eHealth Summit presenting Apertus and generative AI use cases in health
Sep 2025 — Keynote on Apertus at the Trustworthy AI in Practice event by LatticeFlow
Sep 2025 — The newly released Qwen3-Next model uses DeltaNet, which I developed with Kazuki Irie, to improve LLM efficiency at scale!
Sep 2025 — Presentation on Apertus to KImpact - Verband für künstliche Intelligenz
Sep 2025 — Keynote on the Swiss AI Initiative and Apertus at the EnhanceR Symposium
Sep 2025 — Interview on Apertus with 10vor10 on national TV
Sep 2025 — 🎉 Released Apertus 8B and 70B LLMs trained on 15T tokens while fully-open and compliant with Swiss law and EU AI Act
Aug 2025 — Presentation on the Swiss AI Initiative and our LLM effort at the AI Meetup for Business Leaders
Jul 2025 — Prompt Zero Podcast appearance by Blick (in Swiss German)
Jul 2025 — Keynote at the first International Open-Source Model Builder Summit before the AI for Good Summit in Geneva
Jun 2025 — pan.talk keynote on Swiss AI Initiative: The Path to AI Sovereignty
Jun 2025 — Grant accepted "A Swiss-Centric Foundation Model for Switzerland's Sovereign AI Future"
Jun 2025 — Grant accepted "Democratizing LLMs for Global Languages with Mixtures of Multilingual Experts"
Jun 2025 — Successfully taught the first iteration of our MSc course at ETHZ: Large-Scale AI Engineering
May 2025 — Presentation of the Swiss AI Initiative to European Commission with EU delegation from each member state
May 2025 — Invited talk at FH Graubünden AI event presenting the Swiss AI Initiative and our LLM effort
Mar 2025 — Keynote at Swiss Legal Tech Conference
Mar 2025 — Keynote at the AI in Marketing conference (400+ people)
Mar 2025 — Invited talk at HPC-AI Conference on the Swiss AI Initiative and our LLM Effort
Mar 2025 — Invited talk at GenAI 360
Mar 2025 — Redefining AI Podcast appearance (Season 3, Ep. 17)
Mar 2025 — Expert input to SRF Echo der Zeit episode
Dec 2024 — Zürich NLP Meetup talk on "The Swiss AI LLM Effort: Building Transparent and Responsible AI for Switzerland and Beyond"
Dec 2024 — Contributed talk at Swiss Community Day on Data
Dec 2024 — Keynote and panel at EY National Trusted AI Conference with Marc Stampfli and Anne Scherer
Nov 2024 — Invited talk at DeepMind, London on Linear Transformers and DeltaNet
Nov 2024 — SRF KI Fachrunde appearance
Oct 2024 — Invited talk at AI+X conference at the Swiss AI Initiative workshop
Sep 2024 — Invited talk at ETH-wide AI Upskilling «Die Magie der KI entschlüsseln»
May 2024 — Invited talk at 2024 IEEE Switzerland Section General Assembly
May 2024 — Invited talk at the Swiss publisher association (Verlegerverband) on Large Language Models and the Swiss AI Initiative
May 2024 — Marketing Booster Podcast appearance
Feb 2024 — Started a position as research scientist at the ETH AI Center
Oct 2023 — Started a postdoctoral position at ETHZ with Prof. Thomas Hofmann
Aug 2023 — Invited talk at IBM on Linear Transformers and DeltaNet
May 2023 — Defended my PhD on Fast Weight Programmers for Greater Systematic Generalisation in Language with distinction.
Selected Publications
Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
D. Fan, V. Sabolčec, M. Ansaripour, A.K. Tarun, M. Jaggi, A. Bosselut, I. Schlag — COLM 2025
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
A. Romanou, N. Foroutan, A. Sotnikova, Z. Chen, et al. — ICLR 2024
On the Effect of (Near) Duplicate Subwords in Language Modelling
A. Schäfer, T. Hofmann, I. Schlag, T. Pimentel — ACL 2024
Large Language Model Programs
I. Schlag, S. Sukhbaatar, A. Celikyilmaz, W. Yih, J. Weston, J. Schmidhuber, X. Li — Preprint 2023
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
K. Irie*, I. Schlag*, R. Csordás, J. Schmidhuber — ICML 2022
Linear Transformers are Secretly Fast Weight Programmers
I. Schlag*, K. Irie*, J. Schmidhuber — ICML 2021