My research focuses on three interconnected areas. Below are my publications organised by these research themes, with each paper listed in its primary category despite potential overlap.
Open-Source LLMs and Responsible AI
I focus on developing open-source LLMs that are transparent and compliant with current legal frameworks. This work provides a foundation for society to build trustworthy AI products and services while enabling researchers to better understand the benefits and risks of LLM-based systems. Key technical challenges include quantifying the impact of training data governance decisions, understanding memorization risks in large-scale models, and developing infrastructure that democratizes access to language modeling research across different computational scales.
Neural Architecture Research
I advance neural architecture research through fast weight programmers such as the DeltaNet, which contributes to the most significant architectural innovation since the rise of the Transformer. Similar to linear RNNs, like Mamba or RWKV, it offers enhanced efficiency and generality compared to attention-based architectures. This research explores self-referential weight matrices that learn to modify themselves, block-recurrent mechanisms that maintain global context while enabling parallel training, and novel activation functions derived through mathematical integration principles.
LLM Scaling and Generalization
I investigate fundamental questions around LLM scaling and generalisation. In particular, how to train these systems more efficiently and enable them to generalise beyond their current limitations. This includes exploring self-modifying neural networks as a pathway toward more general AI systems. Technical focus areas include developing compute-optimal training strategies through adaptive scaling laws, investigating the role of tokenization and language imbalance in cross-lingual transfer, and creating hybrid neuro-symbolic approaches that augment classical algorithms with neural components for robust generalization.