Skip to content
Home Publications
Publications
Preprints
2025
2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers Lorenzo Tiberi, Francesca Mignacco, Kazuki Irie , Haim Sompolinsky Conference on Neural Information Processing Systems (NeurIPS ), December 2024
MoEUT: Mixture-of-Experts Universal Transformers Róbert Csordás, Kazuki Irie , Christopher Potts, Christopher D. Manning Conference on Neural Information Processing Systems (NeurIPS ), December 2024
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Róbert Csordás, Piotr Piękos, Kazuki Irie , Jürgen Schmidhuber Conference on Neural Information Processing Systems (NeurIPS ), December 2024
Neural representational geometry of concepts in large language models Linden Schrage, Kazuki Irie , Haim SompolinskyNeurIPS Workshop on Symmetry and Geometry in Neural Representations (NeurReps), December 2024
Self-Organising Neural Discrete Representation Learning à la Kohonen Kazuki Irie* , Róbert Csordás*, Jürgen Schmidhuber International Conference on Artificial Neural Networks (ICANN ), September 2024
Exploring the Promise and Limits of Real-Time Recurrent Learning Kazuki Irie , Anand Gopalakrishnan, Jürgen Schmidhuber. International Conference on Learning Representations (ICLR ), May 2024
2023
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions Kazuki Irie , Róbert Csordás, Jürgen Schmidhuber Conference on Empirical Methods in Natural Language Processing (EMNLP ), Short paper, December 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers Róbert Csordás, Kazuki Irie , Jürgen Schmidhuber Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP-Findings ), December 2023
Contrastive Training of Complex-Valued Autoencoders for Object Discovery Aleksandar Stanić*, Anand Gopalakrishnan*, Kazuki Irie , Jürgen Schmidhuber Conference on Neural Information Processing Systems (NeurIPS ), December 2023
Mindstorms in Natural Language-Based Societies of Mind Mingchen Zhuge*, Haozhe Liu*, Francesco Faccio*, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie , Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen SchmidhuberNeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models (R0-FoMo), December 2023Best Paper Award @ R0-FoMo Workshop
Topological Neural Discrete Representation Learning à la Kohonen Kazuki Irie* , Róbert Csordás*, Jürgen SchmidhuberICML Workshop on Sampling and Optimization in Discrete Space, July 2023 Later version published at ICANN 2024
Accelerating Neural Self-Improvement via Bootstrapping Kazuki Irie , Jürgen SchmidhuberICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo), May 2023
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules Kazuki Irie , Jürgen Schmidhuber International Conference on Learning Representations (ICLR ), May 2023
2022
CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations Róbert Csordás, Kazuki Irie , Jürgen Schmidhuber Conference on Empirical Methods in Natural Language Processing (EMNLP ), Short paper, December 2022
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules Kazuki Irie , Francesco Faccio, Jürgen Schmidhuber Conference on Neural Information Processing Systems (NeurIPS ), November 2022
Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks Kazuki Irie , Jürgen SchmidhuberNeurIPS Workshop on Memory in Artificial and Real Intelligence (MemARI), November 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention Kazuki Irie* , Róbert Csordás*, Jürgen Schmidhuber International Conference on Machine Learning (ICML ), July 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itsel Kazuki Irie , Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber International Conference on Machine Learning (ICML ), July 2022
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization Róbert Csordás, Kazuki Irie , Jürgen Schmidhuber International Conference on Learning Representations (ICLR ), April 2022
Unsupervised Learning of Temporal Abstractions using Slot-based Transformers Anand Gopalakrishnan, Kazuki Irie , Jürgen Schmidhuber, Sjoerd van SteenkisteNeural Computation, March 2023
2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers Kazuki Irie* , Imanol Schlag*, Róbert Csordás, Jürgen Schmidhuber Conference on Neural Information Processing Systems (NeurIPS ), December 2021
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers Róbert Csordás, Kazuki Irie , Jürgen Schmidhuber Conference on Empirical Methods in Natural Language Processing (EMNLP ), November 2021
Linear Transformers Are Secretly Fast Weight Programmers Imanol Schlag*, Kazuki Irie* , Jürgen Schmidhuber International Conference on Machine Learning (ICML ), July 2021
A Modern Self-Referential Weight Matrix That Learns to Modify Itself Kazuki Irie , Imanol Schlag, Róbert Csordás, Jürgen SchmidhuberNeurIPS Workshop on Deep Reinforcement Learning (DeepRL), December 2021 Later version published at ICML 2022
Improving Baselines in the Wild Kazuki Irie , Imanol Schlag, Róbert Csordás, Jürgen SchmidhuberNeurIPS Workshop on Distribution Shifts (DistShift), December 2021
Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization Róbert Csordás, Kazuki Irie , Jürgen SchmidhuberNeurIPS Workshop on Advances in Programming Languages and Neurosymbolic Systems (AIPLANS), December 2021Later version published at ICLR 2022
Unsupervised Learning of Temporal Abstractions using Slot-based Transformers Anand Gopalakrishnan, Kazuki Irie , Jürgen Schmidhuber, Sjoerd van SteenkisteNeurIPS Workshop on Deep Reinforcement Learning (DeepRL) & Workshop on Offline Reinforcement Learning (OfflineRL), December 2021Later version published in Neural Computation
Training and Generating Neural Networks in Compressed Weight Space Kazuki Irie , Jürgen SchmidhuberICLR 2021 Workshop on Neural Compression, May 2021
Publications before 2020 (Language Modeling & Speech Recognition)
Advancing Neural Language Modeling in Automatic Speech Recognition K. Irie PhD Thesis , Computer Science Department, RWTH Aachen University, Aachen, Germany, May 2020
The RWTH ASR system for TED-LIUM release 2: Improving Hybrid HMM with SpecAugment W. Zhou, W. Michel, K. Irie , M. Kitza, R. Schlüter, and H. Ney IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), May 2020
Domain Robust, Fast, and Compact Neural Language Models A. Gerstenberger, K. Irie , P. Golik, E. Beck, and H. Ney IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), May 2020
How Much Self-Attention Do We Need? Trading Attention for Feed-Forward Layers K. Irie , A. Gerstenberger, R. Schlüter, and H. Ney IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), May 2020. [slides]
Training Language Models for Long-Span Cross-Sentence Evaluation K. Irie , A. Zeyer, R. Schlüter, and H. Ney IEEE Automatic Speech Recognition and Understanding Workshop (ASRU ), December 2019. [poster]
A comparison of Transformer and LSTM encoder decoder models for ASR A. Zeyer, P. Bahar, K. Irie , R. Schlüter, and H. Ney IEEE Automatic Speech Recognition and Understanding Workshop (ASRU ), December 2019
Language Modeling with Deep Transformers K. Irie , A. Zeyer, R. Schlüter, and H. NeyInterspeech , September 2019. [slides] Best Student Paper Award
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition K. Irie , R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, and P. NguyenInterspeech , September 2019. [slides] Google internship outcome
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention C. Lüscher, E. Beck, K. Irie , M. Kitza, W. Michel, A. Zeyer, R. Schlüter, and H. NeyInterspeech , September 2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen et al. Technical Report, Google, February 2019Google internship outcome
Investigation on Estimation of Sentence Probability By Combining Forward, Backward and Bi-directional LSTM-RNNs K. Irie , Z. Lei, L. Deng, R. Schlüter, and H. NeyInterspeech , September 2018
Improved training of end-to-end attention models for speech recognition A. Zeyer, K. Irie , R. Schlüter, and H. NeyInterspeech , September 2018
RADMM: Recurrent Adaptive Mixture Model with Applications to Domain Robust Language Modeling K. Irie , S. Kumar, M. Nirschl, and H. Liao IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), April 2018. [slides] Google internship outcome
Prediction of LSTM-RNN Full Context States as a Subtask for N-gram Feedforward Language Models K. Irie , Z. Lei, R. Schlüter, and H. Ney IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), April 2018Best Student Paper Award & IEEE Spoken Language Processing Student Travel Grant Award
The 2016 RWTH Keyword Search System for Low-Resource Languages P. Golik, Z. Tüske, K. Irie , E. Beck, R. Schlüter, and H. Ney International Conference Speech and Computer (SPECOM ), Lecture Notes in Computer Science, September 2017
Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition K. Irie , P. Golik, R. Schlüter, and H. Ney IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), March 2017
LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition K. Irie , Z. Tüske, T. Alkhouli, R. Schlüter, and H. NeyInterspeech , September 2016
The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation T. Menne, J. Heymann, A. Alexandridis, K. Irie , A. Zeyer, M. Kitza, P. Golik, I. Kulikov, L. Drude, R. Schlüter, H. Ney, R. Haeb-Umbach, and A. Mouchtaris Interspeech Workshop on Speech Processing in Everyday Environments (CHiME ), September 2016
Automatic Speech Recognition Based on Neural Networks R. Schlüter, P. Doetsch, P. Golik, M. Kitza, T. Menne, K. Irie , Z. Tüske, and A. Zeyer International Conference Speech and Computer (SPECOM ), Lecture Notes in Computer Science, August 2016
Investigation on log-linear interpolation of multi-domain neural network language model Z. Tüske, K. Irie , R. Schlüter, and H. Ney IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), March 2016.