Kazuki Irie

Postdoctoral Fellow
Department of Psychology
Gershman Computational Cognitive Neuroscience Lab
Harvard University

Northwest Building
52 Oxford St, Cambridge, MA 02138

Email: kirie@g.harvard.edu

Google Scholar / GitHub / dblp / OpenReview / arXiv

Bio

Kazuki Irie received his Ph.D. in Computer Science from Aachen University, Germany in 2020, after completing his undergraduate and Master’s studies in Applied Mathematics at École Centrale Paris and ENS Cachan, France. From 2020 to 2023, he was a postdoctoral researcher at the Swiss AI Lab IDSIA and lecturer teaching deep learning at the University of Lugano, Switzerland. He is currently a postdoctoral fellow at the Department of Psychology, Harvard University, USA.

His current research investigates the computational principles of memory, learning, perception, self-reference, analogy & decision making, and problem solving & creation with the dual goals of advancing general-purpose artificial intelligence and developing tools to better understand our own intelligence. The scope of his research interest has expanded from language modelling during his Ph.D. to general sequence and program learning during his postdoc, and now to computational cognitive neuroscience in his current post-postdoctoral research.

Research

I conduct fundamental research in deep learning aimed at developing the foundational building blocks for ever more capable artificial intelligence (AI) systems. In particular, I work on general-purpose sequence-processing neural networks and closely related learning and memory algorithms. As such, my projects span a variety of domains—including supervised/unsupervised learning in language, vision, and algorithmic tasks, as well as reinforcement learning in partially observable environments such as video games—without committing to a single application area (although my original training was in language modelling). My focus is on achieving sophisticated cognitive abilities such as continual learning, (meta-)metalearning, few-shot learning, analogical and compositional generalization, and memory-guided exploration—through a unified view of developing advanced memory algorithms that better leverage past experiences to improve future behaviour.

I also like the idea that advances in AI algorithms can offer computational models that can serve as novel hypotheses in the study of natural intelligence—hypotheses optimized by computer scientists towards their own objectives of efficiency and problem solving that may yield unconventional insights that differ from traditional approaches in cognitive science and neuroscience. Conversely, thinking about human intelligence provides insights into the elements currently missing from state-of-the-art AI systems, while also highlighting the unique strengths of machine intelligence—reminding me reasons why I like both humans and machines. It is also this comparative and multidisciplinary perspective on intelligence that guides my vision of human-centric AGI—artificial general intelligence (AGI) that supports humans in addressing open challenges and enhances human well-being.

Publications

Please click the Publications tab to find the list of my publications.

For the most updated list of my publications, please refer to my Google Scholar.

Note: In my prior life as a Master’s and Ph.D. student (2013-2020), I intensively worked on language modelling before the current era of large language models or LLMs. My Ph.D. work has addressed several topics that are still popular in the field, including the development of large/deep (>100-layer) transformer language models (Interspeech 2019) concurrently to OpenAI’s GPT-2 in 2019 (note that we already had a 96-layer model—like GPT3 released a year later in 2020), mixture of experts to scale up language models at Google/YouTube (ICASSP 2018), identifying and addressing the “KV-cache” memory problem of transformers (ICASSP 2020), distillation of large transformer language models into compact recurrent neural network language models (another ICASSP 2020; building off my earlier work on distilling LSTM language models into n-gram language models; ICASSP 2018), byte-level processing for modeling of low-resource languages with special sets of alphabets (ICASSP 2017), etc…

Recent talks

Recent teaching materials