Project Detail |
Two new compression paradigms
There is a subfield of machine learning that is concerned with algorithms inspired by the brain. This is deep learning, and it is increasingly being used as a basis for data compression systems. Neural networks are part of the deep learning approach to AI. With the support of Marie Sklodowska-Curie Actions, the NNESCI project will explore the parallel compression algorithms for efficiently compressing huge quantities of data. Specifically, the project will develop two new compression paradigms. The first will use generative models based on 3D convolutions, applied to medical images. The second is the compression of audio and video using time series latent variable models. The project will also design a domain-specific language.
Techniques based on neural networks (NNs), the study of which is often referred to as ‘deep learning’, have recently been shown to be extremely effective as a basis for data compression systems. I will develop the new field of ‘neural compression’, which has emerged around these ideas, focussing primarily on lossless compression in the two directions which I believe are most important:
Scale: NNs go hand in hand with parallel hardware, and I will investigate new parallel compression algorithms for efficiently compressing huge quantities of data. Specifically, I will develop two entirely new compression paradigms: Firstly, compression of volumetric images using generative models based on 3D convolutions, applied to medical imaging, where teleradiology and new cloud-based analysis make the need for efficient compression particularly acute. And secondly, compression of audio and video using time-series latent variable models, known as ‘state space models’, which offer uniquely efficient utilization of parallel hardware.
Systems: I will research, design and implement a domain specific language (DSL) for concisely expressing codecs which are guaranteed to be lossless by construction. Until now, implementations of compression systems always separate the implementation of the encoder from the decoder, and rely on an ad-hoc debugging and testing process to ensure that data are recovered correctly. During my PhD, I discovered that it is sometimes possible for a computer to automatically convert an encoder function into a decoder, and vice versa, potentially halving the amount of code. I will explore the limits, in terms of flexibility and efficiency, of this novel idea, using insights from ‘automatic differentiation’, a related functional transformation, on which I am a leading expert. |