Project Detail |
Transforming spatial data processing with machine learning
In the digital era, where most data are spatial, the efficiency of spatial data operations is paramount. Traditional spatial join approaches, integral to applications like traffic management and robotics control, face inefficiencies with growing data-set complexity. With the support of Marie Sklodowska-Curie Actions, the LEJO project will leverage machine learning to revolutionise spatial join processing. It aims to understand spatial data distributions, introducing learned approaches for binary and multi-way spatial joins. By addressing bottlenecks, implementing distribution-aware partitioning, and designing model-based indexes, LEJO promises real-world impacts across spatial data applications. The project fosters knowledge exchange, combining machine learning expertise with spatial data management to shape the future of data processing in and beyond Europe.
Arguably 80% of all data is spatial. This calls for highly efficient and effective spatial data operations. Among them, spatial joins are frequently needed as a key primitive in various applications such as traffic management, robotics control, location-based services and even human brain modelling. However, existing spatial join approaches follow the traditional filter-and-refinement paradigm that is data distribution-oblivious. As a result, existing approaches are increasingly inefficient as spatial datasets to be joined become larger and more complex. The project LEJO is intended to make use of machine learning techniques to better understand the distributions of spatial data, and accordingly design learned approaches for highly efficient spatial join processing. Specifically, the research actions of LEJO include (1) learned approaches for binary spatial joins of memory-resident data; (2) learned approaches for binary spatial joins of disk-resident data; (3) learned approaches for multi-way spatial joins. The research actions will mainly concern analysis of the bottlenecks of existing approaches, design of distribution-aware space/data partitioning, design of learned model based indexes and join algorithms, and implementation and evaluation of the proposed techniques. These research actions, as well as project planning and management, will significantly strengthen the fellow’s research profile and manage skill. This in turn will put him in a considerably better position for future career development after the project. Moreover, a two-way knowledge transfer is expected as LEJO combines the fellow’s expertise in machine learning and the host university’s expertise in spatial data management. Focusing on the challenging intersection of spatial data management and machine learning, LEJO will not only advance the frontier research in the academia but also bring about potential impacts on many spatial data application domains in and beyond Europe. |