Mihail Stoian

I am a second-year PhD student in the newly-founded Data Systems Lab at UTN, advised by Andreas Kipf. My current focus is on robust query processing.

Previously, I worked as a student research assistant in the TUM database group (Prof. Thomas Neumann) and in the DAML group (Prof. Stephan Günnemann).

During my studies I did two industry internships at Oracle Labs and Amazon Redshift.

[Google Scholar] | [GitHub] | [Twitter]

Publications

2025

🪂 Parachute: Single-Pass Bi-Directional Information Passing [code]
Mihail Stoian, Andreas Zimmerer, Skander Krid, Amadou Latyr Ngom, Jialin Ding, Tim Kraska, Andreas Kipf
VLDB 2025

TL;DR

Semi-join filtering with bi-directional information flow. Perfect for repetitive OLAP workloads.
🌰 Instance-Optimized String Fingerprints [code]
Mihail Stoian*, Johannes Thürauf*, Andreas Zimmerer, Alexander van Renen, Andreas Kipf
AIDB @VLDB 2025

TL;DR

Workload-optimized string processing just landed.
DPconv: Super-Polynomially Faster Join Ordering [code]
Mihail Stoian, Andreas Kipf
SIGMOD 2025
SIGMOD Honorable Mention

[Slides @Microsoft GSL] [Slides @SIGMOD]

TL;DR

Forget about $O(3^n)$-time dynamic programming evaluations.
Redbench: A Benchmark Reflecting Real Workloads [workloads]
Skander Krid, Mihail Stoian, Andreas Kipf
aiDM @SIGMOD 2025

TL;DR

_The_ benchmark for instance-optimized database components.
Virtual: Compressing Data Lake Files
Mihail Stoian, Alexander van Renen, Jan Kobiolka, Ping-Lin Kuo, Andreas Zimmerer, Josif Grabocka, Andreas Kipf
EDBT Demo 2025
Best Demo Award

TL;DR

Virtual learns sparse regressors to level up Parquet file sizes, while having bounded column scan overhead in the number of reference columns.
Optimizing Linearized Join Enumeration by Adapting to the Query Structure
Altan Birler, Mihail Stoian, Thomas Neumann
BTW 2025
Best Paper Award

TL;DR

Just forget about LinDP - this is way faster.

2024

Lightweight Correlation-Aware Table Compression [code] [recording]
Mihail Stoian, Alexander van Renen, Jan Kobiolka, Ping-Lin Kuo, Josif Grabocka, Andreas Kipf
Table Representation Learning @ NeurIPS 2024

[Slides @TRL]

TL;DR

Virtual learns sparse regressors to level up Parquet file sizes, while having bounded column scan overhead in the number of reference columns.
Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification [code]
Jan Schuchardt, Mihail Stoian*, Arthur Kosmala*, Stephan Günnemann
NeurIPS 2024

TL;DR

The differential privacy framework for deriving mechanism-specific guarantees.
DataLoom: Simplifying Data Loading with LLMs [code]
Alexander van Renen, Mihail Stoian, Andreas Kipf
VLDB Demo 2024
Approximate Min-Sum Subset Convolution
Mihail Stoian
WAOA @ ALGO 2024

[Slides @WAOA]

TL;DR

First proposal for approximate min-sum subset convolution. This results in out-of-the-box exp-time $(1 + \varepsilon)$-approximations for prize-collecting Steiner tree, min-cost $k$-coloring, protein networks, and more applications in computational biology.
Corra: Correlation-Aware Column Compression
Hanwen Liu, Mihail Stoian, Alexander van Renen, Andreas Kipf
CloudDB @ VLDB 2024

TL;DR

Are you still using FOR-, Delta-, RLE-encodings? Correlation-aware column encodings can compress your data even better!
On the Optimal Linear Contraction Order of Tree Tensor Networks, and Beyond [netzwerk]
Mihail Stoian, Richard Milbradt, Christian B. Mendl
SIAM Journal on Scientific Computing

→ Check out our package netzwerk with plug-in for opt_einsum and cotengra.

TL;DR

Polynomial-time contraction ordering algorithm for tree tensor networks for the total contraction cost. Extension of the well-known IKKBZ algorithm in databases.

2023

Fast Joint Shapley Values [code] [recording]
Mihail Stoian
SRC @ SIGMOD 2023
Faster FFT-based Wildcard Pattern Matching [code] [recording]
Mihail Stoian
SRC @ SIGMOD 2023

2022

Concurrent Link-Cut Trees [code] [recording]
Mihail Stoian | Advised by Jana Giceva and Philipp Fent
SRC @ SIGMOD 2022

2021

Towards Practical Learned Indexing [code] [recording]
Mihail Stoian, Andreas Kipf, Ryan Marcus, and Tim Kraska
AIDB @ VLDB 2021
Benchmarking Learned Indexes [blog] [code] [leaderboard]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska
VLDB 2021

2020

RadixSpline: A Single-Pass Learned Index [code] [talk]
Andreas Kipf^*, Ryan Marcus^*, Alexander van Renen^*, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann
aiDM @ SIGMOD 2020

2019

SOSD: A Benchmark for Learned Indexes [code]
Andreas Kipf^*, Ryan Marcus^*, Alexander van Renen^*, Mihail Stoian, Alfons Kemper, Thomas Neumann, and Tim Kraska
ML For Systems @ NeurIPS 2019

Preprints

Did Fourier Really Meet Möbius? Fast Subset Convolution via FFT
Mihail Stoian

TL;DR

There is no need for Zeta/Möbius transforms in fast subset convolution. FFT suffices. Even in the same running time.
TSP Escapes the $O(2^n n^2)$ Curse [proof-of-concept] [blog]
Mihail Stoian

TL;DR++

This xkcd needs a remake.
Original: https://xkcd.com/399/

Invited Talks

DPconv: Super-Polynomially Faster Join Ordering
January, 2025
@Microsoft GSL
Virtual: Compressing World's Parquet Files
January, 2025
@TUMuchData
What do databases and tensor networks have in common?
August, 2023
@Universität Jena @Joachim Giesen's group. Check out their amazing tools: Matrix Calculus, used in RelationalAI's AutoDiff, and more!

Contact

mihail.stoian@utn.de