Beta-Trees -- Multivariate Histograms with Confidence Statements

Abstract

Multivariate histograms are difficult to construct due to the curse of dimensionality. Motivated by k-d trees in computer science, we show how to construct an efficient data-adaptive partition of Euclidean space that possesses the following two properties, (1) with high confidence the distribution from which the data are generated is close to uniform on each rectangle of the partition; and (2) finite sample simultaneous confidence intervals can be provided for the probabilities of each rectangle in the partition. The method produces confidence intervals whose widths depend only on the probability content of the rectangles and not on the dimensionality of the space, thus avoiding the curse of dimensionality. Moreover, the widths essentially match the optimal widths in the univariate setting. The simultaneous validity of the confidence intervals allows us to use this construction, which we call Beta-trees, for various data-analytic purposes. We illustrate this by using Beta-trees for visualizing and for multivariate mode-hunting of the flow cytometry data.

Date
May 23, 2024 8:25 AM — 10:25 AM
Qian Zhao
Qian Zhao
Assistant Professor in Statistics

My research interests are high-dimensional statistics, statistical genetics, and data science education.