Beta-trees --- Multivariate histograms with confidence statements

Abstract

Multivariate histograms are difficult to construct due to the curse of dimensionality. Motivated by k-d trees in computer science, we show how to construct an efficient data-adaptive partition of Euclidean space that possesses the following two properties. With high confidence the distribution from which the data are generated is close to uniform on each rectangle of the partition; and despite the data-dependent construction we can give guaranteed finite sample simultaneous confidence bounds for the probabilities (and hence for the average densities) of each rectangle in the partition. This partition will automatically find lower-dimensional structures and large regions where the distribution is nearly uniform. The methodology for setting finite sample confidence bounds is designed such that their widths will automatically adapt to the decreased difficulty of these simpler regions. The simultaneous validity of the confidence bounds allows to use this construction, which we call Beta-trees, for various data-analytic purposes. We illustrate this by using Beta-trees for visualizing data and for multivariate mode-hunting.

Qian Zhao
Qian Zhao
Postdoctoral Scholar in Biomedical Data Science

My research interests are high-dimensional statistics, statistical genetics, and data science education.