2018概率统计及其应用系列报告之一

# 信息来源：暂无 发布日期: 2018-01-04浏览次数:

: 李启寨 研究员中国科学院数学与系统科学研究院

Pooled testing is useful to identify positive specimens for large scale screening. Matrix pooling is one of the commonly used algorithms. In this work, we investigate the properties of matrix pooling and reveal that the efficiency of matrix pooling is related with the magnitude of overlapping among groups. Based on this property, we develop a new design to further improve the efficiency while taking into account of testing error. The efficiency pooling sensitivity and specificity of this algorithm are explicitly derived and verified through plasmode simulation of detecting acute human immunodeciency virus among patients who were suspected to have malaria in rural Ugandan. We show that the new design outperforms matrix pooling in efficiency while retain the pooling sensitivity and specificity.

: 孔新兵 教授（南京审计大学统计科学与大数据研究院）

The stochastic block model is widely used in modeling the community structures in network data. In this paper, to detect community structure change, we propose a two-sample test for the stochastic block model with two observed adjacency matrices. The test statistic is related to the $l_{\infty}$ norm of contrast matrices constructed by smoothing the adjacency matrices in local neighborhoods. Under the null hypothesis that the two stochastic block models remain the same, the test statistic converges to the type I extreme value distribution, and otherwise, it explodes fast and the divergence rate could even reach n in the strong signal case where n is the size of the network, guaranteeing high detection power. Motivated by the construction of the two-sample test statistic, and to obtain a consistent prior estimate of the number of communities, we present a new sequential testing procedure, based on the locally smoothed adjacency matrix and the extreme value theory. This method is simple to use and serves as an alternative approach to the novel one in Lei (2016) using random matrix theory.

: 钟威  教授（厦门大学王亚南经济研究院和经济学院统计系教授、博导）

High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of high dimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many high dimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.

: 胡涛 副教授（首都师范大学数学科学学院

201814