Schedule | SDA Workshop 2026

Sessions

Venue: CPD-3.28, Central Podium Levels – Three, The Jockey Club Tower, Centennial Campus, HKU.
A PDF version of this page will be provided later.
New in June 5th: More details are available now!

Day 1: 2026 HKU Summer Workshop
08:30 - 09:00	Registration
09:00 - 09:10	Opening Remarks
Session 1 Chair: Haipeng Shen
09:10 - 09:45	Sequential Monte Carlo for Diffusion Models	Jun Liu
09:45 - 10:20	Representation-Driven Diffusion for Semi-Supervised Conditional Generation	Jian Huang
10:20 - 10:30	Photo
10:30 - 11:00	Coffee Break
Session 2 Chair: Jing Ouyang
11:00 - 11:35	Latent Variable Models and Methods for Educational and Psychological Measurement	Zhiliang Ying
11:35 - 12:10	Dynamic Survival Prediction for Breast Cancer Using Mammogram Imaging Data	Jiguo Cao
12:10 - 14:00	Lunch
Session 3 Chair: Zhanrui Cai
14:00 - 14:35	Transfer Conformal Predictive Inference for Regression	Linglong Kong
14:35 - 15:10	Estimation Techniques for Measurement Error Problems with Excess Zeros	Aurore Delaigle
15:10 - 15:40	Coffee Break
Session 4 Chair: Dan Yang
15:40 - 16:15	Understanding CNN Efficiency: Statistical Generative Models for Unstructured Image Data	Hongtu Zhu
16:15 - 16:30	Coffee Break
16:30 - 17:30	Panel Discussion 1: Career Development (Moderator: Weichen Wang)
18:00 - 20:30	Banquet (invitation only)

Day 2: 1st IMS New Researchers Conference Asia
Session 1 Chair: Eardi Lila
09:00 - 09:35	Reflections on a Research Journey: Lessons for New Researchers	Tony Cai
09:35 - 10:35	Lighting Talks by New Researchers
10:35 - 11:05	Coffee Break
Session 2 Chair: Xin Tong
11:05 - 12:05	Lighting Talks by New Researchers
12:05 - 14:00	Lunch
Session 3 Chair: Xinghao Qiao
14:00 - 15:00	Lighting Talks by New Researchers
15:00 - 15:30	Coffee Break
Session 4 Chair: Armeen Taeb
15:30 - 16:30	Poster Session
16:30 - 17:30	Panel Discussion 2: Research (Moderator: Armeen Taeb)
17:30 - 17:40	Conference Closing

Abstracts of the talks

(According to alphabetical order)

Reflections on a Research Journey: Lessons for New Researchers

Tony Cai (University of Pennsylvania)

Abstract: In this talk, I will reflect on my personal research journey and share insights, lessons, and advice for new researchers based on my own experiences.

Dynamic Survival Prediction for Breast Cancer Using Mammogram Imaging Data

Jiguo Cao (Simon Fraser University)

Abstract: With mammography as the primary strategy for breast cancer screening, it is essential to fully leverage imaging data to better identify women at higher or lower than average risk. The primary objective of this study is to extract mammogram-based features that complement established breast cancer risk factors and improve prediction accuracy. We propose a supervised functional principal component analysis over triangulations method to extract features that are explicitly ordered by their association with failure time outcomes. The proposed approach effectively addresses the irregular boundary of the breast region in mammographic images by employing flexible bivariate splines over triangulations. We further develop a computationally efficient algorithm based on eigenvalue decomposition. We apply the method to data from the Joanne Knight Breast Health Cohort at Siteman Cancer Center. Our approach not only delivers superior predictive performance relative to unsupervised FPCA and other benchmark models, but also identifies meaningful risk patterns within mammographic images.

Estimation Techniques for Measurement Error Problems with Excess Zeros

Aurore Delaigle (The University of Melbourne)

Abstract: How can we recover the distribution of a latent long-term behavior from repeated short-term measurements that are a mixture of zero and noisy data? This question arises in settings involving intermittent phenomena, ranging from episodically consumed nutrients to intermittent environmental exposures. For example, to assess dietary adequacy in population groups and guide nutrition policy, agencies compare estimated long-term nutrient intake distributions with national Nutrient Reference Values. To estimate these curves, many agencies still use methods based on a two-part model developed at the US National Cancer Institute (NCI). However, these methods impose strong, hard-to-check parametric assumptions that can often yield biased curve estimators and erroneous conclusions. In this talk, we develop more flexible semiparametric and nonparametric estimators of these distributions, and more generally for the distribution of semi-continuous variables measured with errors. This is joint work with Feix Camirand Lemyre and Raymond Carroll.

Representation-Driven Diffusion for Semi-Supervised Conditional Generation

Jian Huang (The Hong Kong Polytechnic University)

Abstract: Conditional generative modeling remains a challenging problem in semi-supervised settings where labeled data is scarce but unlabeled samples are abundant. To effectively leverage structural information embedded within the unlabeled dataset and compensate for sparse conditioning signals, we propose a semi-supervised framework combining conditional stochastic interpolation with low-dimensional latent representations. RepG decomposes generation into two stages: label-dependent latent sampling and high-dimensional reconstruction. This isolates the supervised learning of conditional dependencies to a low-dimensional space, requiring few labels while utilizing the abundant unlabeled data purely for reconstruction. Theoretically, we establish an error decomposition showing that the Kullback-Leibler divergence of RepG comprises stage-wise estimation errors and a structural bias quantified by conditional mutual information. For deep neural network estimators, we derive non-asymptotic convergence rates proving that RepG significantly improves sample complexity. By confining the supervised estimation burden to the low intrinsic dimension of the latent representation, RepG achieves a strictly faster convergence rate. Complemented by a minimax lower bound, our theoretical results demonstrate that this method effectively mitigates the curse of dimensionality inherent in direct ambient-space generative modeling.

Transfer Conformal Predictive Inference for Regression

Linglong Kong (University of Alberta)

Abstract: Conformal prediction, a powerful framework for constructing prediction intervals for response variables using any regression function estimators, often faces the challenge of producing overly broad intervals with limited target data. In this paper, we study the transfer learning problem in conformal prediction, aiming to improve the precision of the prediction interval of the target data with insufficient data by leveraging related auxiliary source datasets. Allowing for the potential non-exchangeability between source and target datasets, we propose two transfer conformal prediction algorithms designed for scenarios where knowledge of informative source data is either present or absent. Our approach uses conditional Kullback-Leibler divergence to effectively identify relevant source datasets for transfer. A comprehensive theoretical analysis of the non-asymptotic properties of the proposed algorithms is provided, including lower and upper bounds, and the prediction interval width. These results illustrate the potential to achieve more efficient, narrower intervals without compromising coverage accuracy. Empirical results from extensive simulations and real-world data confirm the efficacy of our methods, demonstrating significant improvements in prediction interval precision by leveraging source data, achieving narrower intervals while maintaining desired coverage levels.

Sequential Monte Carlo for Diffusion Models

Jun Liu (Tsinghua University)

Abstract: Sequential Monte Carlo (aka particle filtering) has been widely used as a powerful tool for making Bayesian inference in both static and dynamical systems. Two key steps in sequential Monte Carlo are (a) finding a good recursive particle sampling distribution (or a good way to guide the particle generation); and (b) resampling, which plays a role of providing sufficient resources towards promising directions. We will review some approaches for conducting these two main steps. and show how they may be modified and adapted to do conditional generation in diffusion models. We also propose Wasserstein-Dirichlet resampling (WDR). WDR first constructs an empirical measure that optimally approximates the weighted particle distribution in the Wasserstein sense by solving a free-support Wasserstein barycenter problem. To balance geometric fidelity with Monte Carlo variability, WDR further employs a Dirichlet mixing mechanism that randomizes the optimal coupling. This talk is based on the joint work with Qianqian Qu, Mengyu Li and Cheng Meng.

Latent variable models and methods for educational and psychological measurement

Zhiliang Ying (Columbia University)

Abstract: Measurement theory has long served as a cornerstone of educational and psychological assessment, providing rigorous frameworks for understanding human abilities, traits, and learning processes. This talk gives an overview of latent variable/class models that are widely used in educational and psychological measurement. In particular, multivariate item response theory (MIRT) models and their extensions will be covered. We will also discuss the latent class models for cognitive diagnosis, which aim to characterize individuals’ mastery of underlying skills and knowledge components. If time permits, statistical learning methods for the analysis of process data from PSTRE (Problem solving in technology-rich environments) items. Together, these methodologies show the evolving role of modern psychometrics and data science in advancing educational and psychological assessment.

Understanding CNN Efficiency: Statistical Generative Models for Unstructured Image Data

Hongtu Zhu (The University of North Carolina at Chapel Hill)

Abstract: Convolutional Neural Networks (CNNs) are foundational in modern image analysis due to their ability to efficiently learn feature representations. However, theoretical understanding of their efficiency remains limited, largely due to inadequate modeling of image structures and their interaction with CNNs. To address this, we introduce novel statistical generative models (SGMs) that decompose images into task-relevant signals and noise, capturing the complexities of natural image data. Based on these SGMs, we propose a feature mapping approach (FMA) to characterize the transformation from raw image data to feature vectors. We analyze CNNs' approximation capabilities, their adaptation to low-dimensional structures, and their efficiency in vision tasks, ultimately developing statistical learning theories for CNN-based image analysis. Our findings reveal the challenges inherent in vision tasks and highlight CNNs' remarkable efficiency in addressing them, providing new insights into their theoretical and practical capabilities. This is based on the joint work with Dr. Guohao Shen.

HKU Workshop Committee: Weichen Wang (Chair), Zhanrui Cai, Xinghao Qiao
IMS NRC-Asia Conference Organizers: Armeen Taeb, Eardi Lila, Yan Shuo Tan