Publications

Thesis

Dongjun Kim, A Study on the Score-based Diffusion Model for Improved Training, Flexible Inference, and Efficient Sampling, PhD Dissertation, Department of Industrial and Systems Engineering, KAIST, 2023
File
A Study on the Score-based Diffusion Model for Improved Training, Flexible Inference, and Efficient Sampling.pdf (32.0M) 7회 다운로드 DATE : 2024-01-27 16:07:52

Dongjun Kim, A Study on the Score-based Diffusion Model for Improved Training, Flexible Inference, and Efficient Sampling, PhD Dissertation, Department of Industrial and Systems Engineering, KAIST, 2023 


Abstract

Learning a data distribution and sampling from it are key on creative generation. For previous decades, however, human-level generation in a high-dimensional space was far-fetching for two reasons. First reason comes from the lack of computational resources. Second, none of generative models were scalable to high-dimensions. Therefore, models that seem to conquer the MNIST dataset failed at generating recognizable natural images, such as CIFAR-10. In this thesis, we introduce recent development of score-based diffusion models, which emerge as a strong candidate of the substitute for previous modeling frameworks. The diffusion models have three components [4]: the forward-time data diffusion process [5], the reverse-time generative diffusion process [6], and the score training objective [7]. There are few works [8, 9, 10] that provides the deep understanding of each component, and we aim to understand each component more deeply by answering fundamental questions that arise from the nature of diffusion models in three chapters. First, we observe that the previous training objective has a trade-off between the actual sample quality and the model likelihood evaluation. We explain this trade-off by the contribution of diffusion loss at each time: the large-time diffusion loss takes only an extremely minor portion on the model log-likelihood. From this imbalanced contribution of small-large times, the log-likelihood training leaves the score estimation on large time inaccurate, and the sample quality is deteriorated by this inaccuracy. We introduce Soft Truncation that successfully mitigates the trade-off. Soft Truncation ease the truncation bound at every mini-batch from a hyper-parameter to a random variable τ , and trains the score network for the batch on [τ, T], instead of [, T]. This forces batch update with large τ to focus on the range of large diffusion time, so the large time score is well-trained with Soft Truncation. Second, we extend the scope of forward-time data diffusion process from the linear SDEs to nonlinear SDEs. So far, the forward-time data diffusion process is fixed throughout the training procedure so to constrain the final density as one of a Gaussian distribution. However, intuitively, there would be promising diffusion patterns to efficiently train the diffusion models that is adaptive to the given data distribution. Therefore, we introduce Implicit Nonlinear Diffusion Models (INDM), that models the nonlinearity by an implciit way. We find that the explicit nonlinearity modeling is unsuccessful for its intractable transition probability, and introduce a normalizing flow to detour the intractability issue. Third, we aim to adjust the score estimation to improve sample quality. This work is motivated from the difference of local optimum and global optimum. At the global optimum of the training objective, the score network perfectly estimates the data score, but achieving the global optimality is hardly satisfied in reality. Instead, the score network (at local optimum) is merely an approximation of the data score, so there is a gap between the estimation and the true data score. We introduce a neural estimator of this gap, using a discriminator training. After the training, we augment the gap estimation to the original generative process to adjust the score part. Throughout the chapters, we validate our works in vision-oriented dataset, such as CIFAR-10.


@phdthesis{Kim:2023,

author = {Dongjun Kim},

advisor ={Il-Chul Moon},

title = {A Study on the Score-based Diffusion Model for Improved Training, Flexible Inference, and Efficient Sampling},

school = {KAIST},

year = {2023}

}