北京工业职业技术学院学报

2025, 03, v.24 23-28

基于时空增强扩散模型的VVC参考帧生成算法

1.北方工业大学

基金项目(Foundation):

邮箱(Email):

DOI:

28	0	17
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

针对通用视频编码(VVC)在大运动场景下预测效率低的问题，提出一种基于时空增强扩散模型的参考帧生成算法。在时间维度上，算法引入时序卷积网络捕捉视频帧间的长程依赖信息；在空间维度上，设计了残差混合感知模块，通过将残差网络和不同感受野结合，更好地把握局部细节特征与全局运动趋势，增强模型对复杂运动的时空建模能力。实验结果表明：与VVC参考软件VTM13.0相比，在低时延P帧配置下，所提算法的BD-rate分别在Y、U、V分量上平均节省了1.99%、3.94%和2.48%。

关键词： 视频编码; 扩散模型; 参考帧生成;

Abstract：

In response to the problem of low prediction efficiency of Versatile Video Coding(VVC) in large motion scenarios, a reference frame generation algorithm based on a spatiotemporal enhanced diffusion model is proposed. In the temporal dimension, the algorithm introduces a temporal convolutional network to capture long-range dependency information between video frames. In the spatial dimension, a residual hybrid perception module is designed, which combines residual networks with different receptive fields to better grasp local detail features and global motion trends, enhancing the model's spatiotemporal modeling capability for complex motions. Experimental results show that compared with the VVC reference software VTM13.0 under the low-latency P-frame configuration, the proposed algorithm achieves average BD-rate savings of 1.99%, 3.94%, and 2.48% for the Y, U, and V components, respectively.

KeyWords： video coding; diffusion model; reference frame generation;

如需获取全文，请访问cnki.net

参考文献

[1]JIA J,ZHANG Y,ZHU H,et al.Deep reference frame generation method for VVC inter prediction enhancement[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,34(5):3111-3124.

[2]HUO S,LIU D,LI B,et al.Deep network-based frame extrapolation with reference frame alignment[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(3):1178-1192.

[3]LEE J K,KIM N,CHO S,et al.Convolution neural network based video coding technique using reference video synthesis[C]//2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPAASC) IEEE,2018:505-508.

[4]LIN J,LIU D,LI H,et al.Generative adversarial network-based frame extrapolation for video coding[C]//2018 IEEE Visual Communications and Image Processing(VCIP).IEEE,2018:1-4.

[5]MAO J,YU L.Convolutional neural network based bi-prediction utilizing spatial and temporal information in video coding[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(7):1856-1870.

[6]CHENG X,CHEN Z.A multi-scale position feature transform network for video frame interpolation[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(11):3968-3981.

基本信息:

DOI：

中图分类号:TN919.81

引用信息:

[1]孙靖.基于时空增强扩散模型的VVC参考帧生成算法[J].北京工业职业技术学院学报,2025,24(03):23-28.

基金信息:

请选择需要下载的pdf数据

北京工业职业技术学院学报

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文