nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2025, 03, v.24 23-28
基于时空增强扩散模型的VVC参考帧生成算法
基金项目(Foundation):
邮箱(Email):
DOI:
摘要:

针对通用视频编码(VVC)在大运动场景下预测效率低的问题,提出一种基于时空增强扩散模型的参考帧生成算法。在时间维度上,算法引入时序卷积网络捕捉视频帧间的长程依赖信息;在空间维度上,设计了残差混合感知模块,通过将残差网络和不同感受野结合,更好地把握局部细节特征与全局运动趋势,增强模型对复杂运动的时空建模能力。实验结果表明:与VVC参考软件VTM13.0相比,在低时延P帧配置下,所提算法的BD-rate分别在Y、U、V分量上平均节省了1.99%、3.94%和2.48%。

Abstract:

In response to the problem of low prediction efficiency of Versatile Video Coding(VVC) in large motion scenarios, a reference frame generation algorithm based on a spatiotemporal enhanced diffusion model is proposed. In the temporal dimension, the algorithm introduces a temporal convolutional network to capture long-range dependency information between video frames. In the spatial dimension, a residual hybrid perception module is designed, which combines residual networks with different receptive fields to better grasp local detail features and global motion trends, enhancing the model's spatiotemporal modeling capability for complex motions. Experimental results show that compared with the VVC reference software VTM13.0 under the low-latency P-frame configuration, the proposed algorithm achieves average BD-rate savings of 1.99%, 3.94%, and 2.48% for the Y, U, and V components, respectively.

参考文献

[1]JIA J,ZHANG Y,ZHU H,et al.Deep reference frame generation method for VVC inter prediction enhancement[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,34(5):3111-3124.

[2]HUO S,LIU D,LI B,et al.Deep network-based frame extrapolation with reference frame alignment[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(3):1178-1192.

[3]LEE J K,KIM N,CHO S,et al.Convolution neural network based video coding technique using reference video synthesis[C]//2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPAASC) IEEE,2018:505-508.

[4]LIN J,LIU D,LI H,et al.Generative adversarial network-based frame extrapolation for video coding[C]//2018 IEEE Visual Communications and Image Processing(VCIP).IEEE,2018:1-4.

[5]MAO J,YU L.Convolutional neural network based bi-prediction utilizing spatial and temporal information in video coding[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(7):1856-1870.

[6]CHENG X,CHEN Z.A multi-scale position feature transform network for video frame interpolation[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(11):3968-3981.

基本信息:

DOI:

中图分类号:TN919.81

引用信息:

[1]孙靖.基于时空增强扩散模型的VVC参考帧生成算法[J].北京工业职业技术学院学报,2025,24(03):23-28.

基金信息:

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文