使用大型视觉语言模型评估科学出版物图表的可视化指导原则遵循性
原文标题:Evaluating Compliance with Visualization Guidelines in Diagrams for Scientific Publications Using Large Vision Language Models
作者:Johannes Rückert, Louise Bloch, Christoph M. Friedrich
原文摘要:Diagrams are widely used to visualize data in publications. The research field of data visualization deals with defining principles and guidelines for the creation and use of these diagrams, which are often not known or adhered to by researchers, leading to misinformation caused by providing inaccurate or incomplete information. In this work, large Vision Language Models (VLMs) are used to analyze diagrams in order to identify potential problems in regards to selected data visualization principles and guidelines. To determine the suitability of VLMs for these tasks, five open source VLMs and five prompting strategies are compared using a set of questions derived from selected data visualization guidelines. The results show that the employed VLMs work well to accurately analyze diagram types (F1-score 82.49 %), 3D effects (F1-score 98.55 %), axes labels (F1-score 76.74 %), lines (RMSE 1.16), colors (RMSE 1.60) and legends (F1-score 96.64 %, RMSE 0.70), while they cannot reliably provide feedback about the image quality (F1-score 0.74 %) and tick marks/labels (F1-score 46.13 %). Among the employed VLMs, Qwen2.5VL performs best, and the summarizing prompting strategy performs best for most of the experimental questions. It is shown that VLMs can be used to automatically identify a number of potential issues in diagrams, such as missing axes labels, missing legends, and unnecessary 3D effects. The approach laid out in this work can be extended for further aspects of data visualization.
中文翻译:图表在出版物中被广泛用于数据可视化。数据可视化研究领域致力于定义图表创建和使用的原则和指导原则,但这些原则往往不为研究人员所知或不被遵循,导致由于提供不准确或不完整信息而造成的误导。在这项工作中,我们使用大型视觉语言模型(VLMs)来分析图表,以识别与选定数据可视化原则和指导原则相关的潜在问题。为了确定VLMs在这些任务中的适用性,我们使用从选定数据可视化指导原则中派生的一组问题,比较了五种开源VLMs和五种提示策略。结果表明,所使用的VLMs在准确分析图表类型(F1分数82.49%)、3D效果(F1分数98.55%)、坐标轴标签(F1分数76.74%)、线条(RMSE 1.16)、颜色(RMSE 1.60)和图例(F1分数96.64%,RMSE 0.70)方面表现良好,而在图像质量(F1分数0.74%)和刻度标记/标签(F1分数46.13%)方面无法可靠地提供反馈。在所使用的VLMs中,Qwen2.5VL表现最佳,总结式提示策略在大多数实验问题中表现最好。研究表明,VLMs可以用于自动识别图表中的许多潜在问题,如缺失的坐标轴标签、缺失的图例和不必要的3D效果。这项工作中提出的方法可以扩展到数据可视化的更多方面。
智能问答:
Q: 这篇论文试图解决什么问题?
A: 这篇论文试图解决科学出版物中图表质量评估和可视化指导原则遵循性检查的自动化问题。具体来说:1)图表质量问题:在科学出版物中,图表被广泛用于数据可视化,但许多研究人员不了解或不遵循数据可视化的原则和指导原则,这导致图表提供不准确或不完整的信息,造成误导;2)人工评估的局限性:传统上,图表质量的评估需要专业人员手动检查,这既耗时又容易出现主观偏差,难以大规模应用;3)自动化检测的技术挑战:需要开发能够自动分析图表并识别可视化问题的技术方法,但这需要模型能够理解图表的视觉元素、结构和设计原则;4)评估方法的有效性验证:需要验证大型视觉语言模型在图表分析任务中的适用性和准确性,确定哪些类型的问题可以被可靠地自动检测。
阅读全文 →