NL Dataset Generation Framework for Visualizations by LLMs

a Large Language Model (LLM) framework that generates rich and diverse NL datasets using only Vega-Lite specifications as input

Hyung-Kwon Ko, Hyeon Jeon, Gwanmo Park, Dae Hyun Kim, Nam Wook Kim, Juho Kim, and Jinwook Seo / 2024

NL Dataset Generation Framework for Visualizations by LLMs

PARTICIPANTS

Hyung-Kwon Ko, KAIST
Hyeon Jeon, Seoul Nationl University
Gwanmo Park, Seoul National University
Dae Hyun Kim, KAIST
Nam Wook Kim, KAIST
Juho Kim, KAIST
Jinwook Seo, Seoul National University

ABSTRACT

We introduce VL2NL, a Large Language Model (LLM) framework that generates rich and diverse NL datasets using only Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. To synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, we leverage 1) a guided discovery incorporated into prompting so that LLMs can steer themselves to create faithful NL datasets in a self-directed manner; 2) a score-based paraphrasing to augment NL syntax along with four language axes. We also present a new collection of 1,981 real-world Vega-Lite specifications that have increased diversity and complexity than existing chart collections. When tested on our chart collection, VL2NL extracted chart semantics and generated L1/L2 captions with 89.4% and 76.0% accuracy, respectively. It also demonstrated generating and paraphrasing utterances and questions with greater diversity compared to the benchmarks. Last, we discuss how our NL datasets and framework can be utilized in real-world scenarios.

Supplemental Materials

https://github.com/hyungkwonko/chart-llm

Related Publications

CHI '24 | Conference Full Paper

Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models

Hyung-Kwon Ko, Hyeon Jeon, Gwanmo Park, Dae Hyun Kim, Nam Wook Kim, Juho Kim, and Jinwook Seo

In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, (Honolulu, HI, USA) (CHI '24).

DOI

PDF