Interactive Data Extraction from Chart Images
Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo / 2017
- Daekyoung Jung, Seoul National University, Seoul, Republic of Korea
- Wonjae Kim, Seoul National University, Seoul, Republic of Korea
- Hyunjoo Song, Seoul National University, Seoul, Republic of Korea
- Jeong-in Hwang, Seoul National University, Seoul, Republic of Korea
- Bongshin Lee, Microsoft Research, Redmond, WA, USA
- Bohyoung Kim, Hankuk University of Foreign Studies, Yongin-si, Republic of Korea
- Jinwook Seo, Seoul National University, Seoul, Republic of Korea
Charts are commonly used to present data in digital documents such as web pages, research papers, or presentation slides. When the underlying data is not available, it is necessary to extract the data from a chart image to utilize the data for further analysis or improve the chart for more accurate perception. In this paper, we present ChartSense, an interactive chart data extraction system. ChartSense first determines the chart type of a given chart image using a deep learning based classifier, and then extracts underlying data from the chart image using semi-automatic, interactive extraction algorithms optimized for each chart type. To evaluate chart type classification accuracy, we compared ChartSense with ReVision, a system with the state-of-the-art chart type classifier. We found that ChartSense was more accurate than ReVision. In addition, to evaluate data extraction performance, we conducted a user study, comparing ChartSense with WebPlotDigitizer, one of the most effective chart data extraction tools among publicly accessible ones. Our results showed that ChartSense was better than WebPlotDigitizer in terms of task completion time, error rate, and subjective preference.
This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIP) (No. NRF-2014R1A2A2A03006998 and NRF-2016R1A2B2007153). The ICT at Seoul National University provided research facilities for this study.