The Role of Text Simplification Operations in Evaluation.
Published in In Current Trends in Text Simplification (CTTS 2021), co-located with SEPLN 2021. (Online)., 2021
Research in Text Simplification (TS) has relied mostly on the Wikipedia-based datasets and the SARI evaluation metric, as the preferred means for creating and evaluating new simplification methods. Previous studies have pointed out the flaws of data evaluation resources, including incorrect alignment of simple/complex sentence pairs, sentences with no simplifications or a dearth in the variety of simplification operations. However, there are no further analyses on the impact of the original data distribution regarding the type of simplification operations performed. In this paper, we set up a systematic benchmark of the most common TS datasets, basing our evaluation on different protocols for split selection (e.g., selection by random or by Monte Carlo). We perform an operation-based investigation, demonstrating in detail the limitations of existing simplification datasets. Further, we make recommendations for future standardised practices in the design, creation and evaluation of TS resources.
Recommended citation: Vásquez-Rodríguez, L., Shardlow, M., Przybyła, P., and Ananiadou, S. (2021). The Role of Text Simplification Operations in Evaluation. In Current Trends in Text Simplification (CTTS 2021), co-located with SEPLN 2021. (Online).
Recommended citation: Vásquez-Rodríguez, L., Shardlow, M., Przybyła, P., and Ananiadou, S. (2021). The Role of Text Simplification Operations in Evaluation. In Current Trends in Text Simplification (CTTS 2021), co-located with SEPLN 2021. (Online).
Download Paper | Download Slides