3.2.2. Real Font Datasets
With a small number of 46 standard fonts used in MF-Font (Campbell and Kautz 2014), Campbell, for the first time, has opened the problem of sequence regularisation and proposes normalisation techniques to match the number of sequences for each glyph in the dataset space. Instead of exploiting normalisation techniques to existing fonts, we propose creating a font library with a regularised number of the Bézier curves.
Later, with the growing confidence to investigate generative methods, Lopes et al. (Lopes et al. 2019) introduced a large-scale dataset of 14 M font characters called SVG-Fonts Dataset (Lopes et al. 2019). Since the dataset has been collected from the internet, the quality of the fonts is questionable, likewise their licencing and authorship. Instead of collecting a large-scale dataset with implicit problems, we strive for less amount sufficient for training, providing font examples with solid quality.
With more complex architectures, DeepSVG (Lopes et al. 2019) and later DeepVecFont (Wang and Lian 2021) used only a subset of the original SVG-Fonts Dataset (Lopes et al. 2019). Using SVG-Fonts Dataset (Lopes et al. 2019) consisting of random fonts from the internet Wang et al. (Wang and Lian 2021) as one of the first to mention the curve dislocation problem caused by seemingly arbitrary lengths of curve sequences. Instead of providing a font library with a random number of curves per similar topological structure, we aim to provide a dataset with a regularised number of sequences.
Relying on open-source libraries, Google Fonts (Google, n.d.) is integral in providing accessible datasets in font generation and classification projects (Nagata, Iwana, and Uchida 2023; Cao et al. 2023; Cho, Lee, and Choi 2022; Nagata et al. 2022; Yuan et al. 2021; Xie, Fujita, and Miyata 2021; Srivatsan et al. 2021). Likewise, similar libraries, including NAVER Fonts (Fonts, n.d.), focus on Hangul alphabets, and Fontworks (Inc., n.d.) focus on Japanese. Despite providing curated libraries of high-quality fonts and their accessibility to the research community, the nature of font libraries designed by many authors, they cannot aim for regulation of the sequential length.