3.2. Fonts Datasets

While the survey on the application served as the white rabbit at the beginning of the journey, the survey on datasets introduces the subject that is entering the rabbit hole.

The investigation delves into the essence of machine learning’s data-driven nature, where models digest information to comprehend domains. Type design manifests through its tangible output: fonts. This study scrutinises the datasets employed in projects documented within the literature database, revealing an intricate dialogue between type designers and machine learning practitioners. Their mutual fascination with letterforms suggests a shared territory waiting to be explored.

The investigation charts two distinct data landscapes: the realm of bitmap datasets, where fonts exist as digital imprints, and the domain of real fonts datasets, where typefaces retain their vector heritage.

Table presents the datasets employed in AI font generation projects. (Ko et al. 2021; Inc., n.d.; Livezey et al. 2021; Fonts, n.d.; Gao et al. 2019; Google [2015] 2023; Magre and Brown 2022; Virkus [2022] 2022; Liu et al. 2011; Lopes et al. 2019)
Fonts, Naver. n.d. “Naver Fonts.” Accessed January 13, 2023. https://hangeul.naver.com/font.
Gao, Yue, Yuan Guo, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2019. AGIS-Net: Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning.” ACM Transactions on Graphics 38 (6): 185:1–12. https://doi.org/10.1145/3355089.3356574.
Google. (2015) 2023. “Google Fonts Files.” Google. https://github.com/google/fonts.
Inc., Fontworks. n.d. “Fontworks Fonts.” GitHub. Accessed January 13, 2023. https://github.com/fontworks-fonts.
Ko, Debbie Honghee, Hyunsoo Lee, Jungjae Suk, Ammar Ul Hassan, and Jaeyoung Choi. 2021. “Hangul Font Dataset for Korean Font Research Based on Deep Learning.” KIPS Transactions on Software and Data Engineering 10 (2): 73–78. https://doi.org/10.3745/KTSDE.2021.10.2.73.
Liu, Cheng-Lin, Fei Yin, Da-Han Wang, and Qiufeng Wang. 2011. CASIA Online and Offline Chinese Handwriting Databases.” In, 37–41. https://doi.org/10.1109/ICDAR.2011.17.
Livezey, Jesse A., Ahyeon Hwang, Jacob Yeung, and Kristofer E. Bouchard. 2021. “Hangul Fonts Dataset: A Hierarchical and Compositional Dataset for Investigating Learned Representations.” June 9, 2021. https://doi.org/10.48550/arXiv.1905.13308.
Lopes, Raphael Gontijo, David Ha, Douglas Eck, and Jonathon Shlens. 2019. SVG-VAE: A Learned Representation for Scalable Vector Graphics.” In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 7929–38. https://doi.org/10.1109/ICCV.2019.00802.
Magre, Nimish, and Nicholas Brown. 2022. “Typography-MNIST: An MNIST-Style Image Dataset to Categorize Glyphs and Font-Styles.” February 12, 2022. https://doi.org/10.48550/arXiv.2202.08112.
Virkus, Dusk. (2022) 2022. “Dafonts Free Dataset.” https://github.com/duskvirkus/dafonts-free.

Citation

If this work is useful for your research, please cite it as:

@phdthesis{paldia2025generative,
  title={Research and development of generative neural networks for type design},
  author={Paldia, Filip},
  year={2025},
  school={Academy of Fine Arts and Design in Bratislava},
  address={Bratislava, Slovakia},
  type={Doctoral thesis},
  url={https://lttrface.com/doctoral-thesis/},
  note={Department of Visual Communication, Studio Typo}
}