3.2.1. Bitmap Image Font Datasets

In AI’s illustrious journey, reading and generating letters has been its adorable kindergarten project. This involved early datasets such as MNIST (Lecun et al. 1998; Deng 2012), its Chinese handwriting counterpart CASIA (Liu et al. 2011), that consisted of images of the letters or fonts. Due to effortless data collection and preparation and the seemingly uncomplicated topological structure of the shapes, fonts have been employed generously for computer vision advances.

Despite this vital role in image recognition tasks, in performing font and glyph classification tasks with TMNIST (Magre and Brown 2022), or even glyph image generation tasks (Gao et al. 2019) such datasets are not examples of the real fonts. Even recent attempts introducing hierarchical structure and compositional aspects of the Korean alphabet Hangul (Ko et al. 2021) are dealing with glyph generation as a bitmap image generation task.

Since the current font industry standards (Wright April-June/1998; Korpela 2006; Yannis Haralambous 2007) employ glyph contour encoding in splines prevalently Bézier curves, we do not consider bitmap datasets suitable for training font generation models.

Deng, Li. 2012. “The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web].” IEEE Signal Processing Magazine 29 (6): 141–42. https://doi.org/10.1109/MSP.2012.2211477.

Gao, Yue, Yuan Guo, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2019. “AGIS-Net: Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning.” ACM Transactions on Graphics 38 (6): 185:1–12. https://doi.org/10.1145/3355089.3356574.

Ko, Debbie Honghee, Hyunsoo Lee, Jungjae Suk, Ammar Ul Hassan, and Jaeyoung Choi. 2021. “Hangul Font Dataset for Korean Font Research Based on Deep Learning.” KIPS Transactions on Software and Data Engineering 10 (2): 73–78. https://doi.org/10.3745/KTSDE.2021.10.2.73.

Korpela, Jukka K. 2006. Unicode Explained. "O’Reilly Media, Inc.". https://books.google.com?id=lxndiWaFMvMC.

Lecun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. https://doi.org/10.1109/5.726791.

Liu, Cheng-Lin, Fei Yin, Da-Han Wang, and Qiufeng Wang. 2011. “CASIA Online and Offline Chinese Handwriting Databases.” In, 37–41. https://doi.org/10.1109/ICDAR.2011.17.

Magre, Nimish, and Nicholas Brown. 2022. “Typography-MNIST: An MNIST-Style Image Dataset to Categorize Glyphs and Font-Styles.” February 12, 2022. https://doi.org/10.48550/arXiv.2202.08112.

Wright, T. April-June/1998. “History and Technology of Computer Fonts.” IEEE Annals of the History of Computing 20 (2): 30–34. https://doi.org/10.1109/85.667294.

Yannis Haralambous. 2007. Fonts & Encodings. "O’Reilly Media, Inc.". https://books.google.com?id=qrElYgVLDwYC.

3.2.1. Bitmap Image Font Datasets

License

Citation