4.5.1. Heuristics
The systematic evaluation of generated fonts necessitates a set of heuristics established to bring rigour to the assessment process. Like a well-calibrated measuring instrument, these heuristics provide consistent benchmarks across two distinct dimensions: training effort and inference effort. Each dimension undergoes meticulous scrutiny to determine its influence on the quality of generated fonts – an approach that transforms subjective visual assessment into a structured analytical framework.
Training effort is connected to the model’s Trained Epochs parameter. Inference effort is connected to the Generated Samples parameter.
Detail precision indicates which effort synthesised more precise results. The visual evaluation ranked smoothness higher than glitchiness. Reference matching indicates which effort synthesised glyph shapes that matched the reference style better. The synthesised style that appeared closer to the reference style is ranked higher. Alphabet consistency indicates which effort synthesised higher style consistency across glyphs in the generated font. Alphabet with higher consistency across glyphs is ranked higher.
Results benchmarks

LTTR24: base model LTTR/SET: our dataset DVF-2: original DeepVecFont-2 model (Wang et al. 2023)
✖️
: not applicable≈
: no significant difference>
: better results than