4.2. Framework: Round Trip Through Dataset Factory
Regularisation of command sequences is a data engineering activity that aims to reduce sequential irregularity. The irregularities occur in the usual font datasets collected from the internet or standard font libraries as the opposite of sequential regularity. The objective is to engineer font datasets in such a manner that data clearly represents the pattern of best practices in the type design. In other words, provide an indication of the optimal number of Bézier curve segments of the letter shapes in a given typographic style and letter construction 1.
In contemporary type design, type designers create letterforms by drawing the outlines and surrounding negative shapes. Bézier curves are traced along the borders of letters or their components. This method is known as the outline approach.
An alternative methodology emerges from the act of writing itself – the stroke approach. Whilst this method finds limited application amongst professional type designers, it has gained notable traction within algorithmic approaches to type design.
Noordzij had a strong argument for the stroke approach: “The stroke is the fundamental artefact. Nothing goes further back than the shape of a single stroke”(Noordzij 2006). The reason for this statement is that the letter shapes originate in writing. At least the letter shapes he is referring to.
Crossland explained it as a method where each letter is constructed by specifying points along the path of a pen’s stroke and the attributes of the pen’s nib at those points (Crossland 2008).
How many shapes in the typographic universe are made of strokes remains an unanswered question 2. For now, it is fair to say there is a scale between two extremes – letter shapes that are explicitly defined by strokes and letter shapes that aren’t defined by strokes – outliners. Based on this logic, it can be argued that a significant amount of type forms are defined by strokes. The continuum of these letter shapes can be represented parametrically and used to generate a solid dataset for training AI.
Some letters have multiple constructions, e.g. letter ‘a’ in single or double storey; and the shape of the middle stroke. The similar letter ‘g’ comes with two variants, single or double storey. Contemporary typefaces usually offer these construction variants↩︎
During this project was not discovered a study that would try to analyse the number of stroke shapes in a font library, which could lead to a statistical conclusion on how many letter shapes are stroke-based and how many are not. Therefore, until then, the thesis uses logical induction: There is a scale between two extremes – letter shapes that are explicitly defined by strokes and letter shapes that aren’t defined by strokes – outliners.↩︎