3.3.3. Spatial Representation

Spatial encoding modality in font generation represents glyphs through raster-based pixel matrices – rather like how a pointillist might have deconstructed letterforms into countless meticulous dots. Since each point is defined in a position of two-dimensional space, the relation to the other points is spatial. This approach mirrors the human visual system’s processing of shapes through discrete neural receptive fields whilst simultaneously aligning with established computer vision methodologies.

The technical implementation involves converting vector outlines into fixed-dimension pixel grids. Each pixel position within the matrix contains either a binary value (0 or 1) indicating the presence or absence of the glyph (much like a set of lights that are either on or off) or a greyscale value (0.0-1.0) capturing edge anti-aliasing (akin to dimming some of the lights for an evening party). A typical encoding might utilise a 28×28 pixel matrix – rather like a miniature chess board where each square either contains or lacks a piece – resulting in 784 numerical values representing a single glyph. This standardised format proves particularly amenable to processing by Convolutional Neural Networks (CNNs), which excel at identifying spatial patterns through learnt hierarchical feature detection.

an original array of pixel data

pixel_data = [
    [0,   0,   76,  204, 255],
    [0,   51,  229, 255, 178],
    [25,  204, 255, 229, 102],
    [0,   76,  204, 255, 51],
    [0,   0,   51,  102, 0]
]

As you can see, the pixel data is a 2D array of integers, where each integer represents the intensity of the pixel. This data could be directly used for computation, but usually, it is normalised to a range of 0.0 to 1.0.

normalised_data = [
    [0.0, 0.0, 0.3, 0.8, 1.0],  
    [0.0, 0.2, 0.9, 1.0, 0.7],
    [0.1, 0.8, 1.0, 0.9, 0.4],
    [0.0, 0.3, 0.8, 1.0, 0.2],
    [0.0, 0.0, 0.2, 0.4, 0.0]
]

As you can see, the normalised data is a 2D array of floats, where each float represents the intensity of the pixel. This data is directly usable for computation.

Since this conversion is simple, the images are used as the most popular representation in the field of font generation. The popularity of spatial encoding approaches has been significantly bolstered by recent advances in image generation algorithms, as exemplified by DALL-E and Stable Diffusion. This widespread success in the broader field has influenced font generation research, where approximately 80% of published papers employ raster-based inputs, a somewhat ironic prevalence of pixel matrices in a domain fundamentally concerned with vector graphics.

3.3.3. Spatial Representation

License

Citation