3.3.4. Sequential Representation

Sequential encoding modality presents a distinct approach from spatial encoding in its treatment of glyph data. Whilst spatial encoding considers the relative positioning of points in two-dimensional space, sequential encoding processes points as ordered instructions – rather like the difference between learning dance by observation versus following written choreography.

In sequential encoding, glyph data is interpreted as a series of ordered commands, whether as polylines, coordinate points, SVG directives, or Bézier path specifications. This methodology mirrors the inherent structure of vector graphics, which are fundamentally stored as notational sequences. The approach shares significant commonality with natural language processing techniques, where textual data is similarly encoded as sequential patterns.

The technical implementation involves transforming Bézier path data into numerical sequences suitable for machine learning applications. This transformation process can be demonstrated through two standard font file formats: .glyphs and .otf.

In both cases, the vector path data – comprising coordinates and control points – is converted into ordered sequences of numerical values.

The encoding process necessitates data normalisation to facilitate machine learning computations. This normalisation transforms the raw coordinate values into a consistent range of 0.0-1.0, which enables the neural network to compute weights effectively. The process involves dividing coordinate values by the em-square size – typically 1000 units per em (UPM) for PostScript-based OpenType fonts or 2048 UPM for TrueType-based fonts. Whilst the valid range of coordinates theoretically extends from 16 to 16384 units, such extremes are rarely encountered in typical font designs. Additionally, the command types undergo one-hot encoding, whereby each instruction is transformed into a vector of floating-point values, creating a standardised numerical representation suitable for neural network processing.

After the normalisation, the data are being transformed into a sequence of numerical values. Each sequence element consists of a command type identifier followed by coordinate pairs, creating a standardised format that preserves the path construction logic whilst being compatible with machine learning architectures.

Figure illustrates how each segment of the path is represented in sequential order.

Native .glyphs representation:

nodes = (
    (38,0,l),      // line point
    (131,233,o),   // off-curve control point
    (225,467,o),   // off-curve control point
    (318,700,c),   // curve point
    (411,467,o),   // off-curve control point
    (505,233,o),   // off-curve control point
    (598,0,c)      // curve point
);

In one sequence, commands like l, o, and c can be represented with three positions (e.g. 1.0, 0.0, 0.0), while coordinates are represented with two positions (e.g. 0.037, 0.000). Then, the whole sequence length has five positions.

tensor([[1.0, 0.0, 0.0, 0.037, 0.000],
        [0.0, 1.0, 0.0, 0.128, 0.228],
        [0.0, 1.0, 0.0, 0.220, 0.456],
        [0.0, 0.0, 1.0, 0.311, 0.684],
        # ...
       ])

Native .otf representation:

38 0 moveto               % starting point
131 233 225 467 318 700   % control points and destination
curveto                   % curve command
411 467 505 233 598 0     % control points and destination
curveto                   % curve command

In one sequence, commands such as moveto and curveto can be represented with three positions (e.g. 1.0, 0.0, 0.0), whilst coordinates are represented with two positions (e.g. 0.037, 0.000). Thus, the whole sequence length maintains five positions.

tensor([[1.0, 0.0, 0.0, 0.037, 0.000],
        [0.0, 0.0, 1.0, 0.128, 0.228],
        [0.0, 0.0, 1.0, 0.220, 0.456],
        [0.0, 0.0, 1.0, 0.311, 0.684],
        # ...
       ])

The sequential nature of vector data representation exhibits notable similarities to natural language processing methodologies. Contemporary approaches leverage established architectures from language modelling, including recurrent neural networks (RNNs), variational autoencoders, and transformer models.

3.3.4. Sequential Representation

License

Citation