Glossary

This document provides definitions to various terms related to text processing layout.

Advance
Advance width
Advance width refers to the distance from the beginning of one character on a printed display to the beginning of the next character on the display.
Analysis An analysis is a Microsoft term (though not a Microsoft concept) which describes the properties of a run of text. In particular, it is used to store information regarding the bidirectional embedding levels of a run of text. A first step in the text layout pipeline is to take a paragraph and identify runs of like directionality and embedding level. An analysis stores this information.
See http://www.unicode.org/reports/tr9/
Ascent
The ascent of a glyph is the distance from the base line to the topmost portion of the glyph. The ascent of a font is the distance from the baseline to the topmost portion of any glyph.
See Descent.
See http://en.wikipedia.org/wiki/Typeface for some related typographical concepts.
Baseline
The baseline is the line upon which a string of text is drawn. For horizontal text, it is at the bottom of the glyphs; for vertical text, it goes through the center of glyphs.
Bidirectional
Bidi
Refers to text that when viewed flows in two opposite directions, usually left-to-right and right-to-left.
See http://en.wikipedia.org/wiki/Bi-directional_text
See http://www.unicode.org/reports/tr9/
Break A point within text that is a natural splitting point. There are paragraph breaks which split paragraphs, line breaks which split lines, sentence breaks which split sentences, word breaks which split words, and charater breaks which split divisible characters.
Caps height The height of capital letters in a font above the baseline, not including any diacritics that may be above them.
Character A character is an atomic unit of a written language. Consists of letters, Asian ideograms, numerals, punctuation, etc. In computer systems, the term 'character' is somewhat ambiguous.
See http://en.wikipedia.org/wiki/Grapheme
Character set A character set is an assignment of glyphs to numerical values (e.g. A == 65). ASCII is a character set, as is Chinese Big 5. Unicode specifies a single unified character set with each character called a "code point."
See http://en.wikipedia.org/wiki/Character_encoding
ClearType ClearType is the name of a Microsoft technology used to improve the appearance of text drawn on LCD displays. It works by using the individual red, green, and blue components of an LCD screen pixel as if they were independent pixels.
Cluster A cluster is a contiguous set of characters that combine together in a single glyph cell and are considered one indivisible  unit. A simple example is the cluster of a and ` together as à. Complex scripts have more complex examples.
Condensed Type A narrower version of the normal width of a typeface. For example, Arial Narrow is a condensed variation of Arial.
Code page
A code page is for all practical purposes the same as a character set. The term "code page" is Microsoft-specific.
Code point
A Unicode character value. A code point is a specific term used by the Unicode standard which serves to avoid confusion over the difference between what is a character and what is a glyph.
See http://en.wikipedia.org/wiki/ISO_10646
Complex script
A complex script has at least one of the following attributes:
  • Bidirectional character ordering
  • Contextual shaping
  • Combining characters
  • Specialized word-breaking and justification rules
  • Illegal character combinations

Bidirectional rendering refers to the script's ability to handle text that reads both left-to-right and right-to-left. For example, in the bidirectional rendering of Arabic, the default reading direction for text is right-to-left, but for some numbers it is left-to-right. Processing a complex script must account for the difference between the logical (keystroke) order and the displayed order of the glyphs on the screen.

Additionally, processing must properly deal with caret movement and cursor hit testing. The mapping between screen position and a character index requires knowledge of font metrics and the layout algorithm.

Contextual shaping occurs when a script's characters change shape depending on the characters that surround them. This occurs in English cursive writing ("handwriting") when a lowercase "l" changes shape depending on the character that precedes it such as an "a" (connects low to the "l") or an "o" (connects high). Arabic is a script that exhibits contextual shaping.

Combining characters (ligatures) are characters that join into one character when placed together. One example is the "ae" combination in English; it is sometimes represented by a single character.
Arabic is a script that has many combining characters.

Specialized word break and justification refers to scripts that have complex rules for dividing words between lines or justifying text on a line. Thai is such a script.

Filtering out invalid character combinations occurs when a language does not allow certain character combinations. Thai is such a script.
Coverage
Coverage refers to the support a font has for the gamut of characters. While a font can have glyphs for any Unicode code point, it is common that fonts support only certain scripts. Basic fonts have coverage for only Latin scripts, whereas a Chinese font will usually have support for (or coverage of) Chinese script and basic Latin script.
CSS
Cascading Style Sheets
CSS is the standard mechanism for defining styles in the HTML family of page description languages.
See http://www.w3.org/Style/CSS.
Decomposition
Decomposition is the process of breaking a composed or shaped character down into its basic parts, often for the purpose of doing proper searching and comparison. For example, if you search for the digit 1 in a paragraph of text and there is a 1 which is in superscript form, it should be found. Similarly, a superscript 1 should compare as equal to a regular 1 in terms of sorting. Other kinds of composition exist as well, such as the application of diacriticals to base characters.
Descent
The descent of a glyph is the distance between the baseline and the lowest part of a glyph. The descent of a font is the largest distance between the baseline and the bottom of any glyph. For most Western scripts, the descent is a value less or equal to zero, as the bottoms of such glyphs are usually at the baseline or below it.
See Ascent.
See http://en.wikipedia.org/wiki/Typeface for some related typographical concepts.
Devanagari The script that covers numerous Indic languages such as Hindi, Nepali, and Sanskrit.
Diacritic A diacritic is an accent mark added to a letter. Examples include (but are not limited to) accent, macron, umlaut, breve, caron, circumflex, cedilla, and ogonek.
See http://en.wikipedia.org/wiki/Diacritical
EFIGS A mnemonic for English, French, Italian, German, Spanish.
Em
 
A relative unit of usually horizontal measurement equal to the - character of the type size currently in use. For example, an em in 12-point type is equal to 12 points. Originally derived from the width of the upper-case M.
En A relative unit of usually horizontal measurement equal to half of one em.
Embedding Embedding refers to the inclusion of a run of opposing direction text within a run of text. If you have an English sentence and put some RTL text in the middle of it, the RTL text is embedded within the primary LTR text. If you then embed a run of LTR text within the RTL text then you have an additional level of embedding in place. An embedding level of 0 means LTR, a level of 1 means RTL alone or within LTR 0, a level of 2 means LTR within RTL 1, a level of 3 means RTL within LTF of level 2, etc.
See http://www.unicode.org/reports/tr9/
Extended Type Typeface with wider body relative to its height.
Family
A family is a class of related fonts. A family is very similar to a typeface, though strictly speaking it is possible to define a family which has multiple typefaces in it. However, it often occurs that a family has a single typeface in it and the two terms become synonymous.
FFS Font Fusion Stroke font. See StrokeFont.
Fixed-Pitch Also known as monospaced. Refers to fonts in which each character has the same standard width. An example is Courier.
Font A font is a particular incarnation of a typeface (e.g. 10 pt courier bold). It is common to confuse fonts with their superset -- typefaces.
gasp gasp (Grid-fitting and Scan-conversion Procedure) refers to information stored in a TrueType font which identifies which font sizes are best for applying hinting and anti-aliasing. A very large font (>= 64 pixels) generally doesn't need to have hinting applied. On the other hand, a small font (~10 pixels) almost always needs hinting in order to look good. TrueType fonts often have a gasp table, which simply says what sizes the font author thinks should have hinting and anti-aliasing applied. The gasp information itself provides no hinting or anti-aliasing functionality.
Glyph A glyph is a graphical representation of a character. In terms of computer text processing and display, a glyph may be a combination of multiple code points, as would be the case with separate accents or other diacriticals. There is a distinction between glyphs and characters that is subtle but significant. Character are what the user types on their keyboard, whereas glyphs are the physical manifestation of such characters. The ratio between characters and glyphs is not necessarily 1:1, though it is often so for simple English text.
See http://en.wikipedia.org/wiki/Glyph
Grapheme A grapheme is the same thing as a character.
See http://en.wikipedia.org/wiki/Grapheme
Hints Hints are instructions stored in some fonts which allow them to draw well on low resolution display such as a computer monitor. The font you are reading now is hinted. An unhinted font will display poorly on a computers screen below about 15 pixels in height.
IME (Input Method Editing) Input Method Editing refers to a system for entering text into a word processor when the keyboard alone is insufficient for the task. The most common usage of IME is to enter Asian glyphs on a system with a 101 key keyboard.
Italic Letterform with a pronounced diagonal slant and often some cursiveness as well. This is similar to oblique, but oblique refers to a font that is only slanted. See Oblique.
Itemize
Itemization
Itemization is the process of separating a string of text into runs of like direction and script. Sometimes this includes like font style as well.
Jamo Jamo are the Korean equivalent of letters.  They are combined in square two-dimensional patterns of two to four jamo to form what are known as Hangul.
Justification
Justification is the process of fitting horizontal text within its left and right boundaries such that it meets both edges. It is a common mistake to confuse alignment with justification. There is left alignment, but there is no such thing as left justification, as that's like saying something meets both sides on the left side.
Hangul The name of the script used to write the Korean language. There are 11,172 encoded characters in the Hangul Syllables character
block, AC00..D7A3. Looks like this: 한류. See http://en.wikipedia.org/wiki/Hangul
Kanji Kanji are Chinese characters used in the modern Japanese writing system aside Hiragana, Katakana, and Arabic numerals. The Japanese term kanji literally means "Han characters". Looks like this: 榊 辻 働. See http://en.wikipedia.org/wiki/Kanji
Kerning Kerning is the process of altering the space between specific pairs of glyphs. Normally kerning is found only with variably spaced fonts and not with monospaced fonts. A kerning value for a font is an adjustment of the normal advance vector for a given glyph to the next glyph. Thus, a zero value means that there is no adjustment and the glyphs are regularly drawn; a negative value means the glyphs are made closer together than normal, and a positive value means the glyphs are farther apart than normal.
See http://en.wikipedia.org/wiki/Kerning
Layout Layout is the process of arranging glyphs from string of text. Layout of some scripts (e.g. Latin) is relatively simple, whereas the layout of others (e.g. Arabic, Thai) is complex.
Leading
 
Pronounced “ledding.” Leading refers to extra space that is placed between lines of text to increase the vertical space between lines. Sometimes leading is described as being the vertical space from baseline to baseline of successive lines of text, but this is not correct.
Letter spacing Extra space inserted between letters in a word. Often used in words set in capitals to improve visual appearance. See Tracking.
Ligature A ligature is a single glyph which is a derived from two otherwise independent side-by-side glyphs. The combination of a and e to form æ is an example of a ligature.
See http://en.wikipedia.org/wiki/Ligature_(typography)
LTR
Left to Right
LTR refers to text display directionality. Most scripts use the left to right direction to display their text. See RTL.
Monospaced. Also known as fixed-pitch. Refers to fonts in which each character has the same standard width. An example is Courier.
Oblique Letterform with a pronounced diagonal slant but otherwise similar to its non slanted sibling. This is similar to italic, but italic refers to a font that is often additionally stylistically embellished. See Italic.
OpenType
A typographical system that consists of a font specification and a layout assistance specification. OpenType fonts are similar to TrueType  fonts but have additional information and can have PostScript outlines. OpenType fonts files usually use the .otf file extension.
See http://en.wikipedia.org/wiki/OpenType
Outline font An outline font that is primarily defined by filled curves such as Bezier curves. Such fonts are usually scalable; a single outline specification allows for various sizes of rendered glyphs. TrueType, PostScript, and OpenType fonts are examples of outline fonts.
PFR Portable Font Resource, also known as a WebFont or TrueDoc font. It is a compressed TrueType font variation.
Pitch Size designation of monospaced fonts – based on typewriter and line printer technology. Also refers to the measurement of the number of characters to an inch.
Point
Pica
Didot
A point is a unit of measure in typography. There are 3 sorts of points, though most computer based typography uses only the first:
  • PostScript point or computer point (now the universal point in computers): 1 pt = 0.35277 mm = 0.01388889 in = 1/72 in.
  • Didot point (continental European point system): 1 dd = 0.375 mm = 0.014831 in, approx. 1/72 French royal inch (pouce).
  • American printer‘s point (Anglo-American point system): 1 pp = 0.3514598 mm = 0.013837 in, ca. 1/72 in.
PostScript
PostScript font
PostScript is a programming language; it is designed specifically for specifying how to print a page of graphics and/or text. A PostScript font is an outline font specification used by the PostScript language to display text. PostScript fonts files use the .pfa or .pfb file extension.
See http://en.wikipedia.org/wiki/PostScript
Proportional spaced fonts Fonts in which each character has its own width, as opposed to monospaced fonts. For example, the “i” in a proportionally spaced font is usually much narrower than the “M.”
Roman Used to distinguish upright letterforms from sloped, oblique or italic. Also used to refer to the upright, normal weight and width variant of a typeface. Infrequently used to refer to serif as opposed to sans serif types. Also refers to types based on traditional Roman letterform proportions and design characteristics.
RTL
Right to Left
RTL refers to text display directionality. Middle-Eastern scripts such as Arabic, Hebrew, and Farsi are right to left. See LTR.
Sans serif type Fonts without horizontal strokes at the top and bottom of characters. Sans serif typefaces include Arial and Helvetica.
SBIT Refers to Scalar Bitmaps, which are bitmapped (1 bit per pixel) versions of glyphs embedded into TrueType fonts. SBIT glyphs are an alternative to hinting as a means of producing sharp glyphs at small sizes.
Script A script is a writing system, such as Latin, Arabic, Hangul. Western languages such as English, French, and Spanish all fall under the category of Latin script.
See http://en.wikipedia.org/wiki/Writing_system
Serif type Fonts with a horizontal stroke projecting from the main strokes of a character. Some popular serif typefaces are Times Roman and Century Schoolbook.
Shaping Shaping is the process of converting a string of keyboard derived characters into a string of glyphs properly designed for display. This may include adding, removing, and changing glyphs from the original string. For example, a two character string of a` may be shaped into à. Similarly, Arabic text shaping causes characters to change representation depending on their position within a word (beginning, middle, end, or alone).
Small capitals
small caps
Capital letters with the same optical height and proportions as the lower case letters of the same typeface. In PostScript and TrueType formats Small caps are often contained in separate small caps fonts and are designated with an SC after the font name. They contain the normal capitals and small caps instead of the lower case letters. Small caps have their own forms and proportions and, unlike false small caps, are not simply the capital letters at a reduced size. False small caps are recognizable by stroke thickness which seems too thin – the weights are scaled when the size is reduced – and characters which appear too light in comparison to the lower case letters. OpenType has no need for seperate small caps fonts, because the small caps can be integrated in the normal font.
Stretch Defines a font's horizontal extent type, such as condensed, normal, expanded, ultra expanded, etc.
Stroke font
A stroke font is a font that is primarily defined by line strokes. East Asian kanji glyphs conceptually are stroked glyphs and sometimes draw better and compress better when represented as strokes instead of outlines. Such fonts are usually scalable; a single outline specification allows for various sizes of rendered glyphs. There is no major standard for the computerized implementation of stroked fonts, but sometimes stroked fonts are embedded into the TrueType or OpenType file format.
Style For fonts, this refers to normal, italic, or oblique. It does not refer to weight (light, medium, bold, black, heavy), nor does it refer to variant (small caps or regular caps), nor does it refer to stretch (condensed, expanded). See Weight, Variant, and Stretch.

For HTML, style refers to the CSS (cascading style sheet) style, which is a broad description of both a font and how text is layed out in that font. See CSS.

Subscript Small character which appears above the x-height.
Superscript Small character which appears above the y-height.
Title case

Unicode defines three kinds of case mapping: lowercase, uppercase, and titlecase. The difference between uppercasing and titlecasing a character or character sequence can be seen in compound characters (that is, a single character that represents a compount of two characters).

Some letters in some languages are compound letters. For example, in Unicode, character U+01F3 is LATIN SMALL LETTER DZ; it is a single Unicode character, but it looks like two characters. Much like with regular characters, it looks like dz when in lower case, looks like Dz when it begins a capitalized word, and looks like DZ when it is upper case or in an all-caps word. But since this is a single Unicode character, it needs to be represented with three Unicode code points and not two. These three points are called the lower case, title case, and upper case versions of the character. For characters like this, the title case version of it (Dz) is not the same as the upper case version of it (DZ).

Tracking Altering the spacing between characters of a word or words. Tracking is distinct from kerning, which alters the spacing between character pairs only. There is no standard definition for tracking, though usually it is implemented as some sort of percentage-based horizontal stretching of the advance widths between characters. See Letter Spacing.
TrueType TrueType is an outline font specification. TrueType fonts files use the .ttf or .ttc file extension.
See http://en.wikipedia.org/wiki/TrueType
Typeface A typeface is a typographical character family classification (e.g. Courier). A typeface is very similar to a family, though strictly speaking it is a subset of a family. However, typeface and family are often used somewhat interchangeably because usually a typeface and its family are the same thing (i.e. the typeface is the only one in its family and thus the typeface name essentially becomes its family name as well).
See http://en.wikipedia.org/wiki/Typeface
See http://en.wikipedia.org/wiki/Typography
Typesetting Typesetting is the process of laying out text on a page, taking into account styles, line wrapping, leading, flow around graphics, underlining, etc.
Unicode Unicode is a character set that attempts to encompass all known written languages. Additionally Unicode defines a number of standardized conventions for the basic computational processing of text from these languages.
See http://en.wikipedia.org/wiki/Unicode
See http://en.wikipedia.org/wiki/ISO_10646
UCS
Universal Character Set
UCS is simply a way to directly map a a Unicode code point to a numerical integers such as C++'s short or int. Usually when we think of wide Unicode strings, we are talking about UCS2 (two byte representation). UTF16 is nearly the same as UCS2. The C wchar_t data type usually refers to UCS2 or UCS4, depending on the compiler and/or platform.
See http://en.wikipedia.org/wiki/ISO_10646
UTF
Universal Transformation Format.
UTF is simply a way to assign full set of Unicode code points to various byte or multi-byte encodings. Examples include UTF8 and UTF16. UTF is different from UCS in that UTF is an encoding that is more complicated because it doesn't treat the values as numbers but as combinatorial sequences. The purpose of UTF is to compress code point representations.
See http://en.wikipedia.org/wiki/UTF-8
See http://en.wikipedia.org/wiki/UTF-16
Weight The thickness or relative amount of blackness of the main strokes of characters. Typical weights are regular, light, medium, demi bold, bold, etc.
XHTML XHTML is essentially the successor to HTML 4.
See http://www.w3.org/TR/xhtml1
x-height
The height of short lower-case glyphs such as 'x' in Latin fonts.
See http://en.wikipedia.org/wiki/Typeface for some related typographical concepts.

 

End of document