This document provides definitions to various terms related to text processing layout.
| Advance Advance width |
Advance width refers to the distance from
the beginning of one character on a printed display to the beginning of
the next character on the display. |
| Analysis | An analysis is a Microsoft term (though
not a Microsoft concept) which describes the properties of a run of
text. In particular, it is used to store information regarding the
bidirectional embedding levels of a run of text. A first step in the
text layout pipeline is to take a paragraph and identify runs of like
directionality and embedding level. An analysis stores this information. See http://www.unicode.org/reports/tr9/ |
| Ascent |
The ascent of a glyph is the distance from the base line to the
topmost portion of the glyph. The ascent of a font is the distance from
the baseline to the topmost portion of any glyph. See Descent. See http://en.wikipedia.org/wiki/Typeface for some related typographical concepts. |
| Baseline |
The baseline is the line upon which a string of text is drawn. For
horizontal text, it is at the bottom of the glyphs; for vertical text,
it goes through the center of glyphs. |
| Bidirectional Bidi |
Refers to text that when viewed flows in
two opposite directions, usually left-to-right and right-to-left. See http://en.wikipedia.org/wiki/Bi-directional_text See http://www.unicode.org/reports/tr9/ |
| Break | A point within text that is a natural splitting point. There are paragraph breaks which split paragraphs, line breaks which split lines, sentence breaks which split sentences, word breaks which split words, and charater breaks which split divisible characters. |
| Caps height | The height of capital letters in a font above the baseline, not including any diacritics that may be above them. |
| Character | A character is an atomic unit of a
written language. Consists of letters, Asian ideograms, numerals,
punctuation, etc. In computer systems, the term 'character' is somewhat
ambiguous. See http://en.wikipedia.org/wiki/Grapheme |
| Character set | A character set is an assignment of
glyphs to numerical values (e.g. A == 65). ASCII is a character set, as
is Chinese Big 5. Unicode specifies a single unified character set with
each character called a "code point." See http://en.wikipedia.org/wiki/Character_encoding |
| ClearType | ClearType is the name of a Microsoft technology used to improve the appearance of text drawn on LCD displays. It works by using the individual red, green, and blue components of an LCD screen pixel as if they were independent pixels. |
| Cluster | A cluster is a contiguous set of characters that combine together in a single glyph cell and are considered one indivisible unit. A simple example is the cluster of a and ` together as à. Complex scripts have more complex examples. |
| Condensed Type | A narrower version of the normal width of a typeface. For example, Arial Narrow is a condensed variation of Arial. |
| Code page |
A code page is for all practical purposes the same as a character set.
The term "code page" is Microsoft-specific. |
| Code point |
A Unicode character value. A code point is a specific term used by the
Unicode standard which serves to avoid confusion over the difference
between what is a character and what is a glyph. See http://en.wikipedia.org/wiki/ISO_10646 |
| Complex script |
A complex script has at least one of the following attributes:
Bidirectional rendering refers to the script's ability to handle text that reads both left-to-right and right-to-left. For example, in the bidirectional rendering of Arabic, the default reading direction for text is right-to-left, but for some numbers it is left-to-right. Processing a complex script must account for the difference between the logical (keystroke) order and the displayed order of the glyphs on the screen. Additionally, processing must properly deal with caret movement and cursor hit testing. The mapping between screen position and a character index requires knowledge of font metrics and the layout algorithm. Contextual shaping occurs when a script's characters change shape depending on the characters that surround them. This occurs in English cursive writing ("handwriting") when a lowercase "l" changes shape depending on the character that precedes it such as an "a" (connects low to the "l") or an "o" (connects high). Arabic is a script that exhibits contextual shaping. Combining characters (ligatures) are characters that join into one character when placed together. One example is the "ae" combination in English; it is sometimes represented by a single character. Arabic is a script that has many combining characters. Specialized word break and justification refers to scripts that have complex rules for dividing words between lines or justifying text on a line. Thai is such a script. Filtering out invalid character combinations occurs when a language does not allow certain character combinations. Thai is such a script. |
| Coverage |
Coverage refers to the support a font has for the gamut of characters.
While a font can have glyphs for any Unicode code point, it is common
that fonts support only certain scripts. Basic fonts have coverage for
only Latin scripts, whereas a Chinese font will usually have support for
(or coverage of) Chinese script and basic Latin script. |
| CSS Cascading Style Sheets |
CSS is the standard mechanism for defining styles in the HTML family
of page description languages. See http://www.w3.org/Style/CSS. |
| Decomposition |
Decomposition is the process of breaking a composed or shaped
character down into its basic parts, often for the purpose of doing
proper searching and comparison. For example, if you search for the
digit 1 in a paragraph of text and there is a 1 which is in superscript
form, it should be found. Similarly, a superscript 1 should compare as
equal to a regular 1 in terms of sorting. Other kinds of composition
exist as well, such as the application of diacriticals to base
characters. |
| Descent |
The descent of a glyph is the distance between the baseline and the
lowest part of a glyph. The descent of a font is the largest distance
between the baseline and the bottom of any glyph. For most Western
scripts, the descent is a value less or equal to zero, as the bottoms of
such glyphs are usually at the baseline or below it. See Ascent. See http://en.wikipedia.org/wiki/Typeface for some related typographical concepts. |
| Devanagari | The script that covers numerous Indic languages such as Hindi, Nepali, and Sanskrit. |
| Diacritic | A diacritic is an accent mark added to a letter. Examples include (but
are not limited to) accent, macron, umlaut, breve, caron, circumflex,
cedilla, and ogonek. See http://en.wikipedia.org/wiki/Diacritical |
| EFIGS | A mnemonic for English, French, Italian, German, Spanish. |
| Em |
A relative unit of usually horizontal measurement equal to the - character of the type size currently in use. For example, an em in 12-point type is equal to 12 points. Originally derived from the width of the upper-case M. |
| En | A relative unit of usually horizontal measurement equal to half of one em. |
| Embedding | Embedding refers to the inclusion
of a run of opposing direction text within a
run of text. If you have an English sentence and put some RTL text in
the middle of it, the RTL text is embedded within the primary LTR text.
If you then embed a run of LTR text within the RTL text then you have
an
additional level of embedding in place. An embedding level of 0 means
LTR, a
level of 1 means RTL alone or within LTR 0, a level of 2 means LTR
within RTL
1, a level of 3 means RTL within LTF of level 2, etc. See http://www.unicode.org/reports/tr9/ |
| Extended Type | Typeface with wider body relative to its height. |
| Family |
A family is a class of related fonts. A family is very
similar to a typeface, though strictly speaking it is possible to
define a family which has multiple typefaces in it. However, it often
occurs that a family has a single typeface in it and the two terms
become synonymous. |
| FFS | Font Fusion Stroke font. See StrokeFont. |
| Fixed-Pitch | Also known as monospaced. Refers to fonts in which each character has the same standard width. An example is Courier. |
| Font | A font is a particular incarnation of a typeface (e.g. 10 pt courier bold). It is common to confuse fonts with their superset -- typefaces. |
| gasp | gasp (Grid-fitting and Scan-conversion Procedure) refers to information stored in a TrueType font which identifies which font sizes are best for applying hinting and anti-aliasing. A very large font (>= 64 pixels) generally doesn't need to have hinting applied. On the other hand, a small font (~10 pixels) almost always needs hinting in order to look good. TrueType fonts often have a gasp table, which simply says what sizes the font author thinks should have hinting and anti-aliasing applied. The gasp information itself provides no hinting or anti-aliasing functionality. |
| Glyph | A glyph is a graphical
representation of a character. In terms of computer text processing and
display, a glyph may be a combination of multiple code points, as would
be the case with separate accents or other
diacriticals. There is a distinction between glyphs and characters that
is subtle but significant. Character are what the user types on their
keyboard, whereas glyphs are the physical manifestation of such
characters. The ratio between characters and glyphs is not necessarily
1:1, though it is often so for simple English text. See http://en.wikipedia.org/wiki/Glyph |
| Grapheme | A grapheme is the same thing as a character. See http://en.wikipedia.org/wiki/Grapheme |
| Hints | Hints are instructions stored in some fonts which allow them to draw well on low resolution display such as a computer monitor. The font you are reading now is hinted. An unhinted font will display poorly on a computers screen below about 15 pixels in height. |
| IME (Input Method Editing) | Input Method Editing refers to a system for entering text into a word processor when the keyboard alone is insufficient for the task. The most common usage of IME is to enter Asian glyphs on a system with a 101 key keyboard. |
| Italic | Letterform with a pronounced diagonal slant and often some cursiveness as well. This is similar to oblique, but oblique refers to a font that is only slanted. See Oblique. |
| Itemize Itemization |
Itemization is the process of separating a string of text into runs of like direction and script. Sometimes this includes like font style as well. |
| Jamo | Jamo are the Korean equivalent of letters. They are combined in square two-dimensional patterns of two to four jamo to form what are known as Hangul. |
| Justification |
Justification is the process of fitting horizontal text within its left and right boundaries such that it meets both edges. It is a common mistake to confuse alignment with justification. There is left alignment, but there is no such thing as left justification, as that's like saying something meets both sides on the left side. |
| Hangul | The name of the script used to write the Korean language. There are 11,172 encoded characters in the Hangul Syllables character block, AC00..D7A3. Looks like this: 한류. See http://en.wikipedia.org/wiki/Hangul |
| Kanji | Kanji are Chinese characters used in the modern Japanese writing system aside Hiragana, Katakana, and Arabic numerals. The Japanese term kanji literally means "Han characters". Looks like this: 榊 辻 働. See http://en.wikipedia.org/wiki/Kanji |
| Kerning | Kerning is the process of altering
the space between specific pairs of glyphs. Normally kerning is found
only with variably spaced fonts and not with monospaced fonts. A
kerning value for a font is an adjustment of the normal advance vector
for a given glyph to the next glyph. Thus, a zero value means that
there is no adjustment and the glyphs are regularly drawn; a negative
value means the glyphs are made closer together than normal, and a
positive value means the glyphs are farther apart than normal. See http://en.wikipedia.org/wiki/Kerning |
| Layout | Layout is the process of arranging glyphs from string of text. Layout of some scripts (e.g. Latin) is relatively simple, whereas the layout of others (e.g. Arabic, Thai) is complex. |
| Leading |
Pronounced “ledding.” Leading refers to extra space that is placed between lines of text to increase the vertical space between lines. Sometimes leading is described as being the vertical space from baseline to baseline of successive lines of text, but this is not correct. |
| Letter spacing | Extra space inserted between letters in a word. Often used in words set in capitals to improve visual appearance. See Tracking. |
| Ligature | A ligature is a single
glyph which is a derived from two otherwise independent side-by-side
glyphs. The combination of a and e to form æ is an example of
a ligature. See http://en.wikipedia.org/wiki/Ligature_(typography) |
| LTR Left to Right |
LTR refers to text display directionality. Most scripts use the left to right direction to display their text. See RTL. |
| Monospaced. | Also known as fixed-pitch. Refers to fonts in which each character has the same standard width. An example is Courier. |
| Oblique | Letterform with a pronounced diagonal slant but otherwise similar to its non slanted sibling. This is similar to italic, but italic refers to a font that is often additionally stylistically embellished. See Italic. |
| OpenType |
A typographical system
that consists of a font specification and a layout assistance
specification. OpenType fonts are similar to
TrueType fonts but have additional information and can have
PostScript
outlines. OpenType fonts files usually use the .otf file extension. See http://en.wikipedia.org/wiki/OpenType |
| Outline font | An outline font that is primarily defined by filled curves such as Bezier curves. Such fonts are usually scalable; a single outline specification allows for various sizes of rendered glyphs. TrueType, PostScript, and OpenType fonts are examples of outline fonts. |
| PFR | Portable Font Resource, also known as a WebFont or TrueDoc font. It is a compressed TrueType font variation. |
| Pitch | Size designation of monospaced fonts – based on typewriter and line printer technology. Also refers to the measurement of the number of characters to an inch. |
| Point Pica Didot |
A point is a unit of measure in typography. There are 3 sorts of
points, though most computer based typography uses only the first:
|
| PostScript PostScript font |
PostScript is a programming language; it is designed
specifically for specifying how to print a page of graphics and/or
text. A PostScript font is an outline font
specification used by the PostScript language to display text.
PostScript fonts files use the .pfa or
.pfb file extension. See http://en.wikipedia.org/wiki/PostScript |
| Proportional spaced fonts | Fonts in which each character has its own width, as opposed to monospaced fonts. For example, the “i” in a proportionally spaced font is usually much narrower than the “M.” |
| Roman | Used to distinguish upright letterforms from sloped, oblique or italic. Also used to refer to the upright, normal weight and width variant of a typeface. Infrequently used to refer to serif as opposed to sans serif types. Also refers to types based on traditional Roman letterform proportions and design characteristics. |
| RTL Right to Left |
RTL refers to text display directionality. Middle-Eastern scripts such as Arabic, Hebrew, and Farsi are right to left. See LTR. |
| Sans serif type | Fonts without horizontal strokes at the top and bottom of characters. Sans serif typefaces include Arial and Helvetica. |
| SBIT | Refers to Scalar Bitmaps, which are bitmapped (1 bit per pixel) versions of glyphs embedded into TrueType fonts. SBIT glyphs are an alternative to hinting as a means of producing sharp glyphs at small sizes. |
| Script | A script is a writing system, such
as Latin, Arabic, Hangul. Western languages
such as English, French, and Spanish all fall under the category of
Latin
script. See http://en.wikipedia.org/wiki/Writing_system |
| Serif type | Fonts with a horizontal stroke projecting from the main strokes of a character. Some popular serif typefaces are Times Roman and Century Schoolbook. |
| Shaping | Shaping is the process of converting a string of keyboard derived characters into a string of glyphs properly designed for display. This may include adding, removing, and changing glyphs from the original string. For example, a two character string of a` may be shaped into à. Similarly, Arabic text shaping causes characters to change representation depending on their position within a word (beginning, middle, end, or alone). |
| Small capitals small caps |
Capital letters with the same optical height and proportions as the lower case letters of the same typeface. In PostScript and TrueType formats Small caps are often contained in separate small caps fonts and are designated with an SC after the font name. They contain the normal capitals and small caps instead of the lower case letters. Small caps have their own forms and proportions and, unlike false small caps, are not simply the capital letters at a reduced size. False small caps are recognizable by stroke thickness which seems too thin – the weights are scaled when the size is reduced – and characters which appear too light in comparison to the lower case letters. OpenType has no need for seperate small caps fonts, because the small caps can be integrated in the normal font. |
| Stretch | Defines a font's horizontal extent type, such as condensed, normal, expanded, ultra expanded, etc. |
| Stroke font |
A stroke font is a font that is primarily defined by line
strokes. East Asian kanji glyphs conceptually are stroked glyphs and
sometimes draw better and compress better when represented as strokes
instead of outlines. Such fonts are usually scalable; a single outline
specification allows for various sizes of rendered glyphs. There is no
major standard for the computerized implementation of stroked fonts,
but sometimes stroked fonts are embedded into the TrueType or OpenType
file format. |
| Style | For fonts, this refers to normal, italic, or oblique. It does not
refer to weight (light, medium, bold, black, heavy), nor does it refer
to variant (small caps or regular caps), nor does it refer to stretch
(condensed, expanded). See Weight, Variant, and Stretch. For HTML, style refers to the CSS (cascading style sheet) style, which is a broad description of both a font and how text is layed out in that font. See CSS. |
| Subscript | Small character which appears above the x-height. |
| Superscript | Small character which appears above the y-height. |
| Title case | Unicode defines three kinds of case mapping: lowercase, uppercase, and titlecase. The difference between uppercasing and titlecasing a character or character sequence can be seen in compound characters (that is, a single character that represents a compount of two characters). Some letters in some languages are compound letters. For example, in Unicode, character U+01F3 is LATIN SMALL LETTER DZ; it is a single Unicode character, but it looks like two characters. Much like with regular characters, it looks like dz when in lower case, looks like Dz when it begins a capitalized word, and looks like DZ when it is upper case or in an all-caps word. But since this is a single Unicode character, it needs to be represented with three Unicode code points and not two. These three points are called the lower case, title case, and upper case versions of the character. For characters like this, the title case version of it (Dz) is not the same as the upper case version of it (DZ). |
| Tracking | Altering the spacing between characters of a word or words. Tracking is distinct from kerning, which alters the spacing between character pairs only. There is no standard definition for tracking, though usually it is implemented as some sort of percentage-based horizontal stretching of the advance widths between characters. See Letter Spacing. |
| TrueType | TrueType is an outline font
specification. TrueType fonts files use the .ttf or
.ttc file extension. See http://en.wikipedia.org/wiki/TrueType |
| Typeface | A typeface is a typographical character
family classification (e.g. Courier). A typeface is very similar to a
family, though strictly speaking it is a subset of a family. However,
typeface and family are often used somewhat interchangeably because
usually a typeface and its family are the same thing (i.e. the typeface
is the only one in its family and thus the typeface name essentially
becomes its family name as well). See http://en.wikipedia.org/wiki/Typeface See http://en.wikipedia.org/wiki/Typography |
| Typesetting | Typesetting is the process of laying out text on a page, taking into account styles, line wrapping, leading, flow around graphics, underlining, etc. |
| Unicode | Unicode is a character set that
attempts to encompass all known written
languages. Additionally Unicode defines a number of standardized
conventions for the basic computational processing of text from these
languages. See http://en.wikipedia.org/wiki/Unicode See http://en.wikipedia.org/wiki/ISO_10646 |
| UCS Universal Character Set |
UCS is simply a way to directly map
a a Unicode code point to a numerical integers such as C++'s short or
int. Usually when we think of wide Unicode strings, we are talking
about
UCS2 (two byte representation). UTF16 is nearly the same as UCS2. The C
wchar_t data type usually refers to UCS2 or UCS4, depending on the
compiler and/or platform. See http://en.wikipedia.org/wiki/ISO_10646 |
| UTF Universal Transformation Format. |
UTF is simply a way to
assign full set of Unicode code points to various byte or multi-byte
encodings. Examples include UTF8 and UTF16. UTF is different from UCS
in that UTF is an encoding that is more complicated because it doesn't
treat the values as numbers but as combinatorial sequences. The purpose
of UTF
is to compress code point representations. See http://en.wikipedia.org/wiki/UTF-8 See http://en.wikipedia.org/wiki/UTF-16 |
| Weight | The thickness or relative amount of blackness of the main strokes of characters. Typical weights are regular, light, medium, demi bold, bold, etc. |
| XHTML | XHTML is essentially the successor to HTML 4. See http://www.w3.org/TR/xhtml1 |
| x-height |
The height of short lower-case glyphs such as 'x' in Latin
fonts. See http://en.wikipedia.org/wiki/Typeface for some related typographical concepts. |