Commonly Confused Characters

This document is adapted from Greg Bakers Commonly Confused Characters document.

With the advent of desktop publishing in the 1980's, typesetting suddenly changed from a spectator sport to a participatory one. Along with this, a lot of the subtleties got lost for many. A lot of people didn't know they were missing these details, and a lot of programs didn't let you do anything about it even if you did. With the creation of Unicode, this has changed for many applications. Unicode includes many characters which are traditionally combined in computer-generated works. This is a list of some of those, including their proper usage.

The following character classes are covered:

Quotation Marks

Apostrophe

This is the character that you type on a standard (US layout) keyboard with the key that's beside the semicolon. It shouldn't really ever be used in proper typography, but is often used because it's easy to type and well supported. It is superceded by one of the characters below, depending on the context.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
'
APOSTROPHE
0027
39
'
'

Backtick

This is the character that's typically on a US layout keyboard below the tilde (~). It shouldn't be used in place of the opening single quote, or for any other discernible typographic purpose.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
`
GRAVE ACCENT
0060
96
`
`

Opening Single Quote

This is the symbol that should be used to start a quotation that's delimited with single quotes. For example, ‘Over here!’

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
LEFT SINGLE QUOTATION MARK
2018
8216
‘
‘
‘

Closing Single Quote

This is the symbol that should be used to end a quotation that's delimited with single quotes. For example, ‘Over here!’ This is also the preferred character to use as an apostrophe, as in I’m coming, or He’s with me.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
RIGHT SINGLE QUOTATION MARK
2019
8217
’
’
’

Grave Accent

This character is provided so it can be placed over others, as an accent. It should take up zero horizontal space, because it's designed to overlap the previous character.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
COMBINING GRAVE ACCENT
0300
768
̀
̀

Acute Accent

This character is provided so it can be placed over others, as an accent. It should take up zero horizontal space, because it's designed to overlap the previous character.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
COMBINING ACUTE ACCENT
0301
769
́
́

ASCII Double Quote

This is the character that you type on a standard (US layout) keyboard with the key that's beside the semicolon. It shouldn't really ever be used in proper typography, but is often used because it's easy to type and well supported. It is superseded by one of the below, depending on meaning.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
"
QUOTATION MARK
0022
34
"
"
&quot

Opening Double Quote

This is the symbol that should be used to start a quotation that's delimited with double quotes. For example, “Over here!”

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
LEFT DOUBLE QUOTATION MARK
201C
8220
“
“
“

Closing Double Quote

This is the symbol that should be used to end a quotation that's delimited with double quotes. For example, “Over here!”

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
RIGHT DOUBLE QUOTATION MARK
201D
8221
”
”
”

Prime

Used in mathematics, as in xx′ + yy′. It's also used as the symbol for feet, as in I am 6′ tall.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
PRIME
2032
8242
′
′

Double Prime

A doubled version of the prime symbol. Used in mathematics like xx′x″ + yy′y″. It is also used to indicate inches, The table is 5′ 6″ long. Unicode also defines a triple prime symbol at hex 2034.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
DOUBLE PRIME
2033
8243
″
″

Ditto Mark

Used to indicate that the text or other material is identical to that above it.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
DITTO MARK
3003
12291
〃
〃

Hyphens

ASCII Hyphen

The hyphen produced by the key on your keyboard. It can be used in place of the hyphen or minus sign (as described below), but the more specific characters are preferred.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
-
DOUBLE PRIME
002D
45
-
-

Hyphen

Used in hyphenating a word, as in pre-fabricated or e-mail. This is also the symbol that should be used when breaking a word across a line.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
HYPHEN
2010
8208
ߚ
‐

En-Dash

The en-dash is used to indicate a range, like I'll need 100–150 units or John Doe, 1914–2001.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
EN DASH
2013
8211
ߝ
–
–

Em-Dash

This is used to break up a sentence. Two dashes (--) are often used in its place when typing, but this is the proper way to do it. For example, “He said he had it under control—I could see that wasnt true.”

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
EM DASH
2014
8212
ߞ
—
—

Minus Sign

Used to indicate subtraction, as in 4 − 1 = 3 or z = x − y. Along with the minus sign is the Unicode + sign, which is just the regular ASCII + sign on the keyboard.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
MINUS SIGN
2212
8722
ࢤ
−
−

Soft Hyphen

This isn't quite a character in the usual sense. The soft hyphen is used to indicate a place that a word can be broken if it's near the end of a line. Usually, this character shouldn't visually appear. It displays as a hyphen if the line is broken at that point. A related character is the zero-width space character, which acts like soft hyphen but doesn't result in a hyphen being drawn if the line is broken at that point.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
SOFT HYPHEN
00AD
173
&#00AD;
­
­

Non-breaking Hyphen

The appearance of this character is exactly the same as the regular hyphen. The difference is that the line should not be broken at this point. A related character is the non-breaking space character, which prevents a break at a given location.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
NON-BREAKING HYPHEN
2011
8209
ߛ
‑

Hyphen Bullet

This character is intended to be used as the bullet in a bulleted list. Related characters are Unicode 2022 (BULLET), and Unicode 2023 (TRIANGULAR BULLET).

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
HYPHEN BULLET
2043
8259
߻
⁃

Spaces

Space

This is the regular space that should appear between words. It's what you get when you press the space-bar on your keyboard.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
SPACE
0020
32

 

Non-breaking Space

This is used is indicate a space where the line should not be broken. For example, all of the spaces in Mr. John Doe should be non-breaking since it's incorrect to split a line in the middle of someone's name. Spaces between digits in a number (used by European languages) should also use non-breaking space, as with 125 000. This character is sometimes called "hard space" by users.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
 
NO-BREAK SPACE
00A0
160
&#00A0;
 

Zero-width Space

Zero-width space acts just like regular space, except it visually has no width. It can be used, for example, to indicate breakable points in a Thai word.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
 ​
ZERO WIDTH SPACE
200B
8203
&#200B;
​

Zero-width Joiner

This is an invisible character which acts to join the characters before and after it. It prevents line breaking but also has effects on the glyphs of complex scripts such as Arabic. It is similar to other Unicode characters such as Zero-Width No-Break Space, Word Joiner, and Zero-Width Joiner, and it is not entirely clear why these separate characters exist which act equivalently.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
 ‍
ZERO WIDTH JOINER
200D
8205
&#200D;
‍

Ideographic Space

Ideographic space is the space char that can be used between ideographic (Asian symbol) characters. It is wider than a regular space char and matches the width of the ideographic characters surrounding it. It thus allows for monospaced text.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
 
IDEOGRAPHIC SPACE
3000
12288
ஸ
 

Em-Space

A space with the same width as the height of the font, approximately the width of a capital letter 'M'.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
EM SPACE
2003
8195
ߓ
 
 

En-Space

Half an em-space. This is about the amount of space that's commonly put between sentences by typesetters, although it's typically a little less in modern texts.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
EN SPACE
2002
8194
ߒ
 
 

Miscellaneous

Ellipsis

The ellipsis is used to indicate an omission. For example, 1, 2, …, 10. The dots in an ellipsis should be spaced a little further apart than three periods would typically be, so they have been assigned this character.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
HORIZONTAL ELLIPSIS
2026
8230
ߪ
…
…

Black Circle

The black circle is used for password character entry. It yields a more professional look than do characters like * or -.

Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name
BLACK CIRCLE
25CF
9679
&#25CF;
●