This document is adapted from Greg Bakers Commonly Confused Characters document.
With the advent of desktop publishing in the 1980's, typesetting suddenly changed from a spectator sport to a participatory one. Along with this, a lot of the subtleties got lost for many. A lot of people didn't know they were missing these details, and a lot of programs didn't let you do anything about it even if you did. With the creation of Unicode, this has changed for many applications. Unicode includes many characters which are traditionally combined in computer-generated works. This is a list of some of those, including their proper usage.The following character classes are covered:
This is the character that you type on a standard (US layout) keyboard with the key that's beside the semicolon. It shouldn't really ever be used in proper typography, but is often used because it's easy to type and well supported. It is superceded by one of the characters below, depending on the context.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ' APOSTROPHE 0027 39 ' '
This is the character that's typically on a US layout keyboard below the tilde (~). It shouldn't be used in place of the opening single quote, or for any other discernible typographic purpose.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ` GRAVE ACCENT 0060 96 ` `
This is the symbol that should be used to start a quotation that's delimited with single quotes. For example, ‘Over here!’
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ‘ LEFT SINGLE QUOTATION MARK 2018 8216 ‘ ‘ ‘
This is the symbol that should be used to end a quotation that's delimited with single quotes. For example, ‘Over here!’ This is also the preferred character to use as an apostrophe, as in I’m coming, or He’s with me.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ‘ RIGHT SINGLE QUOTATION MARK 2019 8217 ’ ’ ’
This character is provided so it can be placed over others, as an accent. It should take up zero horizontal space, because it's designed to overlap the previous character.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name _̀ COMBINING GRAVE ACCENT 0300 768 ̀ ̀
This character is provided so it can be placed over others, as an accent. It should take up zero horizontal space, because it's designed to overlap the previous character.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name _́ COMBINING ACUTE ACCENT 0301 769 ́ ́
This is the character that you type on a standard (US layout) keyboard with the key that's beside the semicolon. It shouldn't really ever be used in proper typography, but is often used because it's easy to type and well supported. It is superseded by one of the below, depending on meaning.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name " QUOTATION MARK 0022 34 " " "
This is the symbol that should be used to start a quotation that's delimited with double quotes. For example, “Over here!”
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name “ LEFT DOUBLE QUOTATION MARK 201C 8220 “ “ “
This is the symbol that should be used to end a quotation that's delimited with double quotes. For example, “Over here!”
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ” RIGHT DOUBLE QUOTATION MARK 201D 8221 ” ” ”
Used in mathematics, as in xx′ + yy′. It's also used as the symbol for feet, as in I am 6′ tall.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ′ PRIME 2032 8242 ′ ′
A doubled version of the prime symbol. Used in mathematics like xx′x″ + yy′y″. It is also used to indicate inches, The table is 5′ 6″ long. Unicode also defines a triple prime symbol at hex 2034.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ″ DOUBLE PRIME 2033 8243 ″ ″
Used to indicate that the text or other material is identical to that above it.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name 〃 DITTO MARK 3003 12291 〃 〃
The hyphen produced by the key on your keyboard. It can be used in place of the hyphen or minus sign (as described below), but the more specific characters are preferred.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name - DOUBLE PRIME 002D 45 - -
Used in hyphenating a word, as in pre-fabricated or e-mail. This is also the symbol that should be used when breaking a word across a line.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ‐ HYPHEN 2010 8208 ߚ ‐
The en-dash is used to indicate a range, like I'll need 100–150 units or John Doe, 1914–2001.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name – EN DASH 2013 8211 ߝ – –
This is used to break up a sentence. Two dashes (--) are often used in its place when typing, but this is the proper way to do it. For example, “He said he had it under control—I could see that wasn’t true.”
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name — EM DASH 2014 8212 ߞ — —
Used to indicate subtraction, as in 4 − 1 = 3 or z = x − y. Along with the minus sign is the Unicode + sign, which is just the regular ASCII + sign on the keyboard.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name − MINUS SIGN 2212 8722 ࢤ − −
This isn't quite a character in the usual sense. The soft hyphen is used to indicate a place that a word can be broken if it's near the end of a line. Usually, this character shouldn't visually appear. It displays as a hyphen if the line is broken at that point. A related character is the zero-width space character, which acts like soft hyphen but doesn't result in a hyphen being drawn if the line is broken at that point.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name SOFT HYPHEN 00AD 173 �AD; ­ ­
The appearance of this character is exactly the same as the regular hyphen. The difference is that the line should not be broken at this point. A related character is the non-breaking space character, which prevents a break at a given location.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name NON-BREAKING HYPHEN 2011 8209 ߛ ‑
This character is intended to be used as the bullet in a bulleted list. Related characters are Unicode 2022 (BULLET), and Unicode 2023 (TRIANGULAR BULLET).
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ⁃ HYPHEN BULLET 2043 8259 ߻ ⁃
This is the regular space that should appear between words. It's what you get when you press the space-bar on your keyboard.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name SPACE 0020 32   
This is used is indicate a space where the line should not be broken. For example, all of the spaces in Mr. John Doe should be non-breaking since it's incorrect to split a line in the middle of someone's name. Spaces between digits in a number (used by European languages) should also use non-breaking space, as with 125 000. This character is sometimes called "hard space" by users.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name NO-BREAK SPACE 00A0 160 �A0;  
Zero-width space acts just like regular space, except it visually has no width. It can be used, for example, to indicate breakable points in a Thai word.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ZERO WIDTH SPACE 200B 8203 ÈB; ​
This is an invisible character which acts to join the characters before and after it. It prevents line breaking but also has effects on the glyphs of complex scripts such as Arabic. It is similar to other Unicode characters such as Zero-Width No-Break Space, Word Joiner, and Zero-Width Joiner, and it is not entirely clear why these separate characters exist which act equivalently.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ZERO WIDTH JOINER 200D 8205 ÈD; ‍
Ideographic space is the space char that can be used between ideographic (Asian symbol) characters. It is wider than a regular space char and matches the width of the ideographic characters surrounding it. It thus allows for monospaced text.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name IDEOGRAPHIC SPACE 3000 12288 ஸ  
A space with the same width as the height of the font, approximately the width of a capital letter 'M'.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name EM SPACE 2003 8195 ߓ    
Half an em-space. This is about the amount of space that's commonly put between sentences by typesetters, although it's typically a little less in modern texts.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name EN SPACE 2002 8194 ߒ    
The ellipsis is used to indicate an omission. For example, 1, 2, …, 10. The dots in an ellipsis should be spaced a little further apart than three periods would typically be, so they have been assigned this character.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name … HORIZONTAL ELLIPSIS 2026 8230 ߪ … …
The black circle is used for password character entry. It yields a more professional look than do characters like * or -.
Unicode glyph Unicode name Unicode hex Unicode decimal HTML entity hex HTML entity decimal HTML entity name ● BLACK CIRCLE 25CF 9679 CF; ●