Recently I busied myself with finding a list of brackets and quotation characters and their correspondence in the Unicode Character Database. By “brackets” I mean all kinds of brackets: round, square, curly, etc. Though the database provides necessary information, it has some quirks.
Using Bidi_Mirrored and Bidi_Mirroring_Glyph (character properties), as suggested somewhere on the web, is misguided. I suppose that these properties were created to aid text rendering. In a right-to-left text, a Bidi_Mirrored character X should be rendered as the character Bidi_Mirroring_Glyph of X is rendered in a left-to-right text. The glyph of Bidi_Mirroring_Glyph of X should be a reflection across the vertical axis of the glyph of X. Hence many characters like U+003C LESS-THAN SIGN and U+2208 ELEMENT OF (set membership) are Bidi_Mirrored. Even worse, Bidi_Mirroring_Glyph of some characters is the character itself, for example, for U+2260 NOT EQUAL TO and U+222B INTEGRAL.
The property Quotation_Mark gives quotation characters. There is no surprises here. Beware that U+0022 QUOTATION MARK (straight double quotation character) and U+0027 APOSTROPHE are Quotation_Mark-s.
The property Bidi_Paired_Bracket_Type has three values: Open, Close, and None. Open for open brackets, and Close for close brackets. Bidi_Paired_Bracket of X is the bracket which is the pair to X. No quotation characters here. For an unknown reason,
- U+FD3E ORNATE LEFT PARENTHESIS and U+FD3F ORNATE RIGHT PARENTHESIS from the Arabic_Presentation_Forms_A block,
- some rare forms of brackets like U+2E02 LEFT SUBSTITUTION BRACKET,
- presentation forms for brackets like U+0E35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
are not included.
The most abstruse way to find brackets and quotation characters is General_Category. The relevant values of this property are Open_Punctuation, Close_Punctuation, Initial_Punctuation, and Final_Punctuation.
Initial_Punctuation and Final_Punctuation mark quotation characters. Classification of quotation characters is not clear-cut: U+0022 QUOTATION MARK is used as open and close characters, some quotation characters occur in more than one pair („…” and „…“), and guillemets are used with swapped meaning (»…«) sometimes. Hence not all quotation marks are Initial_Punctuation and Final_Punctuation, for example, U+0022 QUOTATION MARK is not. U+201E DOUBLE LOW-9 QUOTATION MARK is Open_Punctuation, again for an unknown reason. It is interesting that U+2E20 LEFT VERTICAL BAR WITH QUILL and U+2E21 RIGHT VERTICAL BAR WITH QUILL are considered Initial_Punctuation and Final_Punctuation respectively. Because they are similar to straight brackets (|…|)?
Brackets and other similar characters are Open_Punctuation or Close_Punctuation. Here the items I listed for Bidi_Paired_Bracket_Type are included.