Copyright | (c) The University of Glasgow 2003 |
---|---|
License | see libraries/base/LICENSE |
Maintainer | cvs-ghc@haskell.org |
Stability | internal |
Portability | non-portable (GHC extensions) |
Safe Haskell | Trustworthy |
Language | Haskell2010 |
GHC.Unicode
Description
Implementations for the character predicates (isLower, isUpper, etc.) and the conversions (toUpper, toLower). The implementation uses libunicode on Unix systems if that is available.
Synopsis
- data GeneralCategory
- = UppercaseLetter
- | LowercaseLetter
- | TitlecaseLetter
- | ModifierLetter
- | OtherLetter
- | NonSpacingMark
- | SpacingCombiningMark
- | EnclosingMark
- | DecimalNumber
- | LetterNumber
- | OtherNumber
- | ConnectorPunctuation
- | DashPunctuation
- | OpenPunctuation
- | ClosePunctuation
- | InitialQuote
- | FinalQuote
- | OtherPunctuation
- | MathSymbol
- | CurrencySymbol
- | ModifierSymbol
- | OtherSymbol
- | Space
- | LineSeparator
- | ParagraphSeparator
- | Control
- | Format
- | Surrogate
- | PrivateUse
- | NotAssigned
- generalCategory :: Char -> GeneralCategory
- isAscii :: Char -> Bool
- isLatin1 :: Char -> Bool
- isControl :: Char -> Bool
- isAsciiUpper :: Char -> Bool
- isAsciiLower :: Char -> Bool
- isPrint :: Char -> Bool
- isSpace :: Char -> Bool
- isUpper :: Char -> Bool
- isLower :: Char -> Bool
- isAlpha :: Char -> Bool
- isDigit :: Char -> Bool
- isOctDigit :: Char -> Bool
- isHexDigit :: Char -> Bool
- isAlphaNum :: Char -> Bool
- isPunctuation :: Char -> Bool
- isSymbol :: Char -> Bool
- toUpper :: Char -> Char
- toLower :: Char -> Char
- toTitle :: Char -> Char
- wgencat :: Int -> Int
Documentation
data GeneralCategory #
Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).
Examples
Basic usage:
>>>
:t OtherLetter
OtherLetter :: GeneralCategory
Eq
instance:
>>>
UppercaseLetter == UppercaseLetter
True>>>
UppercaseLetter == LowercaseLetter
False
Ord
instance:
>>>
NonSpacingMark <= MathSymbol
True
Enum
instance:
>>>
enumFromTo ModifierLetter SpacingCombiningMark
[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]
Read
instance:
>>>
read "DashPunctuation" :: GeneralCategory
DashPunctuation>>>
read "17" :: GeneralCategory
*** Exception: Prelude.read: no parse
Show
instance:
>>>
show EnclosingMark
"EnclosingMark"
Bounded
instance:
>>>
minBound :: GeneralCategory
UppercaseLetter>>>
maxBound :: GeneralCategory
NotAssigned
Ix
instance:
>>>
import Data.Ix ( index )
>>>
index (OtherLetter,Control) FinalQuote
12>>>
index (OtherLetter,Control) Format
*** Exception: Error in array index
Constructors
UppercaseLetter | Lu: Letter, Uppercase |
LowercaseLetter | Ll: Letter, Lowercase |
TitlecaseLetter | Lt: Letter, Titlecase |
ModifierLetter | Lm: Letter, Modifier |
OtherLetter | Lo: Letter, Other |
NonSpacingMark | Mn: Mark, Non-Spacing |
SpacingCombiningMark | Mc: Mark, Spacing Combining |
EnclosingMark | Me: Mark, Enclosing |
DecimalNumber | Nd: Number, Decimal |
LetterNumber | Nl: Number, Letter |
OtherNumber | No: Number, Other |
ConnectorPunctuation | Pc: Punctuation, Connector |
DashPunctuation | Pd: Punctuation, Dash |
OpenPunctuation | Ps: Punctuation, Open |
ClosePunctuation | Pe: Punctuation, Close |
InitialQuote | Pi: Punctuation, Initial quote |
FinalQuote | Pf: Punctuation, Final quote |
OtherPunctuation | Po: Punctuation, Other |
MathSymbol | Sm: Symbol, Math |
CurrencySymbol | Sc: Symbol, Currency |
ModifierSymbol | Sk: Symbol, Modifier |
OtherSymbol | So: Symbol, Other |
Space | Zs: Separator, Space |
LineSeparator | Zl: Separator, Line |
ParagraphSeparator | Zp: Separator, Paragraph |
Control | Cc: Other, Control |
Format | Cf: Other, Format |
Surrogate | Cs: Other, Surrogate |
PrivateUse | Co: Other, Private Use |
NotAssigned | Cn: Other, Not Assigned |
Instances
generalCategory :: Char -> GeneralCategory #
The Unicode general category of the character. This relies on the
Enum
instance of GeneralCategory
, which must remain in the
same order as the categories are presented in the Unicode
standard.
Examples
Basic usage:
>>>
generalCategory 'a'
LowercaseLetter>>>
generalCategory 'A'
UppercaseLetter>>>
generalCategory '0'
DecimalNumber>>>
generalCategory '%'
OtherPunctuation>>>
generalCategory '♥'
OtherSymbol>>>
generalCategory '\31'
Control>>>
generalCategory ' '
Space
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
isAsciiUpper :: Char -> Bool #
isAsciiLower :: Char -> Bool #
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
Returns True
for any Unicode space character, and the control
characters \t
, \n
, \r
, \f
, \v
.
Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.
Selects alphabetic Unicode characters (lower-case, upper-case and
title-case letters, plus letters of caseless scripts and modifiers letters).
This function is equivalent to isLetter
.
isOctDigit :: Char -> Bool #
Selects ASCII octal digits, i.e. '0'
..'7'
.
isHexDigit :: Char -> Bool #
Selects ASCII hexadecimal digits,
i.e. '0'
..'9'
, 'a'
..'f'
, 'A'
..'F'
.
isAlphaNum :: Char -> Bool #
Selects alphabetic or numeric Unicode characters.
Note that numeric digits outside the ASCII range, as well as numeric
characters which aren't digits, are selected by this function but not by
isDigit
. Such characters may be part of identifiers but are not used by
the printer and reader to represent numbers.
isPunctuation :: Char -> Bool #
Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.
This function returns True
if its argument has one of the
following GeneralCategory
s, or False
otherwise:
ConnectorPunctuation
DashPunctuation
OpenPunctuation
ClosePunctuation
InitialQuote
FinalQuote
OtherPunctuation
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Punctuation".
Examples
Basic usage:
>>>
isPunctuation 'a'
False>>>
isPunctuation '7'
False>>>
isPunctuation '♥'
False>>>
isPunctuation '"'
True>>>
isPunctuation '?'
True>>>
isPunctuation '—'
True
Selects Unicode symbol characters, including mathematical and currency symbols.
This function returns True
if its argument has one of the
following GeneralCategory
s, or False
otherwise:
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".
Examples
Basic usage:
>>>
isSymbol 'a'
False>>>
isSymbol '6'
False>>>
isSymbol '='
True
The definition of "math symbol" may be a little counter-intuitive depending on one's background:
>>>
isSymbol '+'
True>>>
isSymbol '-'
False
Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.
Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.
Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.