Copyright	(c) The University of Glasgow 2003
License	see libraries/base/LICENSE
Maintainer	cvs-ghc@haskell.org
Stability	internal
Portability	non-portable (GHC extensions)
Safe Haskell	Trustworthy
Language	Haskell2010

GHC.Unicode

Description

Implementations for the character predicates (isLower, isUpper, etc.) and the conversions (toUpper, toLower). The implementation uses libunicode on Unix systems if that is available.

Synopsis

data GeneralCategory
- = UppercaseLetter
- | LowercaseLetter
- | TitlecaseLetter
- | ModifierLetter
- | OtherLetter
- | NonSpacingMark
- | SpacingCombiningMark
- | EnclosingMark
- | DecimalNumber
- | LetterNumber
- | OtherNumber
- | ConnectorPunctuation
- | DashPunctuation
- | OpenPunctuation
- | ClosePunctuation
- | InitialQuote
- | FinalQuote
- | OtherPunctuation
- | MathSymbol
- | CurrencySymbol
- | ModifierSymbol
- | OtherSymbol
- | Space
- | LineSeparator
- | ParagraphSeparator
- | Control
- | Format
- | Surrogate
- | PrivateUse
- | NotAssigned
generalCategory :: Char -> GeneralCategory
isAscii :: Char -> Bool
isLatin1 :: Char -> Bool
isControl :: Char -> Bool
isAsciiUpper :: Char -> Bool
isAsciiLower :: Char -> Bool
isPrint :: Char -> Bool
isSpace :: Char -> Bool
isUpper :: Char -> Bool
isLower :: Char -> Bool
isAlpha :: Char -> Bool
isDigit :: Char -> Bool
isOctDigit :: Char -> Bool
isHexDigit :: Char -> Bool
isAlphaNum :: Char -> Bool
isPunctuation :: Char -> Bool
isSymbol :: Char -> Bool
toUpper :: Char -> Char
toLower :: Char -> Char
toTitle :: Char -> Char
wgencat :: Int -> Int

Documentation

data GeneralCategory #

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).

Examples

Expand

Basic usage:

>>> :t OtherLetter
OtherLetter :: GeneralCategory

Eq instance:

>>> UppercaseLetter == UppercaseLetter
True
>>> UppercaseLetter == LowercaseLetter
False

Ord instance:

>>> NonSpacingMark <= MathSymbol
True

Enum instance:

>>> enumFromTo ModifierLetter SpacingCombiningMark
[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]

Read instance:

>>> read "DashPunctuation" :: GeneralCategory
DashPunctuation
>>> read "17" :: GeneralCategory
*** Exception: Prelude.read: no parse

Show instance:

>>> show EnclosingMark
"EnclosingMark"

Bounded instance:

>>> minBound :: GeneralCategory
UppercaseLetter
>>> maxBound :: GeneralCategory
NotAssigned

Ix instance:

>>> import Data.Ix ( index )
>>> index (OtherLetter,Control) FinalQuote
12
>>> index (OtherLetter,Control) Format
*** Exception: Error in array index

Constructors

UppercaseLetter	Lu: Letter, Uppercase
LowercaseLetter	Ll: Letter, Lowercase
TitlecaseLetter	Lt: Letter, Titlecase
ModifierLetter	Lm: Letter, Modifier
OtherLetter	Lo: Letter, Other
NonSpacingMark	Mn: Mark, Non-Spacing
SpacingCombiningMark	Mc: Mark, Spacing Combining
EnclosingMark	Me: Mark, Enclosing
DecimalNumber	Nd: Number, Decimal
LetterNumber	Nl: Number, Letter
OtherNumber	No: Number, Other
ConnectorPunctuation	Pc: Punctuation, Connector
DashPunctuation	Pd: Punctuation, Dash
OpenPunctuation	Ps: Punctuation, Open
ClosePunctuation	Pe: Punctuation, Close
InitialQuote	Pi: Punctuation, Initial quote
FinalQuote	Pf: Punctuation, Final quote
OtherPunctuation	Po: Punctuation, Other
MathSymbol	Sm: Symbol, Math
CurrencySymbol	Sc: Symbol, Currency
ModifierSymbol	Sk: Symbol, Modifier
OtherSymbol	So: Symbol, Other
Space	Zs: Separator, Space
LineSeparator	Zl: Separator, Line
ParagraphSeparator	Zp: Separator, Paragraph
Control	Cc: Other, Control
Format	Cf: Other, Format
Surrogate	Cs: Other, Surrogate
PrivateUse	Co: Other, Private Use
NotAssigned	Cn: Other, Not Assigned

Instances

Instances details

Bounded GeneralCategory #	Since: base-2.1
Instance details Defined in GHC.Unicode Methods minBound :: GeneralCategory # maxBound :: GeneralCategory #
Enum GeneralCategory #	Since: base-2.1
Instance details Defined in GHC.Unicode Methods succ :: GeneralCategory -> GeneralCategory # pred :: GeneralCategory -> GeneralCategory # toEnum :: Int -> GeneralCategory # fromEnum :: GeneralCategory -> Int # enumFrom :: GeneralCategory -> [GeneralCategory] # enumFromThen :: GeneralCategory -> GeneralCategory -> [GeneralCategory] # enumFromTo :: GeneralCategory -> GeneralCategory -> [GeneralCategory] # enumFromThenTo :: GeneralCategory -> GeneralCategory -> GeneralCategory -> [GeneralCategory] #
Eq GeneralCategory #	Since: base-2.1
Instance details Defined in GHC.Unicode Methods (==) :: GeneralCategory -> GeneralCategory -> Bool Source # (/=) :: GeneralCategory -> GeneralCategory -> Bool Source #
Ord GeneralCategory #	Since: base-2.1
Instance details Defined in GHC.Unicode Methods compare :: GeneralCategory -> GeneralCategory -> Ordering Source # (<) :: GeneralCategory -> GeneralCategory -> Bool Source # (<=) :: GeneralCategory -> GeneralCategory -> Bool Source # (>) :: GeneralCategory -> GeneralCategory -> Bool Source # (>=) :: GeneralCategory -> GeneralCategory -> Bool Source # max :: GeneralCategory -> GeneralCategory -> GeneralCategory Source # min :: GeneralCategory -> GeneralCategory -> GeneralCategory Source #
Read GeneralCategory #	Since: base-2.1
Instance details Defined in GHC.Read Methods readsPrec :: Int -> ReadS GeneralCategory # readList :: ReadS [GeneralCategory] # readPrec :: ReadPrec GeneralCategory # readListPrec :: ReadPrec [GeneralCategory] #
Show GeneralCategory #	Since: base-2.1
Instance details Defined in GHC.Unicode Methods showsPrec :: Int -> GeneralCategory -> ShowS # show :: GeneralCategory -> String # showList :: [GeneralCategory] -> ShowS #
Ix GeneralCategory #	Since: base-2.1
Instance details Defined in GHC.Unicode Methods range :: (GeneralCategory, GeneralCategory) -> [GeneralCategory] # index :: (GeneralCategory, GeneralCategory) -> GeneralCategory -> Int # unsafeIndex :: (GeneralCategory, GeneralCategory) -> GeneralCategory -> Int # inRange :: (GeneralCategory, GeneralCategory) -> GeneralCategory -> Bool # rangeSize :: (GeneralCategory, GeneralCategory) -> Int # unsafeRangeSize :: (GeneralCategory, GeneralCategory) -> Int #

generalCategory :: Char -> GeneralCategory #

The Unicode general category of the character. This relies on the Enum instance of GeneralCategory, which must remain in the same order as the categories are presented in the Unicode standard.

Examples

Expand

Basic usage:

>>> generalCategory 'a'
LowercaseLetter
>>> generalCategory 'A'
UppercaseLetter
>>> generalCategory '0'
DecimalNumber
>>> generalCategory '%'
OtherPunctuation
>>> generalCategory '♥'
OtherSymbol
>>> generalCategory '\31'
Control
>>> generalCategory ' '
Space

isAscii :: Char -> Bool #

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool #

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isControl :: Char -> Bool #

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isAsciiUpper :: Char -> Bool #

Selects ASCII upper-case letters, i.e. characters satisfying both isAscii and isUpper.

isAsciiLower :: Char -> Bool #

Selects ASCII lower-case letters, i.e. characters satisfying both isAscii and isLower.

isPrint :: Char -> Bool #

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isSpace :: Char -> Bool #

Returns True for any Unicode space character, and the control characters \t, \n, \r, \f, \v.

isUpper :: Char -> Bool #

Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isLower :: Char -> Bool #

Selects lower-case alphabetic Unicode characters (letters).

isAlpha :: Char -> Bool #

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to isLetter.

isDigit :: Char -> Bool #

Selects ASCII digits, i.e. '0'..'9'.

isOctDigit :: Char -> Bool #

Selects ASCII octal digits, i.e. '0'..'7'.

isHexDigit :: Char -> Bool #

Selects ASCII hexadecimal digits, i.e. '0'..'9', 'a'..'f', 'A'..'F'.

isAlphaNum :: Char -> Bool #

Selects alphabetic or numeric Unicode characters.

Note that numeric digits outside the ASCII range, as well as numeric characters which aren't digits, are selected by this function but not by isDigit. Such characters may be part of identifiers but are not used by the printer and reader to represent numbers.

isPunctuation :: Char -> Bool #

Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Punctuation".

Examples

Expand

Basic usage:

>>> isPunctuation 'a'
False
>>> isPunctuation '7'
False
>>> isPunctuation '♥'
False
>>> isPunctuation '"'
True
>>> isPunctuation '?'
True
>>> isPunctuation '—'
True

isSymbol :: Char -> Bool #

Selects Unicode symbol characters, including mathematical and currency symbols.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".

Examples

Expand

Basic usage:

>>> isSymbol 'a'
False
>>> isSymbol '6'
False
>>> isSymbol '='
True

The definition of "math symbol" may be a little counter-intuitive depending on one's background:

>>> isSymbol '+'
True
>>> isSymbol '-'
False

toUpper :: Char -> Char #

Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> Char #

Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> Char #

Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

wgencat :: Int -> Int #