AccessEngine :: AEDevice :: AEOutput :: Word :: Word :: Class Word
[hide private]
[frames] | no frames]

Class Word

source code

object --+
         |
        Word

Represents a word in a body of text. Each Word has a main and a trailing part where the main part is processed according to other flags in the current WordState to improve its presentation to the user via a speech or other output device while the trailing part remains unprocessed. The value of WordDef determines what characters lie in the main and trailing parts of each word. The following constants are available in AEConstants.

Characters in the ignore list are considered blank. A AEPor can be associated with a Word to indicate its context in a larger body of text.

Callables may be specified as observers for characters processed by the main and trail parts of each Word. An observer must take four parameters, this Word instance, the WordState in use, the current character, and the list of all characters in the main or trail part of the word. The observer should return the character to be added. The list may be modified in place to affect the final contents of the word.

Instance Methods [hide private]
 
__init__(self, state, por, main_ob=None, trail_ob=None)
Stores the WordState and initializes all instance variables.
source code
 
__eq__(self, other)
Compares this Word to the one provided based on their AEPors and content.
source code
string
__unicode__(self)
Gets this Word as a unicode string.
source code
string
__str__(self)
Gets this Word as a non-unicode string.
source code
 
_isMainChar(self, ch)
Determines if the given character should be considered a part of the main part of this word or not based on the definition of the word given by WordState.
source code
 
replaceMain(self, text)
Replaces the main part of the word with the given string.
source code
 
replaceTrail(self, text)
Replaces the main part of the word with the given string.
source code
AEPor
getPOR(self)
Gets the AEPor associated with the start of this Word.
source code
boolean
isBlank(self, ch)
Determines if the given character is blank or ignored.
source code
boolean
isAlpha(self, ch)
Determines if the given character is a letter in the current locale.
source code
boolean
isNumeric(self, ch)
Determines if the given character is a number in the current locale.
source code
boolean
isPunctuation(self, ch)
Determines if the given character is a punctuation mark.
source code
boolean
isSymbol(self, ch)
Determines if the given character is a symbol.
source code
boolean
isVowel(self, ch)
Determines if the given character is a vowel.
source code
boolean
isCap(self, ch)
Determines if the given character is an upper case letter.
source code
string
getCharValue(self, ch)
Gets the unicode hex value for a character sans the 0x prefix.
source code
string
getCharName(self, ch)
Gets the unicode name of the character, one of the strings listed in the http://unicode.org/charts/charindex.html.
source code
boolean
getCharDescription(self, ch)
Gets a localized description of the given character.
source code
string
getSource(self)
Gets the unprocessed text of this word as it was seen in the original text source.
source code
integer
getSourceLength(self)
Gets the length of the unprocessed source text of this Word.
source code
integer
getMainLength(self)
Gets the length of the processed main part of this Word.
source code
boolean
moreAvailable(self)
Makes a guess as to whether or not there are more Words in the body of text from which this word originated.
source code
boolean
hasRepeat(self)
Gets if this Word has a character repeated more than the maximum number of repetitions allowed or not.
source code
boolean
hasCap(self)
Gets if this Word contains an uppercase letter or not.
source code
boolean
hasVowel(self)
Gets if this Word contains a vowel or not.
source code
boolean
isAllCaps(self)
Gets if this Word is all capitals or not.
source code
boolean
isAllNumeric(self)
Gets if this Word is all numbers or not.
source code
boolean
isAllBlank(self)
Gets if this Word is all blanks or not.
source code
string or None
append(self, chunk)
Parses the given chunk of text for characters that should be added to the main_part or trail_part of this Word.
source code
string
_processMain(self, ch)
Adds the given character to the source_word.
source code
string
_processTrail(self, ch)
Adds the given character to the source_word.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__

Instance Variables [hide private]
integer curr_repeat
Indicates a character should be considered a repeat iff this value > MaxRepeat.
boolean has_main
Has at least one main character been parsed?
string last_char
Last character appended to this Word
boolean main_done
Is the main_part complete?
callable main_ob
Function to invoke for each character in the main part of a word
list main_part
Part of this Word that will receive extra preparation for output
boolean more
Are there likely more Words after this one in the text source where this Word originated?
AEPor por
Point of regard indicating where this Word originated
list source_word
Original text of this Word without any preparation for output applied
WordState state
Settings that determine the definition of a Word and how it is prepared for output
boolean trail_done
Is the trail_part complete?
callable trail_ob
Function to invoke for each character in the trailing part of a word
list trail_part
Part of the word that will receive little preparation for output
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, state, por, main_ob=None, trail_ob=None)
(Constructor)

source code 
Stores the WordState and initializes all instance variables.
Parameters:
  • state (WordState) - State that defines Words and how they are processed
  • por (AEPor) - Point of regard indicating where this Word originated
  • main_ob (callable) - Function to invoke for each character in the main part of a word
  • trail_ob (callable) - Function to invoke for each character in the trailing part of a word
Overrides: object.__init__

__eq__(self, other)
(Equality operator)

source code 
Compares this Word to the one provided based on their AEPors and content. If their source_words and AEPors are the same, they are considered equal.
Parameters:
  • other (Word) - Word to compare

__unicode__(self)

source code 
Gets this Word as a unicode string.
Returns: string
Main part of the string joined with the trail

__str__(self)
(Informal representation operator)

source code 
Gets this Word as a non-unicode string.
Returns: string
Main part of the string joined with the trail
Overrides: object.__str__

_isMainChar(self, ch)

source code 
Determines if the given character should be considered a part of the main part of this word or not based on the definition of the word given by WordState.
Parameters:
  • ch (string) - Character to test

replaceMain(self, text)

source code 
Replaces the main part of the word with the given string.
Parameters:
  • text (string) - Text to use as the main part of the word

replaceTrail(self, text)

source code 
Replaces the main part of the word with the given string.
Parameters:
  • text (string) - Text to use as the main part of the word

getPOR(self)

source code 
Gets the AEPor associated with the start of this Word.
Returns: AEPor
Point of regard pointing to the start of this word

isBlank(self, ch)

source code 
Determines if the given character is blank or ignored.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Is the character a blank?

isAlpha(self, ch)

source code 
Determines if the given character is a letter in the current locale.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Is the character a letter?

isNumeric(self, ch)

source code 
Determines if the given character is a number in the current locale.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Is the character a number?

isPunctuation(self, ch)

source code 
Determines if the given character is a punctuation mark.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Is the character a punctuation mark?

isSymbol(self, ch)

source code 
Determines if the given character is a symbol.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Is the character a symbol?

isVowel(self, ch)

source code 
Determines if the given character is a vowel. Relies on a translator to list all vowels in the current locale.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Is the character a Latin vowel?

isCap(self, ch)

source code 
Determines if the given character is an upper case letter.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Is the character capitalized?

getCharValue(self, ch)

source code 
Gets the unicode hex value for a character sans the 0x prefix.
Parameters:
  • ch (string) - Single character
Returns: string
Hex value of the character

getCharName(self, ch)

source code 
Gets the unicode name of the character, one of the strings listed in the http://unicode.org/charts/charindex.html. If the character could not be determined from the given string, returns an empty string. Note that these names are not localized.
Parameters:
  • ch (string) - Single character
Returns: string
Name of the character

getCharDescription(self, ch)

source code 
Gets a localized description of the given character. The most detailed description for a character is returned so that, for instance, 'e' is described as a vowel and not just a letter.
Parameters:
  • ch (string) - Character to test
Returns: boolean
Localized description of the character according to the processing done by this Word class and based on the current state

getSource(self)

source code 
Gets the unprocessed text of this word as it was seen in the original text source.
Returns: string
Parsed word without any processing applied

getSourceLength(self)

source code 
Gets the length of the unprocessed source text of this Word.
Returns: integer
Length of the source_word

getMainLength(self)

source code 
Gets the length of the processed main part of this Word.
Returns: integer
Length of the main_part

moreAvailable(self)

source code 
Makes a guess as to whether or not there are more Words in the body of text from which this word originated. This guess is based on whether or not the last chunk passed to append was processed in full.
Returns: boolean
Are there likely more Words in the original body of text

hasRepeat(self)

source code 
Gets if this Word has a character repeated more than the maximum number of repetitions allowed or not.
Returns: boolean
Does this Word containg a repeated character?

hasCap(self)

source code 
Gets if this Word contains an uppercase letter or not.
Returns: boolean
Does this Word contain a capital letter?

hasVowel(self)

source code 
Gets if this Word contains a vowel or not.
Returns: boolean
Does this Word contain a vowel?

isAllCaps(self)

source code 
Gets if this Word is all capitals or not.
Returns: boolean
Is this Word all capital letters?

isAllNumeric(self)

source code 
Gets if this Word is all numbers or not.
Returns: boolean
Is this Word all numbers?

isAllBlank(self)

source code 
Gets if this Word is all blanks or not.
Returns: boolean
Is this Word all blanks?

append(self, chunk)

source code 
Parses the given chunk of text for characters that should be added to the main_part or trail_part of this Word. If this word has neither main_done or trail_done set, then all main characters determined by _isMainChar up to the first non-main character are added to the main part of this word. When the first non-main word is encountered, main_done is set. If this word has main_done set and trail_done unset, all non-main characters are added to the trail part of this word. When another main character is encountered after main_done is set, trail_done is set and the remainder of the given chunk is returned unprocessed to be added to another Word. Once trail_done is set, no further text can be appended to this Word.
Parameters:
  • chunk (string) - Chunk of text to parse for words
Returns: string or None
Unprocessed portion of the chunk or None if fully processed

_processMain(self, ch)

source code 
Adds the given character to the source_word. If Caps is unset, makes the character lowercase. If CapExpand and the character is a capital letter or NumExpand and the character is a number, inserts a space in main_part. Finally inserts the possibly lowercased character in main_part.
Parameters:
  • ch (string) - Character to process
Returns: string
Character inserted in trail_part

_processTrail(self, ch)

source code 
Adds the given character to the source_word. If the character is a blank, inserts a space in trail_part, else inserts the character.
Parameters:
  • ch (string) - Character to process
Returns: string
Character inserted in trail_part

Instance Variable Details [hide private]

curr_repeat

Indicates a character should be considered a repeat iff this value > MaxRepeat. It is not the exact number of repetitions of a character as it is optimized for speed, not accuracy
Type:
integer