GetTextNormalize

Description

The GetTextNormalize method allows you to obtain the text data that is contained within the current PBDOM_CHARACTERDATA object, with all surrounding whitespace characters removed and internal whitespace characters normalized to a single space.

Syntax

pbdom_chardata_name.GetTextNormalize()

Argument

Description

pbdom_chardata_name

The name of a PBDOM_CHARACTERDATA object


Return value

String.

The following table lists the return values, based on the type of DOM object contained within PBDOM_CHARACTERDATA.

DOM Object

Return Value

PBDOM_TEXT

Suppose you have the following element:

<abc> MY TEXT </abc>

If there is a PBDOM_TEXT object to represent the TEXT NODE "MY TEXT", then calling GetTextNormalize on the PBDOM_TEXT returns the string MY TEXT.

PBDOM_CDATA

Suppose there is the following CDATA:

<![CDATA] They're saying "x < y" & that "z 
> y" so I guess that means that z > x ]]>

If there is a PBDOM_CDATA to represent the above CDATA section, then calling GetTextNormalize on it returns the string:

They're saying " x < y " & that "z > y" so I 
guess that means that z > x 

Note that the initial spaces before "They're" and the trailing space after the last "x" are removed. Additionally, the spaces between the words "guess" and "that" are reduced to just one space.

PBDOM_COMMENT

Suppose there is the following comment:

<!--This is a comment -->

Calling GetTextNormalize on this comment returns:

This is a comment

Throws

EXCEPTION_PBDOM_OBJECT_INVALID_FOR_USE -- If this PBDOM_CHARACTERDATA is not a reference to an object derived from PBDOM_CHARACTERDATA.

Examples

This example demonstrates:

  1. Using an external general parsed entity.

  2. Using a single line statement to obtain the children PBDOM_OBJECTs of an element.

  3. Obtaining the text of the three separate types of PBDOM_CHARACTERDATA objects : PBDOM_TEXT, PBDOM_COMMENT, and PBDOM_CDATA.

  4. Obtaining the normalized text of the same three separate types of PBDOM_CHARACTERDATA objects.

  5. The difference between the two types of text retrieved in 3 and 4.

Suppose the file C:\entity_text.txt contains the following string:

&#9;&#32;Some&#32;External&#32;&#32;&#9;&#32;Text&#32;&#9;

The example creates a PBDOM_DOCUMENT pbdom_doc based on the following DOM tree, which is in the file C:\inputfile.txt:

<!DOCTYPE abc [<!ENTITY text1 SYSTEM "c:\entity_text.txt" >]>
<abc>
   <data>
       &text1;
       <!-- &text1;-->
       <![CDATA[&text1;]]>
   </data>
</abc>

The Document Type Declaration defines an external general parsed entity text1.

The example obtains the root element, uses it to obtain the data child element, and then obtains an array of the child element's own children. PBDOM collects all the PBDOM_OBJECTs that are the children of data and stores them in the PBDOM_OBJECT array pbdom_obj_array.

Next, the FOR loop iterates through all the items in pbdom_obj_array and stores each item in the PBDOM_CHARACTERDATA array pbdom_chardata. This step is not required -- the pbdom_obj_array can be used to manipulate the data element's children. It is done to demonstrate that you can cast each item into a PBDOM_CHARACTERDATA object by assigning it into a PBDOM_CHARACTERDATA array. This is possible if and only if each PBDOM_OBJECT is also derived from PBDOM_CHARACTERDATA. If a PBDOM_OBJECT is not derived from PBDOM_CHARACTERDATA, the PowerBuilder VM throws an exception.

The next FOR loop iterates through all the items of the pbdom_chardata array and calls the GetText and GetTextNormalize methods on each. Each of the returned strings from GetText and GetTextNormalize is delimited by "[" and "]" characters so that the complete text content displays clearly in the message boxes.

The first child of data is the PBDOM_TEXT &text1;, which has been declared as an external general parsed entity whose content is the content of the file c:\entity_text.txt. The &text1; entity reference and the entity references it contains are expanded by the parser. The call to GetTextNormalize strips away the whitespace characters.

The second child of data is the PBDOM_COMMENT <!-- &text1;--> and the third child is the PBDOM_CDATA <![CDATA[&text1;]]>. Entity references within comments and CDATA sections are never expanded. Both GetText and GetTextNormalize return &text1;.

PBDOM_Builder        pbdombuilder_new
pbdom_document       pbdom_doc
PBDOM_CHARACTERDATA  pbdom_chardata[]
PBDOM_OBJECT         pbdom_obj_array[]
integer              iFileNum1
long                 l = 0

TRY
 pbdombuilder_new = Create PBDOM_Builder
 pbdom_doc = pbdombuilder_new.BuildFromFile &
    ("C:\inputfile.txt")

 pbdom_doc.GetRootElement(). &
    GetChildElement("data"). &
    GetContent(pbdom_obj_array)

 for l = 1 to UpperBound(pbdom_obj_array)
    pbdom_chardata[l] = pbdom_obj_array[l]
 next 

 for l = 1 to UpperBound(pbdom_chardata)
    MessageBox(pbdom_chardata[l]. &
      GetObjectClassString() + "GetText()", &
      "[" + pbdom_chardata[l].GetText() + "]")
    MessageBox (pbdom_chardata[l]. &
      GetObjectClassString() + " GetTextNormalize()", &
      "[" + pbdom_chardata[l].GetTextNormalize() + "]")
 next 

 Destroy pbdombuilder_new

CATCH (PBDOM_Exception except)
 MessageBox ("Exception Occurred", except.Text)
END TRY

Usage

If no textual value exists for the current PBDOM_OBJECT, or if only whitespace characters exist, an empty string is returned.

See also

GetText

GetTextTrim

SetText