Wiki Home

Simplified String Parsing


Namespace: VFP
String Parsing Made Easy. VFP 6.0, 5.0, 3.0
Requirements: Visual Fox
See Also: Reg Exp
Key words: Strings, AT(), SUBSTR(), COM, VBScript, Regular Expressions, VBScript.RegExp

Ever have a very complex string that you needed to parse through? Have you strung At() and Substrings() together till they were virtually unreadable? I found a COM object that can make all that string manipulation very easy. I discovered this little gem while reading a VB magazine. It is also readily available to use in FoxPro. Its called a “Regular Expression.” And you make one by using CreateObject( “VBScript.RegExp” ). Here is how you use it.

** Create the Regular expression
oReg = createobject("VBScript.RegExp")

** Give it a pattern
oReg.Pattern = "fo+"

** Set its properties as desired
oReg.IgnoreCase = .t.    && ignores case sensitivity
oReg.Global = .t.   && Find just one or all. True = All. Defaults to False

** Supply the string
txtString = [f foo food fool {a1b2c3} booooooooo]

** And execute.
oMtchColl = oReg.Execute(txtString)


Execute returns a collection of “Matches”. You can loop through each of the matches and get its position, value and its length. Like this:
For each Match in oMtchColl
	? match.FirstIndex, match.Value, match.Length
Endfor

Keep in mind that the FirstIndex is a C like array. So the first position is 0.
The example above returns :
2 foo 3
6 foo 3
11 foo 3

It also has pattern replacement.

? oRegExp.Replace( txtString, NewString )

It's that easy.

I use regular expressions to parse XML strings when I don’t want to go through the trouble of using the XMLDOM. For small xml files I find it easier.

Here is a list of all the character patterns that you can use. This list came from : http://msdn.microsoft.com/workshop/languages/clinic/scripting051099.asp Which has a great article on regular expressions complete with great examples.
Position Matching
Position matching involves the use of the ^ and $ to search for beginning or ending of strings. Setting the pattern property to "^VBScript" will only successfully match "VBScript is cool." But it will fail to match "I like VBScript."

Symbol Function
^ Only match the beginning of a string.
"^A" matches first "A" in "An A+ for Anita."
$ Only match the ending of a string.
"t$" matches the last "t" in "A cat in the hat"
\b Matches any word boundary
"ly\b" matches "ly" in "possibly tomorrow."
\B Matches any non-word boundary

Literals
Literals can be taken to mean alphanumeric characters, ACSII, octal characters, hexadecimal characters, UNICODE, or special escaped characters. Since some characters have special meanings, we must escape them. To match these special characters, we precede them with a "\" in a regular expression.
Symbol Function
Alphanumeric Matches alphabetical and numerical characters literally.
\n Matches a new line
\f Matches a form feed
\r Matches carriage return
\t Matches horizontal tab
\v Matches vertical tab
\? Matches ?
\* Matches *
\+ Matches +
\. Matches .
\| Matches |
\{ Matches {
\} Matches }
\\ Matches \
\[ Matches [
\] Matches ]
\( Matches (
\) Matches )
\xxx Matches the ASCII character expressed by the octal number xxx.
"\50" matches "(" or chr (40).
\xdd Matches the ASCII character expressed by the hex number dd.
"\x28" matches "(" or chr (40).
\uxxxx Matches the ASCII character expressed by the UNICODE xxxx.
"\u00A3" matches "£".

Character Classes
Character classes enable customized grouping by putting expressions within [] braces. A negated character class may be created by placing ^ as the first character inside the []. Also, a dash can be used to relate a scope of characters. For example, the regular expression "[^a-zA-Z0-9]" matches everything except alphanumeric characters. In addition, some common character sets are bundled as an escape plus a letter.
Symbol Function
[xyz] Match any one character enclosed in the character set.
"[a-e]" matches "b" in "basketball".
[^xyz] Match any one character not enclosed in the character set.
"[^a-e]" matches "s" in "basketball".
. Match any character except \n.
\w Match any word character. Equivalent to [a-zA-Z_0-9].
\W Match any non-word character. Equivalent to [^a-zA-Z_0-9].
\d Match any digit. Equivalent to [0-9].
\D Match any non-digit. Equivalent to [^0-9].
\s Match any space character. Equivalent to [ \t\r\n\v\f].
\S Match any non-space character. Equivalent to [^ \t\r\n\v\f].

Repetition
Repetition allows multiple searches on the clause within the regular expression. By using repetition matching, we can specify the number of times an element may be repeated in a regular expression.
Symbol Function
{x}
Match exactly x occurrences of a regular expression.
"\d{5}" matches 5 digits.
{x,} Match x or more occurrences of a regular expression.
"\s{2,}" matches at least 2 space characters.
{x,y} Matches x to y number of occurrences of a regular expression.
"\d{2,3}" matches at least 2 but no more than 3 digits.
? Match zero or one occurrences. Equivalent to {0,1}.
"a\s?b" matches "ab" or "a b".
* Match zero or more occurrences. Equivalent to {0,}.
+ Match one or more occurrences. Equivalent to {1,}.

Alternation & Grouping
Alternation and grouping is used to develop more complex regular expressions. Using alternation and grouping techniques can create intricate clauses within a regular expression, and offer more flexibility and control.
Symbol Function
() Grouping a clause to create a clause. May be nested.
"(ab)?(c)" matches "abc" or "c".
| Alternation combines clauses into one regular expression and then matches any of the individual clauses.
"(ab)|(cd)|(ef)" matches "ab" or "cd" or "ef".

See Also: Reg Exp
Contributors Jordan Baumgardner
Category Automation Category Code Samples Category 3 Star Topics
( Topic last updated: 2005.04.27 05:53:59 PM )