vs $ vs ASCAN
[2004.04.05] Updated all charts to reflect VFP8. Note that it seems tha AT() has been improved in VFP8 and seems to nominally beat $ in some of these tests where AT() and $ were equivalent in previous versions. YMMV.
If the task at hand is to determine if a string is contained in a list of candidates, you have at least seven ways to perform this check in VFP. Let's take a look at three of them: the
See below for an updated test that uses VFP8's Collection Class for this task.
The following code:
lcList = "one,two,three,four"
? "one" $ lcList
Should produce three .T. responses.
Of course, the obvious question is "All things being equal, which is fastest?"
The quick answer follows from the TANSTAAFL principle (see Acronyms). Since AT and ASCAN give you more information, namely the position of the matching candidate, they will take longer. Thus,
$ wins. Q.E.D.
But, just for fun, here are some charts showing how much faster. These were produced using VFP8. We are testing lists of length 1000,200 and 20 with different test iterations to damp the variance between tests. Note that the Y-axis scale is not the same for these charts. The acutal time values will vary from machine to machine, and are not relevant to the discussion.
Here is the code used to produce these reports: _ ATvs $ vs ASCAN _ code
There are several interesting things here. First, in all cases the position of the candidate in the list/array is linearly related to the time it will take to find it indicating a simple front-to-back search in all cases. For candidates early in a longer list, ASCAN wins. However, in general, one can expect
AT() be faster. Interestingly, in VFP7,
$ are pretty much the same, but in VFP8
AT seems to have improved. To be the consistent winner in these tests.
It would be interesting to see if
ASCAN() could be beaten by a hand-written search algorithm, perhaps combined with pre-sorting the array.
Since the Binary Search algorithm is typically O(log n), it will eventually outstrip any linear ie O(n) process. Keep in mind however, that we have to pre-sort the array, and if
ASORT() uses something like the quick sort algorithm, it will likely be something like O(n log n) which is Super-Linear. So, this approach (using a sorted array) will likely only be feasible if you are able to pre-sort the array and use it numerous times to check strings. Also note that ASCAN is a native function while our Binary Search function will be written in VFP, thus there will likely be some efficiency lost in the compiling.
The results below pit Binary Search against ASCAN and $. The cost of sorting the array is not included here, but for a 600 element array of this type, the time is <0.001 seconds, so it may well be negligible in many situations.
The code for the binary search has been added to _ ATvs $ vs ASCAN _ code.
You'll note the search time is highly uncorrelated with position for the Binary Search. This is because, well, it's a binary search :-) and sometimes it gets lucky. Overall, Binary Search will produce a much lower variance given a uniform distribution of search key requests. This is an important consideration if you are implementing something like this in a web-server or any other application where user's requests are likely to be queued under high load. The overall variance will be the main driver of the user's perception of response time, so you may want to consider using a Binary Search approach especially if you have high variance in the list length. Finally, not so obvious here, is the average response times for each of the three methods. Which over all 1000 postions is: 0.202, 0.111, 0.173, 0.273 for $, AT, BIN and ASCAN respectively. Again, the improvements to AT() in VFP7 have caused it to beat BIN until a list lenght of about 900 in this example. Previously, the crossover point was about 500.
Recall that the scenario at hand was a simple 1 dimentional list lookup. If you want to translate your key location to some other data, a multi-dimensional array can be a good structure, and of course ASCAN becomes your only option. Prior to VFP7 ASCAN would search the entire array. You can always check to see if the hit was in the index column, but the cost of searching the entire array is there. VFP7 supports an nSearchCol parameter in the ASCAN() function which restricts the scan to a particular column. This is not a wrapper around the old ASCAN(), thankfully, since most of us have already written that. Rather, ASCAN() in VFP7 takes advantage of the reduced work of looking down one column of the array rather than searching all of it. This corresponds to time reduction factor of approximately 1/#cols.
The tests above seem to indicate that implementing a Binary search and pre-sorting your arrays can be an improvement over ASCAN() after the total array size is > 400-450. And, until VFP7's column-specific ASCAN() comes into play, this has some implications for applications using multi-column arrays since a homespun binary search can easily restrict itself to the appropriate column. Thus, a 100 row, five column array can be more efficiently searched on the key column via a binary search function rather than ASCANing the whole thing.
VFP8 gives us a new Collection Class that might be useful for this task. Note however, that using the collection class for a simple lookup like this is NOT a good example of the power behind this class. However, it turns out this approach is extremely fast. Here are two charts demonstrating the same test used above. Also a cursor-based approach using SEEK has been added for comparison.
Some things to note:
1. SEEK and Collection approaches are very fast, but remember the TANSTAAFL principle. We are not measuring the cost of loading the candidate strings in the first place.
2. On a similar note, if your candidate string list changes frequently, be aware that adding and removing elements will cost more with the SEEK and Collection approaches (and the BINSEEK approach).
3. While SEEK and Collection are both about the same speed, Collection has an advantage in that it has no table/workarea dependency.
4. I don't know, but it seems the Collection key lookup may be using similar hash-table technology to SEEK which outperforms a simple binary search.
Again, this is a poor example of the power of collections as performance is probably not a major consideration where collections are under consideration. However, I think it demonstrates that the cost of looking up elements in a collection is probably not something to worry about. - ?lc
[2002.11.20] _ ATvs $ vs ASCAN _ code has been updated to include tests for collections and the CPU Performance Counter API is now used to measure time much more accurately. - ?lc
Would you please include OCCURS in these tests? -- Mike Yearwood
See: SEEKvs ASCAN
Category VFP Commands Category 3 Star Topics Category Performance Category VFP Command Comparison
Contributors: Lauren Clarke
( Topic last updated: 2005.12.21 09:47:39 AM )