Wiki Home

CDXCorruption Checklist

Namespace: WIN_COM_API
(With thanks to Bill Anderson, Jim BoothOffsite link to
, LisaSlaterNichols, Andrew MacNeill, Dan Freeman, Drew Georgopulos, Tamar Granor, YAlan Griver, Doug Hennig, Whil Hentzen, Rick Hodder, Rick Strahl, EdZiv and JayvanSanten.)

Note: A while ago I was asked to consult for a client who was experiencing "severe index corruption problems". Before going to their site I enlisted help from some friends to produce a list of things to check. Here is that checklist.

Note that, in this case, none of the items below was the cause of "index corruption". Here's what was happening: One machine on their 50++ user network had the wrong was 8-hours ahead of the clock time. Therefore all its transactions after 4:00 PM were being allocated to the next day. The result: things didn't balance.

The next day they would reindex, run for an hour, then run the calculation process again (before 4:00 PM). Everything in that hour works! Reindexing solves the problem!! Then the clock would tick beyond 4:00 pm and the next morning things wouldn't balance. So they would reindex, run for a while, everything done in that time balances, and so on...

IMHO this looks suspiciously like the first item in the chart below: corrupt data ( an incorrect time-of-day ) that made the data look out-of-scope.

Well, there is a notable semantic difference between corrupt and incorrect data. In this case, the data was definitely not corrupt. As a result of one workstation's clock synchronization, incorrect data was being posted. The problem was several layers removed from the Fox application...-- Steven Black

In my experience almost all "index corruption" problems are not due to index corruption at all. -- Steven Black

Possible cause Area Action
Corrupt data in the dbf's that corrupts the indexes. Data Inspect the indexed fields for causal or collateral corruption.
Audit for changes in SET NEAR/SET EXACT/SET COLLATE (IOW it may not be an index problem) Environment Audit the application for SET NEAR, SET EXACT, SET COLLATE
GPFs on workstations causing havoc with server-side data Environment Inquire about workstation GPFs.
A legacy FP 2.6 app somewhere,
or perhaps. Excel, Access applications accessing or modifying data?
Environment Inspect all applications connecting directly or through ODBC.
Virus: several client systems
had a new strain of the Monkey B virus that caused index corruption but didn't seem to display any other symptoms
Environment Scan for viruses, especially the Monkey B virus, everywhere
Video or printer driver conflicts in memory. Hardware Audit all video and printer drivers, and get and install the latest versions from manufacturer's websites.
Hardware not NT-Approved Hardware Make sure that machines running NT have tested and approved components.
Bad memory chips. Hardware Run hardware diagnostics on all machines.
A faulty NIC (just one) can cause index corruption. Hardware Run NIC and network diagnostics from all machines. Network faults could be the cause, but have to be traced systematically and physically for nic as well as cable faults.
Bad sectors on the disk where
temp files are being written.
Hardware Scan and defrag the important disk drives
I remember back from when I was teaching our Advanced VFP 2.6 course that we told people (depending on the size of the database) to do a "DELETE TAG ALL" and rebuild the indices manually rather than performing a REINDEX, because VFP stores the index expression in the table, a corruption in the table header would replicate that corruption into the index, giving them a brand-spanking-new-and-completely-corrupted index file. Index Make sure reindexing routines are using INDEX ON expressions and not simply issuing REINDEX. Also consider Stonefield Database Toolkit for DBC with RI
UDFs that return dodgy results on an intermittent basis (i.e. not debugged) Indexes Check indexes for UDFs.
Index expressions need to be constant length.

In Msg Svc on a system I was working on, there was an index tag on the expression:


Whenever I did an APPEND FROM into the Msg Svc table the index would be corrupted. I removed that tag and the problem went away

This may mean that index expressions that have the potential to evaluate to a null string "", may cause errors during APPEND FROMs.

Indexes Make sure they are using fixed-length index expressions. Check any UDFs and functions used in index expressions.
Make sure they use CDX indexes, and aren't using .IDX files

Also Consider code that creates temporary indexes being created using IDXs in conjunction with the CDxs and then deletes the IDX afterwards. If they converted this to CDXs, they definitely would have crashed it out.

Indexes Make sure they don't (or ever did) use IDX files. Audit the application for SET INDEX TO and INDEX...TO statements
We saw duplicate primary key values under the following circumstances:

1. From User 1, add a record via a local view and do a TABLEUPDATE().

2. From User 2, add another record to the same table via a local view, then TABLEUPDATE().

3. From User 1, make a modification to the record added in step 1 and issue another TABLEUPDATE(). At the point User 1 makes the 2nd TABLEUPDATE(), the record added by User 2 will have its PK value changed in the base table. The index is not updated with the changed value.

This "overlap" is required for the corruption (duplicates) to occur. It
always or almost always occurs when the overlap is present, and never when the overlap is absent. The structure of the application and human nature (i.e. getting a cup of coffee, or taking a lunch break) dictate that this overlap will happen frequently with normal use of the application.

Additional work at MS has determined the following:

1. The behavior also occurs in VFP 6.

2. The corruption is really an overwriting of the first byte of the first field of the local view's base table in User 2's record with a CHR(26) (or "end of file" marker). If the first field is an integer field, its least significant byte will be changed to a CHR(26). This will cause a duplicate if the value already exists in the table, or will cause a duplicate later when the value is added correctly. Under certain circumstances, the first 2 bytes may be overwritten, the first with what appears to be a random character, then the second with a CHR(26). I was able to cause this by having 3 users, the with User 1 adding a record, and User 2 and User 3 adding a record, then User 1 doing a save at approximately the same time User 2 or User 3 does a save.

3. If an extra "dummy" field is added to the base table, and set to be the first field in the table structure, the extraneous character(s) will appear in the dummy field rather than the CID field (which was formerly the first field in the table structure). This seems to alleviate the corruption and requires no changes to the application. Note that the extra field does not need to be added to the local view(s) based on that table and the views will require no modification unless an error occurs like 1568 "View fields have been changed. The Update and Key field properties have been reset to the default values" when the views are used. This seems to be dependent on the view structure.

5. This has been submitted to the Visual FoxPro Product Group as a bug. Pending review by the Product Group, I would say it could be a Visual FoxPro bug, but could perhaps also be OS related, such as a network caching issue.

Known bug Check the application for the use of updatable views. If so , try the trick suggested here.
Overloaded networks. If they are on 10Base2 (Coax) you might be better off moving to 10BASET. Network Ask LAN administrator to comment on and measure the peak network loading.
Another thing to rule out (though I don't see how this could corrupt indexes) is that the temporary directories for the clients exist, write priviliges exist and there's enough room to write this temporary inormation. Network Check for rights, diskspace on
temporary directories
Since this is a "mixed" environment, is there a NET.CFG? If so, it should probably have show dots = on, to handle relative directory references. Not sure what this could do to an index problem, unless we're talking about some tables that are not in the same directory as their DBC, but it could happen! Network Check NET.CFG for SHOW DOTS=
Too few resources defined in
Network Check NET.CFG for sufficient FILES=

Set all the files and buffers to the max, and free as much ram as necessary, to not run out of locks. (Don't forget the many handles necessary for dbfs, memo files, cdxes and even external one shot (ugh) idxes could necessitate a phased opening of tables rather than opening all the app tables at once, as we would probably like.

Make sure the servers have enough memory, HD space etc. Network Determine the available network resources.
Network cables and PCs positioned close to sources of induction noise. Network Inspect all the network wiring and the location of all PCs.
Maybe Novell and NT aren't playing well together in the same sandbox. Network Make sure the latest versions of each, and their services, are running everywhere.

FWIW, we're running mixed Netware/NT at CAT and we haven't had a single index corruption problem.

Deep Novell Thoughts Network My first instinct would be to see who owns the writes for the data. If the writes are managed by Novell, then I've seen the following behavior; the reason I think this may be germane is that Novell may be handling the *names* of the objects is a similar way, and in so doing, prevent vfp from using new or reindexed files.

The scenario is this, when a modi stru or a pack happens, vfp makes a copy of the table under an internal temporary name (a new file from the perspective of the NOS), so once the records are transferred to the new structure, vfp copies them to the pre-existing table name, but the NOS interprets this as a new file name, and as such, resets the flags on the new file to defaults, which may be prohibiting access.

You'll need to reset the flags on the 'new' table to get access to it (R and Sh seems to ring a bell, but its been a long time, do FLAGS /? ) .

The reason I feel this may have some legs is that if for example indexes are reindexed, I'll bet they are also managed by some vfp internal temporary name, and when vfp finishes reindexing and is ready to rewrite the index under its original name, the NOS interprets it as a new file and sets default flags that prevent its access. The flags may have to be reset on the cdxs or dbc or whatever has been touched by a file operation in vfp.

The workaround is to shell to the os and reset the flags after file operations that change file system names internally, even though they have the same file names at the start and end of the process. see?

Dirty power for the server or the workstations. What about brownouts and surges? Power Check for UPS units on all machines (at best) and good quality surge supressors (at least)
Sabotage Soft Issues Don't rule out the possibility
If it's hardware-related, it'll
be a BEAR to fix because hardware is the last thing people will be willing to replace.
Soft issues Promote the idea that the solution might be hardware related.
Was the app ported up from Fox 2 or written from scratch? Are there BROWSEs in there? Any exciting BROWSE Valid/When UDFs? I don't know what this app does, but it is possible to do great things with SET REPROCESS and BROWSE When -- unfortunately these great things have a tendancy to backfire when brought forward into a VFP/modeless environment. Software Audit for BROWSE commands
Do they "declutch" cursors? (that's the practice of doing a USE AGAIN to make a SELECTEd cursor editable) Do they use SYS(3) to get unique table names when building temporary indexes or SELECTs? (Do they check for real uniqueness when they build temporary filenames, even with SYS(2015) or other methods?) Software Audit for the editing of cursors using the USE AGAIN trick.
Could it be something that seems entirely unrelated to indexes, like nested modal dialogs with OCXs that are fighting for "supremacy"? If something like this is bringing down the app with regularity, and they're working with unbuffered data on these screens, who knows. Software Audit for the use of ActiveX controls.
Do they decrypt/encrypt data on the fly with complex UDFs? I suppose that it's possible to have a failure in the middle of such an activity, and this may affect primary/candidate keys. Look for any lengthy UDFs on a Default value for a field, look for control tables that are incorrectly handled in such a UDF. Software Audit for the use of encryption.
Is their use of transactions appropriate, if any? Software Audit for the use of transactions.
Check also for use of the
UNIQUE index tag. Some people are confused about this one, still.
Software AUDIT for UNIQUE
Some people use SET COLLATE TO
GENERAL to provide case-insensitive indexes.The documentation even suggests this as a good idea. However this has led to index corruption in my experience especially when used within a report
Software Audit the application for SET COLLATE TO "GENERAL"
Watch for STR() with fewer than three parameters in index expressions. I've seen that cause all manner of weirdness. (The docs claim that 3rd param is optional. THEY LIE, but it only shows up on some configurations.) Software Audit the application for STR()
in index expressions.
Check for a cyclical reference
in the persistent relations. One of my clients had this corrupt his indexes big time, although I don't know the details.
Software Audit the data model for possibilities of cyclical relations.
Bad install of VFP Software Check for compatible VERSION() information. Reinstall VFP/application setup everywhere.
Appending large amounts of data while the data is being used for reading and writing by other users. This could be a reason to switch to SQL Server Software Determine if any block appends are happening, and reorganize or resequence that activity if necessary
Fast inserts from multiple simultaneous machines.

The problem here is that separate users/sessions are not seeing the inserts of the others without an\an explicit USE IN / USE x basically forcing the table closed and to repopen. This can also cause people to have to the wrong image and
VFP subsequently wrtes invalid data to the DBF.

You can produce duplicate keys if data is added too quickly (i.e. APPEND FROM, INSERT INTO, etc.). If they've got a function that creates a surrogate key, (like NEWID()) depending on frequency of writes in this environment, there could be duplicate keys getting created.

Software If a NEWID()-type function is being used, suggest a strategy to flush the buffers or validate the recent write or open-and-close the table as needed. Make sure this table is unbuffered
If they are using RI stuff, are they screwing up the error handling somehow? Software Inspect stored procedures.
Is the DBC okay? IOW, could there be corrupt meta table information in there somehow? Software Inspect the DBC for corruption.
Invocation stack problems: How about udfs in the indexes? If an index is relying on a prg up high in the stack for its resolution, and the prg can't be read, or a function with the same name is found sooner, that could bomb the index. Software Make sure that all UDFs used in indexes are singularly and properly resolved.
What's the general error handling like? Are the indexes REINDEXed or recreated from scratch? What happens after a
crash, in general?
Software What is the error message or error indication?

Inspect the error handling and reindexing routines.

Have they changed the VFP INT
file? What (V)FP version are they running? I remember that missing INT files could do some damage. Also, some people try to mess with that file to create new collating sequences. But I don't think a lot of people do that, actually. .
Software Inspect the INT file to make sure it isn't missing and to make sure it isn't a custom version use for custom collating sequences.
Make sure that the version of the runtime libraries has been updated to match the development/compiled one. I have seen a couple of cases where they have applied the service packs to the dev version but the ESL or whatever is still at the original VFP release. This can create very weird errors including things that look like data and index corruption Software Audit to make sure the versions of the runtime libraries on the workstations match the version used by the developers.
In the case of DTOC() indexes, if they are using SET CENTURY and SET DATE. This causes major problems and could therefore possible create corrupt indices. Software Audit the application for DTOC() indexes.
Users powering down improperly.

(Rick Strahl says: FWIW, I don't think people shutting down will cause the indexes to corrupt, unless the user shutting down is the last user. As long as someone on the network has the file open an improperly left VFP table won't kill the header.

Users Check procedures
Services running on the server. I encountered an installation that had a defragging application running on the server during the day when a Fox application was running. Although the software was advertised to be able to work with applications running, I don't think it had been tested with a database application. Corrupt indexes and missing data were the result. Server services. Reschedule such services.
Under Vista, Windows 7 & Server 2008, SMB2 is switched on by default, and this provides opportunistic locking ("Op Locks") which can result in data corruption in non-client-server environments. A fuller explanation is available at Server configuration Disable Op Locks:

Category VFP Troubleshooting Category 3 Star Topics Category Data Corruption

( Topic last updated: 2011.07.04 05:57:20 AM )