Wiki Home

Recursive Data


Namespace: WIN_COM_API
Recursive Data - It's A Natural!
Some suggestions for this topic
  • See also Configuration Semantics for ideas for more generally standardized field nomenclature for the control fields.
  • Joe Celko's Trees and Hierarchies in SQL for Smarties ISBN 1558609202, though not entirely Fox-applicable, covers this and many other facets as well.
  • The solution below, while good insofar as Fox is concerned, doesn't scale to back-end databases, which may or may not be a consideration.
-- Steven Black


For this discussion, "Recursive Data" is generally one table whose records are related to each other in a hierarchy like:
-Parent
-Child
-GrandChild
-GreatGrandChild
-etc.

In a "Normalized" world, each level above would be a separate table, but I have found that "Normal" rarely exists in the real world. The real world is about many things that are different, but only in a slight way. Normalization is black-and-white, recursion is gray.

I would like to take a moment to point out the misuse of the word "Normalized" here. The data design being discussed is not a violation of data-normalization. When defining a table to hold self referencing information it is actually in compliance with the concepts of normalization to insure that one single table holds all of the data. Artificially creating additional tables to hold parts of the data is in violation of a normalized design.

For some reason many folks enjoy striking out at normalization as if it is something bad or difficult to understand. In fact, a fully normalized design is usually a common sense design as well. The rule of normalization that supports the self-referencing data situation is the third normal form which states that all of the data in a table must be dependant on ONLY the primary key for its identity. In the situation where multiple tables would be used to avoid the self-referencing situation the data in these additional tables is not only dependant on the primary key but also on which table it is in, as these tables are created based on some artificial criteria.

So, in conclusion to my little diatribe here, using a self-join table to represent self-referencing data is the "normalized" design, and it is not in conflict with normalization. -- Jim BoothOffsite link to http://www.jamesbooth.com


We often end up using "Reports" to display the recursive data we have forced into normalized related tables. Think about the lowly "Invoice" report. On an Invoice, you might have a "line item" with multiple child "line items" and each of those may in turn have it's own child "line items". This is especially true if you are invoicing for Professional Services or a complex Built-To-Order item like a computer. Each component of a computer may contain several sub-components and so on.

If you use normalized tables to hold this data, you are forced to create a "fixed" number of "broad" categories. In other words, you end up telling the user that they can have 3 levels of detail. They force their data to fit your rules, then come up with a new product that needs 5 levels - time for a rewrite!

In a recursive table, the Parent records knows nothing about the Children. The Child records must maintain their own link to the Parent. The "relationship" is defined using a "Parent-Child-Link" field that will exist in the child record and will hold the "Primary Key" value of that child's Parent record. So, a Parent can easily "find" all it's children, and a child can always "find" it's parent.

This sounds cool!! So, why don't we design stuff this way more often? The answer as I see it is the SQL language. The only way to pull out more than 1 level of records is to recurse down the "family tree" with code. However, even this does not need to be as bad as it sounds! If you are using small (I prefer Integer) key fields and you have them indexed properly, recursively pulling out small subsets of data can be relatively quick.

VFP8 and above make this process much easier with their support of "READWRITE" cursors. SQL seems to support recursive data, but I think there is a basic scan-append-scan-append loop going on there, just like in the code I have written below.


Generic Recursive Data Routine

This code will retreive all related records in a "recursive" table given just the starting record.

Lparameters iPK as Integer, cPKField as String, cParentPKField as String, cAlias as String, cFinalSortTag as String

*---Parm Notes
* iPK	         = The "key" value of the first parent record
* cPKField	 = The name of the "field" used as the "primary key".
* cParentPKField = The name of the "field" used to "relate a child back to it's parent".
* cAlias	 = The name of the "table" to get records from.
* cFinalSortTag	 = The field(s) that will "sort" the final results.

*---Clean Up Parms (You should change this to substitute your own "Default" values).
If Type('iPK') <> 'N'
	=MessageBox("Please pass a beginning Primary Key Value to process.",48,"Error",5000)
	Return
EndIf
If Type('cPKField') <> 'C' or Empty(cPKField)
	cPKField = "inl_pk"
EndIf
If Type('cAlias') <> 'C' or Empty(cAlias)
	cAlias = "invline"
Endif
If Type('cParentPKField') <> 'C' or Empty(cParentPKField)
	cParentPKField = 'inl_parent_pk'
EndIf
If Type('cFinalSortTag') <> 'C' or Empty(cFinalSortTag)
	cFinalSortTag = 'inl_show_order'
Endif

If !Used(cAlias)
	Use (cAlias) In 0
EndIf


*---Setup cursors.
If Used('csrAll')
	Use In csrAll
Endif
If Used('csrChildRows')
	Use In csrChildRows
Endif

*Get original Parent Record and store them in our "master" cursor csrAll.
Select * From &cAlias Where &cPKField = iPK Into Cursor csrAll Readwrite

*Initialize this csrChildRows cursor with original parent set.
Select * From csrAll Into Cursor csrChildRows Readwrite


*---Get Child Records.
*Drill down through all levels of child, grandchild, etc. to extract related records.
If !Eof('csrAll')
	Do While .T.
		*Get all children for the current set of parent records.
		Select * From &cAlias Where &cParentPKField In ;
                    (Select &cPKField From csrChildRows) Into Cursor csrNewRows Readwrite

		If _Tally = 0
			Exit
		EndIf
		
		*Add the found children records to our "Master" table.
		Select csrAll
		Append From Dbf('csrNewRows') For !Deleted()

		*Make this set of children, the next set of parents (this is the 'recursive' part).
		Select * From csrNewRows Into Cursor csrChildRows Readwrite
	Enddo
Endif

*---At this point, our "master" cursor (csrAll) contains the original
*Parent records and ALL related children, grandchildren, and so on.
Select csrAll
Select * from csrAll order by &cFinalSortTag into cursor csrSorted
Browse normal title "All Related Records..."


*---Cleanup
If Used('csrSorted')
	Use in csrSorted
Endif
If Used('csrAllRows')
	Use In csrAllRows
Endif
If Used('csrChildRows')
	Use In csrChildRows
Endif
If Used('csrNewRows')
	Use In csrNewRows
Endif
If Used('csrAll')
	Use in csrAll
EndIf


Here is a good article on hierarchical data in SQL. http://dbazine.com/tropashko4.shtml
See also an article by Andy Kramek and Marcia Akins in the July 2003 edition of Fox Talk titled "Let 'er rip", searching for which will get you an error on he Pinnacle Publishing website. -- Steven Black
Contributors: Paul James

Categories: Category Data Modeling - Category Design Patterns

See Also: Relational Myths - Parent Child Terms
( Topic last updated: 2007.01.19 12:32:18 AM )