Wiki Home

De Vs Norm Speed


Namespace: WIN_COM_API
This program shows that queries can execute faster using normalized data than denormalized data. The reason behind this is simple: Disk IO is much slower than manipulating memory. If a sufficient amount of the parent data is cached, the extra time needed to join it to the child data will offset the amount of time it would have taken to read it from disk had it been stored redundantly in the child record.

The question now is: How much is "a sufficient amount of parent data"? I don't think it is much. It depends significantly on the various components: disk throughput, bus bandwidth, amount of memory, number of CPU's, OS performance, etc.

My experiences have led me to believe that in most situations (including data warehousing), normalized data is more efficient.

In the tests below, I create two databases with the equivilant data in each. then I run two different types of queries: select all, and a summary. The summary goes faster because it doesn't have to write out the result set to a temp file.

I would like it if someone would run this against an SQL server.

I need to augment my study on query speed: for data warehousing you break one of the normal forms, but not the one I used here as an example.

If you have a 3 or more level data structure (grand parent, parent, child) you put a foreign key in the child record that references the grandparent. The descriptions are still in the grandparent, but the query engine has a pointer directly to it, not indirectly via the parent. What you end up with is FactTables and DomainTables. The DomainTable hold things like "Person's Name", "Product Description", "Person's Title". The Fact Table holds the detail data that you would typically find in the child record.

#DEFINE PNTRECS 10
#DEFINE CLDRECS 2000000

do linit

?
? "Using Normalized Data"
lTime( "lMkNormData()" )
* lFileSize()
lTime( "lNormQueryA()" )
lTime( "lNormQueryB()" )
?
? "Using DeNormalized Data"
lTime( "lMkDeNormData()" )
* lFileSize()
lTime( "lDeNormQueryA()" )
lTime( "lDeNormQueryB()" )

do lCleanUp

return

function lMkNormData()

local ;
	lnCtr

create database Test

create table TblA ;
	( pkTbla i primary key, ;
	cNam c(50), ;
	cAdr c(50), ;
	cCty c(20), ;
	cGov c(20), ; 
	cPst c(15), ;
	nQty f )
	
create table TblB ;
	( pkTblb i primary key, ;
	fkTbla i references Tbla )
	
for lnCtr = 1 to PNTRECS
	insert into TblA ;
		( pkTblA ) ;
		values ( lnCtr )
endfor

for lnCtr = 1 to CLDRECS
	insert into TblB ;
		( pkTblB, fkTblA ) ;
		values ( lnCtr, lnCtr%PNTRECS+1 )
endfor

close databases

return
	
function lMkDeNormData()

local ;
	lnCtr

create database Test

create table TblB ;
	( pkTblB i primary key, ;
	cNam c(50), ;
	cAdr c(50), ;
	cCty c(20), ;
	cGov c(20), ; 
	cPst c(15), ;
	nQty f )

for lnCtr = 1 to CLDRECS
	insert into TblB ;
		( pkTblB ) ;
		values ( lnCtr )
endfor

close databases
return
	

function lNormQueryA()

open database test

select * ;
	from TblB;
		join TblA on fkTbla = pkTbla ;
	nofilter ;		
	into cursor Temp

close databases
return

function lNormQueryB()

open database test

select SUM( nQty ) ;
	from TblB;
		join TblA on fkTbla = pkTbla ;
	nofilter ;		
	into cursor Temp

close databases
return

function lDeNormQueryA()

open database test

select * ;
	from TblB;
	nofilter ;	
	into cursor Temp

close databases
return

function lDeNormQueryB()

open database test

select SUM( nQty ) ;
	from TblB;
	nofilter ;	
	into cursor Temp

close databases
return

function lInit
clear
set escape on
set safety off
close data all
close database

set alternate to temp
set alternate on

return

function lFileSize()
local ;
	laDir(1), ;
	lnCtr
	
adir( laDir, "Tbl?.*" )
for lnCtr= 1 to alen( laDir, 1 )
	? laDir(lnCtr,1), trans( laDir(lnCtr,2), "999,999,999,999" )
endfor
return

function lDskSpc()
return "Free Space: " + transform( diskspace(), "999,999,999,999" )

function lCleanUp
set alternate to
_clipText = FileToStr( "temp.txt" )

close data all


function lTime( tcFunc )
local ;
	lnSrt, ;
	lnEnd, ;
	lnDlt
	
? tcFunc
lnSrt = seconds()
&tcFunc
lnEnd = seconds()
lnDlt = lnEnd - lnSrt
?? lShowTim( lnDlt )
* ? lDskSpc()
* ?
return lnDlt

function lShowTim( tcSec )
local ;
	lcRet

do case
	case tcSec < 60
		lcRet = " " + trans( round( tcSec, 2 ) ) + " Secconds"
	case tcSec < 60^2
		lcRet = " " + trans( round( tcSec/60, 2 ) ) + " Minutes"
	otherwise
		lcRet = " " + trans( round( tcSec/60^2, 2 ) ) + " Hours"
endcase

return lcRet



Results:
Local files

Using Normalized Data
lMkNormData() 4.35 Minutes
lNormQueryA() 2.86 Minutes
lNormQueryB() 37.61 Secconds

Using DeNormalized Data
lMkDeNormData() 3.9 Minutes
lDeNormQueryA() 5.18 Minutes
lDeNormQueryB() 6.38 Minutes

10Mb ethernet to NT server, (no sql server)
Using Normalized Data
lMkNormData() 8.83 Minutes
lNormQueryA() 2.85 Minutes
lNormQueryB() 1.13 Minutes

Using DeNormalized Data
lMkDeNormData() 8.27 Minutes
lDeNormQueryA() 9.99 Minutes
lDeNormQueryB() 9.14 Minutes
Contributors Carl Karsten
Category Data Category Modeling
( Topic last updated: 1999.10.15 09:15:25 PM )