Wiki Home

Tuning Issue


Namespace: B2B
Sql Tuning Issue

We're importing data from four different workstations into Sql Server. When all but one finish, that last machine seems to run at the same speed or slower than when they were all running. Total time to this point is about 24 hours. We extrapolate time to finish on the last box to be another 24 hours.

We stop the last machine and distribute what's left to the four machines finishing in a couple of hours instead of 24.

I'd like to keep from chopping up the data. I'd like to tell sql, "Hey, when there's only one workstation dishing you data, give it all you've got."

Where do I go in Sql Server to begin resolving this? Joe Kuhn
How are you "importing" your data? Remote views, sql pass through? If so, I highly recommend you look at Data Transformation Services. DTS, in my experience, can bulk load from an ODBC source MUCH faster than pushing the data in using other methods. -- Randy Jean
Hmmm. That would mean a re-write of the last third of my importer. I'd rather not do that at the moment, but I'll keep DTS in mind.
Does the Server use parallel I/O? SCSI
Does it have multiprocessors? Yes. 1.4GHz
Do you have indexes established on the target database? Yes
Where is your log pointing to?

Is the data pre-sorted? Sometimes yes, sometimes no.

Are you using BCP, DTS, VFP or other to get data into the server box? VFP

Is the database the right size to handle the data coming in? What is 'right size'?

Val Matison
This is a very large Fox importer. The first part brings the data in from a text file into Fox 'temporary' tables. The second part generates cross table keys that keeps the data intact (sometimes a single line of data from the source file needs to go to serveral different sql tables). The third part moves the data from the temp tables to sql. Each of these three major steps can have pre and post process code.

The text file is a "mixed line" format. Different lines have different data items on them. Each line has an ID that tells me the format for the line. So for a specific account I get a bunch of lines that may be in different formats. I have a data map for each type of line which then goes into a different temporary table...

Sounds like a nightmare, but Fox objects make it do-able. I just need to be pointed in the right direction for this sql issue.

Joe Kuhn
1.) The log referred to is the SQL Server log. Is it located on the same drive as the data drives?

The hardware guy tells me it's on the level 1 mirror.

2.) What Raid level are you using?

Raid 5 for the data. Drives are SCSI.

3.) How much data are we talking about here? 480Meg per cd. 11 cds altogether in this specific import with a range of 3 to 20 cds for other imports.

4.) Pre sorting the data in index order and then loading it this way into an empty database is very fast. Pre sorting does NOT make a lopsided index.

Great info, thanks.

5.) Do you have a clustered index on the tables?

We put one in (an index on two fields) and it improved that process by 10 fold. We also indexed two other fields and improved our speed by 100 fold. But these are import specific issues that don't apply to what is left running at the very end on that one machine.

6.) Where does the post processing take place?

It runs on the workstation in Fox with sql pass through and does all the usual actions - select, update, insert.

7.) Do you have any triggers on the SQL Server side? Yes, I'll have to check and see what they do.

Val Matison
Val,

I'm content with the speed when all workstations are loaded and running. It's just when they all finish but one. That last one seems to run at the same pace as when it was competing for services.

Is there a way to ask sql to "Give all you've got to the user that remains?"

I do appreciate your questions. I have other optimizations I can and will do...but I'd to work on the sql side as well.

Joe Kuhn
Are the workstations disconnecting from the server when they finish. It is possible (although I don't know for sure...anyone?) that SQL Server is allocating resources to the various connections and not re-allocating them after the processes on those connections finish because the server does not know if the workstation is finished. Disconnecting from the server may help in this case. It's worth a try. - Raykirk
Disconnected yes. Tried rebooting the machines just to be sure. No gain. Joe
SQL Server will allocate all resources it can to the process where applicable. So it's always on full speed. You can't say to SQL, stop another process and do this one until you are finished. I was going somewhere with my questions - not knowing your system - the slowdown could be one of many issues. From your latest comments, it sounds like something to do with workstation itself. Have you tried that one on it's own?

Yep. It is on it's own when the others finish. All workstations seem to run at about the same speed regardless of how many are running.

Your hardware sounds like its done properly.

Connecting and disconnecting will not cause the type of bottleneck you describe.

FYI, Importing lots of data into tables with indexes can be slower than if you drop the indexes first, load the data, then build the clustered index and then build all the other indexes.

Is the final workstation importing the same data as the other workstations?

Yes, same stuff, just more of it.

Val Matison
Val, Joe

Is it correct to assume a single machine will run faster when the other machines are not sending data to SQL Server? If the server was the bottle neck, then data should flow faster when the server is less stressed. Since the server could handle 10 PCs simultaneously, it may not be the bottle neck. IMO, each PC can only shovel data at a certain rate. That rate is apparently far less than the bandwidth of the network and apparently far less than the server's capacity.

Mike Yearwood
Mike, When we run the task manager performance monitor on the workstations we see CPU Usage floats at about 20-40% regardless of the number of workstations feeding data in. If one workstation finishes early I do expect the others to bump up a little because the pipe to sql and sql should be less stressed. This I don't see in practice.

Joe Kuhn
Have you looked at the packet size specified in the connection? If it is small, you may want to try boosting it up to 4k or 8k. - Ray Kirk
Ah, in 'SqlSetProp' Good speed improvement. I still want more. Joe
I don't think, at least in this scenario with only a few workstations, that it is correct to assume that if 4 stations require X amount of time, one working alone should process at 4X the speed. SQL Server is like Token Ring networks. Native VFP, for smaller numbers of users, is significantly faster. So is ethernet. But add huge numbers of users, and it slows down considerably. SQL Server, like Token Ring, is medium fast with only a few stations, but it also degrades slower as more and more users attach. Eventually, the one that was slower at the start is actually faster because it doesn't degrade as quickly as the alternative. Now, if you said the process started with, say 100 workstations feeding data, and then all but one finish and the last one doesn't go any faster, I'd be concerned. I think the real issue here is that 4 or one, the SQL Server is processing at the exact same speed, and the meaning to be derived from that is not that it doesn't get any faster with only 1 station, but that with all 4 stations active it doesn't even feel the hit.
- Randy Rinker
Ok, I get it. Thanks for the viewpoint. But I don't want that feature during my data load. There aren't lots of users then. I've got to get tons of data loaded fast and four machines to do the job...

If I open Fox tables exclusive things go alot faster. Is there something like this in Sql Server?

Joe Kuhn
That was exactly the point I was trying to make -- Mike Yearwood

I think it was mentioned up near the top. What you probably should be doing in this case is handling all the dirty processing tasks in VFP, and then saving the data that you end up importing. Use the SQL Server bulk copy program BCP, or create a DTS package on the server, to do a bulk data load. Bulk copies transfer large amounts of data in big chunks between commits, compared to a record by rcord copy that probably is committing each INSERT. There is another wiki topic on call DTS services from within VFP, so you can still have the entire process controlled by your VFP app. The downside to the bulk operations is that if there is a problem and it has to roll back the transaction, it could be discarding a lot of records, however many got copied since the last commit. Try it, even if you run one manually, I think you will be pleasantly suprised at how fast a bulk copy can move data.

-RandyRinker


Have you tried increasing the SQL database size above the total data you expect?

If your SQL Server increases the file size for the database in small increments as the final machine is importing it can cause the import to go slower. We overcame a similar slow importing problem simply by increasing the SQL server database size to an amount of MB slightly larger than the expected total. Then the server doesn't need to do all that resizing work little by little as you import, you do it all at once before the import starts.

- Dave Noal
( Topic last updated: 2002.05.31 12:27:58 PM )