Saturday, May 22, 2010

Calling all C/C++ programmers.?

I need to store an array in a file. The array length is 1000. I want the fastest disk writing method to work. I know reading a big block from the hard disk is faster because it involves only 1 seek. I seem to have 2 optons :





Option 1) for(i=1 to 1000)


fll%26lt;%26lt;Array[i];





Option 2) Store the array contents into a single string str and then fll%26lt;%26lt;str;





where fll is the file pointer. Which would be faster. I am concerened about the performance because I need to write numericaal data in a file of size around 4 Gb.





http://cs.dal.ca/~koul/

Calling all C/C++ programmers.?
I'm a C programmer, but from the documentation I can find this looks to be true for C++ as well.





The file functions in C and C++ are buffered. This means it doesn't matter whether you write one character at a time to the file or 1000, the file won't really be written to until you fill the file buffer, flush the buffer or close the file.





Probably the best way to optimize this is to write the entire string and let C++ handle it. If the size of the array will always be the same, you could set the buffer to that size; however, it's improbable that your program will be that neat. Another option would be to be sure that the buffer is the optimal size for your hardware; i.e. if your hard drive prefers blocks of 1K, set your buffer to 1K.
Reply:Almost certainly option 2 will be faster, since you only have to do one write to disk, and so the operating system can do all sorts of optimizations. If you write them one at a time, there's not much optimization that can be done.





But, I'd try it on a small example on the real system to make sure :-)
Reply:If it is 4 gigabytes of data you had better not try to store it in an array or a string before writing it out!





Maybe that will be no problem in 2015 but in 2005, most computers do not have 4 GB of memory and even the ones that do would really be unnecessarily burdened by that approach.





Instead, why don't you just write the data out using the plain old C++ insertion operator as you describe in Option 2 - but without storing the contents in a single string beforehand?





By default, your file I/O is going to to be buffered. Let the C++ runtime library and the operating system take care of buffering.





With both approaches you mention in options 1 and 2 - they are going to be doing that anyway. No sense duplicating their work and unnecessarily staging the entire contents of a 4 GB file in memory. That 4 GB of virtual memory is going to wind up being stored on disk anyway, since your process will not have that much real memory available for it anyway. And _that_ is going to be *really* slow.





Just make sure your I/O is buffered, not unbuffered, and you should be fine. Benchmark it if you like, just to be sure. Don't be surprised though that when you try to create a 4 GB string - you have a lot of performance problems or something aborts.


No comments:

Post a Comment