Humm ... Are you sure we haven't discussed all this already?
@JLWestAbsolutely. However, in this sample code, the array length is incremented by 1. This means that the array must be copied to another place whenever a new line is read which will become very slow after some time in case of large files. I would pre-allocate a larger block of array items and use a line counter to set the final dimension at the end:
I do not understand why you and others keep insisting on using Memos, Listboxes for processing data. This is ridiculous. You should switch to dynamic arrays, they are fast and can handle much more then 7M records. Loading 7M lines to a memo make no sense anyways, the user cannot process that much information. Lots of bad advice in my opinion.
Although I'm not familiar with the exact requirements, you should try something like this:
This means that the array must be copied to another place whenever a new line is read which will become very slow after some time in case of large files. I would pre-allocate a larger block of array items and use a line counter to set the final dimension at the endOK. Good point :), however after running a few tests the results are not so convincing. For 10M line text file(Lazarus 32bit, FPC 3.0.4, win10):
In order to have something comparable, can you post how you create the 10M lines file?http://forum.lazarus.freepascal.org/index.php/topic,43806.msg307101.html#msg307101
BLOCK_SIZE Time to read
----------- ------------
1 42.8 s
1000 42.5 s
10,000 36.1 s
100,000 11.7 s
1,000,000 8.5 s
10,000,000 8.3 s
ReadLn only 7.8 s
BUFFER_SIZE BLOCK_SIZE Time to read
----------- ---------- ------------
1 kB 10,000,000 5.1 s
1 MB 10,000,000 4.0 s
16 MB 10,000,000 3.9 s
1 kB ReadLn only 4.7 s
1 MB ReadLn only 3.6 s
16 MB ReadLn only 3.6 s
Pick another airport or close the file and quit. Tricky, may have to close the file and start all over at the first record again depending on airport picked.
No screen is capable of displaying 10 million datasets at once.As CCRDude points out, when trying to edit huge amounts of data, it is essential to decouple the data location, data editing and data display functions.
No user is capable of viewing 10 million datasets at once.
Which means they don't need to be in memory at the same time.
@wp
It looks like for BLOCK_SIZE larger then 100,000 the speed increase is significant, however you can overshoot the length of the array with a large amount. I find a solution which can count the records fast(3 seconds on my computer), then I can set the length to the exact value:
function GetRecordCount(const AFileName: string): Integer; var DataFile: TextFile; begin Result := 0; AssignFile(DataFile, AFileName); try Reset(DataFile); while not eof(DataFile) do begin Readln(DataFile);//this line has changed, basically it reads nothing but is needed otherwise we go into an endless loop Inc(Result); end; CloseFile(DataFile); except //... end; end;
PS: SetTextBuff indeed helps, the whole process(counting + loading the data) takes 5-6 seconds on my computer.
PS1: I think OP has enough info now to code his application.
// set some const and variables up
const MyMaxArraySize=8000000;
Var s:string;
MyFileHandle:textfile;
MyCounter:longint=0;
myline:string='';
MyArray:Array of String;
MyFileName:AnsiString;
// code to read in a text file into dynamic array, and adjust array after
// some minimal test during routine
// please note I would not do this iin the on create event; if this routine takes some time
// to complete on an old slow machine, with standard hd, users will not see anything until it has finished.
// best to add into form show event; and have a global variable so that it is only run once.
// that way your application will show; whilst its working.
Setlength(MyArray,MyMaxArraySize);
if fileexists(MyFileName) Then
begin
Assignfile(MyFileHandle,MyFileName);
Reset(MyFileHandle);
myGetOut:=False;
While ((not eof(MyFileHandle)) and (MyGetOut=False)) do
begin
readln(MyFileHandle,myline);
if myline<>'' then
begin
myArray[MyCounter]:=s;
inc(MyCounter);
If MyCounter>MyMaxArraySize then
begin
showmessage('Too Much Data, exceeded :'+inttostr(MyMaxArraySize))
MyGetOut:=True;
End;
End;
end;
closefile(MyFileHandle);
setlength(MyArray,MyCounter);
end;
Or use the code posted in reply #12 or #13
I put this in FormCreaete and on my machine by the time the form shows it has read the file and posted the number to BlockSize which is a global. I didn't time it but about 2 seconds.Don't do that, on a slower computer it might take much longer to count the records. Please drop a TTimer to your form, set Interval to 100, Enabled to false, create an OnTimer event then:
Getting it loaded is the problem right this min. The current proposal is a dynamic array.What is the exact problem? Do you get an error? Please be more specific. If there is no error, add the following code(after the array is loaded):
Wow. a dynamic array. I'll give it a try. Have a few questions like How to view the data after loaded into the array.
You opened Pandora's box. :D Now OP will load the big file in VTV, which of course will work, VTV easily can handle 10M records. Still wrong approach in my opinion, the array must be filtered first. He mentioned somewhere that after filtering, only a few thousand item remains, this can be easily shown in any control. I intentionally avoided VTV, although it's my favorite component.Certainly. I only wanted to demonstrate usage of TListView in virtual mode. The code is not meant to be a solution to the OP's question.
When I tried to run I was getting an out of bounds error immediately. So I did the record count thinking to size in one step. However I'm not sure where or how to do that.GetRecordCount returns a value, namely BlockSize(see my previous example, reply 29). You must use that to set the length of the array. Please try the following function to load the array:
The line PData = ^TData; What dose the ^ do. Have to run that down [...]
As it turns out with the free edition of Dropbox you would only be able to view. no download or copy.
You've got a nice son.
i7 GTX 1080 32 GB RAM 1.5 TB SSD 3 TB HDD => very expensive >:D
@JLWestQuoteAs it turns out with the free edition of Dropbox you would only be able to view. no download or copy.
There are a lots of Dropbox alternatives:
https://en.wikipedia.org/wiki/Comparison_of_online_backup_services
Not all services in the list above are free, pick carefully.
Yea, I found that out and dropped Dropbox.
I'll give it a go and try to get it on the web somewhere. It will take me http://forum.lazarus.freepascal.org/index.php?action=post;quote=307694;topic=43855.30;last_msg=307695a few days.
If you have problem choosing one, I recommend you Box.com. I ever used its free plan for sharing 30 MB file, it's easy and worked good.
If you still have problem, you can contact me. I can temporary host your file on my blog. I have a reseller web hosting account.
This is probably going to take me a couple of days.:o Don't forget that the dat file is mostly text, archived should be under 100MB. You can even send it to me via mail.
@wpLoading only the filtered data into memory will make it more difficult to write a valid file back after editing some records. In order to save an edited record you must read the original file line by line and write each line immediately to the new file; when the line which was edited is read the modified content must be written to the new file instead of the original line.
You opened Pandora's box. :D Now OP will load the big file in VTV, which of course will work, VTV easily can handle 10M records. Still wrong approach in my opinion, the array must be filtered first. He mentioned somewhere that after filtering, only a few thousand item remains, this can be easily shown in any control.
Yes, Correct to that point in your narrative.
But the lines that need to be edited come in pairs of text lines. a 1301 and a corresponding 1302 text line. The 1301 is the gate number Gate A-12 and the 1302 is the Gate Specifications. Cargo, Airline, Military, A gate, Tie down and who owns the right to park at the gate. SWA, DAL AAL. Often the gates are co-leased in large airports.
@JLWestQuoteThis is probably going to take me a couple of days.:o Don't forget that the data file is mostly text, archived should be under 100MB. You can even send it to me via mail.
There are 35,000+ airports, Seaplane bases and Helipads. The only way I would feel comfortable with this approach is to beak each out in a separate file: F00001.txt, F00002.txt . . .F35187.txt.
And it's maybe the way to go.
Lamair Research or X-Plane reads the file to verify integrity when you start the program. If it's wrong they give a message of corrupted file and exit to the Desktop.Breaking the large file into separate files will make update and maintenance of your program rather cumbersome. If you do that, every time there is a new version of the file, you'll have to break it into separate files for your program to use them. That's a headache you don't want.
<snip>
There are 35,000+ airports, Seaplane bases and Helipads. The only way I would feel comfortable with this approach is to beak each out in a separate file: F00001.txt, F00002.txt . . .F35187.txt.
using a component such as VTV (which wp used)I think it's worth to put this right: I did not use VTV - which is hard to learn - but a plain old TListView. Using virtual mode in TListView is just three simple actions:
Thank you for clarifying that. The comments about VTV after the example you posted left me with that mistaken impression. Virtual mode is definitely the way to go which, I believe, was the point you were making with the example you posted.using a component such as VTV (which wp used)I think it's worth to put this right: I did not use VTV - which is hard to learn - but a plain old TListView. Using virtual mode in TListView is just three simple actions:
- Activate virtual mode (ListView.OwnerData := true)
- Define how many items are contained in the List (ListView.Items.Count := <number>);
- Write an event handler OnData to populate the TListItem passed as a parameter with the strings to be displayed
@JLWestI couldn't figure out how to attach a file. So I tried to abandon the email.
I received an empty mail, no attachment.
PS: If you have a gmail account you can add the file to the google drive then paste the link here.
@JLWest
I received an empty mail, no attachment.
PS: If you have a gmail account you can add the file to the google drive then paste the link here.
My demo program of reply #30 loads and displays all records of apt.dat within 10 seconds. Just change the const FILENAME near line 50 to the path to apt.dat.
Is the last record '99'?The (0-based) index of the last record is 7,970,311 and the first number (I call it "ID") is 99.