Recent

Author Topic: Simple file with only one single record, but of HUGE size  (Read 9857 times)

wp

  • Hero Member
  • *****
  • Posts: 11858
Re: Simple file with only one single record, but of HUGE size
« Reply #15 on: December 12, 2018, 11:54:59 am »
The file will be filled sequentially character per character, no other choice.
Who made this insane requirement? Why don't you write to a buffer first and write that to disk only when it is full?

Look at the following code which writes 1 billion random characters ('A'..'Z') to disk. The first test measures the time for the loop and the creation of that many characters - it taks 9 seconds on my machine. The second test collects up to 10 million characters in a buffer - this takes 15 sec. But writing character by character takes almost 3 minutes. And reading the discussion I reckognize that 1 GB may not be that big for you...
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   SysUtils;
  5.  
  6. const
  7.   MILLION = 1000*1000;
  8.   N = 1000*MILLION;
  9.   BUFFER_SIZE = 10*MILLION;
  10.  
  11. var
  12.   F: File of char;
  13.   Ft: TextFile;
  14.   i, j: Int64;
  15.   ch: Char;
  16.   buffer: array[0..BUFFER_SIZE-1] of char;
  17.   t: TDateTime;
  18.  
  19. begin
  20.   WriteLn('Creating a file of ', N, ' bytes...');
  21.   t := now;
  22.   i := 0;
  23.   while i < N do begin
  24.     ch := char(ord('A') + random(26));
  25.     inc(i);
  26.   end;
  27.   WriteLn('Test 1 (loop and random data only): ', FormatDateTime('n:ss.zzz', now-t));
  28.  
  29.   t := now;
  30.   AssignFile(F, 'test-file.txt');
  31.   Rewrite(F, 1);
  32.   i := 0;
  33.   j := 0;
  34.   while i < N do begin
  35.     ch := char(ord('A') + random(26));
  36.     if j < BUFFER_SIZE then begin
  37.       buffer[j] := ch;
  38.       inc(j);
  39.     end else begin
  40.       BlockWrite(F, buffer, BUFFER_SIZE);
  41.       j := 0;
  42.     end;
  43.     inc(i);
  44.   end;
  45.   if j > 0 then BlockWrite(F, buffer, j);
  46.   CloseFile(F);
  47.   WriteLn('Test 2 (block): ', FormatDateTime('n:ss.zzz', now-t));
  48.  
  49.   t := now;
  50.   AssignFile(Ft, 'test-file.txt');
  51.   Rewrite(Ft);
  52.   i := 0;
  53.   while i < N do begin
  54.     Write(Ft, char(ord('A') + random(26)));
  55.     inc(i);
  56.   end;
  57.   CloseFile(Ft);
  58.   WriteLn('Test 3 (single bytes): ', FormatDateTime('n:ss.zzz', now-t));
  59.  
  60.   WriteLn('Press ENTER to close.');
  61.   ReadLn;
  62.  
  63. end.

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Simple file with only one single record, but of HUGE size
« Reply #16 on: December 12, 2018, 12:41:29 pm »
The file will be filled sequentially character per character, no other choice. When full it will be read sequentially, that reading can be done in chunks, no problem.
There will ne no random access.

This is one of the clearest cases for using (buffered) streams I have ever seen. :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

Epand

  • New Member
  • *
  • Posts: 25
Re: Simple file with only one single record, but of HUGE size
« Reply #17 on: December 12, 2018, 07:02:25 pm »
But you need ALL the filesize in memory?

No, I don't. The crux of the matter is to gather/retrieve the characters (which will be done character per character) and to safely store them together in one single huge line in the file.



The file will be filled sequentially character per character, no other choice.
Who made this insane requirement? Why don't you write to a buffer first and write that to disk only when it is full?

What may look insane at the first moment is not always insane  ;) .
The characters which fill the file are gathered/retrieved character per character with time intervals between actions getting each single character, where the intervals are neither known nor predictable. It's the fear that during the mentioned time intervals anything can happen, worst case may be power failure - although the software may run weeks the machine is not connected to an uninterruptible power supply. This whole combination led to the conclusion to fill the file character per character - to be on the safe side as each gathered character is important.

wp

  • Hero Member
  • *****
  • Posts: 11858
Re: Simple file with only one single record, but of HUGE size
« Reply #18 on: December 12, 2018, 07:38:14 pm »
But then you must close the file after writing each character and reopen it when the next character is to be written. And hope that the OS buffers already have been flushed. In total, this will make it even slower (what is the rate at which the characters arrive?)

I think it would be much better to invest in a UPS.
« Last Edit: December 12, 2018, 07:42:12 pm by wp »

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Simple file with only one single record, but of HUGE size
« Reply #19 on: December 12, 2018, 08:33:19 pm »
But then you must close the file after writing each character and reopen it when the next character is to be written. And hope that the OS buffers already have been flushed. In total, this will make it even slower (what is the rate at which the characters arrive?)

I think it would be much better to invest in a UPS.
I agree, a UPS sounds like a better and more reasonable alternative.

If investing in a UPS isn't in the cards then, he should open the file specifying FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH, that would eliminate the need to either FlushFileBuffers or closing and re-opening the file after every character is written to it (both inefficient and time consuming operations.)
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

garlar27

  • Hero Member
  • *****
  • Posts: 652
Re: Simple file with only one single record, but of HUGE size
« Reply #20 on: December 12, 2018, 08:49:26 pm »
I think it would be much better to invest in a UPS.
I think the same since I don't know if a solid-state HD would be of any help.


The characters which fill the file are gathered/retrieved character per character with time intervals between actions getting each single character, where the intervals are neither known nor predictable. It's the fear that during the mentioned time intervals anything can happen, worst case may be power failure - although the software may run weeks the machine is not connected to an uninterruptible power supply. This whole combination led to the conclusion to fill the file character per character - to be on the safe side as each gathered character is important.

I had to deal with data loss due to power failure. Because of that you fear to loose some chars but you actually can loose the whole file if it goes corrupt !!!  :o

What we did to keep as much data as possible was start using SQLite and it slowed down the execution during data savings. SQLite is very robust against power failures because every transaction goes directly to the hard drive with the penalty it has (start moving the discs in your HD, move the headers to the location of the file/record, open the file for editing, set the new data, close the file) and though it is robust you can have a data loss but you will not have a corrupt file.
I know you are not trying to use a DB but look these links https://www.sqlite.org/about.html  and https://www.sqlite.org/transactional.html reading those pages might show you what problems are you facing on the "anything ca happen" part of your problem.



It looks like you are getting data from a sensor in a remote place (were none is around). If not, the problem is basically the same. If the data is generated inside the pc (f.i. random numbers, a simulation or what ever you can imagine) then you will lose something, if it comes from a external hardware then if the hardware has a persistent buffer then maybe you can recover something.

I would go for a partitioned file model like described before.
But for taking a decision, I think it is important to know when how and at which rate it is acquired.
 . How many chars per second? (average Bytes per second, minute, hour)
 . Is sort of "regular"? f.i. more or less every 50 milliseconds
 . Is it VERY irregular? f.i. hours of low or no activity and then it burst to 50 million chars per second
just the first ones that come to my mind. the rate at which the data is acquired might help you to chose the best approach.

wp

  • Hero Member
  • *****
  • Posts: 11858
Re: Simple file with only one single record, but of HUGE size
« Reply #21 on: December 12, 2018, 10:10:36 pm »
you actually can loose the whole file if it goes corrupt !!!
Yes, this is another important point. To minimize the effect of damage I'd even give up the idea of a single file and write smaller separate files. If your precious data were money you'd also spread the amount among several banks. Having smaller files would also allow to copy all files except for the current one to a backup location.

ASBzone

  • Hero Member
  • *****
  • Posts: 678
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Simple file with only one single record, but of HUGE size
« Reply #22 on: December 12, 2018, 11:05:24 pm »
you actually can loose the whole file if it goes corrupt !!!
Yes, this is another important point. To minimize the effect of damage I'd even give up the idea of a single file and write smaller separate files. If your precious data were money you'd also spread the amount among several banks. Having smaller files would also allow to copy all files except for the current one to a backup location.


And this is why it is vital to provide the WHY of technology requests, and not just the specific HOW...

You'll almost always get a better, more suitable response when people know what you are trying to accomplish...
-ASB: https://www.BrainWaveCC.com/

Lazarus v2.2.7-ada7a90186 / FPC v3.2.3-706-gaadb53e72c
(Windows 64-bit install w/Win32 and Linux/Arm cross-compiles via FpcUpDeluxe on both instances)

My Systems: Windows 10/11 Pro x64 (Current)

Epand

  • New Member
  • *
  • Posts: 25
Re: Simple file with only one single record, but of HUGE size
« Reply #23 on: December 13, 2018, 07:54:09 am »
I had to deal with data loss due to power failure. Because of that you fear to loose some chars but you actually can loose the whole file if it goes corrupt !!!  :o

Good point. Thanks for putting your finger on that weak spot.

Quote
I know you are not trying to use a DB but look these links https://www.sqlite.org/about.html  and https://www.sqlite.org/transactional.html reading those pages might show you what problems are you facing on the "anything ca happen" part of your problem.

I mentioned "no DB" as I always thought it would be more time consuming. Looking at your linked sqlite infos there is a sublink https://www.sqlite.org/fasterthanfs.html which claims that under certain circumstances sqlite DB can be faster than filesystems, I am surprised.

Quote
the rate at which the data is acquired might help you to chose the best approach.

That rate is unknown.
« Last Edit: December 13, 2018, 08:04:42 am by Epand »

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Simple file with only one single record, but of HUGE size
« Reply #24 on: December 13, 2018, 08:32:29 am »
If i read correctly the requeriments
- Secuential write
- Robust vs power down.

For me it's clear that you need some hardware capability of UPS to save the file and/or to maintain the data input (if it's a kind of sensor outside) or both.

There is not reliable solution without these. A power failure can allways be at the same time when is a disk write and then lose data or damage the file or drive. And the data with a computer off will not be recorded (the sensor would be ok, but computer off).

If not (and not using DB capabilities as you say), it's better having some file that are filled (for example) each 10 minutes and write it, then go for other file, so will be only limited damage (one file with 10 minutes, for example), assuming a good "landing" at power off of the storage drive. You will have a lot of files, but limited damage.

On the other hand, i think you must specify more on the problem. It has no sense that the data flow are unknow, for example. And if the data is critical, then the power save is critical too (like a flying control)

The current specification It's like you need a car to go 100mph without breaks. And the solution (ups, for example in this case, or the breaks) it's discarded with no reason.

« Last Edit: December 13, 2018, 08:36:02 am by damieiro »

User137

  • Hero Member
  • *****
  • Posts: 1791
    • Nxpascal home
Re: Simple file with only one single record, but of HUGE size
« Reply #25 on: December 13, 2018, 04:25:41 pm »
Filling the huge file, writing into it will be done only character per character, curious, but a must.

A totally other point is, that inspecting the content of the single-record-file done by blockread with certain chunks will probably result in a speedup of the inspection.
Yes, it will result in speedup. And it would also speedup if you make a memory buffer of the characters you want to write, and write them into file in 1 bunch.

garlar27

  • Hero Member
  • *****
  • Posts: 652
Re: Simple file with only one single record, but of HUGE size
« Reply #26 on: December 13, 2018, 05:26:44 pm »
Quote
the rate at which the data is acquired might help you to chose the best approach.

That rate is unknown.

Well, prepare your self for a hard work!

The solution you must implement if the rate is 5 B/s (Bytes per second) or lower is VERY different than the one you would do if the rate could go to 20 B/s or more.
And if the rate can go up to many MB/s you probably need a different Operating System and / or hardware!!!

The first case is very easy to tackle.

The other two... well you need to know more.

Main system development failure cause:
   90% bad requirements

If the requirements are excessive you can OVER-ENGINEERING: too much work and develop time for nothing.
If the requirements are too low you can fall short and produce something totally useless.

Quote
“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.”

― Sun Tzu, The Art of War

Do a test for yourself. is a way to have an idea what you are dealing with. You don't need to know exactly how many B/s you'll receive but if it can go beyond a threshold and for how long (remember that HD have a speed limit, then you can buffer the exceeding data but the system memory is finite AND every thing you put on a buffer will vanish on a  power-off hence so many UPS suggestions in the responses).

A dedicated High-rate data saving app goes beyond my knowledge ATM. But I'm curious on what you will do and how you will tackle the problem.

Good luck!!!

Epand

  • New Member
  • *
  • Posts: 25
Re: Simple file with only one single record, but of HUGE size
« Reply #27 on: December 13, 2018, 10:33:10 pm »
And the solution (ups, for example in this case, or the breaks) it's discarded with no reason.

Ah I see, above I have used wrong words, I didn't want to state that the machine will never be connected to an UPS, I should have better written that at this stage of planning I did not aim for it. Sorry, but English isn't my native tongue.



Well, prepare your self for a hard work!

Yes, I will do so. I'm sure it does not result in going for a stroll.

Quote
But I'm curious on what you will do and how you will tackle the problem.

At the moment all is at a the state of planning, the machine for that setup is not unpacked yet. Well, after setting it up and after the planned config has gone through some minor tests (the "final" filesize may be huge, but nevertheless at first I have to know that the underlying software is correctly running), after all that and after the first produced files filled with real data, then I'll drop a note here.
I can't promise a fixed date, but it may be in the first half of January.
« Last Edit: December 13, 2018, 10:35:13 pm by Epand »

garlar27

  • Hero Member
  • *****
  • Posts: 652
Re: Simple file with only one single record, but of HUGE size
« Reply #28 on: December 13, 2018, 11:14:22 pm »
Ah I see, above I have used wrong words, I didn't want to state that the machine will never be connected to an UPS, ....
I'm glad to read that!!!
 :D

At the moment all is at a the state of planning, the machine for that setup is not unpacked yet. Well, after setting it up and after the planned config has gone through some minor tests (the "final" filesize may be huge, but nevertheless at first I have to know that the underlying software is correctly running), after all that and after the first produced files filled with real data...
Just repeating, that possible/probable power-off IS WHAT COMPLICATES EVERY POSSIBLE SOLUTION. If you can get rid off that problem, everything will be OK (but you won't escape from hard work though ).

... then I'll drop a note here.
I can't promise a fixed date, but it may be in the first half of January.

I hope not to miss your feedback!!
 :)

Epand

  • New Member
  • *
  • Posts: 25
Re: Simple file with only one single record, but of HUGE size
« Reply #29 on: December 14, 2018, 06:51:51 pm »
Look at the following code which writes 1 billion random characters ('A'..'Z') to disk. The first test measures the time for the loop and the creation of that many characters - it taks 9 seconds on my machine. The second test collects up to 10 million characters in a buffer - this takes 15 sec. But writing character by character takes almost 3 minutes. And reading the discussion I reckognize that 1 GB may not be that big for you...

Just for fun I did run that code on an old test machine, a notebook:
OS: Windows 7 64-bit
CPU: Intel Core i5 M520 at 2.40 GHz
MEMORY: 4 GB

Your code produced the following output:
Creating a file of 1000000000 bytes...
Test 1 (loop and random data only): 0:10.858
Test 2 (block): 0:16.146
Test 3 (single bytes): 1:33.116


The results look pretty similar except the test with single bytes. My test machine is about 9 years old with its first old hard disk.
I am surprised that Test 3 took only half the time here. On which kind of machine did you get your results?
« Last Edit: December 14, 2018, 07:17:47 pm by Epand »

 

TinyPortal © 2005-2018