Recent

Author Topic: File Search is there a way to speed up ?  (Read 5454 times)

TomTom

  • Full Member
  • ***
  • Posts: 170
File Search is there a way to speed up ?
« on: March 16, 2018, 03:54:24 pm »
Hi.
Currently I'm using cyFileSearch component from Cindy package. It works just fine. Except it's quite slow :(.
I'm wondering what can I do to speed up this process. I need to search large number of files (300k+).
Will use of threads help me in my problem? Or it will just make my program more responsive while searching for files?



marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: File Search is there a way to speed up ?
« Reply #1 on: March 16, 2018, 04:05:49 pm »
File searching is slow on windows. Worse, you can't even move the files to a separate volume and tweak mount options, since NTFS options are global.

Just going through 1 million files in 30000 dirs takes up to half an hour on a 7200rpm platter.

Filesearching is a bit faster on Linux,  and of course SSD might also help. (still a bit too pricey for what I use this for though)

Thaddy

  • Hero Member
  • *****
  • Posts: 14210
  • Probably until I exterminate Putin.
Re: File Search is there a way to speed up ?
« Reply #2 on: March 16, 2018, 04:44:04 pm »
Will use of threads help me in my problem? Or it will just make my program more responsive while searching for files?
Not really: file access is sequential. Some *really* modern drivers support it, though. Like first search for directories and subsequentially start a thread that finds the files.
But if that was the general case, the OS would have supported it....
Better to look at the algorithm: is it recursive? replace it by a stack-based algorithm (usually faster, less memory intensive and safer). Collect the files before you output them, Etc.
Specialize a type, not a var.

balazsszekely

  • Guest
Re: File Search is there a way to speed up ?
« Reply #3 on: March 16, 2018, 05:42:25 pm »
The attached project compares two directories or drives, by creating a hash list. It's not exactly what are you looking for, however is threaded and is fast on my computer, of course your experience may differ. Please run a few test...

PS: I only tested on windows.

TomTom

  • Full Member
  • ***
  • Posts: 170
Re: File Search is there a way to speed up ?
« Reply #4 on: March 16, 2018, 06:34:52 pm »
I found program written in Free Pascal/Lazarus. It's a file manager called Double Commander https://sourceforge.net/projects/doublecmd/
It search files REAAALY FAST. Unfortunately I'm not so good at reading other people code so I don't know how it works... For ex. on same PC Windows

My program found 14k files in about 30sec
Double Commander found those files in 3sec

Another software written in ObjectPascal/Delphi 7 ... for sure well known Ant Renamer is also very fast... How ? :D
 

balazsszekely

  • Guest
Re: File Search is there a way to speed up ?
« Reply #5 on: March 16, 2018, 06:49:19 pm »
Did you try my example? (300 000 files in 10 sec). Take a look at the uFileSearch unit. All you have to do is add file mask and remove the hash part.

ASerge

  • Hero Member
  • *****
  • Posts: 2223
Re: File Search is there a way to speed up ?
« Reply #6 on: March 16, 2018, 06:58:50 pm »
@Tommi, what does search mean: only the name and/or attributes, or the contents in the file?
In the good example @GetMem, it is shown that by the name search is fast.
Long file operations are best run in a separate thread, because they are thread-safe.
I recommend that you skip the reparse points when searching because they are already used frequently in Windows, and their ignoring can bring to an endless loop.

@GetMem, according to the TFileSearch class I have a few questions:
1. Why do we need a variable FDone?
2. May be property NeedToBreak is superfluous, because already there is Terminated?
3. Why do you first search only for names, then only for directories? In Window, this will be a double search. True the OS caches well and re-searches many times faster.

TomTom

  • Full Member
  • ***
  • Posts: 170
Re: File Search is there a way to speed up ?
« Reply #7 on: March 16, 2018, 07:04:15 pm »
I'm looking at it right now. I don't know if I manage to adopt this in my app. :) I'm kinda newbie :P

Did you try my example? (300 000 files in 10 sec). Take a look at the uFileSearch unit. All you have to do is add file mask and remove the hash part.

@ASerge All I need is to get file paths

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: File Search is there a way to speed up ?
« Reply #8 on: March 16, 2018, 07:19:38 pm »
Note that searching the same files a second time is ALWAYS fast (with simple findfirst/findnext I do a million in 3-4s). Only the first time is slow. (30min)

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: File Search is there a way to speed up ?
« Reply #9 on: March 16, 2018, 07:23:45 pm »
depends on the OS and underline file system used. for example on ntfs disks there is a way to speed things up considerably by using MFTs.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: File Search is there a way to speed up ?
« Reply #10 on: March 16, 2018, 07:41:41 pm »
depends on the OS and underline file system used. for example on ntfs disks there is a way to speed things up considerably by using MFTs.

Which NTFS disks don't use MFTs ? :-)

TomTom

  • Full Member
  • ***
  • Posts: 170
Re: File Search is there a way to speed up ?
« Reply #11 on: March 16, 2018, 07:54:30 pm »
Hm I made some tests... So adding found files to the list whilst searching, slows down (10time slower) the whole thing :P.
So my real question is... how to speed up displaying found files ? :P For example in a stringgrid. Threads?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: File Search is there a way to speed up ?
« Reply #12 on: March 16, 2018, 08:07:50 pm »
Adding to sorted tstringlists get slower for > 100000 items. I made my own container for that.

TomTom

  • Full Member
  • ***
  • Posts: 170
Re: File Search is there a way to speed up ?
« Reply #13 on: March 16, 2018, 08:30:39 pm »
Can You give me some hints about making one? Honestly, I do not really understand what this container is?

Adding to sorted tstringlists get slower for > 100000 items. I made my own container for that.

balazsszekely

  • Guest
Re: File Search is there a way to speed up ?
« Reply #14 on: March 16, 2018, 08:34:30 pm »
@ASerge
1-2 years a go a forum user(cannot recall his name) asked for a tool which can compare two folders/drives. He was only interested in files, this is why sub-directories are ignored. I did it in 30 minutes or so, no wonder contains bugs.

Quote
1. Why do we need a variable FDone?
2. May be property NeedToBreak is superfluous, because already there is Terminated?
Usually when one or more  thread needs to be terminated immediately, for example application close, users prefers the following method(especially if they come from delphi/windows background):
Code: Pascal  [Select][+][-]
  1. //...
  2. Thread.Terminate;
  3. Thread.WaitFor;
  4. //...
In my experience this works well on windows, however it will cause dead lock in other os like linux/osx. I prefer the following:
a. Set FreeOnTerminate to true right after you create the thread
b. In the execute method, use a while loop and do some task in smalls chunks
c. When the thread needs to be terminated, set a boolean value to true(NeedToBreak for example), then immediately exit the while loop/Execute method, and the thread will take his natural course. This way you can run a lot of threads without external SIGSEGV, memory leaks/ dead locks, etc...

FDone with FNeedToBreak is indeed redundant and we can get rid of FDone. More over in this particular case the while loop is also not necessary. I do not understand why I put there in the first place.  :-[

Quote
3. Why do you first search only for names, then only for directories? In Window, this will be a double search. True the OS caches well and re-searches many times faster.
You're right, but I doubt your method will be much faster.

 

TinyPortal © 2005-2018