Monitor Pre-existing and new files in a directory with bash











up vote
1
down vote

favorite












I have a script using inotify-tool.

This script notifies when a new file arrives in a folder. It performs some work with the file, and when done it moves the file to another folder. (it looks something along these line):



inotifywait -m -e modify "${path}" |
while read NEWFILE
work on/with NEWFILE
move NEWFILE no a new directory
done


By using inotifywait, one can only monitor new files. A similar procedure using for OLDFILE in path instead of inotifywait will work for existing files:



for OLDFILE in ${path} 
do
work on/with OLDFILE
move NEWFILE no a new directory
done


I tried combining the two loops. By first running the second loop. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running. These files will then not be captured by neither loop.



Given that files already exists in a folder, and that new files will arrive quickly inside the folder, how can one make sure that the script will catch all files?










share|improve this question
























  • Just move the "old" files out before you run your inotifyywait script?
    – Red Cricket
    Nov 22 at 9:02






  • 1




    @redCricket I think that is what I was doing. The problem is that the files arrive too quickly, so given that I already had X amount of files in the folder, then while moving these, Y files arrive. Then inotifywait would not detect the Y files
    – J.doe
    Nov 22 at 9:05












  • Just blow away the whole directory and recreate before.
    – Red Cricket
    Nov 22 at 9:06















up vote
1
down vote

favorite












I have a script using inotify-tool.

This script notifies when a new file arrives in a folder. It performs some work with the file, and when done it moves the file to another folder. (it looks something along these line):



inotifywait -m -e modify "${path}" |
while read NEWFILE
work on/with NEWFILE
move NEWFILE no a new directory
done


By using inotifywait, one can only monitor new files. A similar procedure using for OLDFILE in path instead of inotifywait will work for existing files:



for OLDFILE in ${path} 
do
work on/with OLDFILE
move NEWFILE no a new directory
done


I tried combining the two loops. By first running the second loop. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running. These files will then not be captured by neither loop.



Given that files already exists in a folder, and that new files will arrive quickly inside the folder, how can one make sure that the script will catch all files?










share|improve this question
























  • Just move the "old" files out before you run your inotifyywait script?
    – Red Cricket
    Nov 22 at 9:02






  • 1




    @redCricket I think that is what I was doing. The problem is that the files arrive too quickly, so given that I already had X amount of files in the folder, then while moving these, Y files arrive. Then inotifywait would not detect the Y files
    – J.doe
    Nov 22 at 9:05












  • Just blow away the whole directory and recreate before.
    – Red Cricket
    Nov 22 at 9:06













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a script using inotify-tool.

This script notifies when a new file arrives in a folder. It performs some work with the file, and when done it moves the file to another folder. (it looks something along these line):



inotifywait -m -e modify "${path}" |
while read NEWFILE
work on/with NEWFILE
move NEWFILE no a new directory
done


By using inotifywait, one can only monitor new files. A similar procedure using for OLDFILE in path instead of inotifywait will work for existing files:



for OLDFILE in ${path} 
do
work on/with OLDFILE
move NEWFILE no a new directory
done


I tried combining the two loops. By first running the second loop. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running. These files will then not be captured by neither loop.



Given that files already exists in a folder, and that new files will arrive quickly inside the folder, how can one make sure that the script will catch all files?










share|improve this question















I have a script using inotify-tool.

This script notifies when a new file arrives in a folder. It performs some work with the file, and when done it moves the file to another folder. (it looks something along these line):



inotifywait -m -e modify "${path}" |
while read NEWFILE
work on/with NEWFILE
move NEWFILE no a new directory
done


By using inotifywait, one can only monitor new files. A similar procedure using for OLDFILE in path instead of inotifywait will work for existing files:



for OLDFILE in ${path} 
do
work on/with OLDFILE
move NEWFILE no a new directory
done


I tried combining the two loops. By first running the second loop. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running. These files will then not be captured by neither loop.



Given that files already exists in a folder, and that new files will arrive quickly inside the folder, how can one make sure that the script will catch all files?







bash unix inotify inotifywait






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 9:49

























asked Nov 22 at 8:56









J.doe

234




234












  • Just move the "old" files out before you run your inotifyywait script?
    – Red Cricket
    Nov 22 at 9:02






  • 1




    @redCricket I think that is what I was doing. The problem is that the files arrive too quickly, so given that I already had X amount of files in the folder, then while moving these, Y files arrive. Then inotifywait would not detect the Y files
    – J.doe
    Nov 22 at 9:05












  • Just blow away the whole directory and recreate before.
    – Red Cricket
    Nov 22 at 9:06


















  • Just move the "old" files out before you run your inotifyywait script?
    – Red Cricket
    Nov 22 at 9:02






  • 1




    @redCricket I think that is what I was doing. The problem is that the files arrive too quickly, so given that I already had X amount of files in the folder, then while moving these, Y files arrive. Then inotifywait would not detect the Y files
    – J.doe
    Nov 22 at 9:05












  • Just blow away the whole directory and recreate before.
    – Red Cricket
    Nov 22 at 9:06
















Just move the "old" files out before you run your inotifyywait script?
– Red Cricket
Nov 22 at 9:02




Just move the "old" files out before you run your inotifyywait script?
– Red Cricket
Nov 22 at 9:02




1




1




@redCricket I think that is what I was doing. The problem is that the files arrive too quickly, so given that I already had X amount of files in the folder, then while moving these, Y files arrive. Then inotifywait would not detect the Y files
– J.doe
Nov 22 at 9:05






@redCricket I think that is what I was doing. The problem is that the files arrive too quickly, so given that I already had X amount of files in the folder, then while moving these, Y files arrive. Then inotifywait would not detect the Y files
– J.doe
Nov 22 at 9:05














Just blow away the whole directory and recreate before.
– Red Cricket
Nov 22 at 9:06




Just blow away the whole directory and recreate before.
– Red Cricket
Nov 22 at 9:06












2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










Once inotifywait is up and waiting, it will print the message Watches established. to standard error. So you need to go through existing files after that point.



So, one approach is to write something that will process standard error, and when it sees that message, lists all the existing files. You can wrap that functionality in a function for convenience:



function list-existing-and-follow-modify() {
local path="$1"
inotifywait --monitor
--event modify
--format %f
--
"$path"
2> >( while IFS= read -r line ; do
printf '%sn' "$line" >&2
if [[ "$line" = 'Watches established.' ]] ; then
for file in "$path"/* ; do
if [[ -e "$file" ]] ; then
basename "$file"
fi
done
break
fi
done
cat >&2
)
}


and then write:



list-existing-and-follow-modify "$path" 
| while IFS= read -r file
# ... work on/with "$file"
# move "$file" to a new directory
done


Notes:




  • If you're not familiar with the >(...) notation that I used, it's called "process substitution"; see https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution for details.

  • The above will now have the opposite race condition from your original one: if a file is created shortly after inotifywait starts up, then list-existing-and-follow-modify may list it twice. But you can easily handle that inside your while-loop by using if [[ -e "$file" ]] to make sure the file still exists before you operate on it.

  • I'm a bit skeptical that your inotifywait options are really quite what you want; modify, in particular, seems like the wrong event. But I'm sure you can adjust them as needed. The only change I've made above, other than switching to long options for clarity/explicitly and adding -- for robustness, is to add --format %f so that you get the filenames without extraneous details.

  • There doesn't seem to be any way to tell inotifywait to use a separator other than newlines, so, I just rolled with that. Make sure to avoid filenames that include newlines.






share|improve this answer























  • hi. Thank you. Processing stadrard error seems to work. I do not see the point of cat. However, listeting to standard error, en then loop over existing files seems to work.
    – J.doe
    Nov 26 at 8:37












  • @J.doe: The point of cat is to continue forwarding inotifywait's standard error to list-existing-and-follow-modify's standard error even after we've processed the Watches established. message. (Without that, any subsequent error- or warning-messages would get silently dropped.)
    – ruakh
    Nov 26 at 16:46










  • I have tested your code, and indeed it solves the problem. However, running everal processes, reading from the same folder, makes every process read and process the same files. But your answer do answer my initial question.
    – J.doe
    Nov 27 at 8:44


















up vote
1
down vote














By using inotifywait, one can only monitor new files.




I would ask for a definition of a "new file". The man inotifywait specifies a list of events, which also lists events like create and delete and delete_self and inotifywait can also watch "old files" (beeing defined as files existing prior to inotifywait execution) and directories. You specified only a single event -e modify which notifies about modification of files within ${path}, it includes modification of both preexisting files and created after inotify execution.




... how can one make sure that the script will catch all files?




Your script is just enough to catch all the events that happen inside the path. If you have no means of synchronization between the part that generates files and the part that receives, there is nothing you can do and there always be a race condition. What if you script receives 0% of CPU time and the part that generates the files will get 100% of CPU time? There is no guarantee of cpu time between processes (unless using certified real time system...). Implement a synchronization between them.



You can watch some other event. If the generating sites closes files when ready with them, watch for the close event. Also you could run work on/with NEWFILE in parallel in background to speed up execution and reading new files. But if the receiving side is slower then the sending, if your script is working on NEWFILEs slower then the generating new files part, there is nothing you can do...



If you have no special characters and spaces in filenames, I would go with:



inotifywait -m -e modify "${path}" |
while IFS=' ' read -r path event file ;do
lock "${path}"
work on "${path}/${file}"
ex. mv "${path}/${file}" ${new_location}
unlock "${path}"
done


where lock and unlock is some locking mechanisms implemented between your script and the generating part. You can create a communication between the-creation-of-files-process and the-processing-of-the-files-process.



I think you can use some transaction file system, that would let you to "lock" a directory from the other scripts until you are ready with the work on it, but I have no experience in that field.




I tried combining the two loops. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running.




Run the process_new_file_loop in background prior to running the process_old_files_loop. Also it would be nice to make sure (ie. synchronize) that inotifywait has successfully started before you continue to the processing-existing-files-loop so that there is also no race conditions between them.



Maybe a simple example and/or startpoint would be:



work() {
local file="$1"
some work "$file"
mv "$file" "$predefiend_path"
}

process_new_files_loop() {
# let's work on modified files in parallel, so that it is faster

trap 'wait' INT
inotifywait -m -e modify "${path}" |
while IFS=' ' read -r path event file ;do
work "${path}/${file}" &
done
}

process_old_files_loop() {
# maybe we should parse in parallel here too?
# maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?

find "${path}" -type f |
while IFS= read -r file; do
work "${file}"
done
}

process_new_files_loop &
child=$!

sleep 1

if ! ps -p "$child" >/dev/null 2>&1; then
echo "ERROR running processing-new-file-loop" >&2
exit 1
fi
process_old_files_loop
wait # wait for process_new_file_loop


If you really care about execution speeds and want to do it faster, change to python or to C (or to anything but shell). Bash is not fast, it is a shell, should be used to interconnect two processes (passing stdout of one to stdin of another) and parsing a stream line by line while IFS= read -r line is extremely slow in bash and should be generally used as a last resort. Maybe using xargs like xargs -P0 -n1 sh -c "work on $1; mv $1 $path" -- or parallel would be a mean to speed things up, but an average python or C program probably will be nth times faster.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53427114%2fmonitor-pre-existing-and-new-files-in-a-directory-with-bash%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    Once inotifywait is up and waiting, it will print the message Watches established. to standard error. So you need to go through existing files after that point.



    So, one approach is to write something that will process standard error, and when it sees that message, lists all the existing files. You can wrap that functionality in a function for convenience:



    function list-existing-and-follow-modify() {
    local path="$1"
    inotifywait --monitor
    --event modify
    --format %f
    --
    "$path"
    2> >( while IFS= read -r line ; do
    printf '%sn' "$line" >&2
    if [[ "$line" = 'Watches established.' ]] ; then
    for file in "$path"/* ; do
    if [[ -e "$file" ]] ; then
    basename "$file"
    fi
    done
    break
    fi
    done
    cat >&2
    )
    }


    and then write:



    list-existing-and-follow-modify "$path" 
    | while IFS= read -r file
    # ... work on/with "$file"
    # move "$file" to a new directory
    done


    Notes:




    • If you're not familiar with the >(...) notation that I used, it's called "process substitution"; see https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution for details.

    • The above will now have the opposite race condition from your original one: if a file is created shortly after inotifywait starts up, then list-existing-and-follow-modify may list it twice. But you can easily handle that inside your while-loop by using if [[ -e "$file" ]] to make sure the file still exists before you operate on it.

    • I'm a bit skeptical that your inotifywait options are really quite what you want; modify, in particular, seems like the wrong event. But I'm sure you can adjust them as needed. The only change I've made above, other than switching to long options for clarity/explicitly and adding -- for robustness, is to add --format %f so that you get the filenames without extraneous details.

    • There doesn't seem to be any way to tell inotifywait to use a separator other than newlines, so, I just rolled with that. Make sure to avoid filenames that include newlines.






    share|improve this answer























    • hi. Thank you. Processing stadrard error seems to work. I do not see the point of cat. However, listeting to standard error, en then loop over existing files seems to work.
      – J.doe
      Nov 26 at 8:37












    • @J.doe: The point of cat is to continue forwarding inotifywait's standard error to list-existing-and-follow-modify's standard error even after we've processed the Watches established. message. (Without that, any subsequent error- or warning-messages would get silently dropped.)
      – ruakh
      Nov 26 at 16:46










    • I have tested your code, and indeed it solves the problem. However, running everal processes, reading from the same folder, makes every process read and process the same files. But your answer do answer my initial question.
      – J.doe
      Nov 27 at 8:44















    up vote
    1
    down vote



    accepted










    Once inotifywait is up and waiting, it will print the message Watches established. to standard error. So you need to go through existing files after that point.



    So, one approach is to write something that will process standard error, and when it sees that message, lists all the existing files. You can wrap that functionality in a function for convenience:



    function list-existing-and-follow-modify() {
    local path="$1"
    inotifywait --monitor
    --event modify
    --format %f
    --
    "$path"
    2> >( while IFS= read -r line ; do
    printf '%sn' "$line" >&2
    if [[ "$line" = 'Watches established.' ]] ; then
    for file in "$path"/* ; do
    if [[ -e "$file" ]] ; then
    basename "$file"
    fi
    done
    break
    fi
    done
    cat >&2
    )
    }


    and then write:



    list-existing-and-follow-modify "$path" 
    | while IFS= read -r file
    # ... work on/with "$file"
    # move "$file" to a new directory
    done


    Notes:




    • If you're not familiar with the >(...) notation that I used, it's called "process substitution"; see https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution for details.

    • The above will now have the opposite race condition from your original one: if a file is created shortly after inotifywait starts up, then list-existing-and-follow-modify may list it twice. But you can easily handle that inside your while-loop by using if [[ -e "$file" ]] to make sure the file still exists before you operate on it.

    • I'm a bit skeptical that your inotifywait options are really quite what you want; modify, in particular, seems like the wrong event. But I'm sure you can adjust them as needed. The only change I've made above, other than switching to long options for clarity/explicitly and adding -- for robustness, is to add --format %f so that you get the filenames without extraneous details.

    • There doesn't seem to be any way to tell inotifywait to use a separator other than newlines, so, I just rolled with that. Make sure to avoid filenames that include newlines.






    share|improve this answer























    • hi. Thank you. Processing stadrard error seems to work. I do not see the point of cat. However, listeting to standard error, en then loop over existing files seems to work.
      – J.doe
      Nov 26 at 8:37












    • @J.doe: The point of cat is to continue forwarding inotifywait's standard error to list-existing-and-follow-modify's standard error even after we've processed the Watches established. message. (Without that, any subsequent error- or warning-messages would get silently dropped.)
      – ruakh
      Nov 26 at 16:46










    • I have tested your code, and indeed it solves the problem. However, running everal processes, reading from the same folder, makes every process read and process the same files. But your answer do answer my initial question.
      – J.doe
      Nov 27 at 8:44













    up vote
    1
    down vote



    accepted







    up vote
    1
    down vote



    accepted






    Once inotifywait is up and waiting, it will print the message Watches established. to standard error. So you need to go through existing files after that point.



    So, one approach is to write something that will process standard error, and when it sees that message, lists all the existing files. You can wrap that functionality in a function for convenience:



    function list-existing-and-follow-modify() {
    local path="$1"
    inotifywait --monitor
    --event modify
    --format %f
    --
    "$path"
    2> >( while IFS= read -r line ; do
    printf '%sn' "$line" >&2
    if [[ "$line" = 'Watches established.' ]] ; then
    for file in "$path"/* ; do
    if [[ -e "$file" ]] ; then
    basename "$file"
    fi
    done
    break
    fi
    done
    cat >&2
    )
    }


    and then write:



    list-existing-and-follow-modify "$path" 
    | while IFS= read -r file
    # ... work on/with "$file"
    # move "$file" to a new directory
    done


    Notes:




    • If you're not familiar with the >(...) notation that I used, it's called "process substitution"; see https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution for details.

    • The above will now have the opposite race condition from your original one: if a file is created shortly after inotifywait starts up, then list-existing-and-follow-modify may list it twice. But you can easily handle that inside your while-loop by using if [[ -e "$file" ]] to make sure the file still exists before you operate on it.

    • I'm a bit skeptical that your inotifywait options are really quite what you want; modify, in particular, seems like the wrong event. But I'm sure you can adjust them as needed. The only change I've made above, other than switching to long options for clarity/explicitly and adding -- for robustness, is to add --format %f so that you get the filenames without extraneous details.

    • There doesn't seem to be any way to tell inotifywait to use a separator other than newlines, so, I just rolled with that. Make sure to avoid filenames that include newlines.






    share|improve this answer














    Once inotifywait is up and waiting, it will print the message Watches established. to standard error. So you need to go through existing files after that point.



    So, one approach is to write something that will process standard error, and when it sees that message, lists all the existing files. You can wrap that functionality in a function for convenience:



    function list-existing-and-follow-modify() {
    local path="$1"
    inotifywait --monitor
    --event modify
    --format %f
    --
    "$path"
    2> >( while IFS= read -r line ; do
    printf '%sn' "$line" >&2
    if [[ "$line" = 'Watches established.' ]] ; then
    for file in "$path"/* ; do
    if [[ -e "$file" ]] ; then
    basename "$file"
    fi
    done
    break
    fi
    done
    cat >&2
    )
    }


    and then write:



    list-existing-and-follow-modify "$path" 
    | while IFS= read -r file
    # ... work on/with "$file"
    # move "$file" to a new directory
    done


    Notes:




    • If you're not familiar with the >(...) notation that I used, it's called "process substitution"; see https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution for details.

    • The above will now have the opposite race condition from your original one: if a file is created shortly after inotifywait starts up, then list-existing-and-follow-modify may list it twice. But you can easily handle that inside your while-loop by using if [[ -e "$file" ]] to make sure the file still exists before you operate on it.

    • I'm a bit skeptical that your inotifywait options are really quite what you want; modify, in particular, seems like the wrong event. But I'm sure you can adjust them as needed. The only change I've made above, other than switching to long options for clarity/explicitly and adding -- for robustness, is to add --format %f so that you get the filenames without extraneous details.

    • There doesn't seem to be any way to tell inotifywait to use a separator other than newlines, so, I just rolled with that. Make sure to avoid filenames that include newlines.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 26 at 16:47

























    answered Nov 22 at 18:58









    ruakh

    123k12196250




    123k12196250












    • hi. Thank you. Processing stadrard error seems to work. I do not see the point of cat. However, listeting to standard error, en then loop over existing files seems to work.
      – J.doe
      Nov 26 at 8:37












    • @J.doe: The point of cat is to continue forwarding inotifywait's standard error to list-existing-and-follow-modify's standard error even after we've processed the Watches established. message. (Without that, any subsequent error- or warning-messages would get silently dropped.)
      – ruakh
      Nov 26 at 16:46










    • I have tested your code, and indeed it solves the problem. However, running everal processes, reading from the same folder, makes every process read and process the same files. But your answer do answer my initial question.
      – J.doe
      Nov 27 at 8:44


















    • hi. Thank you. Processing stadrard error seems to work. I do not see the point of cat. However, listeting to standard error, en then loop over existing files seems to work.
      – J.doe
      Nov 26 at 8:37












    • @J.doe: The point of cat is to continue forwarding inotifywait's standard error to list-existing-and-follow-modify's standard error even after we've processed the Watches established. message. (Without that, any subsequent error- or warning-messages would get silently dropped.)
      – ruakh
      Nov 26 at 16:46










    • I have tested your code, and indeed it solves the problem. However, running everal processes, reading from the same folder, makes every process read and process the same files. But your answer do answer my initial question.
      – J.doe
      Nov 27 at 8:44
















    hi. Thank you. Processing stadrard error seems to work. I do not see the point of cat. However, listeting to standard error, en then loop over existing files seems to work.
    – J.doe
    Nov 26 at 8:37






    hi. Thank you. Processing stadrard error seems to work. I do not see the point of cat. However, listeting to standard error, en then loop over existing files seems to work.
    – J.doe
    Nov 26 at 8:37














    @J.doe: The point of cat is to continue forwarding inotifywait's standard error to list-existing-and-follow-modify's standard error even after we've processed the Watches established. message. (Without that, any subsequent error- or warning-messages would get silently dropped.)
    – ruakh
    Nov 26 at 16:46




    @J.doe: The point of cat is to continue forwarding inotifywait's standard error to list-existing-and-follow-modify's standard error even after we've processed the Watches established. message. (Without that, any subsequent error- or warning-messages would get silently dropped.)
    – ruakh
    Nov 26 at 16:46












    I have tested your code, and indeed it solves the problem. However, running everal processes, reading from the same folder, makes every process read and process the same files. But your answer do answer my initial question.
    – J.doe
    Nov 27 at 8:44




    I have tested your code, and indeed it solves the problem. However, running everal processes, reading from the same folder, makes every process read and process the same files. But your answer do answer my initial question.
    – J.doe
    Nov 27 at 8:44












    up vote
    1
    down vote














    By using inotifywait, one can only monitor new files.




    I would ask for a definition of a "new file". The man inotifywait specifies a list of events, which also lists events like create and delete and delete_self and inotifywait can also watch "old files" (beeing defined as files existing prior to inotifywait execution) and directories. You specified only a single event -e modify which notifies about modification of files within ${path}, it includes modification of both preexisting files and created after inotify execution.




    ... how can one make sure that the script will catch all files?




    Your script is just enough to catch all the events that happen inside the path. If you have no means of synchronization between the part that generates files and the part that receives, there is nothing you can do and there always be a race condition. What if you script receives 0% of CPU time and the part that generates the files will get 100% of CPU time? There is no guarantee of cpu time between processes (unless using certified real time system...). Implement a synchronization between them.



    You can watch some other event. If the generating sites closes files when ready with them, watch for the close event. Also you could run work on/with NEWFILE in parallel in background to speed up execution and reading new files. But if the receiving side is slower then the sending, if your script is working on NEWFILEs slower then the generating new files part, there is nothing you can do...



    If you have no special characters and spaces in filenames, I would go with:



    inotifywait -m -e modify "${path}" |
    while IFS=' ' read -r path event file ;do
    lock "${path}"
    work on "${path}/${file}"
    ex. mv "${path}/${file}" ${new_location}
    unlock "${path}"
    done


    where lock and unlock is some locking mechanisms implemented between your script and the generating part. You can create a communication between the-creation-of-files-process and the-processing-of-the-files-process.



    I think you can use some transaction file system, that would let you to "lock" a directory from the other scripts until you are ready with the work on it, but I have no experience in that field.




    I tried combining the two loops. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running.




    Run the process_new_file_loop in background prior to running the process_old_files_loop. Also it would be nice to make sure (ie. synchronize) that inotifywait has successfully started before you continue to the processing-existing-files-loop so that there is also no race conditions between them.



    Maybe a simple example and/or startpoint would be:



    work() {
    local file="$1"
    some work "$file"
    mv "$file" "$predefiend_path"
    }

    process_new_files_loop() {
    # let's work on modified files in parallel, so that it is faster

    trap 'wait' INT
    inotifywait -m -e modify "${path}" |
    while IFS=' ' read -r path event file ;do
    work "${path}/${file}" &
    done
    }

    process_old_files_loop() {
    # maybe we should parse in parallel here too?
    # maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?

    find "${path}" -type f |
    while IFS= read -r file; do
    work "${file}"
    done
    }

    process_new_files_loop &
    child=$!

    sleep 1

    if ! ps -p "$child" >/dev/null 2>&1; then
    echo "ERROR running processing-new-file-loop" >&2
    exit 1
    fi
    process_old_files_loop
    wait # wait for process_new_file_loop


    If you really care about execution speeds and want to do it faster, change to python or to C (or to anything but shell). Bash is not fast, it is a shell, should be used to interconnect two processes (passing stdout of one to stdin of another) and parsing a stream line by line while IFS= read -r line is extremely slow in bash and should be generally used as a last resort. Maybe using xargs like xargs -P0 -n1 sh -c "work on $1; mv $1 $path" -- or parallel would be a mean to speed things up, but an average python or C program probably will be nth times faster.






    share|improve this answer

























      up vote
      1
      down vote














      By using inotifywait, one can only monitor new files.




      I would ask for a definition of a "new file". The man inotifywait specifies a list of events, which also lists events like create and delete and delete_self and inotifywait can also watch "old files" (beeing defined as files existing prior to inotifywait execution) and directories. You specified only a single event -e modify which notifies about modification of files within ${path}, it includes modification of both preexisting files and created after inotify execution.




      ... how can one make sure that the script will catch all files?




      Your script is just enough to catch all the events that happen inside the path. If you have no means of synchronization between the part that generates files and the part that receives, there is nothing you can do and there always be a race condition. What if you script receives 0% of CPU time and the part that generates the files will get 100% of CPU time? There is no guarantee of cpu time between processes (unless using certified real time system...). Implement a synchronization between them.



      You can watch some other event. If the generating sites closes files when ready with them, watch for the close event. Also you could run work on/with NEWFILE in parallel in background to speed up execution and reading new files. But if the receiving side is slower then the sending, if your script is working on NEWFILEs slower then the generating new files part, there is nothing you can do...



      If you have no special characters and spaces in filenames, I would go with:



      inotifywait -m -e modify "${path}" |
      while IFS=' ' read -r path event file ;do
      lock "${path}"
      work on "${path}/${file}"
      ex. mv "${path}/${file}" ${new_location}
      unlock "${path}"
      done


      where lock and unlock is some locking mechanisms implemented between your script and the generating part. You can create a communication between the-creation-of-files-process and the-processing-of-the-files-process.



      I think you can use some transaction file system, that would let you to "lock" a directory from the other scripts until you are ready with the work on it, but I have no experience in that field.




      I tried combining the two loops. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running.




      Run the process_new_file_loop in background prior to running the process_old_files_loop. Also it would be nice to make sure (ie. synchronize) that inotifywait has successfully started before you continue to the processing-existing-files-loop so that there is also no race conditions between them.



      Maybe a simple example and/or startpoint would be:



      work() {
      local file="$1"
      some work "$file"
      mv "$file" "$predefiend_path"
      }

      process_new_files_loop() {
      # let's work on modified files in parallel, so that it is faster

      trap 'wait' INT
      inotifywait -m -e modify "${path}" |
      while IFS=' ' read -r path event file ;do
      work "${path}/${file}" &
      done
      }

      process_old_files_loop() {
      # maybe we should parse in parallel here too?
      # maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?

      find "${path}" -type f |
      while IFS= read -r file; do
      work "${file}"
      done
      }

      process_new_files_loop &
      child=$!

      sleep 1

      if ! ps -p "$child" >/dev/null 2>&1; then
      echo "ERROR running processing-new-file-loop" >&2
      exit 1
      fi
      process_old_files_loop
      wait # wait for process_new_file_loop


      If you really care about execution speeds and want to do it faster, change to python or to C (or to anything but shell). Bash is not fast, it is a shell, should be used to interconnect two processes (passing stdout of one to stdin of another) and parsing a stream line by line while IFS= read -r line is extremely slow in bash and should be generally used as a last resort. Maybe using xargs like xargs -P0 -n1 sh -c "work on $1; mv $1 $path" -- or parallel would be a mean to speed things up, but an average python or C program probably will be nth times faster.






      share|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote










        By using inotifywait, one can only monitor new files.




        I would ask for a definition of a "new file". The man inotifywait specifies a list of events, which also lists events like create and delete and delete_self and inotifywait can also watch "old files" (beeing defined as files existing prior to inotifywait execution) and directories. You specified only a single event -e modify which notifies about modification of files within ${path}, it includes modification of both preexisting files and created after inotify execution.




        ... how can one make sure that the script will catch all files?




        Your script is just enough to catch all the events that happen inside the path. If you have no means of synchronization between the part that generates files and the part that receives, there is nothing you can do and there always be a race condition. What if you script receives 0% of CPU time and the part that generates the files will get 100% of CPU time? There is no guarantee of cpu time between processes (unless using certified real time system...). Implement a synchronization between them.



        You can watch some other event. If the generating sites closes files when ready with them, watch for the close event. Also you could run work on/with NEWFILE in parallel in background to speed up execution and reading new files. But if the receiving side is slower then the sending, if your script is working on NEWFILEs slower then the generating new files part, there is nothing you can do...



        If you have no special characters and spaces in filenames, I would go with:



        inotifywait -m -e modify "${path}" |
        while IFS=' ' read -r path event file ;do
        lock "${path}"
        work on "${path}/${file}"
        ex. mv "${path}/${file}" ${new_location}
        unlock "${path}"
        done


        where lock and unlock is some locking mechanisms implemented between your script and the generating part. You can create a communication between the-creation-of-files-process and the-processing-of-the-files-process.



        I think you can use some transaction file system, that would let you to "lock" a directory from the other scripts until you are ready with the work on it, but I have no experience in that field.




        I tried combining the two loops. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running.




        Run the process_new_file_loop in background prior to running the process_old_files_loop. Also it would be nice to make sure (ie. synchronize) that inotifywait has successfully started before you continue to the processing-existing-files-loop so that there is also no race conditions between them.



        Maybe a simple example and/or startpoint would be:



        work() {
        local file="$1"
        some work "$file"
        mv "$file" "$predefiend_path"
        }

        process_new_files_loop() {
        # let's work on modified files in parallel, so that it is faster

        trap 'wait' INT
        inotifywait -m -e modify "${path}" |
        while IFS=' ' read -r path event file ;do
        work "${path}/${file}" &
        done
        }

        process_old_files_loop() {
        # maybe we should parse in parallel here too?
        # maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?

        find "${path}" -type f |
        while IFS= read -r file; do
        work "${file}"
        done
        }

        process_new_files_loop &
        child=$!

        sleep 1

        if ! ps -p "$child" >/dev/null 2>&1; then
        echo "ERROR running processing-new-file-loop" >&2
        exit 1
        fi
        process_old_files_loop
        wait # wait for process_new_file_loop


        If you really care about execution speeds and want to do it faster, change to python or to C (or to anything but shell). Bash is not fast, it is a shell, should be used to interconnect two processes (passing stdout of one to stdin of another) and parsing a stream line by line while IFS= read -r line is extremely slow in bash and should be generally used as a last resort. Maybe using xargs like xargs -P0 -n1 sh -c "work on $1; mv $1 $path" -- or parallel would be a mean to speed things up, but an average python or C program probably will be nth times faster.






        share|improve this answer













        By using inotifywait, one can only monitor new files.




        I would ask for a definition of a "new file". The man inotifywait specifies a list of events, which also lists events like create and delete and delete_self and inotifywait can also watch "old files" (beeing defined as files existing prior to inotifywait execution) and directories. You specified only a single event -e modify which notifies about modification of files within ${path}, it includes modification of both preexisting files and created after inotify execution.




        ... how can one make sure that the script will catch all files?




        Your script is just enough to catch all the events that happen inside the path. If you have no means of synchronization between the part that generates files and the part that receives, there is nothing you can do and there always be a race condition. What if you script receives 0% of CPU time and the part that generates the files will get 100% of CPU time? There is no guarantee of cpu time between processes (unless using certified real time system...). Implement a synchronization between them.



        You can watch some other event. If the generating sites closes files when ready with them, watch for the close event. Also you could run work on/with NEWFILE in parallel in background to speed up execution and reading new files. But if the receiving side is slower then the sending, if your script is working on NEWFILEs slower then the generating new files part, there is nothing you can do...



        If you have no special characters and spaces in filenames, I would go with:



        inotifywait -m -e modify "${path}" |
        while IFS=' ' read -r path event file ;do
        lock "${path}"
        work on "${path}/${file}"
        ex. mv "${path}/${file}" ${new_location}
        unlock "${path}"
        done


        where lock and unlock is some locking mechanisms implemented between your script and the generating part. You can create a communication between the-creation-of-files-process and the-processing-of-the-files-process.



        I think you can use some transaction file system, that would let you to "lock" a directory from the other scripts until you are ready with the work on it, but I have no experience in that field.




        I tried combining the two loops. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running.




        Run the process_new_file_loop in background prior to running the process_old_files_loop. Also it would be nice to make sure (ie. synchronize) that inotifywait has successfully started before you continue to the processing-existing-files-loop so that there is also no race conditions between them.



        Maybe a simple example and/or startpoint would be:



        work() {
        local file="$1"
        some work "$file"
        mv "$file" "$predefiend_path"
        }

        process_new_files_loop() {
        # let's work on modified files in parallel, so that it is faster

        trap 'wait' INT
        inotifywait -m -e modify "${path}" |
        while IFS=' ' read -r path event file ;do
        work "${path}/${file}" &
        done
        }

        process_old_files_loop() {
        # maybe we should parse in parallel here too?
        # maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?

        find "${path}" -type f |
        while IFS= read -r file; do
        work "${file}"
        done
        }

        process_new_files_loop &
        child=$!

        sleep 1

        if ! ps -p "$child" >/dev/null 2>&1; then
        echo "ERROR running processing-new-file-loop" >&2
        exit 1
        fi
        process_old_files_loop
        wait # wait for process_new_file_loop


        If you really care about execution speeds and want to do it faster, change to python or to C (or to anything but shell). Bash is not fast, it is a shell, should be used to interconnect two processes (passing stdout of one to stdin of another) and parsing a stream line by line while IFS= read -r line is extremely slow in bash and should be generally used as a last resort. Maybe using xargs like xargs -P0 -n1 sh -c "work on $1; mv $1 $path" -- or parallel would be a mean to speed things up, but an average python or C program probably will be nth times faster.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 22 at 10:26









        Kamil Cuk

        7,9871222




        7,9871222






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53427114%2fmonitor-pre-existing-and-new-files-in-a-directory-with-bash%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            What visual should I use to simply compare current year value vs last year in Power BI desktop

            How to ignore python UserWarning in pytest?

            Alexandru Averescu