Script Getting Stuck While Sorting Files Using Grep

Issue

So I currently am working on a matching script that does a few things:

  1. Takes a list of keywords
  2. For each individual keyword, look through the directory for grep matches
  3. For each grep match, copy and paste the file into a "Sorted/{keyword}" directory

The functionality seems to be fine with a couple of issues.

  1. When I run the script, it seems to get stuck on the first iteration of the loop until I press Ctrl-C, then it spits out a lot of the console messages I would expect to be receiving throughout the process.
  2. It takes absurdly long to finish (which might be something that there’s no way around, but any optimization advice would be greatly appreciated).

Little note, I am using pdfgrep. It seems to be pretty functionally the same, just thought it was worth mentioning.

I’m pretty new to scripting, so please feel free to critique and correct.

Thanks!

#!/bin/bash

# Keyword list
keywords=(
    "Keyword1"
    "Keyword2"
    "Keyword3"
);

mkdir "$HOME/Sorted";
echo "Matching list of keywords/phrases ... (${#keywords[@]}) in length...";
for ((i = 0; i < ${#keywords[@]}; i++))
do
    echo "Matching ${keywords[$i]}...";
    mkdir "$HOME/Sorted/${keywords[$i]}";
    pdfgrep -lir "${keywords[$i]}" $HOME/PDFs/* | xargs -I{} cp {} -t $HOME/Sorted/"${keywords[$i]}";
done

echo "Finished that matching session... ";
echo "###########################";
echo "Unable to match:"
find $HOME/Sorted/ -type d -empty -printf "%P\n";
find $HOME/Sorted/ -type d -empty -delete;

Solution

xargs is probably the culprit; you should add the --no-run-if-empty (aka -r) option and specify the delimiter to be \0 (in combination with pdfgrep -lZ):

#!/bin/bash

keywords=(
    "Keyword1"
    "Keyword2"
    "Keyword3"
)

for kw in "${keywords[@]}"
do
    printf 'Matching keyword: %q\n' "$kw"
    folder="$HOME"/Sorted/"$kw"
    mkdir -p "$folder" || exit 1
    pdfgrep -irlZ "$kw" "$HOME"/PDFs/ | xargs -0 -r cp -t "$folder/"
done

echo "Unmatched keywords:"
find "$HOME"/Sorted/ -mindepth 1 -maxdepth 1 -type d -empty -delete -printf "\t%P\n"

Aside: You could create symbolic or even hard links to the PDF (with ... | xargs -0 -r ln -s -t "$folder/") instead of copying them; that’ll be faster and save disk space.

Answered By – Fravadona

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published