Bash is my favorite shell by far. I have known an occasional csh or tcsh aficionado, and I’m sure that those shells have their own virtues, but for me, bash is the end-all be-all of command line interfaces. Come with me now as I wax nerdy all over bash’s pattern matching capabilities.
At work, I often have to make changes to a number of files that all have the same name, but that are stored in a variety of different folders. The challenge is quickly uploading all of these changed files to the server. Because they are in different folders, I would normally have to drag them, one at a time, into the FTP client.
My solution was to write a bash script that would pull these files together into a “staging” folder on my local drive, mirroring the directory structure they live in so that I can drag the entire tree into my FTP client and let it do its work. I know ahead of time what the directory names will be; they are numbers representing the sites. Here is what a typical folder structure might look like:
ROOT
folder_1
660
folder_2
file.txt
671
folder_2
file.txt
What I want to be able to do is simply indicate that I want all of the file.txt files to be staged onto my hard drive and have bash go and find the 660, 671, etc. folders based on their numbers and copy them. This is probably more background information than you need to appreciate the bash tricks I’ve used, but it helps to know what on Earth the script is supposed to do.
The concept of the script is to take a list of files (with relative paths) and for each one try to find one of the preemptively known site numbers in our list. Once we know that number, we can loop through all of the numbers, replacing the original number with each new one and using those generated paths to copy the files to my local drive. That way, we end up with a copy of each file that lives in a structure containing each of the numbers in our list.
For me, the challenge was knowing how to evaluate the matching and stem the paths directly within bash. I wanted to avoid writing this in Ruby and I also wanted to avoid doing a lot of shell execution to run things like tr or awk. Here is the code (implemented as a bash function):
export SITES="660 671 672 685 730 761"
function stagefile() {
if [ -z "$1" ]; then
echo "You must provide a file or files to stage."
return 0
fi
for file in $*; do
for site in $SITES; do
if [[ "$file" == *$site* ]]; then
FILESITE="$site"
fi
done
for site in $SITES; do
NEWFILE="${file//$FILESITE/$site}"
NEWDIR="${NEWFILE%/*}"
cp -v --parents "$NEWFILE" "/c/Staging"
done
done
}
The first little bash gem that I discovered is the ability to use the [[ ... ]] syntax to perform a string matching test. When you use the == operator, the right side is used as a pattern to match against the left side using bash “glob” syntax. In this example, I am just checking each site number to see if it occurs within my original path. If and when I find one of them, I save it in FILESITE.
The next step is to loop through all of the site numbers. With each one, I replace all occurrences of the originally found number with the current one, use that path to go find a file and then copy it to my local drive using cp’s parents switch, which causes cp to create directories as necessary to duplicate the structure.
The trick to parents is that the destination must be a directory, so I also use bash replacement to trim off the filename from the path given (as well as the trailing forward slash).
The replacement trick I used is ${file//$FILESITE/$site}, which will take the contents of $file and replace all occurrences of $FILESITE with the value of $site. The syntax is reminiscent of good old fashioned s//, and the first double slash simply means “replace all.” A single slash there would only replace the first occurrence (from left to right).
Then, to remove the file from the end of the path, I used ${NEWFILE%/*}, which takes the contents of the variable $NEWFILE and removes the part that matches /*. There are two forms of this replacement operation: the first uses a percent sign (as in my example) and searches from the end of the string, whereas the second uses a number (or pound or hash) sign and searches from the beginning. Using a single symbol is non-greedy and using two of them is greedy. So, to summarize, %/* searches from the end of the string and non-greedily removes characters. The pattern starts with a forward slash, so it stops matching when it comes to the last forward slash in the string.
I can demonstrate some of these techniques using a little bash interactive example:
$ MYFILE="/usr/local/share/temp.txt"
$ echo $MYFILE
/usr/local/share/temp.txt
$ echo "${MYFILE%/*}"
/usr/local/share
$ echo "${MYFILE%%/*}"
$ MYFILE="usr/local/share/temp.txt"
$ echo "${MYFILE%%/*}"
usr
I think this replacement functionality is one of the least-known and most commonly useful tricks in anyone’s bash toolbox and I’m surprised at how little coverage it receives in the “bash scripting primers” out there. Most useful is the ability to make sure that a given path does or doesn’t end with a trailing slash, which is instrumental in concatenating paths, especially when dealing with user-supplied arguments.
Let’s say you will receive a path fragment as an argument and you need to append a filename to the end of it, but you don’t know whether the fragment will end with a slash. Here’s how you might do it:
function appendpath() {
echo "${1%/}/file.txt"
}
Of course, this function will not work correctly if the user-supplied path ends with a whole string of forward slashes, but it handles the two main use cases: one slash or no slashes. The replacement simply says “remove a slash from the end of the variable called 1,” which does nothing if the slash isn’t there. Then you can confidently add your own slash and the filename. Give it a try!