I want to share my personal notes of the edX course Unix Tools: Data, Software and Production Engineering, by Prof. Diomidis Spinellis. I attended this course from March to June 2020. It was my first MOOC experience. I have to say that I learned a lot in this course, and it makes me consider online education very seriously, as it provides an excellent way of learning from top courses, given by top universities, and tough by top experts in the world.
These notes are not organized in any specific manner.
So, they are actually a bunch of very disordered Unix command line hacks and tricks .
Display elapsed time from the start to the end of the process
Computes the average number of characters per line across the files in a directory.
1
2
3
4
5
6
7
8
ls |
while read name ;do# For every entryif[-f$name-a-r$name];then# If is a regular file and readableecho-n"$name "expr$(wc-c <$name) / $(wc-l <$name)#display average characters per linefi
done |
head
Conditionals
If
Create two files, and check if sourcefile is older than destfile.
1
2
3
4
5
6
7
8
touch sourcefile
touch destfile
# Make source newer than destinationif test[ soutcefile -nt destfile ];then
cp sourcefile destfile
echo Refreshed destfile
fi
If-else
1
2
3
4
5
6
if test[ soutcefile -nt destfile] ;then
cp sourcefile destfile
echo Refreshed destfile
else
echo destfile is up to date
fi
xargs
Executes the commands repeatedly to the output. Apply a set of commands as arguments to a command. The following program counts the number of lines of files in the current directory.
1
2
3
find .-type f | # Output the name of all files
xargs cat | # Combining them by applying catwc-l# Count number of lines
case
Allows running specific command based on pattern matching.
1
case$(uname)in Linux
Data processing flow
1
2
3
4
5
6
git clone gitrepo;cd gitrepo
git log --pretty=format:%aD master | # Fetch commit datescut-d, -f1# Select the weekdaysort | # Bring all weekdays togetheruniq-c | # Count all weekday occurrencessort-rn# Order by descending popularity
Append a timestamp to a log file
1
echo$(date): operation failed >>log-file
Fetching data
From the web
We can invoke a web service to get some results and then pipe to jq to output the result in pretty-print format.
1
2
curl -s"http://api.currencylayer.com/\ Thursday 26 March 07:04
live?access_key=$API_KEY&source=USD¤cies=EUR" | jq .
From a MySQL database
1
2
echo"SELECT COUNT(*) FROM projects" | # SQL query
mysql -ughtorrent-p ghtorrent # MySQL client and database
1
2
3
4
5
6
echo'select url from projects limit 3' | # Obtain URL of first three projects
mysql -ughtorrent-p ughtorrent | # Invoke MySQL clientwhile read url ;do
curl -s$url | # Fetch project's details
jq -r'{owner: .owner.login, name: .name, pushed: .pushed_at}'# Print owner, project, and last push timedone
Archives
List the content of an archive file in the web without pushing its content in the disk.
1
2
3
curl -Ls https://github.com/castor-software/depclean/archive/1.0.0.tar.gz | # Download tar filetar-tzvf - | # -t list content, z- indicates zip compression, -v is verbose, -f retrieve file to the output of curl head-10# list first 10 entries
Decompress the file in the disk
1
2
curl -Ls https://github.com/castor-software/depclean/archive/1.0.0.tar.gz | # Download tar filetar-xzf -
git log --pretty=format:%ae | # list each commit author email Saturday 4 April 10:24 148 ↵sort | # Bring emails togetheruniq-c | # Count occurrencesort-rn | # Order by numberhead
What is the file the largest number of changes?
1
2
3
4
5
6
7
find .-type f -print |
while read f ;do# For each fileecho-n"$f "# Prints its name on a single line
git log --follow--oneline$f | wc-l# Count the number of changesdone |
sort-k 2nr | # Sort by the second field in reverse numerical orderhead
What are the changes made to a file?
1
2
git blame --line-porcelain src/main/java/spoon/Launcher.java | # obtain line metadatahead-15
Which author has contributed more to a file?
1
2
3
4
5
6
git blame --line-porcelain src/main/java/spoon/Launcher.java |
grep"^author " | #Show each line's authorsort | # Order by authoruniq-c | # Count author instancessort-rn | # Order by counthead
What is the average date of all lines in a file?
1
2
date +%s # show date in epochdate-d @1585990273 # parse data from epoch to date
1
2
3
4
5
date-d @$(
git blame --line-porcelain src/main/java/spoon/Launcher.java |
awk"/^author-time / {sum += $2; count++} # Maintain sum and count of commit times
END {printf("%d", sum / count)}")
What is the evolution of the file size?
1
2
3
4
5
6
7
8
file=src/main/java/spoon/Launcher.java # File to examine
git log --pretty=format:%H -3$ $file# Show SHA of commmits
git log --pretty=format:%H $file | # Obtain commits' SHAwhile read sha ;do# For each SHA
git show $sha:$file | # List files stated at that commitwc-ldone |
head-15# First 15 entries
System administration
Unix store administrative date in /etc (stands for “extreme technical context”)
Generators
1
2
3
for i in$(seq 50);do
echo-n"."# displais 50 dotsdone
Regular expressions
grep
1
2
3
4
5
cd /usr/share/dict
grep baba words # All lines (words) containing babagrep"^baba" words # All lines (words) starting with babagrep"baba$" words # All lines (words) ending with babagrep a.a.a.a words # Words containing a followed by anything
1
2
3
4
5
6
7
8
9
grep"^t.y$" words # Three letter words starting with t, ending with ygrep"^....$" words | wc-l# Number of four letter wordsgrep"^k.*d.*k$" words # Words starting with k, ending with k, and with a d in betweengrep"^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$" words | wc-l# words that follow the alphabetical ordergrep"[0-9]" words # Lines containing a digitgrep"^k.[bdgkpt].k$" words # Words with a hard consonant between ksgrep"^[A-Z]" words | wc-l# Number of proper nounsgrep"[^A-Za-z]" words # Lines with non-alphabetic characters
find ~/Downloads | grep"[[:space:]]"# List files with space characters
egrep (or grep -E)
1
2
3
4
5
6
grep-E"s{2}" words # Words with two secuentical s charactersgrep-E"[^aeiouy]{7}" words # Words with seven consonantsgrep-E"^.{,15}$" words | wc-l# Words with a length up to 15grep-E"^.{15,}$" words | wc-l# Words with at least 15 charactersgrep-E"^(.).*\1$" words | head# Words beginning and ending with the same character (the character in parentesis is referenced with \1)grep-E"^(.)(.)((.)\4)?\2\1$" words # Find 3-4 letter palindromes
Alternative matches.
1
2
grep-E"^(aba|ono).*(ly|ne)$" words # Words with alternative start/end partsgrep-l vfs *.c # List C files containing vfs
Matches in files (grep -F)
1
2
3
grep-rl read_iter . | head-5# Search recursively all the files that contain the string read_itergrep-F ... *.c | head
Other tools
cut
1
2
cd /etc/passwd
cut-d: -f 1 /etc/passwd | head-4# Output the first field
awk
1
2
3
4
awk"/bash/" /etc/passwd # Output lines containing "bash"awk-F: "$3 > 1000" /etc/passwd # Lines where field 3 > 1000awk-F: "{print $1}" /etc/passw | head-5# Output field 1awk"!/^#/ {print $1}" /etc/services | head# Print first field that doesn't match the regular expression
ack
ack - grep-like text finder
1
2
ack --ruby foo # Find foo in occurrences in ruby files
ack abc -l# List files with the occurrence of abc character
Processing
Sorting
1
2
3
4
5
6
7
8
sort-k 2 dates | head-5# Sort by second and subsequent fields (space separated)sort-k 4r dattes | head# Sorts in reverse ordersort-k 3M -k 2n dates | head# sort 3rd field (Month) in chronological order, then second field (Month day) in numberical ordersort-t : -k 4n /etc/passwd | head# Sort by numeric group-idsort-u /etc/passw | head# sort the unique elementssort dates | sort-C&&echo"Sorted"
The comm command allows to select or reject lines common to two files. Both files must be sorted.
1
comm linux.bin freebsd.bin
sed
substitutions
Create JSON from list of items.
1
2
3
4
5
6
7
8
9
10
11
vim tojson.sed
#!/bin/bash
li``[# Insert [ at the beginning
s/.*/ "&",/ # Convert lines into strings$a\ # Append ] at the end]
EOF
ls /usr | tojson.sed
awk
Summarize size of of files in a directory.
1
2
3
4
5
6
7
8
9
ll > contents.txt
awk'
{ size += $5; n++ } # Sum size and number of files
END { # Print summary
print "Total size " size
print "Number of files " n
print "Average file size " size/n
}
' contents.txt
Count the number of file extension
1
2
3
4
5
6
7
8
9
10
11
12
13
14
ll > contents.txt
awk'
{
sub(".*/", "", $9) # Remove path
if (!sub(".*\\.", "", $9)) # Keep only extension
next # Skip files without extension
size[$9] += $5 # Tally size of extension
}
END {
for (i in size)
print i, size[i]
}' content.txt |
sort-k 2nr |
head
diff
1
diff -u a b
$ patch
1
patch john.c <mary.patch # Patch John's copy with Mary's patch
Testing and expressions
test
1
2
3
4
5
6
7
8
9
10
test-d / &&echo Directory # Test if root is a directorytest-f / &&echo File # Test if root is a filetest hi = there &&echo Same # Test if strings are equaltest hi != hi &&echo Different # Test if strings are differenttest-z""&&echo Empty # Test if string is emptytest-n"a string"&&echo Non-empty # Test if string is non emptytest 32 -eq 42 &&echo Equal # Test integers are equaltest 32 -lt 50 &&echo Less than # Test if integer less than othertest.-nt / &&echo. is newer than / # Test if a file is newer than othertest-w / &&echo Writable # Test if a file is writable
expr
1
2
3
4
5
6
7
8
9
10
expr 1 + 2
expr 2 \* 10
expr 12 \% 5
expr 10 \< 50
expr 5 = 12 # Test of equalityexpr John \> Mary # Compare stringsexpr\(1 + 20 \)\* 2
expr length "To be or not to be"# String lengthexpr substr "To be or not to vbe" 4 2 # Substring of 2 from 4expr""\| b # Short-circuit OR (first part failed)
1
2
3
4
5
i=0
while[$i-lt 10 ];do
echo$ii=$((i +1))done
Dealing with characters
1
2
echo'This is a test' | tr' ' - # Replace space with -echo'This is a test' | tr a-z A-Z # Replace a-z A-Z
shuf-n 5 /usr/share/dict/words # Output five random wordsshuf-n 1 -e heads tails # Throw a single coin
split
1
split-l 10000 -d /usr/share/dict/words # Split the dictionary
rs
1
head /etc/passwd | rs -c: -C: -T# Transposes the output
Graphs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cho "digraph talks {
bob [gender="male"];
eliza [gender="female"];
fred [gender="male"];
john [gender="male"];
mary [gender="female"];
steve [gender="male"];
sue [gender="female"];
mark [gender="male"];
john -> mary;
john -> bob;
mary -> sue;
sue -> bob;
sue -> mary;
fred -> bob;
eliza -> steve;
}"> talk.dot
Count nodes
1
2
gvpr 'N {clone($0, $)}' talk.dot # Clone each node to the output graph
gvpr
Images & sound
Create a symbolic link to a file or directory:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
tifftopnm image.tiff | pnmtopng >image.png # Convert from TIFF to PNGfor i in*.tiff ;do# For each TIFF file>pngname=$(basename$i .tiff).png # Compute name of PNG file> tifftopnm $i | # Convert image to PNM> pnmtopng >$pngname# Convert and write to PNG file>done
tifftopnm image.tiff | # Convert TIFF to PNM> pamscale -width=1024 | # Scale to 1024 pixels> pnmtopng >image.png # Write the result in PNG format
jpegtopnm plate.jpeg |
> pamflip -r90 | # Rotate the image by 90 degrees> pamscale -width=1024 | # Scale to 1024 pixels> pnmtojpeg >rplate.jpeg # Write the result in JPEG format
play -q sox-orig.wav
sox sox-orig.wav sox-orig.mp3 # Convert between file formats
sox sox-orig.wav sox-low.wav pitch -600# Lower pitch by 600 cents
play -q sox-low.wav
sox sox-orig.wav sox-fast.wav tempo 1.5 # Increase tempo by 50%
play -q sox-fast.wav
sox sox-orig.wav sox-chorus.wav chorus 0.5 0.9 50 0.4 0.25 2 -t\
60 0.32 0.4 2.3 -t 40 0.3 0.3 1.3 -s# Apply chorus effect
play -q sox-chorus.wav
wget -q-O persephone.mp3 \
http://ccmixter.org/content/hansatom/hansatom_-_Persephone.mp3 # By Hans Atom (CC BY 2.5)
sox persephone.mp3 persephone-trimmed.mp3 fade 0 0:06 1 # Trim to 6s with 1s fade-out
play -q persephone-trimmed.mp3
sox --combine mix -v 0.2 persephone-trimmed.mp3 sox-orig.wav \
sox-persephone.mp3 # Mix the two audio files
play -q sox-persephone.mp3
Good practices
Output error
1
echo Error >&2 # Send output to standard error
Clean up temporary files when script execution finishes
1
2
3
4
5
6
7
8
cat>tmpdir.sh <<\EOF
#!/bin/sh
TMPDIR="${TMP:-/tmp}/$$" # Create temporary directory name
trap 'rm -rf "$TMPDIR"' 0 # Remove it when exiting
trap 'exit 2' 1 2 15 # Exit when the program is interrupted
mkdir "$TMPDIR" # Create the directory
# Do some work in $TMPDIR
EOF
Prefer redirection to pipes
1
command <file # A redirection is all that's needed
Test command, not its exit code
1
2
3
if!command;then# A simple negation will doecho Error >&2
fi
grep can recurse directories
1
grep-r pattern .# Modern recursive search
Prefer wildcards to ls
1
2
3
for i in*;do# can be replaced by a wildcard...done
Replace awk with cut
1
2
3
cut-d : -f 1,7 # More efficient way to print fields 1 and 7expr"$LANG" : '.*\.\(.*\)'# More efficient way to isolate encoding
UTF-8