Advanced awk Exercises

Challenge yourself with these complex awk programming tasks

Tip:

These exercises are designed to push your awk skills to the next level. Don't hesitate to consult the awk manual (man awk) or experiment with different approaches.

Exercise 1: Custom CSV Parser

Create an awk script that can parse a CSV file, correctly handling fields that contain commas within quoted strings.

Sample Input:

Name,Age,"Address, City",Country
John Doe,30,"123 Main St, Anytown",USA
Jane Smith,28,"456 Elm St, Somewhere",Canada

Solution:

awk 'BEGIN {FS = ","; FPAT = "([^,]+)|(\"[^\"]+\")"} { for (i=1; i<=NF; i++) { gsub(/^"|"$/, "", $i) # Remove surrounding quotes printf "%s%s", $i, (i==NF ? "\n" : "|") } }' input.csv

This script uses FPAT to define field patterns, correctly parsing quoted fields with commas. It then removes the quotes and outputs the fields separated by '|'.

Exercise 2: Advanced Log Analysis

Analyze a web server log to generate a report of unique visitors per hour, sorted by the hour with the most visitors.

Sample Input (simplified log format):

2023-05-01 08:30:45 192.168.1.1
2023-05-01 08:45:30 192.168.1.2
2023-05-01 09:15:20 192.168.1.1
2023-05-01 09:30:10 192.168.1.3

Solution:

awk '{ split($2, time, ":") hour = time[1] ip = $3 visits[hour][ip]++ } END { for (h in visits) { unique = length(visits[h]) print h, unique } }' log_file.txt | sort -k2 -nr

This script extracts the hour and IP from each log entry, counts unique IPs per hour, and outputs the results. The sort command then orders the output by visitor count.

Exercise 3: Data Normalization

Normalize a dataset by calculating the z-score for each value in a column.

Sample Input:

Name Score
Alice 85
Bob 92
Charlie 78
David 88

Solution:

awk 'NR==1 {print $0, "Z-Score"; next} { sum += $2 sqsum += $2 ^ 2 data[NR] = $0 } END { n = NR - 1 mean = sum / n stddev = sqrt((sqsum - n * (mean ^ 2)) / (n - 1)) for (i=2; i<=NR; i++) { split(data[i], fields) zscore = (fields[2] - mean) / stddev printf "%s %.2f\n", data[i], zscore } }' input.txt

This script calculates the mean and standard deviation of the scores, then computes and appends the z-score for each entry.

Exercise 4: Advanced Text Processing

Create an awk script that can find and highlight the longest common substring between two lines of text.

Sample Input:

The quick brown fox jumps over the lazy dog
A quick brown dog jumps over the lazy fox

Solution:

awk ' function lcs(X, Y, m, n, L, i, j, result) { m = split(X, x, "") n = split(Y, y, "") for (i = 0; i <= m; i++) for (j = 0; j <= n; j++) if (i == 0 || j == 0) L[i,j] = 0 else if (x[i] == y[j]) L[i,j] = L[i-1,j-1] + 1 else L[i,j] = (L[i-1,j] > L[i,j-1]) ? L[i-1,j] : L[i,j-1] result = "" while (m > 0 && n > 0) { if (x[m] == y[n]) { result = x[m] result m--; n-- } else if (L[m-1,n] > L[m,n-1]) m-- else n-- } return result } { if (NR == 1) line1 = $0 else if (NR == 2) { line2 = $0 common = lcs(line1, line2) print line1 print line2 print "Longest common substring: " common } }' input.txt

This script implements the Longest Common Substring algorithm using dynamic programming, then applies it to find the common substring between two lines of text.

Challenge Yourself:

Try modifying these exercises or combining techniques from different exercises to create even more complex awk scripts. The more you practice, the more proficient you'll become with awk's powerful features.

Further Learning

To continue improving your awk skills, consider exploring:






Scroll to Top