Text Processing Exercises

Tip:

Try to solve each exercise on your own before looking at the solution. This will help reinforce your learning and improve your problem-solving skills.

Exercise 1: sed - Selective Replacement

Use sed to replace all occurrences of "apple" with "orange" in a file, but only on lines that contain the word "fruit".

Solution:

sed '/fruit/s/apple/orange/g' input_file.txt

Explanation: This sed command first matches lines containing "fruit", then performs the substitution on those lines only.

Exercise 2: awk - Calculate Average

Use awk to calculate the average of numbers in the third column of a tab-separated file.

Solution:

awk -F'\t' '{sum += $3; count++} END {print "Average:", sum/count}' input_file.txt

Explanation: This awk command sets the field separator to tab, sums up the third column, counts the lines, and calculates the average at the end.

Exercise 3: Combining sed and awk - Email Domain Count

Combine sed and awk to extract email addresses from a log file and count how many times each domain appears.

Solution:

sed -n 's/.*$[a-zA-Z0-9._%+-]\+@[a-zA-Z0-9.-]\+\.[a-zA-Z]\{2,4\}$.*/\1/p' log_file.txt | awk -F'@' '{print $2}' | sort | uniq -c | sort -nr

Explanation: This pipeline uses sed to extract email addresses, awk to isolate the domain part, then sorts and counts unique occurrences.

Exercise 4: sed - Multi-line Replacement

Use sed to replace all occurrences of "START" with "BEGIN" and "END" with "FINISH", but only within blocks of text between "SECTION" and "ENDSECTION".

Solution:

sed '/SECTION/,/ENDSECTION/{s/START/BEGIN/g; s/END/FINISH/g}' input_file.txt

Explanation: This sed command uses address ranges to limit the substitutions to the specified blocks of text.

Exercise 5: awk - CSV to JSON Conversion

Use awk to convert a CSV file with headers to JSON format.

Solution:

awk -F',' 'NR==1{split($0,h);next} {print "{"; for(i=1;i<=NF;i++) printf "\"%s\": \"%s\"%s\n", h[i], $i, (i==NF?"":","); print "}"}' input_file.csv

Explanation: This awk script reads the headers, then formats each subsequent line as a JSON object.

Exercise 6: sed - XML Tag Removal

Use sed to remove all XML tags from a file, leaving only the text content.

Solution:

sed -e 's/<[^>]*>//g' -e '/^$/d' input_file.xml

Explanation: This sed command removes anything between < and >, then deletes any resulting empty lines.

Exercise 7: awk - Log Analysis

Use awk to analyze an Apache access log and report the top 5 IP addresses by number of requests.

Solution:

awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -n 5

Explanation: This pipeline extracts IP addresses, counts occurrences, sorts by frequency, and displays the top 5.

Practice Tip:

Create sample input files to test these solutions.
Modify the exercises to work with your own data or create variations to further challenge yourself.

Further Practice

For more advanced exercises and real-world scenarios, check out our additional resources: