Advanced Text Processing with sed and awk

Master powerful text manipulation techniques in Linux

1. Introduction to sed (Stream Editor)

sed is a powerful utility for parsing and transforming text. It's particularly useful for batch editing files.

Basic sed Usage:

sed 's/old_text/new_text/' file.txt # Replace first occurrence in each line sed 's/old_text/new_text/g' file.txt # Replace all occurrences sed -i 's/old_text/new_text/g' file.txt # Edit file in-place

Advanced sed Techniques:

2. Deep Dive into awk

awk is a versatile programming language designed for text processing and typically used as a data extraction and reporting tool.

Basic awk Usage:

awk '{print $1}' file.txt # Print first field of each line awk -F: '{print $1}' /etc/passwd # Use colon as field separator

Advanced awk Techniques:

3. Combining sed and awk

sed and awk can be combined in powerful ways using pipes:

sed 's/,/ /g' data.csv | awk '{sum += $2} END {print "Total:", sum}'

This command replaces commas with spaces in a CSV file, then sums the values in the second column.

4. Real-world Examples

Example 1: Log File Analysis

awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn | head -10

This command finds the top 10 URLs resulting in 404 errors from an Apache access log.

Example 2: Data Cleaning

sed 's/^[ \t]*//' data.txt | sed 's/[ \t]*$//' | sed '/^$/d' | awk '!seen[$0]++'

This series of commands trims leading and trailing whitespace, removes empty lines, and eliminates duplicates.

Example 3: CSV to JSON Conversion

awk -F, 'BEGIN {print "["} {printf " {\"name\": \"%s\", \"age\": %d, \"city\": \"%s\"}%s\n", $1, $2, $3, (NR==NR ? "" : ",")} END {print "]"}' data.csv

This awk script converts a simple CSV file to JSON format.

Tip:

When working with complex sed or awk commands, it's often helpful to build them up incrementally and test each part separately. Use echo or cat to pipe test data into your commands for quick iterations.






Scroll to Top