Text Processing in Bash

1. Introduction to Text Processing in Bash

Bash provides powerful tools for processing and manipulating text data. These tools are essential for system administration, log analysis, and data processing tasks.

2. grep: Pattern Matching and Searching

grep is used to search for patterns in text files.

# Search for a pattern in a file grep "error" logfile.txt # Case-insensitive search grep -i "warning" logfile.txt # Recursive search in directories grep -r "TODO" /path/to/project/ # Show line numbers grep -n "important" document.txt # Invert match (show lines that don't match) grep -v "exclude" data.txt

3. sed: Stream Editor

sed is used for text transformation and substitution.

# Replace first occurrence of a pattern in each line sed 's/old/new/' file.txt # Replace all occurrences of a pattern sed 's/old/new/g' file.txt # Delete lines matching a pattern sed '/pattern/d' file.txt # Insert text at a specific line sed '2i\Inserted text' file.txt # Apply multiple sed commands sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

4. awk: Text Processing and Data Extraction

awk is a powerful tool for processing structured text data.

# Print specific columns (fields) awk '{print $1, $3}' file.txt # Use custom field separator awk -F':' '{print $1}' /etc/passwd # Sum values in a column awk '{sum += $3} END {print sum}' data.txt # Filter rows based on a condition awk '$3 > 100 {print $0}' data.txt # Replace field contents awk '{$2 = "new_value"; print}' file.txt

5. cut: Extract Sections from Lines

cut is used to extract specific parts of lines from files.

# Extract characters by position cut -c 1-5 file.txt # Extract fields using a delimiter cut -d':' -f1,3 /etc/passwd # Complement selection cut -d',' --complement -f2 data.csv

6. sort: Sort Lines of Text

sort is used to sort lines of text files.

# Sort lines alphabetically sort file.txt # Sort numerically sort -n numbers.txt # Sort by specific field sort -k2 data.txt # Remove duplicate lines after sorting sort -u file.txt # Sort in reverse order sort -r file.txt

7. uniq: Report or Omit Repeated Lines

uniq is often used in combination with sort to handle duplicate lines.

# Remove duplicate consecutive lines sort file.txt | uniq # Count occurrences of lines sort file.txt | uniq -c # Display only duplicate lines sort file.txt | uniq -d # Display only unique lines sort file.txt | uniq -u

8. tr: Translate or Delete Characters

tr is used for translating or deleting characters.

# Convert lowercase to uppercase echo "hello" | tr 'a-z' 'A-Z' # Delete specific characters echo "hello 123" | tr -d '0-9' # Squeeze repeating characters echo "hello world" | tr -s ' '

Tip: Combining Commands

These tools are most powerful when combined using pipes (|). For example:

cat logfile.txt | grep "error" | sort | uniq -c | sort -nr

This command will find all "error" lines in logfile.txt, sort them, count unique occurrences, and display the results sorted by frequency.

9. Regular Expressions

Many of these tools support regular expressions for powerful pattern matching:

. (dot): Matches any single character
* (asterisk): Matches zero or more occurrences of the previous character
^ (caret): Matches the start of a line
$ (dollar): Matches the end of a line
[ ]: Matches any one of the enclosed characters