1. Introduction to Text Processing in Bash
Bash provides powerful tools for processing and manipulating text data. These tools are essential for system administration, log analysis, and data processing tasks.
2. grep: Pattern Matching and Searching
grep is used to search for patterns in text files.
# Search for a pattern in a file
grep "error" logfile.txt
# Case-insensitive search
grep -i "warning" logfile.txt
# Recursive search in directories
grep -r "TODO" /path/to/project/
# Show line numbers
grep -n "important" document.txt
# Invert match (show lines that don't match)
grep -v "exclude" data.txt
3. sed: Stream Editor
sed is used for text transformation and substitution.
# Replace first occurrence of a pattern in each line
sed 's/old/new/' file.txt
# Replace all occurrences of a pattern
sed 's/old/new/g' file.txt
# Delete lines matching a pattern
sed '/pattern/d' file.txt
# Insert text at a specific line
sed '2i\Inserted text' file.txt
# Apply multiple sed commands
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt
4. awk: Text Processing and Data Extraction
awk is a powerful tool for processing structured text data.
# Print specific columns (fields)
awk '{print $1, $3}' file.txt
# Use custom field separator
awk -F':' '{print $1}' /etc/passwd
# Sum values in a column
awk '{sum += $3} END {print sum}' data.txt
# Filter rows based on a condition
awk '$3 > 100 {print $0}' data.txt
# Replace field contents
awk '{$2 = "new_value"; print}' file.txt
5. cut: Extract Sections from Lines
cut is used to extract specific parts of lines from files.
# Extract characters by position
cut -c 1-5 file.txt
# Extract fields using a delimiter
cut -d':' -f1,3 /etc/passwd
# Complement selection
cut -d',' --complement -f2 data.csv
6. sort: Sort Lines of Text
sort is used to sort lines of text files.
# Sort lines alphabetically
sort file.txt
# Sort numerically
sort -n numbers.txt
# Sort by specific field
sort -k2 data.txt
# Remove duplicate lines after sorting
sort -u file.txt
# Sort in reverse order
sort -r file.txt
7. uniq: Report or Omit Repeated Lines
uniq is often used in combination with sort to handle duplicate lines.
# Remove duplicate consecutive lines
sort file.txt | uniq
# Count occurrences of lines
sort file.txt | uniq -c
# Display only duplicate lines
sort file.txt | uniq -d
# Display only unique lines
sort file.txt | uniq -u
8. tr: Translate or Delete Characters
tr is used for translating or deleting characters.
# Convert lowercase to uppercase
echo "hello" | tr 'a-z' 'A-Z'
# Delete specific characters
echo "hello 123" | tr -d '0-9'
# Squeeze repeating characters
echo "hello world" | tr -s ' '
Tip: Combining Commands
These tools are most powerful when combined using pipes (|). For example:
cat logfile.txt | grep "error" | sort | uniq -c | sort -nr
This command will find all "error" lines in logfile.txt, sort them, count unique occurrences, and display the results sorted by frequency.
9. Regular Expressions
Many of these tools support regular expressions for powerful pattern matching:
- . (dot): Matches any single character
- * (asterisk): Matches zero or more occurrences of the previous character
- ^ (caret): Matches the start of a line
- $ (dollar): Matches the end of a line
- [ ]: Matches any one of the enclosed characters