File Operations and Text Processing in Shell Script: Mastering Bash File Handling

Last updated at: September 08, 2024

Written by: Abdul

Introduction to File Operations and Text Processing

File operations and text processing are crucial skills for any shell script developer. These capabilities allow you to manipulate files, extract information, and transform data efficiently. In this article, we'll explore various techniques for handling files and processing text in Bash scripts.

Basic File Operations

Reading Files

#!/bin/bash

# Reading a file line by line
while IFS= read -r line
do
    echo "$line"
done < "input.txt"

# Reading entire file content
content=$(cat file.txt)
echo "$content"

Writing to Files

# Overwriting a file
echo "New content" > file.txt

# Appending to a file
echo "Additional content" >> file.txt

# Using a here-document
cat << EOF > newfile.txt
This is line 1
This is line 2
EOF

File Permissions and Ownership

# Changing permissions
chmod 755 script.sh

# Changing ownership
chown user:group file.txt

Text Processing with sed

'sed' (Stream Editor) is a powerful tool for text manipulation:

# Replacing text
sed 's/old/new/g' file.txt

# Deleting lines
sed '/pattern/d' file.txt

# Inserting lines
sed '2i\New line' file.txt

# Using sed with variables
pattern="hello"
replacement="world"
sed "s/$pattern/$replacement/g" file.txt

Advanced Text Processing with awk

'awk' is excellent for processing structured text data:

# Printing specific fields
awk '{print $1, $3}' file.txt

# Filtering rows
awk '$3 > 50' data.txt

# Calculating sums
awk '{sum += $2} END {print sum}' numbers.txt

# Using awk with custom field separators
awk -F':' '{print $1}' /etc/passwd

Using grep for Pattern Matching

'grep' is used for searching text using regular expressions:

# Basic search
grep "pattern" file.txt

# Case-insensitive search
grep -i "pattern" file.txt

# Recursive search in directories
grep -r "pattern" /path/to/directory

# Inverting the match
grep -v "pattern" file.txt

Regular Expressions in Bash

Regular expressions are powerful tools for pattern matching:

#!/bin/bash

# Using regex with if statement
if [[ "string" =~ ^str ]]; then
    echo "String starts with 'str'"
fi

# Extracting information with regex
email="user@example.com"
if [[ $email =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$ ]]; then
    echo "Valid email address"
fi

Practical Example: Log File Analyzer

Let's create a script that analyzes a log file, extracting and summarizing information:

#!/bin/bash

log_file="access.log"

# Count total number of entries
total_entries=$(wc -l < "$log_file")

# Count unique IP addresses
unique_ips=$(awk '{print $1}' "$log_file" | sort -u | wc -l)

# Find most common HTTP status code
most_common_status=$(awk '{print $9}' "$log_file" | sort | uniq -c | sort -rn | head -1)

# Extract 404 errors
not_found_errors=$(grep " 404 " "$log_file")

# Output results
echo "Log File Analysis"
echo "-----------------"
echo "Total entries: $total_entries"
echo "Unique IP addresses: $unique_ips"
echo "Most common HTTP status: $most_common_status"
echo "404 Errors:"
echo "$not_found_errors"

# Save 404 errors to a separate file
echo "$not_found_errors" > 404_errors.log

Conclusion

Mastering file operations and text processing in shell scripts opens up a world of possibilities for data manipulation and analysis. These skills are essential for creating powerful and efficient Bash scripts that can handle complex file and text processing tasks.

In our next article, we'll explore advanced topics in shell scripting, including error handling, debugging techniques, and best practices for writing robust scripts. Stay tuned!