| Add comments here | |
|
|
| |
11.1The commands grep, echo, df and so on print some
output to the screen. In fact, what is happening on a lower level is that they
are printing characters one by one into a theoretical data stream (also
called a pipe) called the stdout pipe. The shell itself performs
the action of reading those characters one by one and displaying them on the
screen. The word pipe itself means exactly that: a program places data
in the one end of a funnel while another program reads that data from the other
end. The reason for pipes is to allow two seperate programs to perform simple
communications with each other. In this case, the program is merely communicating
with the shell in order to display some output.
|
| |
The same is true with the cat command explained previously. This command
run with no arguments reads from the stdin pipe. By default this is the
keyboard. One further pipe is the stderr pipe which a program writes
error messages to. It is not possible to see whether a program message is caused
by the program writing to its stderr or stdout pipe, because usually both are
directed to the screen. Good programs however always write to the appropriate
pipes to allow output to be specially separated for diagnostic purposes if need
be.
|
| |
|
| |
Create a text file with lots of lines that contain the word GNU and
one line that contains the word GNU as well the word Linux.
Then do grep GNU myfile.txt. The result is printed to stdout as usual.
Now try grep GNU myfile.txt > gnu_lines.txt. What is happening here
is that the output of the grep command is being redirected into
a file. The > gnu_lines.txt tells the shell to create a new file gnu_lines.txt
and fill it with any output from stdout, instead of displaying the output as
it usually does. If the file already exists, it will be truncated11.2.
|
| |
Now suppose you want to append further output to this file. Using >>
instead of > will not truncate the file but append any output
to it. Try this: echo "morestuff" >>
gnu_lines.txt. Then view the contents of gnu_lines.txt.
|
| |
|
| |
The real power of pipes is when one program can read from the output of another
program. Consider the grep command which reads from stdin when given
no arguments: run grep with one argument on the command line:
|
| |
5
|
# grep GNU
A line without that word in it
Another line without that word in it
A line with the word GNU in it
A line with the word GNU in it
I have the idea now
^C
#
|
|
| |
grep's default is to read from stdin when no files are given. As you
can see, it is doing its usual work of printing out lines that have the word
GNU in them. Hence lines containing GNU will be printed twice
- as you type them in and again when grep reads them and decides that
they contain GNU.
|
| |
Now try grep GNU myfile.txt | grep Linux. The first grep outputs
all lines with the word GNU in them to stdout. The | tells
that all stdout is to be typed as stdin (us we just did above) into the next
command, which is also a grep command. The second grep command
scans that data for lines with the word Linux in them. grep
is often used this way as a filter11.3 and be used multiple times eg. grep L myfile.txt | grep i | grep n
| grep u | grep x.
|
| |
|
| |
In a previous chapter we used grep on a dictionary to demonstrate regular expressions.
This is how a dictionary of words can be created:
|
| |
|
cat /usr/lib/ispell/english.hash | strings | tr 'A-Z' 'a-z' \
| grep '^[a-z]' | sort -u > mydict
|
11.4The file english.hash contains the UNIX dictionary normally used for
spell checking. With a bit of filtering you can create a dictionary that will
make solving crossword puzzles a breese. First we use the command strings
explained previously to extract readable bits of text. Here we are using its
alternate mode of operation where it reads from stdin when no files are specified
on its command-line. The command tr (abbreviated from translate
see the tr man page.) then converts upper to lower case. The grep
command then filters out lines that do not start with a letter. Finally the
sort command sorts the words in alphabetical order. The -u
option stands for unique, and specifies that there should be not duplicate
lines of text. Now try less mydict.
|
| |
|
| |
Try the command ls nofile.txt > A. ls should give an error
message if the file doesn't exist. The error message is however displayed, and
not written into the file A. This is because ls has written
its error message to stderr while > has only redirected stdout. The
way to get both stdout and stderr to both go to the same file is to use a redirection
operator. As far as the shell is concerned, stdout is called 1 and stderr
is called 2, and commands can be appended with a redirection
like 2>&1 to dictate that stderr is to be mixed into the output of
stdout. The actual words stderr and stdout are only used in C programming. Try
the following:
|
| |
|
touch existing_file
rm -f non-existing_file
ls existing_file non-existing_file
|
ls will output two lines: a line containing a listing for the file
existing_file and a line containing an error message to explain that
the file non-existing_file does not exist. The error message would
have been written to stderr or file descriptor number 2, and
the remaining line would have been written to stdout or file descriptor
number 1. Next we try
|
| |
|
ls existing_file non-existing_file 2>A
cat A
|
Now A contains the error message, while the remaining output came to
the screen. Now try,
|
| |
|
ls existing_file non-existing_file 1>A
cat A
|
The notation 1>A is the same as >A because the shell assumes
that you are referring to file descriptor 1 when you don't specify
any. Now A contains the stdout output, while the error message has
been redirected to the screen. Now try,
|
| |
|
ls existing_file non-existing_file 1>A 2>&1
cat A
|
Now A contains both the error message and the normal output. The >&
is called a redirection operator. x>&y tells
to write pipe x into pipe y. Redirection is specified
from right too left on the command line. Hence the above command means to mix
stderr into stdout and then to redirect stdout to the file A.
Finally,
|
| |
|
ls existing_file non-existing_file 2>A 1>&2
cat A
|
We notice that this has the same effect, except that here we are doing the reverse:
redirecting stdout into stderr, and then redirecting stderr into a file A.
To see what happens if we redirect in reverse order, we can try,
|
| |
|
ls existing_file non-existing_file 2>&1 1>A
cat A
|
which means to redirect stdout into a file A, and then to
redirect stderr into stdout. This will therefore not mix stderr and stdout because
the redirection to A came first.
|
| |
|
| |
ed used to be the standard text editor for UNIX. It is cryptic
to use, but is compact and programmable. sed stands for stream
editor, and is the only incarnation of ed that is commonly used today.
sed allows editing of files non-interactively. In the way that grep
can search for words and filter lines of text; sed can do search-replace
operations and insert and delete lines into text files. sed is one
of those programs with no man page to speek of. Do info sed to see
sed's comprehensive info pages with examples. The most common usage
of sed is to replace words in a stream with alternative words. sed
reads from stdin and writes to stdout. Like grep, it is line buffered
which means that it reads one line in at a time and then writes that line out
again after performing whatever editing operations. Replacements are typically
done with:
|
| |
|
cat <file> | sed -e 's/<search-regexp>/<replace-text>/<option>' \
> <resultfile>
|
where search-regexp is a regular expression, replace-text is the
text you would like to replace each found occurance with, and option
is nothing or g, which means to replace every occurance in the same
line (usually sed just replaces the first occurance of the regular
expression in each line). (There are other options, see the sed info
page.) For demonstration, type
|
| |
and type out a few lines of english text.
|
| |
sed is actually an extremely powerful and important system of editing.
A complete overview will be done later. Here we will concentrate on searching
and replacing regular expressions.
|
| |
|
| |
The section explains how to do the apparently complex task of moving text around
within lines. Consider for example the output of ls: now say you want
to automatically strip out only the size column -- sed can do this
sort of editing using the special \( \)
notation to group parts of the regular expression together. Consider the following
example:
|
| |
|
sed -e 's/\(\<[^ ]*\>\)\([ ]*\)\(\<[^ ]*\>\)/\3\2\1/g'
|
Here sed is searching for the expression \<.*\>[
]*\<.*\>. From the chapter on regular expressions,
we can see that it matches a whole word, an arbitrary amount of whitespace,
and then another whole word. The \( \)
groups these three so that they can be referred to in replace-text. Each
part of the regular expression inside \( \)
is called a sub-expression of the regular expresion. Each sub-expression
is numbered -- namely \1, \2
etc. Hence \1 in replace-text is the first \<[^ ]*\>,
\2 is [ ]*, and finally, \3
is the second \<[^ ]*\>. Now
test to see what happens when you run this:
|
| |
|
sed -e 's/\(\<[^ ]*\>\)\([ ]*\)\(\<[^ ]*\>\)/\3\2\1/g'
GNU Linux is cool
Linux GNU cool is
|
To return to our ls example (note that this is just an example, to
count file sizes you should rather use the du command), think about
if we would like to sum the bytes sizes of all the files in a directory:
|
| |
|
expr 0 `ls -l | grep '^-' | \
sed 's/^\([^ ]*[ ]*\)\\{4,4\\}\([0-9]*\).*$/ + \2/'`
|
We know that ls -l output lines start with - for ordinary
files. So we use grep to strip lines not starting with -.
If we do an ls -l, we see the output is divided into four columns of
stuff we are not interested in, and then a number indicating the size of the
file. A column (or field) can be described by the regular expression
[^ ]*[ ]*, i.e. a length of text with no whitespace,
followed by a length of whitespace. There are four of these, so we bracket it
with \( \), and then use the \{
\} notation to indicate that we want exactly 4. After
that comes our number [0-9]*, and then any trailing characters
which we are not interested in, .*$. Notice here that we have neglected
to use \< \> notation to indicate whole
words. This is because sed tries to match the maximum number of characters
legally allowed, and in the situation we have here, has exactly the same effect.
|
| |
If you haven't yet figured it out, we are trying to get that column of bytes
sizes into the format like,
|
| |
|
+ 438
+ 1525
+ 76
+ 92146
|
...
so that expr can understand it. Hence we replace each line with sub-expression
\2 and a leading + sign. Backquotes give the
output of this to expr, which sums them studiously, ignoring any newline
characters as though the summation were typed in on a single line. There is
one minor problem here: the first line contains a + with nothing before
it, which will cause expr to complain. To get around this, we can just
add a 0 to the expression, so that it becomes 0 + ....
|
| |
|