Linux tools - wc and cut¶
Learning objectives
- learn about
wc
- try some examples with
wc
- learn about
cut
- try some examples with
cut
wc¶
The Linux wc
command calculates a file’s word, line, character, or byte count (returning the values in that order from left to right).
Syntax¶
Some common options
- -l: list number of lines per file
- -m: list number of characters per file
- -w: list number of words per file
Examples¶
To run the examples, go to the “exercises” -> “piping-wc-cut” directory where there are files that are suitable to run these examples on.
Hint
Type along!
wc on a file
Output:
wc
counted the number of lines, words, and characters in the file “myfile1.txt”. It says there are 4 lines, 15 words, and 80 characters.
wc on several files
Let us run wc
on all files with suffix .txt
Output:
1 9 45 fil2.txt
1 9 43 fil3.txt
2 10 48 fil4.txt
4 22 128 file.txt
1 6 34 fil.txt
0 0 0 myfile0.txt
4 15 80 myfile1.txt
2 10 48 myfile2.txt
7 38 203 myfile3.txt
0 0 0 myfiles.txt
4 22 128 newfile.txt
12 12 33 numbers.txt
0 0 0 thisfile0.txt
0 0 0 thisfile1.txt
0 0 0 thisfile2.txt
0 0 0 thisfile3.txt
0 0 0 thisfile4.txt
0 0 0 thisfile5.txt
0 0 0 thisfile6.txt
0 0 0 thisfile7.txt
0 0 0 thisfile8.txt
0 0 0 thisfile9.txt
0 0 0 thisfile.txt
38 153 790 total
All lines, words, characters in the files with the extension .txt. Also sums up the total.
wc -l to get only the number of lines in a file
Output:
wc combined with a pipe and sort to get the files with suffix .txt in a given order
Output:
bbrydsoe@enterprise:~/exercises/piping-wc-cut$ wc *.txt | sort -n
0 0 0 myfile0.txt
0 0 0 myfiles.txt
0 0 0 thisfile0.txt
0 0 0 thisfile1.txt
0 0 0 thisfile2.txt
0 0 0 thisfile3.txt
0 0 0 thisfile4.txt
0 0 0 thisfile5.txt
0 0 0 thisfile6.txt
0 0 0 thisfile7.txt
0 0 0 thisfile8.txt
0 0 0 thisfile9.txt
0 0 0 thisfile.txt
1 6 34 fil.txt
1 9 43 fil3.txt
1 9 45 fil2.txt
2 10 48 fil4.txt
2 10 48 myfile2.txt
4 15 80 myfile1.txt
4 22 128 file.txt
4 22 128 newfile.txt
7 38 203 myfile3.txt
12 12 33 numbers.txt
38 153 790 total
wc with no input
If you just do wc
without giving any files as input, it will assume it should wait for input. If you just want to escape this, you can do it with CTRL-C (Press the CTRL key and hold it down, then press the C key).
wc - capturing output
Assume you have a large number of files that you want to run wc
on. Then it will not work well to just get the output thrown to screen. It would be much better to get the output to a file, and you can do that this way:
This will take the number of lines for each file and put to the file “filelength.txt”. You can then look inside that file:
bbrydsoe@enterprise:~/exercises/piping-wc-cut$ wc -l *.txt > filelength.txt
bbrydsoe@enterprise:~/exercises/piping-wc-cut$ cat filelength.txt
1 fil2.txt
1 fil3.txt
2 fil4.txt
4 file.txt
1 fil.txt
0 myfile0.txt
4 myfile1.txt
2 myfile2.txt
7 myfile3.txt
0 myfiles.txt
4 newfile.txt
12 numbers.txt
0 thisfile0.txt
0 thisfile1.txt
0 thisfile2.txt
0 thisfile3.txt
0 thisfile4.txt
0 thisfile5.txt
0 thisfile6.txt
0 thisfile7.txt
0 thisfile8.txt
0 thisfile9.txt
0 thisfile.txt
38 total
If you have a lot if files, and so a lot of entries in the “filelength.txt”, it might be better to use something like “less” to look in it so you can look through the file instead of getting it all output to screen.
Exercise¶
Exercise
The “exercises” -> “piping-wc-cut” directory is where there are files that are suitable to run these examples on.
- Use the correct option to
wc
to count the number of words in “file.txt” - Use the correct option to
wc
to count the number of characters in “numbers.txt” - How many lines are there in total in all the files in the directory “piping-wc-cut”?
cut¶
cut
is a command which is used to extract sections from each line of input.
Syntax¶
Extraction of line segments can typically be done by options/flags
- bytes (
-b
) - characters (
-c
) - fields (
-f
)
separated by a delimiter (-d
— the tab character by default).
A range must be provided in each case which consists of one of N, N-M, N- (N to the end of the line), or -M (beginning of the line to M), where N and M are counted from 1 (there is no zeroth value).
The options
- -n in combination with -b suppresses splits of multi-byte characters.
- -s bypasses lines which contain no field delimiters when -f is specified, unless otherwise indicated.
Examples¶
We are again going to use the directory “exercises” -> “piping-wc-cut” as a source of files that are suitable to run these examples on.
cut with the -b flag
The -b n
option returns the first n bytes of a line.
Output:
For reference, this is how the file looks:
cut with the -c flag
A list following -c specifies a range of characters which will be returned
Note!
No difference between the -b and -c option right now.
However, adding multibyte support is in progress and may enable a different behaviour of these two options in the future!
cut with the -f and delimiter flags
Click to see content of file thisfile8.txt
bbrydsoe@enterprise:~/exercises/piping-wc-cut$ cat thisfile8.txt
Hello:helloe:hello:hi there!
What is this! Is this a list: yes, this, is, a, list
Weird list? Normal list: 1, 2, 3, 4, 5, 6, 7, 8
Why not? I need a tab
I will write a longer sentence: there is a delimiter colon in this line
One more line that has a tab and one more and another hahahaha
Delimiter ” ” (space) and fields 2-4:
bbrydsoe@enterprise:~/exercises/piping-wc-cut$ cut -f 2-4 -d " " thisfile8.txt
there!
is this! Is
list? Normal list:
not? I need a
will write a
more line that
Delimiter “:” (colon) and fields 2-4:
bbrydsoe@enterprise:~/exercises/piping-wc-cut$ cut -f 2-4 -d ":" thisfile8.txt
helloe:hello:hi there!
yes, this, is, a, list
1, 2, 3, 4, 5, 6, 7, 8
Why not? I need a tab
there is a delimiter colon in this line
One more line that has a tab and one more and another hahahaha
Delimiter “:” (colon) and fields 3- (from 3 to the end):
bbrydsoe@enterprise:~/exercises/piping-wc-cut$ cut -f 3- -d ":" thisfile8.txt
hello:hi there!
Why not? I need a tab
One more line that has a tab and one more and another hahahaha
Default delimiter (tab) and fields 3-4:
Info
- -c option is useful for fixed-length lines.
- Most unix files doesn’t have fixed-length lines. To extract the useful information you need to cut by fields rather than columns.
- List of the fields number specified must be separated by comma. Ranges are not described with -f option.
- cut uses tab as a default field delimiter but can also work with other delimiter by using -d option.
Example: columns of data
Here we work with the file “data.dat” which is in the same directory as the other files.
Content of data.dat - click to reveal
Cutting column 2 and 3:
Exercises¶
Exercise
The “exercises” -> “piping-wc-cut” directory is where there are files that are suitable to run these examples on.
- Use
cut
and suitable option(s) to print column 1 and 4 of the file “data.dat” - Create a file where you use “:” delimiters in. Use
cut
for different combination of fields and this delimiter - See that you get the same output with options -c and -b for the files you try
- Output the first 4 characters of each line for a file you pick
Summary¶
Keypoints
- we have learned about
wc
- and tried the options -l (lines), -m (characters), and -w (words)
- we have learned about
cut
- and tried the options for selecting with bytes, characters, fields, and ysed delimiters