/[sed]/sed/manual/sed.txt
ViewVC logotype

Contents of /sed/manual/sed.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.6 - (show annotations) (download)
Wed Jan 15 04:53:50 2020 UTC (4 years, 9 months ago) by meyering
Branch: MAIN
CVS Tags: HEAD
Changes since 1.5: +2 -2 lines
File MIME type: text/plain
4.8

1 GNU 'sed'
2 1 Introduction
3 2 Running sed
4 2.1 Overview
5 2.2 Command-Line Options
6 2.3 Exit status
7 3 'sed' scripts
8 3.1 'sed' script overview
9 3.2 'sed' commands summary
10 3.3 The 's' Command
11 3.4 Often-Used Commands
12 3.5 Less Frequently-Used Commands
13 3.6 Commands for 'sed' gurus
14 3.7 Commands Specific to GNU 'sed'
15 3.8 Multiple commands syntax
16 3.8.1 Commands Requiring a newline
17 4 Addresses: selecting lines
18 4.1 Addresses overview
19 4.2 Selecting lines by numbers
20 4.3 selecting lines by text matching
21 4.4 Range Addresses
22 5 Regular Expressions: selecting text
23 5.1 Overview of regular expression in 'sed'
24 5.2 Basic (BRE) and extended (ERE) regular expression
25 5.3 Overview of basic regular expression syntax
26 5.4 Overview of extended regular expression syntax
27 5.5 Character Classes and Bracket Expressions
28 5.6 regular expression extensions
29 5.7 Back-references and Subexpressions
30 5.8 Escape Sequences - specifying special characters
31 5.8.1 Escaping Precedence
32 5.9 Multibyte characters and Locale Considerations
33 5.9.1 Invalid multibyte characters
34 5.9.2 Upper/Lower case conversion
35 5.9.3 Multibyte regexp character classes
36 6 Advanced 'sed': cycles and buffers
37 6.1 How 'sed' Works
38 6.2 Hold and Pattern Buffers
39 6.3 Multiline techniques - using D,G,H,N,P to process multiple lines
40 6.4 Branching and Flow Control
41 6.4.1 Branching and Cycles
42 6.4.2 Branching example: joining lines
43 7 Some Sample Scripts
44 7.1 Joining lines
45 7.2 Centering Lines
46 7.3 Increment a Number
47 7.4 Rename Files to Lower Case
48 7.5 Print 'bash' Environment
49 7.6 Reverse Characters of Lines
50 7.7 Text search across multiple lines
51 7.8 Line length adjustment
52 7.9 Reverse Lines of Files
53 7.10 Numbering Lines
54 7.11 Numbering Non-blank Lines
55 7.12 Counting Characters
56 7.13 Counting Words
57 7.14 Counting Lines
58 7.15 Printing the First Lines
59 7.16 Printing the Last Lines
60 7.17 Make Duplicate Lines Unique
61 7.18 Print Duplicated Lines of Input
62 7.19 Remove All Duplicated Lines
63 7.20 Squeezing Blank Lines
64 8 GNU 'sed''s Limitations and Non-limitations
65 9 Other Resources for Learning About 'sed'
66 10 Reporting Bugs
67 Appendix A GNU Free Documentation License
68 Concept Index
69 Command and Option Index
70 GNU 'sed'
71 *********
72
73 This file documents version 4.8 of GNU 'sed', a stream editor.
74
75 Copyright (C) 1998-2020 Free Software Foundation, Inc.
76
77 Permission is granted to copy, distribute and/or modify this
78 document under the terms of the GNU Free Documentation License,
79 Version 1.3 or any later version published by the Free Software
80 Foundation; with no Invariant Sections, no Front-Cover Texts, and
81 no Back-Cover Texts. A copy of the license is included in the
82 section entitled "GNU Free Documentation License".
83
84 1 Introduction
85 **************
86
87 'sed' is a stream editor. A stream editor is used to perform basic text
88 transformations on an input stream (a file or input from a pipeline).
89 While in some ways similar to an editor which permits scripted edits
90 (such as 'ed'), 'sed' works by making only one pass over the input(s),
91 and is consequently more efficient. But it is 'sed''s ability to filter
92 text in a pipeline which particularly distinguishes it from other types
93 of editors.
94
95 2 Running sed
96 *************
97
98 This chapter covers how to run 'sed'. Details of 'sed' scripts and
99 individual 'sed' commands are discussed in the next chapter.
100
101 2.1 Overview
102 ============
103
104 Normally 'sed' is invoked like this:
105
106 sed SCRIPT INPUTFILE...
107
108 For example, to replace all occurrences of 'hello' to 'world' in the
109 file 'input.txt':
110
111 sed 's/hello/world/' input.txt > output.txt
112
113 If you do not specify INPUTFILE, or if INPUTFILE is '-', 'sed'
114 filters the contents of the standard input. The following commands are
115 equivalent:
116
117 sed 's/hello/world/' input.txt > output.txt
118 sed 's/hello/world/' < input.txt > output.txt
119 cat input.txt | sed 's/hello/world/' - > output.txt
120
121 'sed' writes output to standard output. Use '-i' to edit files
122 in-place instead of printing to standard output. See also the 'W' and
123 's///w' commands for writing output to other files. The following
124 command modifies 'file.txt' and does not produce any output:
125
126 sed -i 's/hello/world/' file.txt
127
128 By default 'sed' prints all processed input (except input that has
129 been modified/deleted by commands such as 'd'). Use '-n' to suppress
130 output, and the 'p' command to print specific lines. The following
131 command prints only line 45 of the input file:
132
133 sed -n '45p' file.txt
134
135 'sed' treats multiple input files as one long stream. The following
136 example prints the first line of the first file ('one.txt') and the last
137 line of the last file ('three.txt'). Use '-s' to reverse this behavior.
138
139 sed -n '1p ; $p' one.txt two.txt three.txt
140
141 Without '-e' or '-f' options, 'sed' uses the first non-option
142 parameter as the SCRIPT, and the following non-option parameters as
143 input files. If '-e' or '-f' options are used to specify a SCRIPT, all
144 non-option parameters are taken as input files. Options '-e' and '-f'
145 can be combined, and can appear multiple times (in which case the final
146 effective SCRIPT will be concatenation of all the individual SCRIPTs).
147
148 The following examples are equivalent:
149
150 sed 's/hello/world/' input.txt > output.txt
151
152 sed -e 's/hello/world/' input.txt > output.txt
153 sed --expression='s/hello/world/' input.txt > output.txt
154
155 echo 's/hello/world/' > myscript.sed
156 sed -f myscript.sed input.txt > output.txt
157 sed --file=myscript.sed input.txt > output.txt
158
159 2.2 Command-Line Options
160 ========================
161
162 The full format for invoking 'sed' is:
163
164 sed OPTIONS... [SCRIPT] [INPUTFILE...]
165
166 'sed' may be invoked with the following command-line options:
167
168 '--version'
169 Print out the version of 'sed' that is being run and a copyright
170 notice, then exit.
171
172 '--help'
173 Print a usage message briefly summarizing these command-line
174 options and the bug-reporting address, then exit.
175
176 '-n'
177 '--quiet'
178 '--silent'
179 By default, 'sed' prints out the pattern space at the end of each
180 cycle through the script (*note How 'sed' works: Execution Cycle.).
181 These options disable this automatic printing, and 'sed' only
182 produces output when explicitly told to via the 'p' command.
183
184 '--debug'
185 Print the input sed program in canonical form, and annotate program
186 execution.
187 $ echo 1 | sed '\%1%s21232'
188 3
189
190 $ echo 1 | sed --debug '\%1%s21232'
191 SED PROGRAM:
192 /1/ s/1/3/
193 INPUT: 'STDIN' line 1
194 PATTERN: 1
195 COMMAND: /1/ s/1/3/
196 PATTERN: 3
197 END-OF-CYCLE:
198 3
199
200 '-e SCRIPT'
201 '--expression=SCRIPT'
202 Add the commands in SCRIPT to the set of commands to be run while
203 processing the input.
204
205 '-f SCRIPT-FILE'
206 '--file=SCRIPT-FILE'
207 Add the commands contained in the file SCRIPT-FILE to the set of
208 commands to be run while processing the input.
209
210 '-i[SUFFIX]'
211 '--in-place[=SUFFIX]'
212 This option specifies that files are to be edited in-place. GNU
213 'sed' does this by creating a temporary file and sending output to
214 this file rather than to the standard output.(1).
215
216 This option implies '-s'.
217
218 When the end of the file is reached, the temporary file is renamed
219 to the output file's original name. The extension, if supplied, is
220 used to modify the name of the old file before renaming the
221 temporary file, thereby making a backup copy(2)).
222
223 This rule is followed: if the extension doesn't contain a '*', then
224 it is appended to the end of the current filename as a suffix; if
225 the extension does contain one or more '*' characters, then _each_
226 asterisk is replaced with the current filename. This allows you to
227 add a prefix to the backup file, instead of (or in addition to) a
228 suffix, or even to place backup copies of the original files into
229 another directory (provided the directory already exists).
230
231 If no extension is supplied, the original file is overwritten
232 without making a backup.
233
234 Because '-i' takes an optional argument, it should not be followed
235 by other short options:
236 'sed -Ei '...' FILE'
237 Same as '-E -i' with no backup suffix - 'FILE' will be edited
238 in-place without creating a backup.
239
240 'sed -iE '...' FILE'
241 This is equivalent to '--in-place=E', creating 'FILEE' as
242 backup of 'FILE'
243
244 Be cautious of using '-n' with '-i': the former disables automatic
245 printing of lines and the latter changes the file in-place without
246 a backup. Used carelessly (and without an explicit 'p' command),
247 the output file will be empty:
248 # WRONG USAGE: 'FILE' will be truncated.
249 sed -ni 's/foo/bar/' FILE
250
251 '-l N'
252 '--line-length=N'
253 Specify the default line-wrap length for the 'l' command. A length
254 of 0 (zero) means to never wrap long lines. If not specified, it
255 is taken to be 70.
256
257 '--posix'
258 GNU 'sed' includes several extensions to POSIX sed. In order to
259 simplify writing portable scripts, this option disables all the
260 extensions that this manual documents, including additional
261 commands. Most of the extensions accept 'sed' programs that are
262 outside the syntax mandated by POSIX, but some of them (such as the
263 behavior of the 'N' command described in *note Reporting Bugs::)
264 actually violate the standard. If you want to disable only the
265 latter kind of extension, you can set the 'POSIXLY_CORRECT'
266 variable to a non-empty value.
267
268 '-b'
269 '--binary'
270 This option is available on every platform, but is only effective
271 where the operating system makes a distinction between text files
272 and binary files. When such a distinction is made--as is the case
273 for MS-DOS, Windows, Cygwin--text files are composed of lines
274 separated by a carriage return _and_ a line feed character, and
275 'sed' does not see the ending CR. When this option is specified,
276 'sed' will open input files in binary mode, thus not requesting
277 this special processing and considering lines to end at a line
278 feed.
279
280 '--follow-symlinks'
281 This option is available only on platforms that support symbolic
282 links and has an effect only if option '-i' is specified. In this
283 case, if the file that is specified on the command line is a
284 symbolic link, 'sed' will follow the link and edit the ultimate
285 destination of the link. The default behavior is to break the
286 symbolic link, so that the link destination will not be modified.
287
288 '-E'
289 '-r'
290 '--regexp-extended'
291 Use extended regular expressions rather than basic regular
292 expressions. Extended regexps are those that 'egrep' accepts; they
293 can be clearer because they usually have fewer backslashes.
294 Historically this was a GNU extension, but the '-E' extension has
295 since been added to the POSIX standard
296 (http://austingroupbugs.net/view.php?id=528), so use '-E' for
297 portability. GNU sed has accepted '-E' as an undocumented option
298 for years, and *BSD seds have accepted '-E' for years as well, but
299 scripts that use '-E' might not port to other older systems. *Note
300 Extended regular expressions: ERE syntax.
301
302 '-s'
303 '--separate'
304 By default, 'sed' will consider the files specified on the command
305 line as a single continuous long stream. This GNU 'sed' extension
306 allows the user to consider them as separate files: range addresses
307 (such as '/abc/,/def/') are not allowed to span several files, line
308 numbers are relative to the start of each file, '$' refers to the
309 last line of each file, and files invoked from the 'R' commands are
310 rewound at the start of each file.
311
312 '--sandbox'
313 In sandbox mode, 'e/w/r' commands are rejected - programs
314 containing them will be aborted without being run. Sandbox mode
315 ensures 'sed' operates only on the input files designated on the
316 command line, and cannot run external programs.
317
318 '-u'
319 '--unbuffered'
320 Buffer both input and output as minimally as practical. (This is
321 particularly useful if the input is coming from the likes of 'tail
322 -f', and you wish to see the transformed output as soon as
323 possible.)
324
325 '-z'
326 '--null-data'
327 '--zero-terminated'
328 Treat the input as a set of lines, each terminated by a zero byte
329 (the ASCII 'NUL' character) instead of a newline. This option can
330 be used with commands like 'sort -z' and 'find -print0' to process
331 arbitrary file names.
332
333 If no '-e', '-f', '--expression', or '--file' options are given on
334 the command-line, then the first non-option argument on the command line
335 is taken to be the SCRIPT to be executed.
336
337 If any command-line parameters remain after processing the above,
338 these parameters are interpreted as the names of input files to be
339 processed. A file name of '-' refers to the standard input stream. The
340 standard input will be processed if no file names are specified.
341
342 ---------- Footnotes ----------
343
344 (1) This applies to commands such as '=', 'a', 'c', 'i', 'l', 'p'.
345 You can still write to the standard output by using the 'w' or 'W'
346 commands together with the '/dev/stdout' special file
347
348 (2) Note that GNU 'sed' creates the backup file whether or not any
349 output is actually changed.
350
351 2.3 Exit status
352 ===============
353
354 An exit status of zero indicates success, and a nonzero value indicates
355 failure. GNU 'sed' returns the following exit status error values:
356
357 0
358 Successful completion.
359
360 1
361 Invalid command, invalid syntax, invalid regular expression or a
362 GNU 'sed' extension command used with '--posix'.
363
364 2
365 One or more of the input file specified on the command line could
366 not be opened (e.g. if a file is not found, or read permission is
367 denied). Processing continued with other files.
368
369 4
370 An I/O error, or a serious processing error during runtime, GNU
371 'sed' aborted immediately.
372
373 Additionally, the commands 'q' and 'Q' can be used to terminate 'sed'
374 with a custom exit code value (this is a GNU 'sed' extension):
375
376 $ echo | sed 'Q42' ; echo $?
377 42
378
379 3 'sed' scripts
380 ***************
381
382 3.1 'sed' script overview
383 =========================
384
385 A 'sed' program consists of one or more 'sed' commands, passed in by one
386 or more of the '-e', '-f', '--expression', and '--file' options, or the
387 first non-option argument if zero of these options are used. This
388 document will refer to "the" 'sed' script; this is understood to mean
389 the in-order concatenation of all of the SCRIPTs and SCRIPT-FILEs passed
390 in. *Note Overview::.
391
392 'sed' commands follow this syntax:
393
394 [addr]X[options]
395
396 X is a single-letter 'sed' command. '[addr]' is an optional line
397 address. If '[addr]' is specified, the command X will be executed only
398 on the matched lines. '[addr]' can be a single line number, a regular
399 expression, or a range of lines (*note sed addresses::). Additional
400 '[options]' are used for some 'sed' commands.
401
402 The following example deletes lines 30 to 35 in the input. '30,35'
403 is an address range. 'd' is the delete command:
404
405 sed '30,35d' input.txt > output.txt
406
407 The following example prints all input until a line starting with the
408 word 'foo' is found. If such line is found, 'sed' will terminate with
409 exit status 42. If such line was not found (and no other error
410 occurred), 'sed' will exit with status 0. '/^foo/' is a
411 regular-expression address. 'q' is the quit command. '42' is the
412 command option.
413
414 sed '/^foo/q42' input.txt > output.txt
415
416 Commands within a SCRIPT or SCRIPT-FILE can be separated by
417 semicolons (';') or newlines (ASCII 10). Multiple scripts can be
418 specified with '-e' or '-f' options.
419
420 The following examples are all equivalent. They perform two 'sed'
421 operations: deleting any lines matching the regular expression '/^foo/',
422 and replacing all occurrences of the string 'hello' with 'world':
423
424 sed '/^foo/d ; s/hello/world/' input.txt > output.txt
425
426 sed -e '/^foo/d' -e 's/hello/world/' input.txt > output.txt
427
428 echo '/^foo/d' > script.sed
429 echo 's/hello/world/' >> script.sed
430 sed -f script.sed input.txt > output.txt
431
432 echo 's/hello/world/' > script2.sed
433 sed -e '/^foo/d' -f script2.sed input.txt > output.txt
434
435 Commands 'a', 'c', 'i', due to their syntax, cannot be followed by
436 semicolons working as command separators and thus should be terminated
437 with newlines or be placed at the end of a SCRIPT or SCRIPT-FILE.
438 Commands can also be preceded with optional non-significant whitespace
439 characters. *Note Multiple commands syntax::.
440
441 3.2 'sed' commands summary
442 ==========================
443
444 The following commands are supported in GNU 'sed'. Some are standard
445 POSIX commands, while other are GNU extensions. Details and examples
446 for each command are in the following sections. (Mnemonics) are shown
447 in parentheses.
448
449 'a\'
450 'TEXT'
451 Append TEXT after a line.
452
453 'a TEXT'
454 Append TEXT after a line (alternative syntax).
455
456 'b LABEL'
457 Branch unconditionally to LABEL. The LABEL may be omitted, in
458 which case the next cycle is started.
459
460 'c\'
461 'TEXT'
462 Replace (change) lines with TEXT.
463
464 'c TEXT'
465 Replace (change) lines with TEXT (alternative syntax).
466
467 'd'
468 Delete the pattern space; immediately start next cycle.
469
470 'D'
471 If pattern space contains newlines, delete text in the pattern
472 space up to the first newline, and restart cycle with the resultant
473 pattern space, without reading a new line of input.
474
475 If pattern space contains no newline, start a normal new cycle as
476 if the 'd' command was issued.
477
478 'e'
479 Executes the command that is found in pattern space and replaces
480 the pattern space with the output; a trailing newline is
481 suppressed.
482
483 'e COMMAND'
484 Executes COMMAND and sends its output to the output stream. The
485 command can run across multiple lines, all but the last ending with
486 a back-slash.
487
488 'F'
489 (filename) Print the file name of the current input file (with a
490 trailing newline).
491
492 'g'
493 Replace the contents of the pattern space with the contents of the
494 hold space.
495
496 'G'
497 Append a newline to the contents of the pattern space, and then
498 append the contents of the hold space to that of the pattern space.
499
500 'h'
501 (hold) Replace the contents of the hold space with the contents of
502 the pattern space.
503
504 'H'
505 Append a newline to the contents of the hold space, and then append
506 the contents of the pattern space to that of the hold space.
507
508 'i\'
509 'TEXT'
510 insert TEXT before a line.
511
512 'i TEXT'
513 insert TEXT before a line (alternative syntax).
514
515 'l'
516 Print the pattern space in an unambiguous form.
517
518 'n'
519 (next) If auto-print is not disabled, print the pattern space,
520 then, regardless, replace the pattern space with the next line of
521 input. If there is no more input then 'sed' exits without
522 processing any more commands.
523
524 'N'
525 Add a newline to the pattern space, then append the next line of
526 input to the pattern space. If there is no more input then 'sed'
527 exits without processing any more commands.
528
529 'p'
530 Print the pattern space.
531
532 'P'
533 Print the pattern space, up to the first <newline>.
534
535 'q[EXIT-CODE]'
536 (quit) Exit 'sed' without processing any more commands or input.
537
538 'Q[EXIT-CODE]'
539 (quit) This command is the same as 'q', but will not print the
540 contents of pattern space. Like 'q', it provides the ability to
541 return an exit code to the caller.
542
543 'r filename'
544 Reads file FILENAME.
545
546 'R filename'
547 Queue a line of FILENAME to be read and inserted into the output
548 stream at the end of the current cycle, or when the next input line
549 is read.
550
551 's/REGEXP/REPLACEMENT/[FLAGS]'
552 (substitute) Match the regular-expression against the content of
553 the pattern space. If found, replace matched string with
554 REPLACEMENT.
555
556 't LABEL'
557 (test) Branch to LABEL only if there has been a successful
558 's'ubstitution since the last input line was read or conditional
559 branch was taken. The LABEL may be omitted, in which case the next
560 cycle is started.
561
562 'T LABEL'
563 (test) Branch to LABEL only if there have been no successful
564 's'ubstitutions since the last input line was read or conditional
565 branch was taken. The LABEL may be omitted, in which case the next
566 cycle is started.
567
568 'v [VERSION]'
569 (version) This command does nothing, but makes 'sed' fail if GNU
570 'sed' extensions are not supported, or if the requested version is
571 not available.
572
573 'w filename'
574 Write the pattern space to FILENAME.
575
576 'W filename'
577 Write to the given filename the portion of the pattern space up to
578 the first newline
579
580 'x'
581 Exchange the contents of the hold and pattern spaces.
582
583 'y/src/dst/'
584 Transliterate any characters in the pattern space which match any
585 of the SOURCE-CHARS with the corresponding character in DEST-CHARS.
586
587 'z'
588 (zap) This command empties the content of pattern space.
589
590 '#'
591 A comment, until the next newline.
592
593 '{ CMD ; CMD ... }'
594 Group several commands together.
595
596 '='
597 Print the current input line number (with a trailing newline).
598
599 ': LABEL'
600 Specify the location of LABEL for branch commands ('b', 't', 'T').
601
602 3.3 The 's' Command
603 ===================
604
605 The 's' command (as in substitute) is probably the most important in
606 'sed' and has a lot of different options. The syntax of the 's' command
607 is 's/REGEXP/REPLACEMENT/FLAGS'.
608
609 Its basic concept is simple: the 's' command attempts to match the
610 pattern space against the supplied regular expression REGEXP; if the
611 match is successful, then that portion of the pattern space which was
612 matched is replaced with REPLACEMENT.
613
614 For details about REGEXP syntax *note Regular Expression Addresses:
615 Regexp Addresses.
616
617 The REPLACEMENT can contain '\N' (N being a number from 1 to 9,
618 inclusive) references, which refer to the portion of the match which is
619 contained between the Nth '\(' and its matching '\)'. Also, the
620 REPLACEMENT can contain unescaped '&' characters which reference the
621 whole matched portion of the pattern space.
622
623 The '/' characters may be uniformly replaced by any other single
624 character within any given 's' command. The '/' character (or whatever
625 other character is used in its stead) can appear in the REGEXP or
626 REPLACEMENT only if it is preceded by a '\' character.
627
628 Finally, as a GNU 'sed' extension, you can include a special sequence
629 made of a backslash and one of the letters 'L', 'l', 'U', 'u', or 'E'.
630 The meaning is as follows:
631
632 '\L'
633 Turn the replacement to lowercase until a '\U' or '\E' is found,
634
635 '\l'
636 Turn the next character to lowercase,
637
638 '\U'
639 Turn the replacement to uppercase until a '\L' or '\E' is found,
640
641 '\u'
642 Turn the next character to uppercase,
643
644 '\E'
645 Stop case conversion started by '\L' or '\U'.
646
647 When the 'g' flag is being used, case conversion does not propagate
648 from one occurrence of the regular expression to another. For example,
649 when the following command is executed with 'a-b-' in pattern space:
650 s/\(b\?\)-/x\u\1/g
651
652 the output is 'axxB'. When replacing the first '-', the '\u' sequence
653 only affects the empty replacement of '\1'. It does not affect the 'x'
654 character that is added to pattern space when replacing 'b-' with 'xB'.
655
656 On the other hand, '\l' and '\u' do affect the remainder of the
657 replacement text if they are followed by an empty substitution. With
658 'a-b-' in pattern space, the following command:
659 s/\(b\?\)-/\u\1x/g
660
661 will replace '-' with 'X' (uppercase) and 'b-' with 'Bx'. If this
662 behavior is undesirable, you can prevent it by adding a '\E'
663 sequence--after '\1' in this case.
664
665 To include a literal '\', '&', or newline in the final replacement,
666 be sure to precede the desired '\', '&', or newline in the REPLACEMENT
667 with a '\'.
668
669 The 's' command can be followed by zero or more of the following
670 FLAGS:
671
672 'g'
673 Apply the replacement to _all_ matches to the REGEXP, not just the
674 first.
675
676 'NUMBER'
677 Only replace the NUMBERth match of the REGEXP.
678
679 interaction in 's' command Note: the POSIX standard does not
680 specify what should happen when you mix the 'g' and NUMBER
681 modifiers, and currently there is no widely agreed upon meaning
682 across 'sed' implementations. For GNU 'sed', the interaction is
683 defined to be: ignore matches before the NUMBERth, and then match
684 and replace all matches from the NUMBERth on.
685
686 'p'
687 If the substitution was made, then print the new pattern space.
688
689 Note: when both the 'p' and 'e' options are specified, the relative
690 ordering of the two produces very different results. In general,
691 'ep' (evaluate then print) is what you want, but operating the
692 other way round can be useful for debugging. For this reason, the
693 current version of GNU 'sed' interprets specially the presence of
694 'p' options both before and after 'e', printing the pattern space
695 before and after evaluation, while in general flags for the 's'
696 command show their effect just once. This behavior, although
697 documented, might change in future versions.
698
699 'w FILENAME'
700 If the substitution was made, then write out the result to the
701 named file. As a GNU 'sed' extension, two special values of
702 FILENAME are supported: '/dev/stderr', which writes the result to
703 the standard error, and '/dev/stdout', which writes to the standard
704 output.(1)
705
706 'e'
707 This command allows one to pipe input from a shell command into
708 pattern space. If a substitution was made, the command that is
709 found in pattern space is executed and pattern space is replaced
710 with its output. A trailing newline is suppressed; results are
711 undefined if the command to be executed contains a NUL character.
712 This is a GNU 'sed' extension.
713
714 'I'
715 'i'
716 The 'I' modifier to regular-expression matching is a GNU extension
717 which makes 'sed' match REGEXP in a case-insensitive manner.
718
719 'M'
720 'm'
721 The 'M' modifier to regular-expression matching is a GNU 'sed'
722 extension which directs GNU 'sed' to match the regular expression
723 in 'multi-line' mode. The modifier causes '^' and '$' to match
724 respectively (in addition to the normal behavior) the empty string
725 after a newline, and the empty string before a newline. There are
726 special character sequences ('\`' and '\'') which always match the
727 beginning or the end of the buffer. In addition, the period
728 character does not match a new-line character in multi-line mode.
729
730 ---------- Footnotes ----------
731
732 (1) This is equivalent to 'p' unless the '-i' option is being used.
733
734 3.4 Often-Used Commands
735 =======================
736
737 If you use 'sed' at all, you will quite likely want to know these
738 commands.
739
740 '#'
741 [No addresses allowed.]
742
743 The '#' character begins a comment; the comment continues until the
744 next newline.
745
746 If you are concerned about portability, be aware that some
747 implementations of 'sed' (which are not POSIX conforming) may only
748 support a single one-line comment, and then only when the very
749 first character of the script is a '#'.
750
751 Warning: if the first two characters of the 'sed' script are '#n',
752 then the '-n' (no-autoprint) option is forced. If you want to put
753 a comment in the first line of your script and that comment begins
754 with the letter 'n' and you do not want this behavior, then be sure
755 to either use a capital 'N', or place at least one space before the
756 'n'.
757
758 'q [EXIT-CODE]'
759 Exit 'sed' without processing any more commands or input.
760
761 Example: stop after printing the second line:
762 $ seq 3 | sed 2q
763 1
764 2
765
766 This command accepts only one address. Note that the current
767 pattern space is printed if auto-print is not disabled with the
768 '-n' options. The ability to return an exit code from the 'sed'
769 script is a GNU 'sed' extension.
770
771 See also the GNU 'sed' extension 'Q' command which quits silently
772 without printing the current pattern space.
773
774 'd'
775 Delete the pattern space; immediately start next cycle.
776
777 Example: delete the second input line:
778 $ seq 3 | sed 2d
779 1
780 3
781
782 'p'
783 Print out the pattern space (to the standard output). This command
784 is usually only used in conjunction with the '-n' command-line
785 option.
786
787 Example: print only the second input line:
788 $ seq 3 | sed -n 2p
789 2
790
791 'n'
792 If auto-print is not disabled, print the pattern space, then,
793 regardless, replace the pattern space with the next line of input.
794 If there is no more input then 'sed' exits without processing any
795 more commands.
796
797 This command is useful to skip lines (e.g. process every Nth
798 line).
799
800 Example: perform substitution on every 3rd line (i.e. two 'n'
801 commands skip two lines):
802 $ seq 6 | sed 'n;n;s/./x/'
803 1
804 2
805 x
806 4
807 5
808 x
809
810 GNU 'sed' provides an extension address syntax of FIRST~STEP to
811 achieve the same result:
812
813 $ seq 6 | sed '0~3s/./x/'
814 1
815 2
816 x
817 4
818 5
819 x
820
821 '{ COMMANDS }'
822 A group of commands may be enclosed between '{' and '}' characters.
823 This is particularly useful when you want a group of commands to be
824 triggered by a single address (or address-range) match.
825
826 Example: perform substitution then print the second input line:
827 $ seq 3 | sed -n '2{s/2/X/ ; p}'
828 X
829
830 3.5 Less Frequently-Used Commands
831 =================================
832
833 Though perhaps less frequently used than those in the previous section,
834 some very small yet useful 'sed' scripts can be built with these
835 commands.
836
837 'y/SOURCE-CHARS/DEST-CHARS/'
838 Transliterate any characters in the pattern space which match any
839 of the SOURCE-CHARS with the corresponding character in DEST-CHARS.
840
841 Example: transliterate 'a-j' into '0-9':
842 $ echo hello world | sed 'y/abcdefghij/0123456789/'
843 74llo worl3
844
845 (The '/' characters may be uniformly replaced by any other single
846 character within any given 'y' command.)
847
848 Instances of the '/' (or whatever other character is used in its
849 stead), '\', or newlines can appear in the SOURCE-CHARS or
850 DEST-CHARS lists, provide that each instance is escaped by a '\'.
851 The SOURCE-CHARS and DEST-CHARS lists _must_ contain the same
852 number of characters (after de-escaping).
853
854 See the 'tr' command from GNU coreutils for similar functionality.
855
856 'a TEXT'
857 Appending TEXT after a line. This is a GNU extension to the
858 standard 'a' command - see below for details.
859
860 Example: Add the word 'hello' after the second line:
861 $ seq 3 | sed '2a hello'
862 1
863 2
864 hello
865 3
866
867 Leading whitespace after the 'a' command is ignored. The text to
868 add is read until the end of the line.
869
870 'a\'
871 'TEXT'
872 Appending TEXT after a line.
873
874 Example: Add 'hello' after the second line (-| indicates printed
875 output lines):
876 $ seq 3 | sed '2a\
877 hello'
878 -|1
879 -|2
880 -|hello
881 -|3
882
883 The 'a' command queues the lines of text which follow this command
884 (each but the last ending with a '\', which are removed from the
885 output) to be output at the end of the current cycle, or when the
886 next input line is read.
887
888 As a GNU extension, this command accepts two addresses.
889
890 Escape sequences in TEXT are processed, so you should use '\\' in
891 TEXT to print a single backslash.
892
893 The commands resume after the last line without a backslash ('\') -
894 'world' in the following example:
895 $ seq 3 | sed '2a\
896 hello\
897 world
898 3s/./X/'
899 -|1
900 -|2
901 -|hello
902 -|world
903 -|X
904
905 As a GNU extension, the 'a' command and TEXT can be separated into
906 two '-e' parameters, enabling easier scripting:
907 $ seq 3 | sed -e '2a\' -e hello
908 1
909 2
910 hello
911 3
912
913 $ sed -e '2a\' -e "$VAR"
914
915 'i TEXT'
916 insert TEXT before a line. This is a GNU extension to the standard
917 'i' command - see below for details.
918
919 Example: Insert the word 'hello' before the second line:
920 $ seq 3 | sed '2i hello'
921 1
922 hello
923 2
924 3
925
926 Leading whitespace after the 'i' command is ignored. The text to
927 add is read until the end of the line.
928
929 'i\'
930 'TEXT'
931 Immediately output the lines of text which follow this command.
932
933 Example: Insert 'hello' before the second line (-| indicates
934 printed output lines):
935 $ seq 3 | sed '2i\
936 hello'
937 -|1
938 -|hello
939 -|2
940 -|3
941
942 As a GNU extension, this command accepts two addresses.
943
944 Escape sequences in TEXT are processed, so you should use '\\' in
945 TEXT to print a single backslash.
946
947 The commands resume after the last line without a backslash ('\') -
948 'world' in the following example:
949 $ seq 3 | sed '2i\
950 hello\
951 world
952 s/./X/'
953 -|X
954 -|hello
955 -|world
956 -|X
957 -|X
958
959 As a GNU extension, the 'i' command and TEXT can be separated into
960 two '-e' parameters, enabling easier scripting:
961 $ seq 3 | sed -e '2i\' -e hello
962 1
963 hello
964 2
965 3
966
967 $ sed -e '2i\' -e "$VAR"
968
969 'c TEXT'
970 Replaces the line(s) with TEXT. This is a GNU extension to the
971 standard 'c' command - see below for details.
972
973 Example: Replace the 2nd to 9th lines with the word 'hello':
974 $ seq 10 | sed '2,9c hello'
975 1
976 hello
977 10
978
979 Leading whitespace after the 'c' command is ignored. The text to
980 add is read until the end of the line.
981
982 'c\'
983 'TEXT'
984 Delete the lines matching the address or address-range, and output
985 the lines of text which follow this command.
986
987 Example: Replace 2nd to 4th lines with the words 'hello' and
988 'world' (-| indicates printed output lines):
989 $ seq 5 | sed '2,4c\
990 hello\
991 world'
992 -|1
993 -|hello
994 -|world
995 -|5
996
997 If no addresses are given, each line is replaced.
998
999 A new cycle is started after this command is done, since the
1000 pattern space will have been deleted. In the following example,
1001 the 'c' starts a new cycle and the substitution command is not
1002 performed on the replaced text:
1003
1004 $ seq 3 | sed '2c\
1005 hello
1006 s/./X/'
1007 -|X
1008 -|hello
1009 -|X
1010
1011 As a GNU extension, the 'c' command and TEXT can be separated into
1012 two '-e' parameters, enabling easier scripting:
1013 $ seq 3 | sed -e '2c\' -e hello
1014 1
1015 hello
1016 3
1017
1018 $ sed -e '2c\' -e "$VAR"
1019
1020 '='
1021 Print out the current input line number (with a trailing newline).
1022
1023 $ printf '%s\n' aaa bbb ccc | sed =
1024 1
1025 aaa
1026 2
1027 bbb
1028 3
1029 ccc
1030
1031 As a GNU extension, this command accepts two addresses.
1032
1033 'l N'
1034 Print the pattern space in an unambiguous form: non-printable
1035 characters (and the '\' character) are printed in C-style escaped
1036 form; long lines are split, with a trailing '\' character to
1037 indicate the split; the end of each line is marked with a '$'.
1038
1039 N specifies the desired line-wrap length; a length of 0 (zero)
1040 means to never wrap long lines. If omitted, the default as
1041 specified on the command line is used. The N parameter is a GNU
1042 'sed' extension.
1043
1044 'r FILENAME'
1045
1046 Reads file FILENAME. Example:
1047
1048 $ seq 3 | sed '2r/etc/hostname'
1049 1
1050 2
1051 fencepost.gnu.org
1052 3
1053
1054 Queue the contents of FILENAME to be read and inserted into the
1055 output stream at the end of the current cycle, or when the next
1056 input line is read. Note that if FILENAME cannot be read, it is
1057 treated as if it were an empty file, without any error indication.
1058
1059 As a GNU 'sed' extension, the special value '/dev/stdin' is
1060 supported for the file name, which reads the contents of the
1061 standard input.
1062
1063 As a GNU extension, this command accepts two addresses. The file
1064 will then be reread and inserted on each of the addressed lines.
1065
1066 'w FILENAME'
1067 Write the pattern space to FILENAME. As a GNU 'sed' extension, two
1068 special values of FILENAME are supported: '/dev/stderr', which
1069 writes the result to the standard error, and '/dev/stdout', which
1070 writes to the standard output.(1)
1071
1072 The file will be created (or truncated) before the first input line
1073 is read; all 'w' commands (including instances of the 'w' flag on
1074 successful 's' commands) which refer to the same FILENAME are
1075 output without closing and reopening the file.
1076
1077 'D'
1078 If pattern space contains no newline, start a normal new cycle as
1079 if the 'd' command was issued. Otherwise, delete text in the
1080 pattern space up to the first newline, and restart cycle with the
1081 resultant pattern space, without reading a new line of input.
1082
1083 'N'
1084 Add a newline to the pattern space, then append the next line of
1085 input to the pattern space. If there is no more input then 'sed'
1086 exits without processing any more commands.
1087
1088 When '-z' is used, a zero byte (the ascii 'NUL' character) is added
1089 between the lines (instead of a new line).
1090
1091 By default 'sed' does not terminate if there is no 'next' input
1092 line. This is a GNU extension which can be disabled with
1093 '--posix'. *Note N command on the last line: N_command_last_line.
1094
1095 'P'
1096 Print out the portion of the pattern space up to the first newline.
1097
1098 'h'
1099 Replace the contents of the hold space with the contents of the
1100 pattern space.
1101
1102 'H'
1103 Append a newline to the contents of the hold space, and then append
1104 the contents of the pattern space to that of the hold space.
1105
1106 'g'
1107 Replace the contents of the pattern space with the contents of the
1108 hold space.
1109
1110 'G'
1111 Append a newline to the contents of the pattern space, and then
1112 append the contents of the hold space to that of the pattern space.
1113
1114 'x'
1115 Exchange the contents of the hold and pattern spaces.
1116
1117 ---------- Footnotes ----------
1118
1119 (1) This is equivalent to 'p' unless the '-i' option is being used.
1120
1121 3.6 Commands for 'sed' gurus
1122 ============================
1123
1124 In most cases, use of these commands indicates that you are probably
1125 better off programming in something like 'awk' or Perl. But
1126 occasionally one is committed to sticking with 'sed', and these commands
1127 can enable one to write quite convoluted scripts.
1128
1129 ': LABEL'
1130 [No addresses allowed.]
1131
1132 Specify the location of LABEL for branch commands. In all other
1133 respects, a no-op.
1134
1135 'b LABEL'
1136 Unconditionally branch to LABEL. The LABEL may be omitted, in
1137 which case the next cycle is started.
1138
1139 't LABEL'
1140 Branch to LABEL only if there has been a successful 's'ubstitution
1141 since the last input line was read or conditional branch was taken.
1142 The LABEL may be omitted, in which case the next cycle is started.
1143
1144 3.7 Commands Specific to GNU 'sed'
1145 ==================================
1146
1147 These commands are specific to GNU 'sed', so you must use them with care
1148 and only when you are sure that hindering portability is not evil. They
1149 allow you to check for GNU 'sed' extensions or to do tasks that are
1150 required quite often, yet are unsupported by standard 'sed's.
1151
1152 'e [COMMAND]'
1153 This command allows one to pipe input from a shell command into
1154 pattern space. Without parameters, the 'e' command executes the
1155 command that is found in pattern space and replaces the pattern
1156 space with the output; a trailing newline is suppressed.
1157
1158 If a parameter is specified, instead, the 'e' command interprets it
1159 as a command and sends its output to the output stream. The
1160 command can run across multiple lines, all but the last ending with
1161 a back-slash.
1162
1163 In both cases, the results are undefined if the command to be
1164 executed contains a NUL character.
1165
1166 Note that, unlike the 'r' command, the output of the command will
1167 be printed immediately; the 'r' command instead delays the output
1168 to the end of the current cycle.
1169
1170 'F'
1171 Print out the file name of the current input file (with a trailing
1172 newline).
1173
1174 'Q [EXIT-CODE]'
1175 This command accepts only one address.
1176
1177 This command is the same as 'q', but will not print the contents of
1178 pattern space. Like 'q', it provides the ability to return an exit
1179 code to the caller.
1180
1181 This command can be useful because the only alternative ways to
1182 accomplish this apparently trivial function are to use the '-n'
1183 option (which can unnecessarily complicate your script) or
1184 resorting to the following snippet, which wastes time by reading
1185 the whole file without any visible effect:
1186
1187 :eat
1188 $d Quit silently on the last line
1189 N Read another line, silently
1190 g Overwrite pattern space each time to save memory
1191 b eat
1192
1193 'R FILENAME'
1194 Queue a line of FILENAME to be read and inserted into the output
1195 stream at the end of the current cycle, or when the next input line
1196 is read. Note that if FILENAME cannot be read, or if its end is
1197 reached, no line is appended, without any error indication.
1198
1199 As with the 'r' command, the special value '/dev/stdin' is
1200 supported for the file name, which reads a line from the standard
1201 input.
1202
1203 'T LABEL'
1204 Branch to LABEL only if there have been no successful
1205 's'ubstitutions since the last input line was read or conditional
1206 branch was taken. The LABEL may be omitted, in which case the next
1207 cycle is started.
1208
1209 'v VERSION'
1210 This command does nothing, but makes 'sed' fail if GNU 'sed'
1211 extensions are not supported, simply because other versions of
1212 'sed' do not implement it. In addition, you can specify the
1213 version of 'sed' that your script requires, such as '4.0.5'. The
1214 default is '4.0' because that is the first version that implemented
1215 this command.
1216
1217 This command enables all GNU extensions even if 'POSIXLY_CORRECT'
1218 is set in the environment.
1219
1220 'W FILENAME'
1221 Write to the given filename the portion of the pattern space up to
1222 the first newline. Everything said under the 'w' command about
1223 file handling holds here too.
1224
1225 'z'
1226 This command empties the content of pattern space. It is usually
1227 the same as 's/.*//', but is more efficient and works in the
1228 presence of invalid multibyte sequences in the input stream. POSIX
1229 mandates that such sequences are _not_ matched by '.', so that
1230 there is no portable way to clear 'sed''s buffers in the middle of
1231 the script in most multibyte locales (including UTF-8 locales).
1232
1233 3.8 Multiple commands syntax
1234 ============================
1235
1236 There are several methods to specify multiple commands in a 'sed'
1237 program.
1238
1239 Using newlines is most natural when running a sed script from a file
1240 (using the '-f' option).
1241
1242 On the command line, all 'sed' commands may be separated by newlines.
1243 Alternatively, you may specify each command as an argument to an '-e'
1244 option:
1245
1246 $ seq 6 | sed '1d
1247 3d
1248 5d'
1249 2
1250 4
1251 6
1252
1253 $ seq 6 | sed -e 1d -e 3d -e 5d
1254 2
1255 4
1256 6
1257
1258 A semicolon (';') may be used to separate most simple commands:
1259
1260 $ seq 6 | sed '1d;3d;5d'
1261 2
1262 4
1263 6
1264
1265 The '{','}','b','t','T',':' commands can be separated with a
1266 semicolon (this is a non-portable GNU 'sed' extension).
1267
1268 $ seq 4 | sed '{1d;3d}'
1269 2
1270 4
1271
1272 $ seq 6 | sed '{1d;3d};5d'
1273 2
1274 4
1275 6
1276
1277 Labels used in 'b','t','T',':' commands are read until a semicolon.
1278 Leading and trailing whitespace is ignored. In the examples below the
1279 label is 'x'. The first example works with GNU 'sed'. The second is a
1280 portable equivalent. For more information about branching and labels
1281 *note Branching and flow control::.
1282
1283 $ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d'
1284 1
1285 =2
1286
1287 $ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
1288 1
1289 =2
1290
1291 3.8.1 Commands Requiring a newline
1292 ----------------------------------
1293
1294 The following commands cannot be separated by a semicolon and require a
1295 newline:
1296
1297 'a','c','i' (append/change/insert)
1298
1299 All characters following 'a','c','i' commands are taken as the text
1300 to append/change/insert. Using a semicolon leads to undesirable
1301 results:
1302
1303 $ seq 2 | sed '1aHello ; 2d'
1304 1
1305 Hello ; 2d
1306 2
1307
1308 Separate the commands using '-e' or a newline:
1309
1310 $ seq 2 | sed -e 1aHello -e 2d
1311 1
1312 Hello
1313
1314 $ seq 2 | sed '1aHello
1315 2d'
1316 1
1317 Hello
1318
1319 Note that specifying the text to add ('Hello') immediately after
1320 'a','c','i' is itself a GNU 'sed' extension. A portable,
1321 POSIX-compliant alternative is:
1322
1323 $ seq 2 | sed '1a\
1324 Hello
1325 2d'
1326 1
1327 Hello
1328
1329 '#' (comment)
1330
1331 All characters following '#' until the next newline are ignored.
1332
1333 $ seq 3 | sed '# this is a comment ; 2d'
1334 1
1335 2
1336 3
1337
1338
1339 $ seq 3 | sed '# this is a comment
1340 2d'
1341 1
1342 3
1343
1344 'r','R','w','W' (reading and writing files)
1345
1346 The 'r','R','w','W' commands parse the filename until end of the
1347 line. If whitespace, comments or semicolons are found, they will
1348 be included in the filename, leading to unexpected results:
1349
1350 $ seq 2 | sed '1w hello.txt ; 2d'
1351 1
1352 2
1353
1354 $ ls -log
1355 total 4
1356 -rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
1357
1358 $ cat 'hello.txt ; 2d'
1359 1
1360
1361 Note that 'sed' silently ignores read/write errors in
1362 'r','R','w','W' commands (such as missing files). In the following
1363 example, 'sed' tries to read a file named ''hello.txt ; N''. The
1364 file is missing, and the error is silently ignored:
1365
1366 $ echo x | sed '1rhello.txt ; N'
1367 x
1368
1369 'e' (command execution)
1370
1371 Any characters following the 'e' command until the end of the line
1372 will be sent to the shell. If whitespace, comments or semicolons
1373 are found, they will be included in the shell command, leading to
1374 unexpected results:
1375
1376 $ echo a | sed '1e touch foo#bar'
1377 a
1378
1379 $ ls -1
1380 foo#bar
1381
1382 $ echo a | sed '1e touch foo ; s/a/b/'
1383 sh: 1: s/a/b/: not found
1384 a
1385
1386 's///[we]' (substitute with 'e' or 'w' flags)
1387
1388 In a substitution command, the 'w' flag writes the substitution
1389 result to a file, and the 'e' flag executes the subsitution result
1390 as a shell command. As with the 'r/R/w/W/e' commands, these must
1391 be terminated with a newline. If whitespace, comments or
1392 semicolons are found, they will be included in the shell command or
1393 filename, leading to unexpected results:
1394
1395 $ echo a | sed 's/a/b/w1.txt#foo'
1396 b
1397
1398 $ ls -1
1399 1.txt#foo
1400
1401 4 Addresses: selecting lines
1402 ****************************
1403
1404 4.1 Addresses overview
1405 ======================
1406
1407 Addresses determine on which line(s) the 'sed' command will be executed.
1408 The following command replaces the word 'hello' with 'world' only on
1409 line 144:
1410
1411 sed '144s/hello/world/' input.txt > output.txt
1412
1413 If no addresses are given, the command is performed on all lines.
1414 The following command replaces the word 'hello' with 'world' on all
1415 lines in the input file:
1416
1417 sed 's/hello/world/' input.txt > output.txt
1418
1419 Addresses can contain regular expressions to match lines based on
1420 content instead of line numbers. The following command replaces the
1421 word 'hello' with 'world' only in lines containing the word 'apple':
1422
1423 sed '/apple/s/hello/world/' input.txt > output.txt
1424
1425 An address range is specified with two addresses separated by a comma
1426 (','). Addresses can be numeric, regular expressions, or a mix of both.
1427 The following command replaces the word 'hello' with 'world' only in
1428 lines 4 to 17 (inclusive):
1429
1430 sed '4,17s/hello/world/' input.txt > output.txt
1431
1432 Appending the '!' character to the end of an address specification
1433 (before the command letter) negates the sense of the match. That is, if
1434 the '!' character follows an address or an address range, then only
1435 lines which do _not_ match the addresses will be selected. The
1436 following command replaces the word 'hello' with 'world' only in lines
1437 _not_ containing the word 'apple':
1438
1439 sed '/apple/!s/hello/world/' input.txt > output.txt
1440
1441 The following command replaces the word 'hello' with 'world' only in
1442 lines 1 to 3 and 18 till the last line of the input file (i.e.
1443 excluding lines 4 to 17):
1444
1445 sed '4,17!s/hello/world/' input.txt > output.txt
1446
1447 4.2 Selecting lines by numbers
1448 ==============================
1449
1450 Addresses in a 'sed' script can be in any of the following forms:
1451 'NUMBER'
1452 Specifying a line number will match only that line in the input.
1453 (Note that 'sed' counts lines continuously across all input files
1454 unless '-i' or '-s' options are specified.)
1455
1456 '$'
1457 This address matches the last line of the last file of input, or
1458 the last line of each file when the '-i' or '-s' options are
1459 specified.
1460
1461 'FIRST~STEP'
1462 This GNU extension matches every STEPth line starting with line
1463 FIRST. In particular, lines will be selected when there exists a
1464 non-negative N such that the current line-number equals FIRST + (N
1465 * STEP). Thus, one would use '1~2' to select the odd-numbered
1466 lines and '0~2' for even-numbered lines; to pick every third line
1467 starting with the second, '2~3' would be used; to pick every fifth
1468 line starting with the tenth, use '10~5'; and '50~0' is just an
1469 obscure way of saying '50'.
1470
1471 The following commands demonstrate the step address usage:
1472
1473 $ seq 10 | sed -n '0~4p'
1474 4
1475 8
1476
1477 $ seq 10 | sed -n '1~3p'
1478 1
1479 4
1480 7
1481 10
1482
1483 4.3 selecting lines by text matching
1484 ====================================
1485
1486 GNU 'sed' supports the following regular expression addresses. The
1487 default regular expression is *note Basic Regular Expression (BRE): BRE
1488 syntax. If '-E' or '-r' options are used, The regular expression should
1489 be in *note Extended Regular Expression (ERE): ERE syntax. syntax.
1490 *Note BRE vs ERE::.
1491
1492 '/REGEXP/'
1493 This will select any line which matches the regular expression
1494 REGEXP. If REGEXP itself includes any '/' characters, each must be
1495 escaped by a backslash ('\').
1496
1497 The following command prints lines in '/etc/passwd' which end with
1498 'bash'(1):
1499
1500 sed -n '/bash$/p' /etc/passwd
1501
1502 The empty regular expression '//' repeats the last regular
1503 expression match (the same holds if the empty regular expression is
1504 passed to the 's' command). Note that modifiers to regular
1505 expressions are evaluated when the regular expression is compiled,
1506 thus it is invalid to specify them together with the empty regular
1507 expression.
1508
1509 '\%REGEXP%'
1510 (The '%' may be replaced by any other single character.)
1511
1512 This also matches the regular expression REGEXP, but allows one to
1513 use a different delimiter than '/'. This is particularly useful if
1514 the REGEXP itself contains a lot of slashes, since it avoids the
1515 tedious escaping of every '/'. If REGEXP itself includes any
1516 delimiter characters, each must be escaped by a backslash ('\').
1517
1518 The following commands are equivalent. They print lines which
1519 start with '/home/alice/documents/':
1520
1521 sed -n '/^\/home\/alice\/documents\//p'
1522 sed -n '\%^/home/alice/documents/%p'
1523 sed -n '\;^/home/alice/documents/;p'
1524
1525 '/REGEXP/I'
1526 '\%REGEXP%I'
1527 The 'I' modifier to regular-expression matching is a GNU extension
1528 which causes the REGEXP to be matched in a case-insensitive manner.
1529
1530 In many other programming languages, a lower case 'i' is used for
1531 case-insensitive regular expression matching. However, in 'sed'
1532 the 'i' is used for the insert command (*note insert command::).
1533
1534 Observe the difference between the following examples.
1535
1536 In this example, '/b/I' is the address: regular expression with 'I'
1537 modifier. 'd' is the delete command:
1538
1539 $ printf "%s\n" a b c | sed '/b/Id'
1540 a
1541 c
1542
1543 Here, '/b/' is the address: a regular expression. 'i' is the
1544 insert command. 'd' is the value to insert. A line with 'd' is
1545 then inserted above the matched line:
1546
1547 $ printf "%s\n" a b c | sed '/b/id'
1548 a
1549 d
1550 b
1551 c
1552
1553 '/REGEXP/M'
1554 '\%REGEXP%M'
1555 The 'M' modifier to regular-expression matching is a GNU 'sed'
1556 extension which directs GNU 'sed' to match the regular expression
1557 in 'multi-line' mode. The modifier causes '^' and '$' to match
1558 respectively (in addition to the normal behavior) the empty string
1559 after a newline, and the empty string before a newline. There are
1560 special character sequences ('\`' and '\'') which always match the
1561 beginning or the end of the buffer. In addition, the period
1562 character does not match a new-line character in multi-line mode.
1563
1564 Regex addresses operate on the content of the current pattern space.
1565 If the pattern space is changed (for example with 's///' command) the
1566 regular expression matching will operate on the changed text.
1567
1568 In the following example, automatic printing is disabled with '-n'.
1569 The 's/2/X/' command changes lines containing '2' to 'X'. The command
1570 '/[0-9]/p' matches lines with digits and prints them. Because the
1571 second line is changed before the '/[0-9]/' regex, it will not match and
1572 will not be printed:
1573
1574 $ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
1575 1
1576 3
1577
1578 ---------- Footnotes ----------
1579
1580 (1) There are of course many other ways to do the same, e.g.
1581 grep 'bash$' /etc/passwd
1582 awk -F: '$7 == "/bin/bash"' /etc/passwd
1583
1584 4.4 Range Addresses
1585 ===================
1586
1587 An address range can be specified by specifying two addresses separated
1588 by a comma (','). An address range matches lines starting from where
1589 the first address matches, and continues until the second address
1590 matches (inclusively):
1591
1592 $ seq 10 | sed -n '4,6p'
1593 4
1594 5
1595 6
1596
1597 If the second address is a REGEXP, then checking for the ending match
1598 will start with the line _following_ the line which matched the first
1599 address: a range will always span at least two lines (except of course
1600 if the input stream ends).
1601
1602 $ seq 10 | sed -n '4,/[0-9]/p'
1603 4
1604 5
1605
1606 If the second address is a NUMBER less than (or equal to) the line
1607 matching the first address, then only the one line is matched:
1608
1609 $ seq 10 | sed -n '4,1p'
1610 4
1611
1612 GNU 'sed' also supports some special two-address forms; all these are
1613 GNU extensions:
1614 '0,/REGEXP/'
1615 A line number of '0' can be used in an address specification like
1616 '0,/REGEXP/' so that 'sed' will try to match REGEXP in the first
1617 input line too. In other words, '0,/REGEXP/' is similar to
1618 '1,/REGEXP/', except that if ADDR2 matches the very first line of
1619 input the '0,/REGEXP/' form will consider it to end the range,
1620 whereas the '1,/REGEXP/' form will match the beginning of its range
1621 and hence make the range span up to the _second_ occurrence of the
1622 regular expression.
1623
1624 Note that this is the only place where the '0' address makes sense;
1625 there is no 0-th line and commands which are given the '0' address
1626 in any other way will give an error.
1627
1628 The following examples demonstrate the difference between starting
1629 with address 1 and 0:
1630
1631 $ seq 10 | sed -n '1,/[0-9]/p'
1632 1
1633 2
1634
1635 $ seq 10 | sed -n '0,/[0-9]/p'
1636 1
1637
1638 'ADDR1,+N'
1639 Matches ADDR1 and the N lines following ADDR1.
1640
1641 $ seq 10 | sed -n '6,+2p'
1642 6
1643 7
1644 8
1645
1646 ADDR1 can be a line number or a regular expression.
1647
1648 'ADDR1,~N'
1649 Matches ADDR1 and the lines following ADDR1 until the next line
1650 whose input line number is a multiple of N. The following command
1651 prints starting at line 6, until the next line which is a multiple
1652 of 4 (i.e. line 8):
1653
1654 $ seq 10 | sed -n '6,~4p'
1655 6
1656 7
1657 8
1658
1659 ADDR1 can be a line number or a regular expression.
1660
1661 5 Regular Expressions: selecting text
1662 *************************************
1663
1664 5.1 Overview of regular expression in 'sed'
1665 ===========================================
1666
1667 To know how to use 'sed', people should understand regular expressions
1668 ("regexp" for short). A regular expression is a pattern that is matched
1669 against a subject string from left to right. Most characters are
1670 "ordinary": they stand for themselves in a pattern, and match the
1671 corresponding characters. Regular expressions in 'sed' are specified
1672 between two slashes.
1673
1674 The following command prints lines containing the word 'hello':
1675
1676 sed -n '/hello/p'
1677
1678 The above example is equivalent to this 'grep' command:
1679
1680 grep 'hello'
1681
1682 The power of regular expressions comes from the ability to include
1683 alternatives and repetitions in the pattern. These are encoded in the
1684 pattern by the use of "special characters", which do not stand for
1685 themselves but instead are interpreted in some special way.
1686
1687 The character '^' (caret) in a regular expression matches the
1688 beginning of the line. The character '.' (dot) matches any single
1689 character. The following 'sed' command matches and prints lines which
1690 start with the letter 'b', followed by any single character, followed by
1691 the letter 'd':
1692
1693 $ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p'
1694 bad
1695 bed
1696 bid
1697 body
1698
1699 The following sections explain the meaning and usage of special
1700 characters in regular expressions.
1701
1702 5.2 Basic (BRE) and extended (ERE) regular expression
1703 =====================================================
1704
1705 Basic and extended regular expressions are two variations on the syntax
1706 of the specified pattern. Basic Regular Expression (BRE) syntax is the
1707 default in 'sed' (and similarly in 'grep'). Use the POSIX-specified
1708 '-E' option ('-r', '--regexp-extended') to enable Extended Regular
1709 Expression (ERE) syntax.
1710
1711 In GNU 'sed', the only difference between basic and extended regular
1712 expressions is in the behavior of a few special characters: '?', '+',
1713 parentheses, braces ('{}'), and '|'.
1714
1715 With basic (BRE) syntax, these characters do not have special meaning
1716 unless prefixed with a backslash ('\'); While with extended (ERE) syntax
1717 it is reversed: these characters are special unless they are prefixed
1718 with backslash ('\').
1719
1720 Desired pattern Basic (BRE) Syntax Extended (ERE) Syntax
1721
1722 --------------------------------------------------------------------------
1723 literal '+' (plus $ echo 'a+b=c' > foo $ echo 'a+b=c' > foo
1724 sign) $ sed -n '/a+b/p' foo $ sed -E -n '/a\+b/p' foo
1725 a+b=c a+b=c
1726
1727 One or more 'a' $ echo aab > foo $ echo aab > foo
1728 characters $ sed -n '/a\+b/p' foo $ sed -E -n '/a+b/p' foo
1729 followed by 'b' aab aab
1730 (plus sign as
1731 special
1732 meta-character)
1733
1734 5.3 Overview of basic regular expression syntax
1735 ===============================================
1736
1737 Here is a brief description of regular expression syntax as used in
1738 'sed'.
1739
1740 'CHAR'
1741 A single ordinary character matches itself.
1742
1743 '*'
1744 Matches a sequence of zero or more instances of matches for the
1745 preceding regular expression, which must be an ordinary character,
1746 a special character preceded by '\', a '.', a grouped regexp (see
1747 below), or a bracket expression. As a GNU extension, a postfixed
1748 regular expression can also be followed by '*'; for example, 'a**'
1749 is equivalent to 'a*'. POSIX 1003.1-2001 says that '*' stands for
1750 itself when it appears at the start of a regular expression or
1751 subexpression, but many nonGNU implementations do not support this
1752 and portable scripts should instead use '\*' in these contexts.
1753 '.'
1754 Matches any character, including newline.
1755
1756 '^'
1757 Matches the null string at beginning of the pattern space, i.e.
1758 what appears after the circumflex must appear at the beginning of
1759 the pattern space.
1760
1761 In most scripts, pattern space is initialized to the content of
1762 each line (*note How 'sed' works: Execution Cycle.). So, it is a
1763 useful simplification to think of '^#include' as matching only
1764 lines where '#include' is the first thing on line--if there are
1765 spaces before, for example, the match fails. This simplification
1766 is valid as long as the original content of pattern space is not
1767 modified, for example with an 's' command.
1768
1769 '^' acts as a special character only at the beginning of the
1770 regular expression or subexpression (that is, after '\(' or '\|').
1771 Portable scripts should avoid '^' at the beginning of a
1772 subexpression, though, as POSIX allows implementations that treat
1773 '^' as an ordinary character in that context.
1774
1775 '$'
1776 It is the same as '^', but refers to end of pattern space. '$'
1777 also acts as a special character only at the end of the regular
1778 expression or subexpression (that is, before '\)' or '\|'), and its
1779 use at the end of a subexpression is not portable.
1780
1781 '[LIST]'
1782 '[^LIST]'
1783 Matches any single character in LIST: for example, '[aeiou]'
1784 matches all vowels. A list may include sequences like
1785 'CHAR1-CHAR2', which matches any character between (inclusive)
1786 CHAR1 and CHAR2. *Note Character Classes and Bracket
1787 Expressions::.
1788
1789 '\+'
1790 As '*', but matches one or more. It is a GNU extension.
1791
1792 '\?'
1793 As '*', but only matches zero or one. It is a GNU extension.
1794
1795 '\{I\}'
1796 As '*', but matches exactly I sequences (I is a decimal integer;
1797 for portability, keep it between 0 and 255 inclusive).
1798
1799 '\{I,J\}'
1800 Matches between I and J, inclusive, sequences.
1801
1802 '\{I,\}'
1803 Matches more than or equal to I sequences.
1804
1805 '\(REGEXP\)'
1806 Groups the inner REGEXP as a whole, this is used to:
1807
1808 * Apply postfix operators, like '\(abcd\)*': this will search
1809 for zero or more whole sequences of 'abcd', while 'abcd*'
1810 would search for 'abc' followed by zero or more occurrences of
1811 'd'. Note that support for '\(abcd\)*' is required by POSIX
1812 1003.1-2001, but many non-GNU implementations do not support
1813 it and hence it is not universally portable.
1814
1815 * Use back references (see below).
1816
1817 'REGEXP1\|REGEXP2'
1818 Matches either REGEXP1 or REGEXP2. Use parentheses to use complex
1819 alternative regular expressions. The matching process tries each
1820 alternative in turn, from left to right, and the first one that
1821 succeeds is used. It is a GNU extension.
1822
1823 'REGEXP1REGEXP2'
1824 Matches the concatenation of REGEXP1 and REGEXP2. Concatenation
1825 binds more tightly than '\|', '^', and '$', but less tightly than
1826 the other regular expression operators.
1827
1828 '\DIGIT'
1829 Matches the DIGIT-th '\(...\)' parenthesized subexpression in the
1830 regular expression. This is called a "back reference".
1831 Subexpressions are implicitly numbered by counting occurrences of
1832 '\(' left-to-right.
1833
1834 '\n'
1835 Matches the newline character.
1836
1837 '\CHAR'
1838 Matches CHAR, where CHAR is one of '$', '*', '.', '[', '\', or '^'.
1839 Note that the only C-like backslash sequences that you can portably
1840 assume to be interpreted are '\n' and '\\'; in particular '\t' is
1841 not portable, and matches a 't' under most implementations of
1842 'sed', rather than a tab character.
1843
1844 Note that the regular expression matcher is greedy, i.e., matches are
1845 attempted from left to right and, if two or more matches are possible
1846 starting at the same character, it selects the longest.
1847
1848 Examples:
1849 'abcdef'
1850 Matches 'abcdef'.
1851
1852 'a*b'
1853 Matches zero or more 'a's followed by a single 'b'. For example,
1854 'b' or 'aaaaab'.
1855
1856 'a\?b'
1857 Matches 'b' or 'ab'.
1858
1859 'a\+b\+'
1860 Matches one or more 'a's followed by one or more 'b's: 'ab' is the
1861 shortest possible match, but other examples are 'aaaab' or 'abbbbb'
1862 or 'aaaaaabbbbbbb'.
1863
1864 '.*'
1865 '.\+'
1866 These two both match all the characters in a string; however, the
1867 first matches every string (including the empty string), while the
1868 second matches only strings containing at least one character.
1869
1870 '^main.*(.*)'
1871 This matches a string starting with 'main', followed by an opening
1872 and closing parenthesis. The 'n', '(' and ')' need not be
1873 adjacent.
1874
1875 '^#'
1876 This matches a string beginning with '#'.
1877
1878 '\\$'
1879 This matches a string ending with a single backslash. The regexp
1880 contains two backslashes for escaping.
1881
1882 '\$'
1883 Instead, this matches a string consisting of a single dollar sign,
1884 because it is escaped.
1885
1886 '[a-zA-Z0-9]'
1887 In the C locale, this matches any ASCII letters or digits.
1888
1889 '[^ '<TAB>']\+'
1890 (Here '<TAB>' stands for a single tab character.) This matches a
1891 string of one or more characters, none of which is a space or a
1892 tab. Usually this means a word.
1893
1894 '^\(.*\)\n\1$'
1895 This matches a string consisting of two equal substrings separated
1896 by a newline.
1897
1898 '.\{9\}A$'
1899 This matches nine characters followed by an 'A' at the end of a
1900 line.
1901
1902 '^.\{15\}A'
1903 This matches the start of a string that contains 16 characters, the
1904 last of which is an 'A'.
1905
1906 5.4 Overview of extended regular expression syntax
1907 ==================================================
1908
1909 The only difference between basic and extended regular expressions is in
1910 the behavior of a few characters: '?', '+', parentheses, braces ('{}'),
1911 and '|'. While basic regular expressions require these to be escaped if
1912 you want them to behave as special characters, when using extended
1913 regular expressions you must escape them if you want them _to match a
1914 literal character_. '|' is special here because '\|' is a GNU extension
1915 - standard basic regular expressions do not provide its functionality.
1916
1917 Examples:
1918 'abc?'
1919 becomes 'abc\?' when using extended regular expressions. It
1920 matches the literal string 'abc?'.
1921
1922 'c\+'
1923 becomes 'c+' when using extended regular expressions. It matches
1924 one or more 'c's.
1925
1926 'a\{3,\}'
1927 becomes 'a{3,}' when using extended regular expressions. It
1928 matches three or more 'a's.
1929
1930 '\(abc\)\{2,3\}'
1931 becomes '(abc){2,3}' when using extended regular expressions. It
1932 matches either 'abcabc' or 'abcabcabc'.
1933
1934 '\(abc*\)\1'
1935 becomes '(abc*)\1' when using extended regular expressions.
1936 Backreferences must still be escaped when using extended regular
1937 expressions.
1938
1939 'a\|b'
1940 becomes 'a|b' when using extended regular expressions. It matches
1941 'a' or 'b'.
1942
1943 5.5 Character Classes and Bracket Expressions
1944 =============================================
1945
1946 A "bracket expression" is a list of characters enclosed by '[' and ']'.
1947 It matches any single character in that list; if the first character of
1948 the list is the caret '^', then it matches any character *not* in the
1949 list. For example, the following command replaces the words 'gray' or
1950 'grey' with 'blue':
1951
1952 sed 's/gr[ae]y/blue/'
1953
1954 Bracket expressions can be used in both *note basic: BRE syntax. and
1955 *note extended: ERE syntax. regular expressions (that is, with or
1956 without the '-E'/'-r' options).
1957
1958 Within a bracket expression, a "range expression" consists of two
1959 characters separated by a hyphen. It matches any single character that
1960 sorts between the two characters, inclusive. In the default C locale,
1961 the sorting sequence is the native character order; for example, '[a-d]'
1962 is equivalent to '[abcd]'.
1963
1964 Finally, certain named classes of characters are predefined within
1965 bracket expressions, as follows.
1966
1967 These named classes must be used _inside_ brackets themselves.
1968 Correct usage:
1969 $ echo 1 | sed 's/[[:digit:]]/X/'
1970 X
1971
1972 Incorrect usage is rejected by newer 'sed' versions. Older versions
1973 accepted it but treated it as a single bracket expression (which is
1974 equivalent to '[dgit:]', that is, only the characters D/G/I/T/:):
1975 # current GNU sed versions - incorrect usage rejected
1976 $ echo 1 | sed 's/[:digit:]/X/'
1977 sed: character class syntax is [[:space:]], not [:space:]
1978
1979 # older GNU sed versions
1980 $ echo 1 | sed 's/[:digit:]/X/'
1981 1
1982
1983 '[:alnum:]'
1984 Alphanumeric characters: '[:alpha:]' and '[:digit:]'; in the 'C'
1985 locale and ASCII character encoding, this is the same as
1986 '[0-9A-Za-z]'.
1987
1988 '[:alpha:]'
1989 Alphabetic characters: '[:lower:]' and '[:upper:]'; in the 'C'
1990 locale and ASCII character encoding, this is the same as
1991 '[A-Za-z]'.
1992
1993 '[:blank:]'
1994 Blank characters: space and tab.
1995
1996 '[:cntrl:]'
1997 Control characters. In ASCII, these characters have octal codes
1998 000 through 037, and 177 (DEL). In other character sets, these are
1999 the equivalent characters, if any.
2000
2001 '[:digit:]'
2002 Digits: '0 1 2 3 4 5 6 7 8 9'.
2003
2004 '[:graph:]'
2005 Graphical characters: '[:alnum:]' and '[:punct:]'.
2006
2007 '[:lower:]'
2008 Lower-case letters; in the 'C' locale and ASCII character encoding,
2009 this is 'a b c d e f g h i j k l m n o p q r s t u v w x y z'.
2010
2011 '[:print:]'
2012 Printable characters: '[:alnum:]', '[:punct:]', and space.
2013
2014 '[:punct:]'
2015 Punctuation characters; in the 'C' locale and ASCII character
2016 encoding, this is '! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \
2017 ] ^ _ ` { | } ~'.
2018
2019 '[:space:]'
2020 Space characters: in the 'C' locale, this is tab, newline, vertical
2021 tab, form feed, carriage return, and space.
2022
2023 '[:upper:]'
2024 Upper-case letters: in the 'C' locale and ASCII character encoding,
2025 this is 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z'.
2026
2027 '[:xdigit:]'
2028 Hexadecimal digits: '0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f'.
2029
2030 Note that the brackets in these class names are part of the symbolic
2031 names, and must be included in addition to the brackets delimiting the
2032 bracket expression.
2033
2034 Most meta-characters lose their special meaning inside bracket
2035 expressions:
2036
2037 ']'
2038 ends the bracket expression if it's not the first list item. So,
2039 if you want to make the ']' character a list item, you must put it
2040 first.
2041
2042 '-'
2043 represents the range if it's not first or last in a list or the
2044 ending point of a range.
2045
2046 '^'
2047 represents the characters not in the list. If you want to make the
2048 '^' character a list item, place it anywhere but first.
2049
2050 TODO: incorporate this paragraph (copied verbatim from BRE section).
2051
2052 The characters '$', '*', '.', '[', and '\' are normally not special
2053 within LIST. For example, '[\*]' matches either '\' or '*', because the
2054 '\' is not special here. However, strings like '[.ch.]', '[=a=]', and
2055 '[:space:]' are special within LIST and represent collating symbols,
2056 equivalence classes, and character classes, respectively, and '[' is
2057 therefore special within LIST when it is followed by '.', '=', or ':'.
2058 Also, when not in 'POSIXLY_CORRECT' mode, special escapes like '\n' and
2059 '\t' are recognized within LIST. *Note Escapes::.
2060
2061 '[.'
2062 represents the open collating symbol.
2063
2064 '.]'
2065 represents the close collating symbol.
2066
2067 '[='
2068 represents the open equivalence class.
2069
2070 '=]'
2071 represents the close equivalence class.
2072
2073 '[:'
2074 represents the open character class symbol, and should be followed
2075 by a valid character class name.
2076
2077 ':]'
2078 represents the close character class symbol.
2079
2080 5.6 regular expression extensions
2081 =================================
2082
2083 The following sequences have special meaning inside regular expressions
2084 (used in *note addresses: Regexp Addresses. and the 's' command).
2085
2086 These can be used in both *note basic: BRE syntax. and *note
2087 extended: ERE syntax. regular expressions (that is, with or without the
2088 '-E'/'-r' options).
2089
2090 '\w'
2091 Matches any "word" character. A "word" character is any letter or
2092 digit or the underscore character.
2093
2094 $ echo "abc %-= def." | sed 's/\w/X/g'
2095 XXX %-= XXX.
2096
2097 '\W'
2098 Matches any "non-word" character.
2099
2100 $ echo "abc %-= def." | sed 's/\W/X/g'
2101 abcXXXXXdefX
2102
2103 '\b'
2104 Matches a word boundary; that is it matches if the character to the
2105 left is a "word" character and the character to the right is a
2106 "non-word" character, or vice-versa.
2107
2108 $ echo "abc %-= def." | sed 's/\b/X/g'
2109 XabcX %-= XdefX.
2110
2111 '\B'
2112 Matches everywhere but on a word boundary; that is it matches if
2113 the character to the left and the character to the right are either
2114 both "word" characters or both "non-word" characters.
2115
2116 $ echo "abc %-= def." | sed 's/\B/X/g'
2117 aXbXc X%X-X=X dXeXf.X
2118
2119 '\s'
2120 Matches whitespace characters (spaces and tabs). Newlines embedded
2121 in the pattern/hold spaces will also match:
2122
2123 $ echo "abc %-= def." | sed 's/\s/X/g'
2124 abcX%-=Xdef.
2125
2126 '\S'
2127 Matches non-whitespace characters.
2128
2129 $ echo "abc %-= def." | sed 's/\S/X/g'
2130 XXX XXX XXXX
2131
2132 '\<'
2133 Matches the beginning of a word.
2134
2135 $ echo "abc %-= def." | sed 's/\</X/g'
2136 Xabc %-= Xdef.
2137
2138 '\>'
2139 Matches the end of a word.
2140
2141 $ echo "abc %-= def." | sed 's/\>/X/g'
2142 abcX %-= defX.
2143
2144 '\`'
2145 Matches only at the start of pattern space. This is different from
2146 '^' in multi-line mode.
2147
2148 Compare the following two examples:
2149
2150 $ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm'
2151 Xa
2152 Xb
2153 Xc
2154
2155 $ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm'
2156 Xa
2157 b
2158 c
2159
2160 '\''
2161 Matches only at the end of pattern space. This is different from
2162 '$' in multi-line mode.
2163
2164 5.7 Back-references and Subexpressions
2165 ======================================
2166
2167 "back-references" are regular expression commands which refer to a
2168 previous part of the matched regular expression. Back-references are
2169 specified with backslash and a single digit (e.g. '\1'). The part of
2170 the regular expression they refer to is called a "subexpression", and is
2171 designated with parentheses.
2172
2173 Back-references and subexpressions are used in two cases: in the
2174 regular expression search pattern, and in the REPLACEMENT part of the
2175 's' command (*note Regular Expression Addresses: Regexp Addresses. and
2176 *note The "s" Command::).
2177
2178 In a regular expression pattern, back-references are used to match
2179 the same content as a previously matched subexpression. In the
2180 following example, the subexpression is '.' - any single character
2181 (being surrounded by parentheses makes it a subexpression). The
2182 back-reference '\1' asks to match the same content (same character) as
2183 the sub-expression.
2184
2185 The command below matches words starting with any character, followed
2186 by the letter 'o', followed by the same character as the first.
2187
2188 $ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
2189 bob
2190 mom
2191 non
2192 pop
2193 sos
2194 tot
2195 wow
2196
2197 Multiple subexpressions are automatically numbered from
2198 left-to-right. This command searches for 6-letter palindromes (the
2199 first three letters are 3 subexpressions, followed by 3 back-references
2200 in reverse order):
2201
2202 $ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
2203 redder
2204
2205 In the 's' command, back-references can be used in the REPLACEMENT
2206 part to refer back to subexpressions in the REGEXP part.
2207
2208 The following example uses two subexpressions in the regular
2209 expression to match two space-separated words. The back-references in
2210 the REPLACEMENT part prints the words in a different order:
2211
2212 $ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
2213 The name is Bond, James Bond.
2214
2215 When used with alternation, if the group does not participate in the
2216 match then the back-reference makes the whole match fail. For example,
2217 'a(.)|b\1' will not match 'ba'. When multiple regular expressions are
2218 given with '-e' or from a file ('-f FILE'), back-references are local to
2219 each expression.
2220
2221 5.8 Escape Sequences - specifying special characters
2222 ====================================================
2223
2224 Until this chapter, we have only encountered escapes of the form '\^',
2225 which tell 'sed' not to interpret the circumflex as a special character,
2226 but rather to take it literally. For example, '\*' matches a single
2227 asterisk rather than zero or more backslashes.
2228
2229 This chapter introduces another kind of escape(1)--that is, escapes
2230 that are applied to a character or sequence of characters that
2231 ordinarily are taken literally, and that 'sed' replaces with a special
2232 character. This provides a way of encoding non-printable characters in
2233 patterns in a visible manner. There is no restriction on the appearance
2234 of non-printing characters in a 'sed' script but when a script is being
2235 prepared in the shell or by text editing, it is usually easier to use
2236 one of the following escape sequences than the binary character it
2237 represents:
2238
2239 The list of these escapes is:
2240
2241 '\a'
2242 Produces or matches a BEL character, that is an "alert" (ASCII 7).
2243
2244 '\f'
2245 Produces or matches a form feed (ASCII 12).
2246
2247 '\n'
2248 Produces or matches a newline (ASCII 10).
2249
2250 '\r'
2251 Produces or matches a carriage return (ASCII 13).
2252
2253 '\t'
2254 Produces or matches a horizontal tab (ASCII 9).
2255
2256 '\v'
2257 Produces or matches a so called "vertical tab" (ASCII 11).
2258
2259 '\cX'
2260 Produces or matches 'CONTROL-X', where X is any character. The
2261 precise effect of '\cX' is as follows: if X is a lower case letter,
2262 it is converted to upper case. Then bit 6 of the character (hex
2263 40) is inverted. Thus '\cz' becomes hex 1A, but '\c{' becomes hex
2264 3B, while '\c;' becomes hex 7B.
2265
2266 '\dXXX'
2267 Produces or matches a character whose decimal ASCII value is XXX.
2268
2269 '\oXXX'
2270 Produces or matches a character whose octal ASCII value is XXX.
2271
2272 '\xXX'
2273 Produces or matches a character whose hexadecimal ASCII value is
2274 XX.
2275
2276 '\b' (backspace) was omitted because of the conflict with the
2277 existing "word boundary" meaning.
2278
2279 5.8.1 Escaping Precedence
2280 -------------------------
2281
2282 GNU 'sed' processes escape sequences _before_ passing the text onto the
2283 regular-expression matching of the 's///' command and Address matching.
2284 Thus the follwing two commands are equivalent ('0x5e' is the hexadecimal
2285 ASCII value of the character '^'):
2286
2287 $ echo 'a^c' | sed 's/^/b/'
2288 ba^c
2289
2290 $ echo 'a^c' | sed 's/\x5e/b/'
2291 ba^c
2292
2293 As are the following ('0x5b','0x5d' are the hexadecimal ASCII values
2294 of '[',']', respectively):
2295
2296 $ echo abc | sed 's/[a]/x/'
2297 Xbc
2298 $ echo abc | sed 's/\x5ba\x5d/x/'
2299 Xbc
2300
2301 However it is recommended to avoid such special characters due to
2302 unexpected edge-cases. For example, the following are not equivalent:
2303
2304 $ echo 'a^c' | sed 's/\^/b/'
2305 abc
2306
2307 $ echo 'a^c' | sed 's/\\\x5e/b/'
2308 a^c
2309
2310 ---------- Footnotes ----------
2311
2312 (1) All the escapes introduced here are GNU extensions, with the
2313 exception of '\n'. In basic regular expression mode, setting
2314 'POSIXLY_CORRECT' disables them inside bracket expressions.
2315
2316 5.9 Multibyte characters and Locale Considerations
2317 ==================================================
2318
2319 GNU 'sed' processes valid multibyte characters in multibyte locales
2320 (e.g. 'UTF-8'). (1)
2321
2322 The following example uses the Greek letter Capital Sigma (U+03A3,
2323 Unicode code point '0x03A3'). In a 'UTF-8' locale, 'sed' correctly
2324 processes the Sigma as one character despite it being 2 octets (bytes):
2325
2326 $ locale | grep LANG
2327 LANG=en_US.UTF-8
2328
2329 $ printf 'a\u03A3b'
2330 aU+03A3b
2331
2332 $ printf 'a\u03A3b' | sed 's/./X/g'
2333 XXX
2334
2335 $ printf 'a\u03A3b' | od -tx1 -An
2336 61 ce a3 62
2337
2338 To force 'sed' to process octets separately, use the 'C' locale (also
2339 known as the 'POSIX' locale):
2340
2341 $ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g'
2342 XXXX
2343
2344 5.9.1 Invalid multibyte characters
2345 ----------------------------------
2346
2347 'sed''s regular expressions _do not_ match invalid multibyte sequences
2348 in a multibyte locale.
2349
2350 In the following examples, the ascii value '0xCE' is an incomplete
2351 multibyte character (shown here as U+FFFD). The regular expression '.'
2352 does not match it:
2353
2354 $ printf 'a\xCEb\n'
2355 aU+FFFDe
2356
2357 $ printf 'a\xCEb\n' | sed 's/./X/g'
2358 XU+FFFDX
2359
2360 $ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An
2361 58 ce 58 0a
2362 X X \n
2363
2364 Similarly, the 'catch-all' regular expression '.*' does not match the
2365 entire line:
2366
2367 $ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An
2368 ce 63 0a
2369 c \n
2370
2371 GNU 'sed' offers the special 'z' command to clear the current pattern
2372 space regardless of invalid multibyte characters (i.e. it works like
2373 's/.*//' but also removes invalid multibyte characters):
2374
2375 $ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An
2376 0a
2377 \n
2378
2379 Alternatively, force the 'C' locale to process each octet separately
2380 (every octet is a valid character in the 'C' locale):
2381
2382 $ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An
2383 0a
2384 \n
2385
2386 'sed''s inability to process invalid multibyte characters can be used
2387 to detect such invalid sequences in a file. In the following examples,
2388 the '\xCE\xCE' is an invalid multibyte sequence, while '\xCE\A3' is a
2389 valid multibyte sequence (of the Greek Sigma character).
2390
2391 The following 'sed' program removes all valid characters using 's/.//g'.
2392 Any content left in the pattern space (the invalid characters) are added
2393 to the hold space using the 'H' command. On the last line ('$'), the
2394 hold space is retrieved ('x'), newlines are removed ('s/\n//g'), and any
2395 remaining octets are printed unambiguously ('l'). Thus, any invalid
2396 multibyte sequences are printed as octal values:
2397
2398 $ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
2399
2400 $ cat invalid.txt
2401 ab
2402 c
2403 U+FFFDU+FFFDde
2404 U+03A3f
2405
2406 $ sed -n 's/.//g ; H ; ${x;s/\n//g;l}' invalid.txt
2407 \316\316$
2408
2409 With a few more commands, 'sed' can print the exact line number
2410 corresponding to each invalid characters (line 3). These characters can
2411 then be removed by forcing the 'C' locale and using octal escape
2412 sequences:
2413
2414 $ sed -n 's/.//g;=;l' invalid.txt | paste - - | awk '$2!="$"'
2415 3 \316\316$
2416
2417 $ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
2418
2419 5.9.2 Upper/Lower case conversion
2420 ---------------------------------
2421
2422 GNU 'sed''s substitute command ('s') supports upper/lower case
2423 conversions using '\U','\L' codes. These conversions support multibyte
2424 characters:
2425
2426 $ printf 'ABC\u03a3\n'
2427 ABCU+03A3
2428
2429 $ printf 'ABC\u03a3\n' | sed 's/.*/\L&/'
2430 abcU+03C3
2431
2432 *Note The "s" Command::.
2433
2434 5.9.3 Multibyte regexp character classes
2435 ----------------------------------------
2436
2437 In other locales, the sorting sequence is not specified, and '[a-d]'
2438 might be equivalent to '[abcd]' or to '[aBbCcDd]', or it might fail to
2439 match any character, or the set of characters that it matches might even
2440 be erratic. To obtain the traditional interpretation of bracket
2441 expressions, you can use the 'C' locale by setting the 'LC_ALL'
2442 environment variable to the value 'C'.
2443
2444 # TODO: is there any real-world system/locale where 'A'
2445 # is replaced by '-' ?
2446 $ echo A | sed 's/[a-z]/-/'
2447 A
2448
2449 Their interpretation depends on the 'LC_CTYPE' locale; for example,
2450 '[[:alnum:]]' means the character class of numbers and letters in the
2451 current locale.
2452
2453 TODO: show example of collation
2454
2455 # TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
2456 $ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
2457 clichX
2458
2459 ---------- Footnotes ----------
2460
2461 (1) Some regexp edge-cases depends on the operating system and libc
2462 implementation. The examples shown are known to work as-expected on
2463 GNU/Linux systems using glibc.
2464
2465 6 Advanced 'sed': cycles and buffers
2466 ************************************
2467
2468 6.1 How 'sed' Works
2469 ===================
2470
2471 'sed' maintains two data buffers: the active _pattern_ space, and the
2472 auxiliary _hold_ space. Both are initially empty.
2473
2474 'sed' operates by performing the following cycle on each line of
2475 input: first, 'sed' reads one line from the input stream, removes any
2476 trailing newline, and places it in the pattern space. Then commands are
2477 executed; each command can have an address associated to it: addresses
2478 are a kind of condition code, and a command is only executed if the
2479 condition is verified before the command is to be executed.
2480
2481 When the end of the script is reached, unless the '-n' option is in
2482 use, the contents of pattern space are printed out to the output stream,
2483 adding back the trailing newline if it was removed.(1) Then the next
2484 cycle starts for the next input line.
2485
2486 Unless special commands (like 'D') are used, the pattern space is
2487 deleted between two cycles. The hold space, on the other hand, keeps
2488 its data between cycles (see commands 'h', 'H', 'x', 'g', 'G' to move
2489 data between both buffers).
2490
2491 ---------- Footnotes ----------
2492
2493 (1) Actually, if 'sed' prints a line without the terminating newline,
2494 it will nevertheless print the missing newline as soon as more text is
2495 sent to the same output stream, which gives the "least expected
2496 surprise" even though it does not make commands like 'sed -n p' exactly
2497 identical to 'cat'.
2498
2499 6.2 Hold and Pattern Buffers
2500 ============================
2501
2502 TODO
2503
2504 6.3 Multiline techniques - using D,G,H,N,P to process multiple lines
2505 ====================================================================
2506
2507 Multiple lines can be processed as one buffer using the
2508 'D','G','H','N','P'. They are similar to their lowercase counterparts
2509 ('d','g', 'h','n','p'), except that these commands append or subtract
2510 data while respecting embedded newlines - allowing adding and removing
2511 lines from the pattern and hold spaces.
2512
2513 They operate as follows:
2514 'D'
2515 _deletes_ line from the pattern space until the first newline, and
2516 restarts the cycle.
2517
2518 'G'
2519 _appends_ line from the hold space to the pattern space, with a
2520 newline before it.
2521
2522 'H'
2523 _appends_ line from the pattern space to the hold space, with a
2524 newline before it.
2525
2526 'N'
2527 _appends_ line from the input file to the pattern space.
2528
2529 'P'
2530 _prints_ line from the pattern space until the first newline.
2531
2532 The following example illustrates the operation of 'N' and 'D'
2533 commands:
2534
2535 $ seq 6 | sed -n 'N;l;D'
2536 1\n2$
2537 2\n3$
2538 3\n4$
2539 4\n5$
2540 5\n6$
2541
2542 1. 'sed' starts by reading the first line into the pattern space (i.e.
2543 '1').
2544 2. At the beginning of every cycle, the 'N' command appends a newline
2545 and the next line to the pattern space (i.e. '1', '\n', '2' in the
2546 first cycle).
2547 3. The 'l' command prints the content of the pattern space
2548 unambiguously.
2549 4. The 'D' command then removes the content of pattern space up to the
2550 first newline (leaving '2' at the end of the first cycle).
2551 5. At the next cycle the 'N' command appends a newline and the next
2552 input line to the pattern space (e.g. '2', '\n', '3').
2553
2554 A common technique to process blocks of text such as paragraphs
2555 (instead of line-by-line) is using the following construct:
2556
2557 sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/'
2558
2559 1. The first expression, '/./{H;$!d}' operates on all non-empty lines,
2560 and adds the current line (in the pattern space) to the hold space.
2561 On all lines except the last, the pattern space is deleted and the
2562 cycle is restarted.
2563
2564 2. The other expressions 'x' and 's' are executed only on empty lines
2565 (i.e. paragraph separators). The 'x' command fetches the
2566 accumulated lines from the hold space back to the pattern space.
2567 The 's///' command then operates on all the text in the paragraph
2568 (including the embedded newlines).
2569
2570 The following example demonstrates this technique:
2571 $ cat input.txt
2572 a a a aa aaa
2573 aaaa aaaa aa
2574 aaaa aaa aaa
2575
2576 bbbb bbb bbb
2577 bb bb bbb bb
2578 bbbbbbbb bbb
2579
2580 ccc ccc cccc
2581 cccc ccccc c
2582 cc cc cc cc
2583
2584 $ sed '/./{H;$!d} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
2585
2586 START-->
2587 a a a aa aaa
2588 aaaa aaaa aa
2589 aaaa aaa aaa
2590 <--END
2591
2592 START-->
2593 bbbb bbb bbb
2594 bb bb bbb bb
2595 bbbbbbbb bbb
2596 <--END
2597
2598 START-->
2599 ccc ccc cccc
2600 cccc ccccc c
2601 cc cc cc cc
2602 <--END
2603
2604 For more annotated examples, *note Text search across multiple
2605 lines:: and *note Line length adjustment::.
2606
2607 6.4 Branching and Flow Control
2608 ==============================
2609
2610 The branching commands 'b', 't', and 'T' enable changing the flow of
2611 'sed' programs.
2612
2613 By default, 'sed' reads an input line into the pattern buffer, then
2614 continues to processes all commands in order. Commands without
2615 addresses affect all lines. Commands with addresses affect only
2616 matching lines. *Note Execution Cycle:: and *note Addresses overview::.
2617
2618 'sed' does not support a typical 'if/then' construct. Instead, some
2619 commands can be used as conditionals or to change the default flow
2620 control:
2621
2622 'd'
2623 delete (clears) the current pattern space, and restart the program
2624 cycle without processing the rest of the commands and without
2625 printing the pattern space.
2626
2627 'D'
2628 delete the contents of the pattern space _up to the first newline_,
2629 and restart the program cycle without processing the rest of the
2630 commands and without printing the pattern space.
2631
2632 '[addr]X'
2633 '[addr]{ X ; X ; X }'
2634 '/regexp/X'
2635 '/regexp/{ X ; X ; X }'
2636 Addresses and regular expressions can be used as an 'if/then'
2637 conditional: If [ADDR] matches the current pattern space, execute
2638 the command(s). For example: The command '/^#/d' means: _if_ the
2639 current pattern matches the regular expression '^#' (a line
2640 starting with a hash), _then_ execute the 'd' command: delete the
2641 line without printing it, and restart the program cycle
2642 immediately.
2643
2644 'b'
2645 branch unconditionally (that is: always jump to a label, skipping
2646 or repeating other commands, without restarting a new cycle).
2647 Combined with an address, the branch can be conditionally executed
2648 on matched lines.
2649
2650 't'
2651 branch conditionally (that is: jump to a label) _only if_ a 's///'
2652 command has succeeded since the last input line was read or another
2653 conditional branch was taken.
2654
2655 'T'
2656 similar but opposite to the 't' command: branch only if there has
2657 been _no_ successful substitutions since the last input line was
2658 read.
2659
2660 The following two 'sed' programs are equivalent. The first
2661 (contrived) example uses the 'b' command to skip the 's///' command on
2662 lines containing '1'. The second example uses an address with negation
2663 ('!') to perform substitution only on desired lines. The 'y///' command
2664 is still executed on all lines:
2665
2666 $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
2667 a4
2668 z5
2669 z6
2670
2671 $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
2672 a4
2673 z5
2674 z6
2675
2676 6.4.1 Branching and Cycles
2677 --------------------------
2678
2679 The 'b','t' and 'T' commands can be followed by a label (typically a
2680 single letter). Labels are defined with a colon followed by one or more
2681 letters (e.g. ':x'). If the label is omitted the branch commands
2682 restart the cycle. Note the difference between branching to a label and
2683 restarting the cycle: when a cycle is restarted, 'sed' first prints the
2684 current content of the pattern space, then reads the next input line
2685 into the pattern space; Jumping to a label (even if it is at the
2686 beginning of the program) does not print the pattern space and does not
2687 read the next input line.
2688
2689 The following program is a no-op. The 'b' command (the only command
2690 in the program) does not have a label, and thus simply restarts the
2691 cycle. On each cycle, the pattern space is printed and the next input
2692 line is read:
2693
2694 $ seq 3 | sed b
2695 1
2696 2
2697 3
2698
2699 The following example is an infinite-loop - it doesn't terminate and
2700 doesn't print anything. The 'b' command jumps to the 'x' label, and a
2701 new cycle is never started:
2702
2703 $ seq 3 | sed ':x ; bx'
2704
2705 # The above command requires gnu sed (which supports additional
2706 # commands following a label, without a newline). A portable equivalent:
2707 # sed -e ':x' -e bx
2708
2709 Branching is often complemented with the 'n' or 'N' commands: both
2710 commands read the next input line into the pattern space without waiting
2711 for the cycle to restart. Before reading the next input line, 'n'
2712 prints the current pattern space then empties it, while 'N' appends a
2713 newline and the next input line to the pattern space.
2714
2715 Consider the following two examples:
2716
2717 $ seq 3 | sed ':x ; n ; bx'
2718 1
2719 2
2720 3
2721
2722 $ seq 3 | sed ':x ; N ; bx'
2723 1
2724 2
2725 3
2726
2727 * Both examples do not inf-loop, despite never starting a new cycle.
2728
2729 * In the first example, the 'n' commands first prints the content of
2730 the pattern space, empties the pattern space then reads the next
2731 input line.
2732
2733 * In the second example, the 'N' commands appends the next input line
2734 to the pattern space (with a newline). Lines are accumulated in
2735 the pattern space until there are no more input lines to read, then
2736 the 'N' command terminates the 'sed' program. When the program
2737 terminates, the end-of-cycle actions are performed, and the entire
2738 pattern space is printed.
2739
2740 * The second example requires GNU 'sed', because it uses the
2741 non-POSIX-standard behavior of 'N'. See the "'N' command on the
2742 last line" paragraph in *note Reporting Bugs::.
2743
2744 * To further examine the difference between the two examples, try the
2745 following commands:
2746 printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
2747 printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
2748 printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
2749 printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
2750
2751 6.4.2 Branching example: joining lines
2752 --------------------------------------
2753
2754 As a real-world example of using branching, consider the case of
2755 quoted-printable (https://en.wikipedia.org/wiki/Quoted-printable) files,
2756 typically used to encode email messages. In these files long lines are
2757 split and marked with a "soft line break" consisting of a single '='
2758 character at the end of the line:
2759
2760 $ cat jaques.txt
2761 All the wor=
2762 ld's a stag=
2763 e,
2764 And all the=
2765 men and wo=
2766 men merely =
2767 players:
2768 They have t=
2769 heir exits =
2770 and their e=
2771 ntrances;
2772 And one man=
2773 in his tim=
2774 e plays man=
2775 y parts.
2776
2777 The following program uses an address match '/=$/' as a conditional:
2778 If the current pattern space ends with a '=', it reads the next input
2779 line using 'N', replaces all '=' characters which are followed by a
2780 newline, and unconditionally branches ('b') to the beginning of the
2781 program without restarting a new cycle. If the pattern space does not
2782 ends with '=', the default action is performed: the pattern space is
2783 printed and a new cycle is started:
2784
2785 $ sed ':x ; /=$/ { N ; s/=\n//g ; bx }' jaques.txt
2786 All the world's a stage,
2787 And all the men and women merely players:
2788 They have their exits and their entrances;
2789 And one man in his time plays many parts.
2790
2791 Here's an alternative program with a slightly different approach: On
2792 all lines except the last, 'N' appends the line to the pattern space. A
2793 substitution command then removes soft line breaks ('=' at the end of a
2794 line, i.e. followed by a newline) by replacing them with an empty
2795 string. _if_ the substitution was successful (meaning the pattern space
2796 contained a line which should be joined), The conditional branch command
2797 't' jumps to the beginning of the program without completing or
2798 restarting the cycle. If the substitution failed (meaning there were no
2799 soft line breaks), The 't' command will _not_ branch. Then, 'P' will
2800 print the pattern space content until the first newline, and 'D' will
2801 delete the pattern space content until the first new line. (To learn
2802 more about 'N', 'P' and 'D' commands *note Multiline techniques::).
2803
2804 $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
2805 All the world's a stage,
2806 And all the men and women merely players:
2807 They have their exits and their entrances;
2808 And one man in his time plays many parts.
2809
2810 For more line-joining examples *note Joining lines::.
2811
2812 7 Some Sample Scripts
2813 *********************
2814
2815 Here are some 'sed' scripts to guide you in the art of mastering 'sed'.
2816
2817 7.1 Joining lines
2818 =================
2819
2820 This section uses 'N', 'D' and 'P' commands to process multiple lines,
2821 and the 'b' and 't' commands for branching. *Note Multiline
2822 techniques:: and *note Branching and flow control::.
2823
2824 Join specific lines (e.g. if lines 2 and 3 need to be joined):
2825
2826 $ cat lines.txt
2827 hello
2828 hel
2829 lo
2830 hello
2831
2832 $ sed '2{N;s/\n//;}' lines.txt
2833 hello
2834 hello
2835 hello
2836
2837 Join backslash-continued lines:
2838
2839 $ cat 1.txt
2840 this \
2841 is \
2842 a \
2843 long \
2844 line
2845 and another \
2846 line
2847
2848 $ sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' 1.txt
2849 this is a long line
2850 and another line
2851
2852
2853 #TODO: The above requires gnu sed.
2854 # non-gnu seds need newlines after ':' and 'b'
2855
2856 Join lines that start with whitespace (e.g SMTP headers):
2857
2858 $ cat 2.txt
2859 Subject: Hello
2860 World
2861 Content-Type: multipart/alternative;
2862 boundary=94eb2c190cc6370f06054535da6a
2863 Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
2864 Authentication-Results: mx.gnu.org;
2865 dkim=pass header.i=@gnu.org;
2866 spf=pass
2867 Message-ID: <abcdef@gnu.org>
2868 From: John Doe <jdoe@gnu.org>
2869 To: Jane Smith <jsmith@gnu.org>
2870
2871 $ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
2872 Subject: Hello World
2873 Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
2874 Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
2875 Authentication-Results: mx.gnu.org; dkim=pass header.i=@gnu.org; spf=pass
2876 Message-ID: <abcdef@gnu.org>
2877 From: John Doe <jdoe@gnu.org>
2878 To: Jane Smith <jsmith@gnu.org>
2879
2880 # A portable (non-gnu) variation:
2881 # sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D'
2882
2883 7.2 Centering Lines
2884 ===================
2885
2886 This script centers all lines of a file on a 80 columns width. To
2887 change that width, the number in '\{...\}' must be replaced, and the
2888 number of added spaces also must be changed.
2889
2890 Note how the buffer commands are used to separate parts in the
2891 regular expressions to be matched--this is a common technique.
2892
2893 #!/usr/bin/sed -f
2894
2895 # Put 80 spaces in the buffer
2896 1 {
2897 x
2898 s/^$/ /
2899 s/^.*$/&&&&&&&&/
2900 x
2901 }
2902
2903 # delete leading and trailing spaces
2904 y/<TAB>/ /
2905 s/^ *//
2906 s/ *$//
2907
2908 # add a newline and 80 spaces to end of line
2909 G
2910
2911 # keep first 81 chars (80 + a newline)
2912 s/^\(.\{81\}\).*$/\1/
2913
2914 # \2 matches half of the spaces, which are moved to the beginning
2915 s/^\(.*\)\n\(.*\)\2/\2\1/
2916
2917 7.3 Increment a Number
2918 ======================
2919
2920 This script is one of a few that demonstrate how to do arithmetic in
2921 'sed'. This is indeed possible,(1) but must be done manually.
2922
2923 To increment one number you just add 1 to last digit, replacing it by
2924 the following digit. There is one exception: when the digit is a nine
2925 the previous digits must be also incremented until you don't have a
2926 nine.
2927
2928 This solution by Bruno Haible is very clever and smart because it
2929 uses a single buffer; if you don't have this limitation, the algorithm
2930 used in *note Numbering lines: cat -n, is faster. It works by replacing
2931 trailing nines with an underscore, then using multiple 's' commands to
2932 increment the last digit, and then again substituting underscores with
2933 zeros.
2934
2935 #!/usr/bin/sed -f
2936
2937 /[^0-9]/ d
2938
2939 # replace all trailing 9s by _ (any other character except digits, could
2940 # be used)
2941 :d
2942 s/9\(_*\)$/_\1/
2943 td
2944
2945 # incr last digit only. The first line adds a most-significant
2946 # digit of 1 if we have to add a digit.
2947
2948 s/^\(_*\)$/1\1/; tn
2949 s/8\(_*\)$/9\1/; tn
2950 s/7\(_*\)$/8\1/; tn
2951 s/6\(_*\)$/7\1/; tn
2952 s/5\(_*\)$/6\1/; tn
2953 s/4\(_*\)$/5\1/; tn
2954 s/3\(_*\)$/4\1/; tn
2955 s/2\(_*\)$/3\1/; tn
2956 s/1\(_*\)$/2\1/; tn
2957 s/0\(_*\)$/1\1/; tn
2958
2959 :n
2960 y/_/0/
2961
2962 ---------- Footnotes ----------
2963
2964 (1) 'sed' guru Greg Ubben wrote an implementation of the 'dc' RPN
2965 calculator! It is distributed together with sed.
2966
2967 7.4 Rename Files to Lower Case
2968 ==============================
2969
2970 This is a pretty strange use of 'sed'. We transform text, and transform
2971 it to be shell commands, then just feed them to shell. Don't worry,
2972 even worse hacks are done when using 'sed'; I have seen a script
2973 converting the output of 'date' into a 'bc' program!
2974
2975 The main body of this is the 'sed' script, which remaps the name from
2976 lower to upper (or vice-versa) and even checks out if the remapped name
2977 is the same as the original name. Note how the script is parameterized
2978 using shell variables and proper quoting.
2979
2980 #! /bin/sh
2981 # rename files to lower/upper case...
2982 #
2983 # usage:
2984 # move-to-lower *
2985 # move-to-upper *
2986 # or
2987 # move-to-lower -R .
2988 # move-to-upper -R .
2989 #
2990
2991 help()
2992 {
2993 cat << eof
2994 Usage: $0 [-n] [-r] [-h] files...
2995
2996 -n do nothing, only see what would be done
2997 -R recursive (use find)
2998 -h this message
2999 files files to remap to lower case
3000
3001 Examples:
3002 $0 -n * (see if everything is ok, then...)
3003 $0 *
3004
3005 $0 -R .
3006
3007 eof
3008 }
3009
3010 apply_cmd='sh'
3011 finder='echo "$@" | tr " " "\n"'
3012 files_only=
3013
3014 while :
3015 do
3016 case "$1" in
3017 -n) apply_cmd='cat' ;;
3018 -R) finder='find "$@" -type f';;
3019 -h) help ; exit 1 ;;
3020 *) break ;;
3021 esac
3022 shift
3023 done
3024
3025 if [ -z "$1" ]; then
3026 echo Usage: $0 [-h] [-n] [-r] files...
3027 exit 1
3028 fi
3029
3030 LOWER='abcdefghijklmnopqrstuvwxyz'
3031 UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
3032
3033 case `basename $0` in
3034 *upper*) TO=$UPPER; FROM=$LOWER ;;
3035 *) FROM=$UPPER; TO=$LOWER ;;
3036 esac
3037
3038 eval $finder | sed -n '
3039
3040 # remove all trailing slashes
3041 s/\/*$//
3042
3043 # add ./ if there is no path, only a filename
3044 /\//! s/^/.\//
3045
3046 # save path+filename
3047 h
3048
3049 # remove path
3050 s/.*\///
3051
3052 # do conversion only on filename
3053 y/'$FROM'/'$TO'/
3054
3055 # now line contains original path+file, while
3056 # hold space contains the new filename
3057 x
3058
3059 # add converted file name to line, which now contains
3060 # path/file-name\nconverted-file-name
3061 G
3062
3063 # check if converted file name is equal to original file name,
3064 # if it is, do not print anything
3065 /^.*\/\(.*\)\n\1/b
3066
3067 # escape special characters for the shell
3068 s/["$`\\]/\\&/g
3069
3070 # now, transform path/fromfile\n, into
3071 # mv path/fromfile path/tofile and print it
3072 s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p
3073
3074 ' | $apply_cmd
3075
3076 7.5 Print 'bash' Environment
3077 ============================
3078
3079 This script strips the definition of the shell functions from the output
3080 of the 'set' Bourne-shell command.
3081
3082 #!/bin/sh
3083
3084 set | sed -n '
3085 :x
3086
3087 # if no occurrence of "=()" print and load next line
3088 /=()/! { p; b; }
3089 / () $/! { p; b; }
3090
3091 # possible start of functions section
3092 # save the line in case this is a var like FOO="() "
3093 h
3094
3095 # if the next line has a brace, we quit because
3096 # nothing comes after functions
3097 n
3098 /^{/ q
3099
3100 # print the old line
3101 x; p
3102
3103 # work on the new line now
3104 x; bx
3105 '
3106
3107 7.6 Reverse Characters of Lines
3108 ===============================
3109
3110 This script can be used to reverse the position of characters in lines.
3111 The technique moves two characters at a time, hence it is faster than
3112 more intuitive implementations.
3113
3114 Note the 'tx' command before the definition of the label. This is
3115 often needed to reset the flag that is tested by the 't' command.
3116
3117 Imaginative readers will find uses for this script. An example is
3118 reversing the output of 'banner'.(1)
3119
3120 #!/usr/bin/sed -f
3121
3122 /../! b
3123
3124 # Reverse a line. Begin embedding the line between two newlines
3125 s/^.*$/\
3126 &\
3127 /
3128
3129 # Move first character at the end. The regexp matches until
3130 # there are zero or one characters between the markers
3131 tx
3132 :x
3133 s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/
3134 tx
3135
3136 # Remove the newline markers
3137 s/\n//g
3138
3139 ---------- Footnotes ----------
3140
3141 (1) This requires another script to pad the output of banner; for
3142 example
3143
3144 #! /bin/sh
3145
3146 banner -w $1 $2 $3 $4 |
3147 sed -e :a -e '/^.\{0,'$1'\}$/ { s/$/ /; ba; }' |
3148 ~/sedscripts/reverseline.sed
3149
3150 7.7 Text search across multiple lines
3151 =====================================
3152
3153 This section uses 'N' and 'D' commands to search for consecutive words
3154 spanning multiple lines. *Note Multiline techniques::.
3155
3156 These examples deal with finding doubled occurrences of words in a
3157 document.
3158
3159 Finding doubled words in a single line is easy using GNU 'grep' and
3160 similarly with GNU 'sed':
3161
3162 $ cat two-cities-dup1.txt
3163 It was the best of times,
3164 it was the worst of times,
3165 it was the the age of wisdom,
3166 it was the age of foolishness,
3167
3168 $ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
3169 it was the the age of wisdom,
3170
3171 $ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
3172 3:it was the the age of wisdom,
3173
3174 $ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
3175 it was the the age of wisdom,
3176
3177 $ sed -En '/\b(\w+)\s+\1\b/{=;p}' two-cities-dup1.txt
3178 3
3179 it was the the age of wisdom,
3180
3181 * The regular expression '\b\w+\s+' searches for word-boundary
3182 ('\b'), followed by one-or-more word-characters ('\w+'), followed
3183 by whitespace ('\s+'). *Note regexp extensions::.
3184
3185 * Adding parentheses around the '(\w+)' expression creates a
3186 subexpression. The regular expression pattern '(PATTERN)\s+\1'
3187 defines a subexpression (in the parentheses) followed by a
3188 back-reference, separated by whitespace. A successful match means
3189 the PATTERN was repeated twice in succession. *Note
3190 Back-references and Subexpressions::.
3191
3192 * The word-boundery expression ('\b') at both ends ensures partial
3193 words are not matched (e.g. 'the then' is not a desired match).
3194
3195 * The '-E' option enables extended regular expression syntax,
3196 alleviating the need to add backslashes before the parenthesis.
3197 *Note ERE syntax::.
3198
3199 When the doubled word span two lines the above regular expression
3200 will not find them as 'grep' and 'sed' operate line-by-line.
3201
3202 By using 'N' and 'D' commands, 'sed' can apply regular expressions on
3203 multiple lines (that is, multiple lines are stored in the pattern space,
3204 and the regular expression works on it):
3205
3206 $ cat two-cities-dup2.txt
3207 It was the best of times, it was the
3208 worst of times, it was the
3209 the age of wisdom,
3210 it was the age of foolishness,
3211
3212 $ sed -En '{N; /\b(\w+)\s+\1\b/{=;p} ; D}' two-cities-dup2.txt
3213 3
3214 worst of times, it was the
3215 the age of wisdom,
3216
3217 * The 'N' command appends the next line to the pattern space (thus
3218 ensuring it contains two consecutive lines in every cycle).
3219
3220 * The regular expression uses '\s+' for word separator which matches
3221 both spaces and newlines.
3222
3223 * The regular expression matches, the entire pattern space is printed
3224 with 'p'. No lines are printed by default due to the '-n' option.
3225
3226 * The 'D' removes the first line from the pattern space (up until the
3227 first newline), readying it for the next cycle.
3228
3229 See the GNU 'coreutils' manual for an alternative solution using 'tr
3230 -s' and 'uniq' at
3231 <https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html>.
3232
3233 7.8 Line length adjustment
3234 ==========================
3235
3236 This section uses 'N' and 'D' commands to search for consecutive words
3237 spanning multiple lines, and the 'b' command for branching. *Note
3238 Multiline techniques:: and *note Branching and flow control::.
3239
3240 This (somewhat contrived) example deal with formatting and wrapping
3241 lines of text of the following input file:
3242
3243 $ cat two-cities-mix.txt
3244 It was the best of times, it was
3245 the worst of times, it
3246 was the age of
3247 wisdom,
3248 it
3249 was
3250 the age
3251 of foolishness,
3252
3253 The following sed program wraps lines at 40 characters:
3254 $ cat wrap40.sed
3255 # outer loop
3256 :x
3257
3258 # Appead a newline followed by the next input line to the pattern buffer
3259 N
3260
3261 # Remove all newlines from the pattern buffer
3262 s/\n/ /g
3263
3264
3265 # Inner loop
3266 :y
3267
3268 # Add a newline after the first 40 characters
3269 s/(.{40,40})/\1\n/
3270
3271 # If there is a newline in the pattern buffer
3272 # (i.e. the previous substitution added a newline)
3273 /\n/ {
3274 # There are newlines in the pattern buffer -
3275 # print the content until the first newline.
3276 P
3277
3278 # Remove the printed characters and the first newline
3279 s/.*\n//
3280
3281 # branch to label 'y' - repeat inner loop
3282 by
3283 }
3284
3285 # No newlines in the pattern buffer - Branch to label 'x' (outer loop)
3286 # and read the next input line
3287 bx
3288
3289 The wrapped output:
3290 $ sed -E -f wrap40.sed two-cities-mix.txt
3291 It was the best of times, it was the wor
3292 st of times, it was the age of wisdom, i
3293 t was the age of foolishness,
3294
3295 7.9 Reverse Lines of Files
3296 ==========================
3297
3298 This one begins a series of totally useless (yet interesting) scripts
3299 emulating various Unix commands. This, in particular, is a 'tac'
3300 workalike.
3301
3302 Note that on implementations other than GNU 'sed' this script might
3303 easily overflow internal buffers.
3304
3305 #!/usr/bin/sed -nf
3306
3307 # reverse all lines of input, i.e. first line became last, ...
3308
3309 # from the second line, the buffer (which contains all previous lines)
3310 # is *appended* to current line, so, the order will be reversed
3311 1! G
3312
3313 # on the last line we're done -- print everything
3314 $ p
3315
3316 # store everything on the buffer again
3317 h
3318
3319 7.10 Numbering Lines
3320 ====================
3321
3322 This script replaces 'cat -n'; in fact it formats its output exactly
3323 like GNU 'cat' does.
3324
3325 Of course this is completely useless and for two reasons: first,
3326 because somebody else did it in C, second, because the following
3327 Bourne-shell script could be used for the same purpose and would be much
3328 faster:
3329
3330 #! /bin/sh
3331 sed -e "=" $@ | sed -e '
3332 s/^/ /
3333 N
3334 s/^ *\(......\)\n/\1 /
3335 '
3336
3337 It uses 'sed' to print the line number, then groups lines two by two
3338 using 'N'. Of course, this script does not teach as much as the one
3339 presented below.
3340
3341 The algorithm used for incrementing uses both buffers, so the line is
3342 printed as soon as possible and then discarded. The number is split so
3343 that changing digits go in a buffer and unchanged ones go in the other;
3344 the changed digits are modified in a single step (using a 'y' command).
3345 The line number for the next line is then composed and stored in the
3346 hold space, to be used in the next iteration.
3347
3348 #!/usr/bin/sed -nf
3349
3350 # Prime the pump on the first line
3351 x
3352 /^$/ s/^.*$/1/
3353
3354 # Add the correct line number before the pattern
3355 G
3356 h
3357
3358 # Format it and print it
3359 s/^/ /
3360 s/^ *\(......\)\n/\1 /p
3361
3362 # Get the line number from hold space; add a zero
3363 # if we're going to add a digit on the next line
3364 g
3365 s/\n.*$//
3366 /^9*$/ s/^/0/
3367
3368 # separate changing/unchanged digits with an x
3369 s/.9*$/x&/
3370
3371 # keep changing digits in hold space
3372 h
3373 s/^.*x//
3374 y/0123456789/1234567890/
3375 x
3376
3377 # keep unchanged digits in pattern space
3378 s/x.*$//
3379
3380 # compose the new number, remove the newline implicitly added by G
3381 G
3382 s/\n//
3383 h
3384
3385 7.11 Numbering Non-blank Lines
3386 ==============================
3387
3388 Emulating 'cat -b' is almost the same as 'cat -n'--we only have to
3389 select which lines are to be numbered and which are not.
3390
3391 The part that is common to this script and the previous one is not
3392 commented to show how important it is to comment 'sed' scripts
3393 properly...
3394
3395 #!/usr/bin/sed -nf
3396
3397 /^$/ {
3398 p
3399 b
3400 }
3401
3402 # Same as cat -n from now
3403 x
3404 /^$/ s/^.*$/1/
3405 G
3406 h
3407 s/^/ /
3408 s/^ *\(......\)\n/\1 /p
3409 x
3410 s/\n.*$//
3411 /^9*$/ s/^/0/
3412 s/.9*$/x&/
3413 h
3414 s/^.*x//
3415 y/0123456789/1234567890/
3416 x
3417 s/x.*$//
3418 G
3419 s/\n//
3420 h
3421
3422 7.12 Counting Characters
3423 ========================
3424
3425 This script shows another way to do arithmetic with 'sed'. In this case
3426 we have to add possibly large numbers, so implementing this by
3427 successive increments would not be feasible (and possibly even more
3428 complicated to contrive than this script).
3429
3430 The approach is to map numbers to letters, kind of an abacus
3431 implemented with 'sed'. 'a's are units, 'b's are tens and so on: we
3432 simply add the number of characters on the current line as units, and
3433 then propagate the carry to tens, hundreds, and so on.
3434
3435 As usual, running totals are kept in hold space.
3436
3437 On the last line, we convert the abacus form back to decimal. For
3438 the sake of variety, this is done with a loop rather than with some 80
3439 's' commands(1): first we convert units, removing 'a's from the number;
3440 then we rotate letters so that tens become 'a's, and so on until no more
3441 letters remain.
3442
3443 #!/usr/bin/sed -nf
3444
3445 # Add n+1 a's to hold space (+1 is for the newline)
3446 s/./a/g
3447 H
3448 x
3449 s/\n/a/
3450
3451 # Do the carry. The t's and b's are not necessary,
3452 # but they do speed up the thing
3453 t a
3454 : a; s/aaaaaaaaaa/b/g; t b; b done
3455 : b; s/bbbbbbbbbb/c/g; t c; b done
3456 : c; s/cccccccccc/d/g; t d; b done
3457 : d; s/dddddddddd/e/g; t e; b done
3458 : e; s/eeeeeeeeee/f/g; t f; b done
3459 : f; s/ffffffffff/g/g; t g; b done
3460 : g; s/gggggggggg/h/g; t h; b done
3461 : h; s/hhhhhhhhhh//g
3462
3463 : done
3464 $! {
3465 h
3466 b
3467 }
3468
3469 # On the last line, convert back to decimal
3470
3471 : loop
3472 /a/! s/[b-h]*/&0/
3473 s/aaaaaaaaa/9/
3474 s/aaaaaaaa/8/
3475 s/aaaaaaa/7/
3476 s/aaaaaa/6/
3477 s/aaaaa/5/
3478 s/aaaa/4/
3479 s/aaa/3/
3480 s/aa/2/
3481 s/a/1/
3482
3483 : next
3484 y/bcdefgh/abcdefg/
3485 /[a-h]/ b loop
3486 p
3487
3488 ---------- Footnotes ----------
3489
3490 (1) Some implementations have a limit of 199 commands per script
3491
3492 7.13 Counting Words
3493 ===================
3494
3495 This script is almost the same as the previous one, once each of the
3496 words on the line is converted to a single 'a' (in the previous script
3497 each letter was changed to an 'a').
3498
3499 It is interesting that real 'wc' programs have optimized loops for
3500 'wc -c', so they are much slower at counting words rather than
3501 characters. This script's bottleneck, instead, is arithmetic, and hence
3502 the word-counting one is faster (it has to manage smaller numbers).
3503
3504 Again, the common parts are not commented to show the importance of
3505 commenting 'sed' scripts.
3506
3507 #!/usr/bin/sed -nf
3508
3509 # Convert words to a's
3510 s/[ <TAB>][ <TAB>]*/ /g
3511 s/^/ /
3512 s/ [^ ][^ ]*/a /g
3513 s/ //g
3514
3515 # Append them to hold space
3516 H
3517 x
3518 s/\n//
3519
3520 # From here on it is the same as in wc -c.
3521 /aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
3522 /bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
3523 /cccccccccc/! bx; s/cccccccccc/d/g
3524 /dddddddddd/! bx; s/dddddddddd/e/g
3525 /eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
3526 /ffffffffff/! bx; s/ffffffffff/g/g
3527 /gggggggggg/! bx; s/gggggggggg/h/g
3528 s/hhhhhhhhhh//g
3529 :x
3530 $! { h; b; }
3531 :y
3532 /a/! s/[b-h]*/&0/
3533 s/aaaaaaaaa/9/
3534 s/aaaaaaaa/8/
3535 s/aaaaaaa/7/
3536 s/aaaaaa/6/
3537 s/aaaaa/5/
3538 s/aaaa/4/
3539 s/aaa/3/
3540 s/aa/2/
3541 s/a/1/
3542 y/bcdefgh/abcdefg/
3543 /[a-h]/ by
3544 p
3545
3546 7.14 Counting Lines
3547 ===================
3548
3549 No strange things are done now, because 'sed' gives us 'wc -l'
3550 functionality for free!!! Look:
3551
3552 #!/usr/bin/sed -nf
3553 $=
3554
3555 7.15 Printing the First Lines
3556 =============================
3557
3558 This script is probably the simplest useful 'sed' script. It displays
3559 the first 10 lines of input; the number of displayed lines is right
3560 before the 'q' command.
3561
3562 #!/usr/bin/sed -f
3563 10q
3564
3565 7.16 Printing the Last Lines
3566 ============================
3567
3568 Printing the last N lines rather than the first is more complex but
3569 indeed possible. N is encoded in the second line, before the bang
3570 character.
3571
3572 This script is similar to the 'tac' script in that it keeps the final
3573 output in the hold space and prints it at the end:
3574
3575 #!/usr/bin/sed -nf
3576
3577 1! {; H; g; }
3578 1,10 !s/[^\n]*\n//
3579 $p
3580 h
3581
3582 Mainly, the scripts keeps a window of 10 lines and slides it by
3583 adding a line and deleting the oldest (the substitution command on the
3584 second line works like a 'D' command but does not restart the loop).
3585
3586 The "sliding window" technique is a very powerful way to write
3587 efficient and complex 'sed' scripts, because commands like 'P' would
3588 require a lot of work if implemented manually.
3589
3590 To introduce the technique, which is fully demonstrated in the rest
3591 of this chapter and is based on the 'N', 'P' and 'D' commands, here is
3592 an implementation of 'tail' using a simple "sliding window."
3593
3594 This looks complicated but in fact the working is the same as the
3595 last script: after we have kicked in the appropriate number of lines,
3596 however, we stop using the hold space to keep inter-line state, and
3597 instead use 'N' and 'D' to slide pattern space by one line:
3598
3599 #!/usr/bin/sed -f
3600
3601 1h
3602 2,10 {; H; g; }
3603 $q
3604 1,9d
3605 N
3606 D
3607
3608 Note how the first, second and fourth line are inactive after the
3609 first ten lines of input. After that, all the script does is: exiting
3610 on the last line of input, appending the next input line to pattern
3611 space, and removing the first line.
3612
3613 7.17 Make Duplicate Lines Unique
3614 ================================
3615
3616 This is an example of the art of using the 'N', 'P' and 'D' commands,
3617 probably the most difficult to master.
3618
3619 #!/usr/bin/sed -f
3620 h
3621
3622 :b
3623 # On the last line, print and exit
3624 $b
3625 N
3626 /^\(.*\)\n\1$/ {
3627 # The two lines are identical. Undo the effect of
3628 # the n command.
3629 g
3630 bb
3631 }
3632
3633 # If the N command had added the last line, print and exit
3634 $b
3635
3636 # The lines are different; print the first and go
3637 # back working on the second.
3638 P
3639 D
3640
3641 As you can see, we maintain a 2-line window using 'P' and 'D'. This
3642 technique is often used in advanced 'sed' scripts.
3643
3644 7.18 Print Duplicated Lines of Input
3645 ====================================
3646
3647 This script prints only duplicated lines, like 'uniq -d'.
3648
3649 #!/usr/bin/sed -nf
3650
3651 $b
3652 N
3653 /^\(.*\)\n\1$/ {
3654 # Print the first of the duplicated lines
3655 s/.*\n//
3656 p
3657
3658 # Loop until we get a different line
3659 :b
3660 $b
3661 N
3662 /^\(.*\)\n\1$/ {
3663 s/.*\n//
3664 bb
3665 }
3666 }
3667
3668 # The last line cannot be followed by duplicates
3669 $b
3670
3671 # Found a different one. Leave it alone in the pattern space
3672 # and go back to the top, hunting its duplicates
3673 D
3674
3675 7.19 Remove All Duplicated Lines
3676 ================================
3677
3678 This script prints only unique lines, like 'uniq -u'.
3679
3680 #!/usr/bin/sed -f
3681
3682 # Search for a duplicate line --- until that, print what you find.
3683 $b
3684 N
3685 /^\(.*\)\n\1$/ ! {
3686 P
3687 D
3688 }
3689
3690 :c
3691 # Got two equal lines in pattern space. At the
3692 # end of the file we simply exit
3693 $d
3694
3695 # Else, we keep reading lines with N until we
3696 # find a different one
3697 s/.*\n//
3698 N
3699 /^\(.*\)\n\1$/ {
3700 bc
3701 }
3702
3703 # Remove the last instance of the duplicate line
3704 # and go back to the top
3705 D
3706
3707 7.20 Squeezing Blank Lines
3708 ==========================
3709
3710 As a final example, here are three scripts, of increasing complexity and
3711 speed, that implement the same function as 'cat -s', that is squeezing
3712 blank lines.
3713
3714 The first leaves a blank line at the beginning and end if there are
3715 some already.
3716
3717 #!/usr/bin/sed -f
3718
3719 # on empty lines, join with next
3720 # Note there is a star in the regexp
3721 :x
3722 /^\n*$/ {
3723 N
3724 bx
3725 }
3726
3727 # now, squeeze all '\n', this can be also done by:
3728 # s/^\(\n\)*/\1/
3729 s/\n*/\
3730 /
3731
3732 This one is a bit more complex and removes all empty lines at the
3733 beginning. It does leave a single blank line at end if one was there.
3734
3735 #!/usr/bin/sed -f
3736
3737 # delete all leading empty lines
3738 1,/^./{
3739 /./!d
3740 }
3741
3742 # on an empty line we remove it and all the following
3743 # empty lines, but one
3744 :x
3745 /./!{
3746 N
3747 s/^\n$//
3748 tx
3749 }
3750
3751 This removes leading and trailing blank lines. It is also the
3752 fastest. Note that loops are completely done with 'n' and 'b', without
3753 relying on 'sed' to restart the script automatically at the end of a
3754 line.
3755
3756 #!/usr/bin/sed -nf
3757
3758 # delete all (leading) blanks
3759 /./!d
3760
3761 # get here: so there is a non empty
3762 :x
3763 # print it
3764 p
3765 # get next
3766 n
3767 # got chars? print it again, etc...
3768 /./bx
3769
3770 # no, don't have chars: got an empty line
3771 :z
3772 # get next, if last line we finish here so no trailing
3773 # empty lines are written
3774 n
3775 # also empty? then ignore it, and get next... this will
3776 # remove ALL empty lines
3777 /./!bz
3778
3779 # all empty lines were deleted/ignored, but we have a non empty. As
3780 # what we want to do is to squeeze, insert a blank line artificially
3781 i\
3782
3783 bx
3784
3785 8 GNU 'sed''s Limitations and Non-limitations
3786 *********************************************
3787
3788 For those who want to write portable 'sed' scripts, be aware that some
3789 implementations have been known to limit line lengths (for the pattern
3790 and hold spaces) to be no more than 4000 bytes. The POSIX standard
3791 specifies that conforming 'sed' implementations shall support at least
3792 8192 byte line lengths. GNU 'sed' has no built-in limit on line length;
3793 as long as it can 'malloc()' more (virtual) memory, you can feed or
3794 construct lines as long as you like.
3795
3796 However, recursion is used to handle subpatterns and indefinite
3797 repetition. This means that the available stack space may limit the
3798 size of the buffer that can be processed by certain patterns.
3799
3800 9 Other Resources for Learning About 'sed'
3801 ******************************************
3802
3803 For up to date information about GNU 'sed' please visit
3804 <https://www.gnu.org/software/sed/>.
3805
3806 Send general questions and suggestions to <sed-devel@gnu.org>. Visit
3807 the mailing list archives for past discussions at
3808 <https://lists.gnu.org/archive/html/sed-devel/>.
3809
3810 The following resources provide information about 'sed' (both GNU
3811 'sed' and other variations). Note these not maintained by GNU 'sed'
3812 developers.
3813
3814 * sed '$HOME': <http://sed.sf.net>
3815
3816 * sed FAQ: <http://sed.sf.net/sedfaq.html>
3817
3818 * seder's grabbag: <http://sed.sf.net/grabbag>
3819
3820 * The 'sed-users' mailing list maintained by Sven Guckes:
3821 <http://groups.yahoo.com/group/sed-users/> (note this is _not_ the
3822 GNU 'sed' mailing list).
3823
3824 10 Reporting Bugs
3825 *****************
3826
3827 Email bug reports to <bug-sed@gnu.org>. Also, please include the output
3828 of 'sed --version' in the body of your report if at all possible.
3829
3830 Please do not send a bug report like this:
3831
3832 while building frobme-1.3.4
3833 $ configure
3834 error-> sed: file sedscr line 1: Unknown option to 's'
3835
3836 If GNU 'sed' doesn't configure your favorite package, take a few
3837 extra minutes to identify the specific problem and make a stand-alone
3838 test case. Unlike other programs such as C compilers, making such test
3839 cases for 'sed' is quite simple.
3840
3841 A stand-alone test case includes all the data necessary to perform
3842 the test, and the specific invocation of 'sed' that causes the problem.
3843 The smaller a stand-alone test case is, the better. A test case should
3844 not involve something as far removed from 'sed' as "try to configure
3845 frobme-1.3.4". Yes, that is in principle enough information to look for
3846 the bug, but that is not a very practical prospect.
3847
3848 Here are a few commonly reported bugs that are not bugs.
3849
3850 'N' command on the last line
3851
3852 Most versions of 'sed' exit without printing anything when the 'N'
3853 command is issued on the last line of a file. GNU 'sed' prints
3854 pattern space before exiting unless of course the '-n' command
3855 switch has been specified. This choice is by design.
3856
3857 Default behavior (gnu extension, non-POSIX conforming):
3858 $ seq 3 | sed N
3859 1
3860 2
3861 3
3862 To force POSIX-conforming behavior:
3863 $ seq 3 | sed --posix N
3864 1
3865 2
3866
3867 For example, the behavior of
3868 sed N foo bar
3869 would depend on whether foo has an even or an odd number of
3870 lines(1). Or, when writing a script to read the next few lines
3871 following a pattern match, traditional implementations of 'sed'
3872 would force you to write something like
3873 /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N }
3874 instead of just
3875 /foo/{ N;N;N;N;N;N;N;N;N; }
3876
3877 In any case, the simplest workaround is to use '$d;N' in scripts
3878 that rely on the traditional behavior, or to set the
3879 'POSIXLY_CORRECT' variable to a non-empty value.
3880
3881 Regex syntax clashes (problems with backslashes)
3882 'sed' uses the POSIX basic regular expression syntax. According to
3883 the standard, the meaning of some escape sequences is undefined in
3884 this syntax; notable in the case of 'sed' are '\|', '\+', '\?',
3885 '\`', '\'', '\<', '\>', '\b', '\B', '\w', and '\W'.
3886
3887 As in all GNU programs that use POSIX basic regular expressions,
3888 'sed' interprets these escape sequences as special characters. So,
3889 'x\+' matches one or more occurrences of 'x'. 'abc\|def' matches
3890 either 'abc' or 'def'.
3891
3892 This syntax may cause problems when running scripts written for
3893 other 'sed's. Some 'sed' programs have been written with the
3894 assumption that '\|' and '\+' match the literal characters '|' and
3895 '+'. Such scripts must be modified by removing the spurious
3896 backslashes if they are to be used with modern implementations of
3897 'sed', like GNU 'sed'.
3898
3899 On the other hand, some scripts use s|abc\|def||g to remove
3900 occurrences of _either_ 'abc' or 'def'. While this worked until
3901 'sed' 4.0.x, newer versions interpret this as removing the string
3902 'abc|def'. This is again undefined behavior according to POSIX,
3903 and this interpretation is arguably more robust: older 'sed's, for
3904 example, required that the regex matcher parsed '\/' as '/' in the
3905 common case of escaping a slash, which is again undefined behavior;
3906 the new behavior avoids this, and this is good because the regex
3907 matcher is only partially under our control.
3908
3909 In addition, this version of 'sed' supports several escape
3910 characters (some of which are multi-character) to insert
3911 non-printable characters in scripts ('\a', '\c', '\d', '\o', '\r',
3912 '\t', '\v', '\x'). These can cause similar problems with scripts
3913 written for other 'sed's.
3914
3915 '-i' clobbers read-only files
3916
3917 In short, 'sed -i' will let you delete the contents of a read-only
3918 file, and in general the '-i' option (*note Invocation: Invoking
3919 sed.) lets you clobber protected files. This is not a bug, but
3920 rather a consequence of how the Unix file system works.
3921
3922 The permissions on a file say what can happen to the data in that
3923 file, while the permissions on a directory say what can happen to
3924 the list of files in that directory. 'sed -i' will not ever open
3925 for writing a file that is already on disk. Rather, it will work
3926 on a temporary file that is finally renamed to the original name:
3927 if you rename or delete files, you're actually modifying the
3928 contents of the directory, so the operation depends on the
3929 permissions of the directory, not of the file. For this same
3930 reason, 'sed' does not let you use '-i' on a writable file in a
3931 read-only directory, and will break hard or symbolic links when
3932 '-i' is used on such a file.
3933
3934 '0a' does not work (gives an error)
3935
3936 There is no line 0. 0 is a special address that is only used to
3937 treat addresses like '0,/RE/' as active when the script starts: if
3938 you write '1,/abc/d' and the first line includes the word 'abc',
3939 then that match would be ignored because address ranges must span
3940 at least two lines (barring the end of the file); but what you
3941 probably wanted is to delete every line up to the first one
3942 including 'abc', and this is obtained with '0,/abc/d'.
3943
3944 '[a-z]' is case insensitive
3945
3946 You are encountering problems with locales. POSIX mandates that
3947 '[a-z]' uses the current locale's collation order - in C parlance,
3948 that means using 'strcoll(3)' instead of 'strcmp(3)'. Some locales
3949 have a case-insensitive collation order, others don't.
3950
3951 Another problem is that '[a-z]' tries to use collation symbols.
3952 This only happens if you are on the GNU system, using GNU libc's
3953 regular expression matcher instead of compiling the one supplied
3954 with GNU sed. In a Danish locale, for example, the regular
3955 expression '^[a-z]$' matches the string 'aa', because this is a
3956 single collating symbol that comes after 'a' and before 'b'; 'll'
3957 behaves similarly in Spanish locales, or 'ij' in Dutch locales.
3958
3959 To work around these problems, which may cause bugs in shell
3960 scripts, set the 'LC_COLLATE' and 'LC_CTYPE' environment variables
3961 to 'C'.
3962
3963 's/.*//' does not clear pattern space
3964
3965 This happens if your input stream includes invalid multibyte
3966 sequences. POSIX mandates that such sequences are _not_ matched by
3967 '.', so that 's/.*//' will not clear pattern space as you would
3968 expect. In fact, there is no way to clear sed's buffers in the
3969 middle of the script in most multibyte locales (including UTF-8
3970 locales). For this reason, GNU 'sed' provides a 'z' command (for
3971 'zap') as an extension.
3972
3973 To work around these problems, which may cause bugs in shell
3974 scripts, set the 'LC_COLLATE' and 'LC_CTYPE' environment variables
3975 to 'C'.
3976
3977 ---------- Footnotes ----------
3978
3979 (1) which is the actual "bug" that prompted the change in behavior
3980
3981 Appendix A GNU Free Documentation License
3982 *****************************************
3983
3984 Version 1.3, 3 November 2008
3985
3986 Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
3987 <https://fsf.org/>
3988
3989 Everyone is permitted to copy and distribute verbatim copies
3990 of this license document, but changing it is not allowed.
3991
3992 0. PREAMBLE
3993
3994 The purpose of this License is to make a manual, textbook, or other
3995 functional and useful document "free" in the sense of freedom: to
3996 assure everyone the effective freedom to copy and redistribute it,
3997 with or without modifying it, either commercially or
3998 noncommercially. Secondarily, this License preserves for the
3999 author and publisher a way to get credit for their work, while not
4000 being considered responsible for modifications made by others.
4001
4002 This License is a kind of "copyleft", which means that derivative
4003 works of the document must themselves be free in the same sense.
4004 It complements the GNU General Public License, which is a copyleft
4005 license designed for free software.
4006
4007 We have designed this License in order to use it for manuals for
4008 free software, because free software needs free documentation: a
4009 free program should come with manuals providing the same freedoms
4010 that the software does. But this License is not limited to
4011 software manuals; it can be used for any textual work, regardless
4012 of subject matter or whether it is published as a printed book. We
4013 recommend this License principally for works whose purpose is
4014 instruction or reference.
4015
4016 1. APPLICABILITY AND DEFINITIONS
4017
4018 This License applies to any manual or other work, in any medium,
4019 that contains a notice placed by the copyright holder saying it can
4020 be distributed under the terms of this License. Such a notice
4021 grants a world-wide, royalty-free license, unlimited in duration,
4022 to use that work under the conditions stated herein. The
4023 "Document", below, refers to any such manual or work. Any member
4024 of the public is a licensee, and is addressed as "you". You accept
4025 the license if you copy, modify or distribute the work in a way
4026 requiring permission under copyright law.
4027
4028 A "Modified Version" of the Document means any work containing the
4029 Document or a portion of it, either copied verbatim, or with
4030 modifications and/or translated into another language.
4031
4032 A "Secondary Section" is a named appendix or a front-matter section
4033 of the Document that deals exclusively with the relationship of the
4034 publishers or authors of the Document to the Document's overall
4035 subject (or to related matters) and contains nothing that could
4036 fall directly within that overall subject. (Thus, if the Document
4037 is in part a textbook of mathematics, a Secondary Section may not
4038 explain any mathematics.) The relationship could be a matter of
4039 historical connection with the subject or with related matters, or
4040 of legal, commercial, philosophical, ethical or political position
4041 regarding them.
4042
4043 The "Invariant Sections" are certain Secondary Sections whose
4044 titles are designated, as being those of Invariant Sections, in the
4045 notice that says that the Document is released under this License.
4046 If a section does not fit the above definition of Secondary then it
4047 is not allowed to be designated as Invariant. The Document may
4048 contain zero Invariant Sections. If the Document does not identify
4049 any Invariant Sections then there are none.
4050
4051 The "Cover Texts" are certain short passages of text that are
4052 listed, as Front-Cover Texts or Back-Cover Texts, in the notice
4053 that says that the Document is released under this License. A
4054 Front-Cover Text may be at most 5 words, and a Back-Cover Text may
4055 be at most 25 words.
4056
4057 A "Transparent" copy of the Document means a machine-readable copy,
4058 represented in a format whose specification is available to the
4059 general public, that is suitable for revising the document
4060 straightforwardly with generic text editors or (for images composed
4061 of pixels) generic paint programs or (for drawings) some widely
4062 available drawing editor, and that is suitable for input to text
4063 formatters or for automatic translation to a variety of formats
4064 suitable for input to text formatters. A copy made in an otherwise
4065 Transparent file format whose markup, or absence of markup, has
4066 been arranged to thwart or discourage subsequent modification by
4067 readers is not Transparent. An image format is not Transparent if
4068 used for any substantial amount of text. A copy that is not
4069 "Transparent" is called "Opaque".
4070
4071 Examples of suitable formats for Transparent copies include plain
4072 ASCII without markup, Texinfo input format, LaTeX input format,
4073 SGML or XML using a publicly available DTD, and standard-conforming
4074 simple HTML, PostScript or PDF designed for human modification.
4075 Examples of transparent image formats include PNG, XCF and JPG.
4076 Opaque formats include proprietary formats that can be read and
4077 edited only by proprietary word processors, SGML or XML for which
4078 the DTD and/or processing tools are not generally available, and
4079 the machine-generated HTML, PostScript or PDF produced by some word
4080 processors for output purposes only.
4081
4082 The "Title Page" means, for a printed book, the title page itself,
4083 plus such following pages as are needed to hold, legibly, the
4084 material this License requires to appear in the title page. For
4085 works in formats which do not have any title page as such, "Title
4086 Page" means the text near the most prominent appearance of the
4087 work's title, preceding the beginning of the body of the text.
4088
4089 The "publisher" means any person or entity that distributes copies
4090 of the Document to the public.
4091
4092 A section "Entitled XYZ" means a named subunit of the Document
4093 whose title either is precisely XYZ or contains XYZ in parentheses
4094 following text that translates XYZ in another language. (Here XYZ
4095 stands for a specific section name mentioned below, such as
4096 "Acknowledgements", "Dedications", "Endorsements", or "History".)
4097 To "Preserve the Title" of such a section when you modify the
4098 Document means that it remains a section "Entitled XYZ" according
4099 to this definition.
4100
4101 The Document may include Warranty Disclaimers next to the notice
4102 which states that this License applies to the Document. These
4103 Warranty Disclaimers are considered to be included by reference in
4104 this License, but only as regards disclaiming warranties: any other
4105 implication that these Warranty Disclaimers may have is void and
4106 has no effect on the meaning of this License.
4107
4108 2. VERBATIM COPYING
4109
4110 You may copy and distribute the Document in any medium, either
4111 commercially or noncommercially, provided that this License, the
4112 copyright notices, and the license notice saying this License
4113 applies to the Document are reproduced in all copies, and that you
4114 add no other conditions whatsoever to those of this License. You
4115 may not use technical measures to obstruct or control the reading
4116 or further copying of the copies you make or distribute. However,
4117 you may accept compensation in exchange for copies. If you
4118 distribute a large enough number of copies you must also follow the
4119 conditions in section 3.
4120
4121 You may also lend copies, under the same conditions stated above,
4122 and you may publicly display copies.
4123
4124 3. COPYING IN QUANTITY
4125
4126 If you publish printed copies (or copies in media that commonly
4127 have printed covers) of the Document, numbering more than 100, and
4128 the Document's license notice requires Cover Texts, you must
4129 enclose the copies in covers that carry, clearly and legibly, all
4130 these Cover Texts: Front-Cover Texts on the front cover, and
4131 Back-Cover Texts on the back cover. Both covers must also clearly
4132 and legibly identify you as the publisher of these copies. The
4133 front cover must present the full title with all words of the title
4134 equally prominent and visible. You may add other material on the
4135 covers in addition. Copying with changes limited to the covers, as
4136 long as they preserve the title of the Document and satisfy these
4137 conditions, can be treated as verbatim copying in other respects.
4138
4139 If the required texts for either cover are too voluminous to fit
4140 legibly, you should put the first ones listed (as many as fit
4141 reasonably) on the actual cover, and continue the rest onto
4142 adjacent pages.
4143
4144 If you publish or distribute Opaque copies of the Document
4145 numbering more than 100, you must either include a machine-readable
4146 Transparent copy along with each Opaque copy, or state in or with
4147 each Opaque copy a computer-network location from which the general
4148 network-using public has access to download using public-standard
4149 network protocols a complete Transparent copy of the Document, free
4150 of added material. If you use the latter option, you must take
4151 reasonably prudent steps, when you begin distribution of Opaque
4152 copies in quantity, to ensure that this Transparent copy will
4153 remain thus accessible at the stated location until at least one
4154 year after the last time you distribute an Opaque copy (directly or
4155 through your agents or retailers) of that edition to the public.
4156
4157 It is requested, but not required, that you contact the authors of
4158 the Document well before redistributing any large number of copies,
4159 to give them a chance to provide you with an updated version of the
4160 Document.
4161
4162 4. MODIFICATIONS
4163
4164 You may copy and distribute a Modified Version of the Document
4165 under the conditions of sections 2 and 3 above, provided that you
4166 release the Modified Version under precisely this License, with the
4167 Modified Version filling the role of the Document, thus licensing
4168 distribution and modification of the Modified Version to whoever
4169 possesses a copy of it. In addition, you must do these things in
4170 the Modified Version:
4171
4172 A. Use in the Title Page (and on the covers, if any) a title
4173 distinct from that of the Document, and from those of previous
4174 versions (which should, if there were any, be listed in the
4175 History section of the Document). You may use the same title
4176 as a previous version if the original publisher of that
4177 version gives permission.
4178
4179 B. List on the Title Page, as authors, one or more persons or
4180 entities responsible for authorship of the modifications in
4181 the Modified Version, together with at least five of the
4182 principal authors of the Document (all of its principal
4183 authors, if it has fewer than five), unless they release you
4184 from this requirement.
4185
4186 C. State on the Title page the name of the publisher of the
4187 Modified Version, as the publisher.
4188
4189 D. Preserve all the copyright notices of the Document.
4190
4191 E. Add an appropriate copyright notice for your modifications
4192 adjacent to the other copyright notices.
4193
4194 F. Include, immediately after the copyright notices, a license
4195 notice giving the public permission to use the Modified
4196 Version under the terms of this License, in the form shown in
4197 the Addendum below.
4198
4199 G. Preserve in that license notice the full lists of Invariant
4200 Sections and required Cover Texts given in the Document's
4201 license notice.
4202
4203 H. Include an unaltered copy of this License.
4204
4205 I. Preserve the section Entitled "History", Preserve its Title,
4206 and add to it an item stating at least the title, year, new
4207 authors, and publisher of the Modified Version as given on the
4208 Title Page. If there is no section Entitled "History" in the
4209 Document, create one stating the title, year, authors, and
4210 publisher of the Document as given on its Title Page, then add
4211 an item describing the Modified Version as stated in the
4212 previous sentence.
4213
4214 J. Preserve the network location, if any, given in the Document
4215 for public access to a Transparent copy of the Document, and
4216 likewise the network locations given in the Document for
4217 previous versions it was based on. These may be placed in the
4218 "History" section. You may omit a network location for a work
4219 that was published at least four years before the Document
4220 itself, or if the original publisher of the version it refers
4221 to gives permission.
4222
4223 K. For any section Entitled "Acknowledgements" or "Dedications",
4224 Preserve the Title of the section, and preserve in the section
4225 all the substance and tone of each of the contributor
4226 acknowledgements and/or dedications given therein.
4227
4228 L. Preserve all the Invariant Sections of the Document, unaltered
4229 in their text and in their titles. Section numbers or the
4230 equivalent are not considered part of the section titles.
4231
4232 M. Delete any section Entitled "Endorsements". Such a section
4233 may not be included in the Modified Version.
4234
4235 N. Do not retitle any existing section to be Entitled
4236 "Endorsements" or to conflict in title with any Invariant
4237 Section.
4238
4239 O. Preserve any Warranty Disclaimers.
4240
4241 If the Modified Version includes new front-matter sections or
4242 appendices that qualify as Secondary Sections and contain no
4243 material copied from the Document, you may at your option designate
4244 some or all of these sections as invariant. To do this, add their
4245 titles to the list of Invariant Sections in the Modified Version's
4246 license notice. These titles must be distinct from any other
4247 section titles.
4248
4249 You may add a section Entitled "Endorsements", provided it contains
4250 nothing but endorsements of your Modified Version by various
4251 parties--for example, statements of peer review or that the text
4252 has been approved by an organization as the authoritative
4253 definition of a standard.
4254
4255 You may add a passage of up to five words as a Front-Cover Text,
4256 and a passage of up to 25 words as a Back-Cover Text, to the end of
4257 the list of Cover Texts in the Modified Version. Only one passage
4258 of Front-Cover Text and one of Back-Cover Text may be added by (or
4259 through arrangements made by) any one entity. If the Document
4260 already includes a cover text for the same cover, previously added
4261 by you or by arrangement made by the same entity you are acting on
4262 behalf of, you may not add another; but you may replace the old
4263 one, on explicit permission from the previous publisher that added
4264 the old one.
4265
4266 The author(s) and publisher(s) of the Document do not by this
4267 License give permission to use their names for publicity for or to
4268 assert or imply endorsement of any Modified Version.
4269
4270 5. COMBINING DOCUMENTS
4271
4272 You may combine the Document with other documents released under
4273 this License, under the terms defined in section 4 above for
4274 modified versions, provided that you include in the combination all
4275 of the Invariant Sections of all of the original documents,
4276 unmodified, and list them all as Invariant Sections of your
4277 combined work in its license notice, and that you preserve all
4278 their Warranty Disclaimers.
4279
4280 The combined work need only contain one copy of this License, and
4281 multiple identical Invariant Sections may be replaced with a single
4282 copy. If there are multiple Invariant Sections with the same name
4283 but different contents, make the title of each such section unique
4284 by adding at the end of it, in parentheses, the name of the
4285 original author or publisher of that section if known, or else a
4286 unique number. Make the same adjustment to the section titles in
4287 the list of Invariant Sections in the license notice of the
4288 combined work.
4289
4290 In the combination, you must combine any sections Entitled
4291 "History" in the various original documents, forming one section
4292 Entitled "History"; likewise combine any sections Entitled
4293 "Acknowledgements", and any sections Entitled "Dedications". You
4294 must delete all sections Entitled "Endorsements."
4295
4296 6. COLLECTIONS OF DOCUMENTS
4297
4298 You may make a collection consisting of the Document and other
4299 documents released under this License, and replace the individual
4300 copies of this License in the various documents with a single copy
4301 that is included in the collection, provided that you follow the
4302 rules of this License for verbatim copying of each of the documents
4303 in all other respects.
4304
4305 You may extract a single document from such a collection, and
4306 distribute it individually under this License, provided you insert
4307 a copy of this License into the extracted document, and follow this
4308 License in all other respects regarding verbatim copying of that
4309 document.
4310
4311 7. AGGREGATION WITH INDEPENDENT WORKS
4312
4313 A compilation of the Document or its derivatives with other
4314 separate and independent documents or works, in or on a volume of a
4315 storage or distribution medium, is called an "aggregate" if the
4316 copyright resulting from the compilation is not used to limit the
4317 legal rights of the compilation's users beyond what the individual
4318 works permit. When the Document is included in an aggregate, this
4319 License does not apply to the other works in the aggregate which
4320 are not themselves derivative works of the Document.
4321
4322 If the Cover Text requirement of section 3 is applicable to these
4323 copies of the Document, then if the Document is less than one half
4324 of the entire aggregate, the Document's Cover Texts may be placed
4325 on covers that bracket the Document within the aggregate, or the
4326 electronic equivalent of covers if the Document is in electronic
4327 form. Otherwise they must appear on printed covers that bracket
4328 the whole aggregate.
4329
4330 8. TRANSLATION
4331
4332 Translation is considered a kind of modification, so you may
4333 distribute translations of the Document under the terms of section
4334 4. Replacing Invariant Sections with translations requires special
4335 permission from their copyright holders, but you may include
4336 translations of some or all Invariant Sections in addition to the
4337 original versions of these Invariant Sections. You may include a
4338 translation of this License, and all the license notices in the
4339 Document, and any Warranty Disclaimers, provided that you also
4340 include the original English version of this License and the
4341 original versions of those notices and disclaimers. In case of a
4342 disagreement between the translation and the original version of
4343 this License or a notice or disclaimer, the original version will
4344 prevail.
4345
4346 If a section in the Document is Entitled "Acknowledgements",
4347 "Dedications", or "History", the requirement (section 4) to
4348 Preserve its Title (section 1) will typically require changing the
4349 actual title.
4350
4351 9. TERMINATION
4352
4353 You may not copy, modify, sublicense, or distribute the Document
4354 except as expressly provided under this License. Any attempt
4355 otherwise to copy, modify, sublicense, or distribute it is void,
4356 and will automatically terminate your rights under this License.
4357
4358 However, if you cease all violation of this License, then your
4359 license from a particular copyright holder is reinstated (a)
4360 provisionally, unless and until the copyright holder explicitly and
4361 finally terminates your license, and (b) permanently, if the
4362 copyright holder fails to notify you of the violation by some
4363 reasonable means prior to 60 days after the cessation.
4364
4365 Moreover, your license from a particular copyright holder is
4366 reinstated permanently if the copyright holder notifies you of the
4367 violation by some reasonable means, this is the first time you have
4368 received notice of violation of this License (for any work) from
4369 that copyright holder, and you cure the violation prior to 30 days
4370 after your receipt of the notice.
4371
4372 Termination of your rights under this section does not terminate
4373 the licenses of parties who have received copies or rights from you
4374 under this License. If your rights have been terminated and not
4375 permanently reinstated, receipt of a copy of some or all of the
4376 same material does not give you any rights to use it.
4377
4378 10. FUTURE REVISIONS OF THIS LICENSE
4379
4380 The Free Software Foundation may publish new, revised versions of
4381 the GNU Free Documentation License from time to time. Such new
4382 versions will be similar in spirit to the present version, but may
4383 differ in detail to address new problems or concerns. See
4384 <https://www.gnu.org/copyleft/>.
4385
4386 Each version of the License is given a distinguishing version
4387 number. If the Document specifies that a particular numbered
4388 version of this License "or any later version" applies to it, you
4389 have the option of following the terms and conditions either of
4390 that specified version or of any later version that has been
4391 published (not as a draft) by the Free Software Foundation. If the
4392 Document does not specify a version number of this License, you may
4393 choose any version ever published (not as a draft) by the Free
4394 Software Foundation. If the Document specifies that a proxy can
4395 decide which future versions of this License can be used, that
4396 proxy's public statement of acceptance of a version permanently
4397 authorizes you to choose that version for the Document.
4398
4399 11. RELICENSING
4400
4401 "Massive Multiauthor Collaboration Site" (or "MMC Site") means any
4402 World Wide Web server that publishes copyrightable works and also
4403 provides prominent facilities for anybody to edit those works. A
4404 public wiki that anybody can edit is an example of such a server.
4405 A "Massive Multiauthor Collaboration" (or "MMC") contained in the
4406 site means any set of copyrightable works thus published on the MMC
4407 site.
4408
4409 "CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0
4410 license published by Creative Commons Corporation, a not-for-profit
4411 corporation with a principal place of business in San Francisco,
4412 California, as well as future copyleft versions of that license
4413 published by that same organization.
4414
4415 "Incorporate" means to publish or republish a Document, in whole or
4416 in part, as part of another Document.
4417
4418 An MMC is "eligible for relicensing" if it is licensed under this
4419 License, and if all works that were first published under this
4420 License somewhere other than this MMC, and subsequently
4421 incorporated in whole or in part into the MMC, (1) had no cover
4422 texts or invariant sections, and (2) were thus incorporated prior
4423 to November 1, 2008.
4424
4425 The operator of an MMC Site may republish an MMC contained in the
4426 site under CC-BY-SA on the same site at any time before August 1,
4427 2009, provided the MMC is eligible for relicensing.
4428
4429 ADDENDUM: How to use this License for your documents
4430 ====================================================
4431
4432 To use this License in a document you have written, include a copy of
4433 the License in the document and put the following copyright and license
4434 notices just after the title page:
4435
4436 Copyright (C) YEAR YOUR NAME.
4437 Permission is granted to copy, distribute and/or modify this document
4438 under the terms of the GNU Free Documentation License, Version 1.3
4439 or any later version published by the Free Software Foundation;
4440 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
4441 Texts. A copy of the license is included in the section entitled ``GNU
4442 Free Documentation License''.
4443
4444 If you have Invariant Sections, Front-Cover Texts and Back-Cover
4445 Texts, replace the "with...Texts." line with this:
4446
4447 with the Invariant Sections being LIST THEIR TITLES, with
4448 the Front-Cover Texts being LIST, and with the Back-Cover Texts
4449 being LIST.
4450
4451 If you have Invariant Sections without Cover Texts, or some other
4452 combination of the three, merge those two alternatives to suit the
4453 situation.
4454
4455 If your document contains nontrivial examples of program code, we
4456 recommend releasing these examples in parallel under your choice of free
4457 software license, such as the GNU General Public License, to permit
4458 their use in free software.
4459
4460 Concept Index
4461 *************
4462
4463 This is a general index of all issues discussed in this manual, with the
4464 exception of the 'sed' commands and command-line options.
4465
4466 * Menu:
4467
4468 * -e, example: Overview. (line 140)
4469 * -e, example <1>: sed script overview.
4470 (line 415)
4471 * -expression, example: Overview. (line 140)
4472 * -f, example: Overview. (line 140)
4473 * -f, example <1>: sed script overview.
4474 (line 415)
4475 * -file, example: Overview. (line 140)
4476 * -i, example: Overview. (line 120)
4477 * -n, example: Overview. (line 127)
4478 * -s, example: Overview. (line 134)
4479 * 0 address: Reporting Bugs. (line 3934)
4480 * ;, command separator: sed script overview.
4481 (line 415)
4482 * a, and semicolons: sed script overview.
4483 (line 434)
4484 * Additional reading about sed: Other Resources. (line 3809)
4485 * ADDR1,+N: Range Addresses. (line 1611)
4486 * ADDR1,~N: Range Addresses. (line 1611)
4487 * address range, example: sed script overview.
4488 (line 401)
4489 * Address, as a regular expression: Regexp Addresses. (line 1492)
4490 * Address, last line: Numeric Addresses. (line 1456)
4491 * Address, numeric: Numeric Addresses. (line 1451)
4492 * addresses, excluding: Addresses overview. (line 1431)
4493 * Addresses, in sed scripts: Numeric Addresses. (line 1449)
4494 * addresses, negating: Addresses overview. (line 1431)
4495 * addresses, numeric: Addresses overview. (line 1406)
4496 * addresses, range: Addresses overview. (line 1424)
4497 * addresses, regular expression: Addresses overview. (line 1418)
4498 * addresses, syntax: sed script overview.
4499 (line 391)
4500 * alphabetic characters: Character Classes and Bracket Expressions.
4501 (line 1988)
4502 * alphanumeric characters: Character Classes and Bracket Expressions.
4503 (line 1983)
4504 * Append hold space to pattern space: Other Commands. (line 1110)
4505 * Append next input line to pattern space: Other Commands. (line 1083)
4506 * Append pattern space to hold space: Other Commands. (line 1102)
4507 * Appending text after a line: Other Commands. (line 871)
4508 * b, joining lines with: Branching and flow control.
4509 (line 2753)
4510 * b, versus t: Branching and flow control.
4511 (line 2753)
4512 * back-reference: Back-references and Subexpressions.
4513 (line 2166)
4514 * Backreferences, in regular expressions: The "s" Command. (line 616)
4515 * blank characters: Character Classes and Bracket Expressions.
4516 (line 1993)
4517 * bracket expression: Character Classes and Bracket Expressions.
4518 (line 1945)
4519 * Branch to a label, if s/// failed: Extended Commands. (line 1203)
4520 * Branch to a label, if s/// succeeded: Programming Commands.
4521 (line 1139)
4522 * Branch to a label, unconditionally: Programming Commands.
4523 (line 1135)
4524 * branching and n, N: Branching and flow control.
4525 (line 2708)
4526 * branching, infinite loop: Branching and flow control.
4527 (line 2698)
4528 * branching, joining lines: Branching and flow control.
4529 (line 2753)
4530 * Buffer spaces, pattern and hold: Execution Cycle. (line 2470)
4531 * Bugs, reporting: Reporting Bugs. (line 3826)
4532 * c, and semicolons: sed script overview.
4533 (line 434)
4534 * case insensitive, regular expression: Regexp Addresses. (line 1526)
4535 * Case-insensitive matching: The "s" Command. (line 715)
4536 * Caveat -- #n on first line: Common Commands. (line 750)
4537 * character class: Character Classes and Bracket Expressions.
4538 (line 1945)
4539 * character classes: Character Classes and Bracket Expressions.
4540 (line 1982)
4541 * classes of characters: Character Classes and Bracket Expressions.
4542 (line 1982)
4543 * Command groups: Common Commands. (line 821)
4544 * Comments, in scripts: Common Commands. (line 742)
4545 * Conditional branch: Programming Commands.
4546 (line 1139)
4547 * Conditional branch <1>: Extended Commands. (line 1203)
4548 * control characters: Character Classes and Bracket Expressions.
4549 (line 1996)
4550 * Copy hold space into pattern space: Other Commands. (line 1106)
4551 * Copy pattern space into hold space: Other Commands. (line 1098)
4552 * cycle, restarting: Branching and flow control.
4553 (line 2678)
4554 * d, example: sed script overview.
4555 (line 401)
4556 * Delete first line from pattern space: Other Commands. (line 1077)
4557 * digit characters: Character Classes and Bracket Expressions.
4558 (line 2001)
4559 * Disabling autoprint, from command line: Command-Line Options.
4560 (line 178)
4561 * empty regular expression: Regexp Addresses. (line 1501)
4562 * Emptying pattern space: Extended Commands. (line 1225)
4563 * Emptying pattern space <1>: Reporting Bugs. (line 3963)
4564 * Evaluate Bourne-shell commands: Extended Commands. (line 1152)
4565 * Evaluate Bourne-shell commands, after substitution: The "s" Command.
4566 (line 706)
4567 * example, address range: sed script overview.
4568 (line 401)
4569 * example, regular expression: sed script overview.
4570 (line 406)
4571 * Exchange hold space with pattern space: Other Commands. (line 1114)
4572 * Excluding lines: Addresses overview. (line 1431)
4573 * exit status: Exit status. (line 353)
4574 * exit status, example: Exit status. (line 372)
4575 * Extended regular expressions, choosing: Command-Line Options.
4576 (line 290)
4577 * Extended regular expressions, syntax: ERE syntax. (line 1908)
4578 * File name, printing: Extended Commands. (line 1170)
4579 * Files to be processed as input: Command-Line Options.
4580 (line 336)
4581 * Flow of control in scripts: Programming Commands.
4582 (line 1128)
4583 * Global substitution: The "s" Command. (line 672)
4584 * GNU extensions, /dev/stderr file: The "s" Command. (line 699)
4585 * GNU extensions, /dev/stderr file <1>: Other Commands. (line 1066)
4586 * GNU extensions, /dev/stdin file: Other Commands. (line 1053)
4587 * GNU extensions, /dev/stdin file <1>: Extended Commands. (line 1193)
4588 * GNU extensions, /dev/stdout file: Command-Line Options.
4589 (line 344)
4590 * GNU extensions, /dev/stdout file <1>: The "s" Command. (line 699)
4591 * GNU extensions, /dev/stdout file <2>: Other Commands. (line 1066)
4592 * GNU extensions, 0 address: Range Addresses. (line 1611)
4593 * GNU extensions, 0 address <1>: Reporting Bugs. (line 3934)
4594 * GNU extensions, 0,ADDR2 addressing: Range Addresses. (line 1611)
4595 * GNU extensions, ADDR1,+N addressing: Range Addresses. (line 1611)
4596 * GNU extensions, ADDR1,~N addressing: Range Addresses. (line 1611)
4597 * GNU extensions, branch if s/// failed: Extended Commands. (line 1203)
4598 * GNU extensions, case modifiers in s commands: The "s" Command.
4599 (line 627)
4600 * GNU extensions, checking for their presence: Extended Commands.
4601 (line 1209)
4602 * GNU extensions, debug: Command-Line Options.
4603 (line 184)
4604 * GNU extensions, disabling: Command-Line Options.
4605 (line 257)
4606 * GNU extensions, emptying pattern space: Extended Commands. (line 1225)
4607 * GNU extensions, emptying pattern space <1>: Reporting Bugs.
4608 (line 3963)
4609 * GNU extensions, evaluating Bourne-shell commands: The "s" Command.
4610 (line 706)
4611 * GNU extensions, evaluating Bourne-shell commands <1>: Extended Commands.
4612 (line 1152)
4613 * GNU extensions, extended regular expressions: Command-Line Options.
4614 (line 290)
4615 * GNU extensions, g and NUMBER modifier: The "s" Command. (line 678)
4616 * GNU extensions, I modifier: The "s" Command. (line 715)
4617 * GNU extensions, I modifier <1>: Regexp Addresses. (line 1526)
4618 * GNU extensions, in-place editing: Command-Line Options.
4619 (line 211)
4620 * GNU extensions, in-place editing <1>: Reporting Bugs. (line 3915)
4621 * GNU extensions, M modifier: The "s" Command. (line 720)
4622 * GNU extensions, M modifier <1>: Regexp Addresses. (line 1554)
4623 * GNU extensions, modifiers and the empty regular expression: Regexp Addresses.
4624 (line 1501)
4625 * GNU extensions, N~M addresses: Numeric Addresses. (line 1461)
4626 * GNU extensions, quitting silently: Extended Commands. (line 1176)
4627 * GNU extensions, R command: Extended Commands. (line 1193)
4628 * GNU extensions, reading a file a line at a time: Extended Commands.
4629 (line 1193)
4630 * GNU extensions, returning an exit code: Common Commands. (line 758)
4631 * GNU extensions, returning an exit code <1>: Extended Commands.
4632 (line 1176)
4633 * GNU extensions, setting line length: Other Commands. (line 1033)
4634 * GNU extensions, special escapes: Escapes. (line 2223)
4635 * GNU extensions, special escapes <1>: Reporting Bugs. (line 3908)
4636 * GNU extensions, special two-address forms: Range Addresses.
4637 (line 1611)
4638 * GNU extensions, subprocesses: The "s" Command. (line 706)
4639 * GNU extensions, subprocesses <1>: Extended Commands. (line 1152)
4640 * GNU extensions, to basic regular expressions: BRE syntax. (line 1743)
4641 * GNU extensions, to basic regular expressions <1>: BRE syntax.
4642 (line 1789)
4643 * GNU extensions, to basic regular expressions <2>: BRE syntax.
4644 (line 1792)
4645 * GNU extensions, to basic regular expressions <3>: BRE syntax.
4646 (line 1807)
4647 * GNU extensions, to basic regular expressions <4>: BRE syntax.
4648 (line 1817)
4649 * GNU extensions, to basic regular expressions <5>: Reporting Bugs.
4650 (line 3881)
4651 * GNU extensions, two addresses supported by most commands: Other Commands.
4652 (line 887)
4653 * GNU extensions, two addresses supported by most commands <1>: Other Commands.
4654 (line 941)
4655 * GNU extensions, two addresses supported by most commands <2>: Other Commands.
4656 (line 1030)
4657 * GNU extensions, two addresses supported by most commands <3>: Other Commands.
4658 (line 1062)
4659 * GNU extensions, unlimited line length: Limitations. (line 3787)
4660 * GNU extensions, writing first line to a file: Extended Commands.
4661 (line 1220)
4662 * Goto, in scripts: Programming Commands.
4663 (line 1135)
4664 * graphic characters: Character Classes and Bracket Expressions.
4665 (line 2004)
4666 * Greedy regular expression matching: BRE syntax. (line 1843)
4667 * Grouping commands: Common Commands. (line 821)
4668 * hexadecimal digits: Character Classes and Bracket Expressions.
4669 (line 2027)
4670 * Hold space, appending from pattern space: Other Commands. (line 1102)
4671 * Hold space, appending to pattern space: Other Commands. (line 1110)
4672 * Hold space, copy into pattern space: Other Commands. (line 1106)
4673 * Hold space, copying pattern space into: Other Commands. (line 1098)
4674 * Hold space, definition: Execution Cycle. (line 2470)
4675 * Hold space, exchange with pattern space: Other Commands. (line 1114)
4676 * i, and semicolons: sed script overview.
4677 (line 434)
4678 * In-place editing: Reporting Bugs. (line 3915)
4679 * In-place editing, activating: Command-Line Options.
4680 (line 211)
4681 * In-place editing, Perl-style backup file names: Command-Line Options.
4682 (line 222)
4683 * infinite loop, branching: Branching and flow control.
4684 (line 2698)
4685 * Inserting text before a line: Other Commands. (line 930)
4686 * joining lines with branching: Branching and flow control.
4687 (line 2753)
4688 * joining quoted-printable lines: Branching and flow control.
4689 (line 2753)
4690 * labels: Branching and flow control.
4691 (line 2678)
4692 * Labels, in scripts: Programming Commands.
4693 (line 1131)
4694 * Last line, selecting: Numeric Addresses. (line 1456)
4695 * Line length, setting: Command-Line Options.
4696 (line 252)
4697 * Line length, setting <1>: Other Commands. (line 1033)
4698 * Line number, printing: Other Commands. (line 1020)
4699 * Line selection: Numeric Addresses. (line 1449)
4700 * Line, selecting by number: Numeric Addresses. (line 1451)
4701 * Line, selecting by regular expression match: Regexp Addresses.
4702 (line 1492)
4703 * Line, selecting last: Numeric Addresses. (line 1456)
4704 * List pattern space: Other Commands. (line 1033)
4705 * lower-case letters: Character Classes and Bracket Expressions.
4706 (line 2007)
4707 * Mixing g and NUMBER modifiers in the s command: The "s" Command.
4708 (line 678)
4709 * multiple files: Overview. (line 134)
4710 * multiple sed commands: sed script overview.
4711 (line 415)
4712 * n, and branching: Branching and flow control.
4713 (line 2708)
4714 * N, and branching: Branching and flow control.
4715 (line 2708)
4716 * named character classes: Character Classes and Bracket Expressions.
4717 (line 1982)
4718 * newline, command separator: sed script overview.
4719 (line 415)
4720 * Next input line, append to pattern space: Other Commands. (line 1083)
4721 * Next input line, replace pattern space with: Common Commands.
4722 (line 791)
4723 * Non-bugs, 0 address: Reporting Bugs. (line 3934)
4724 * Non-bugs, in-place editing: Reporting Bugs. (line 3915)
4725 * Non-bugs, localization-related: Reporting Bugs. (line 3944)
4726 * Non-bugs, localization-related <1>: Reporting Bugs. (line 3963)
4727 * Non-bugs, N command on the last line: Reporting Bugs. (line 3850)
4728 * Non-bugs, regex syntax clashes: Reporting Bugs. (line 3881)
4729 * numeric addresses: Addresses overview. (line 1406)
4730 * numeric characters: Character Classes and Bracket Expressions.
4731 (line 2001)
4732 * omitting labels: Branching and flow control.
4733 (line 2678)
4734 * output: Overview. (line 120)
4735 * output, suppressing: Overview. (line 127)
4736 * p, example: Overview. (line 127)
4737 * paragraphs, processing: Multiline techniques.
4738 (line 2553)
4739 * parameters, script: Overview. (line 140)
4740 * Parenthesized substrings: The "s" Command. (line 616)
4741 * Pattern space, definition: Execution Cycle. (line 2470)
4742 * Portability, comments: Common Commands. (line 745)
4743 * Portability, line length limitations: Limitations. (line 3787)
4744 * Portability, N command on the last line: Reporting Bugs. (line 3850)
4745 * POSIXLY_CORRECT behavior, bracket expressions: Character Classes and Bracket Expressions.
4746 (line 2051)
4747 * POSIXLY_CORRECT behavior, enabling: Command-Line Options.
4748 (line 260)
4749 * POSIXLY_CORRECT behavior, escapes: Escapes. (line 2228)
4750 * POSIXLY_CORRECT behavior, N command: Reporting Bugs. (line 3876)
4751 * Print first line from pattern space: Other Commands. (line 1095)
4752 * printable characters: Character Classes and Bracket Expressions.
4753 (line 2011)
4754 * Printing file name: Extended Commands. (line 1170)
4755 * Printing line number: Other Commands. (line 1020)
4756 * Printing text unambiguously: Other Commands. (line 1033)
4757 * processing paragraphs: Multiline techniques.
4758 (line 2553)
4759 * punctuation characters: Character Classes and Bracket Expressions.
4760 (line 2014)
4761 * Q, example: Exit status. (line 372)
4762 * q, example: sed script overview.
4763 (line 406)
4764 * Quitting: Common Commands. (line 758)
4765 * Quitting <1>: Extended Commands. (line 1176)
4766 * quoted-printable lines, joining: Branching and flow control.
4767 (line 2753)
4768 * range addresses: Addresses overview. (line 1424)
4769 * range expression: Character Classes and Bracket Expressions.
4770 (line 1957)
4771 * Range of lines: Range Addresses. (line 1586)
4772 * Range with start address of zero: Range Addresses. (line 1611)
4773 * Read next input line: Common Commands. (line 791)
4774 * Read text from a file: Other Commands. (line 1045)
4775 * Read text from a file <1>: Extended Commands. (line 1193)
4776 * regex addresses and input lines: Regexp Addresses. (line 1563)
4777 * regex addresses and pattern space: Regexp Addresses. (line 1563)
4778 * regular expression addresses: Addresses overview. (line 1418)
4779 * regular expression, example: sed script overview.
4780 (line 406)
4781 * Replace hold space with copy of pattern space: Other Commands.
4782 (line 1098)
4783 * Replace pattern space with copy of hold space: Other Commands.
4784 (line 1106)
4785 * Replacing all text matching regexp in a line: The "s" Command.
4786 (line 672)
4787 * Replacing only Nth match of regexp in a line: The "s" Command.
4788 (line 676)
4789 * Replacing selected lines with other text: Other Commands. (line 983)
4790 * Requiring GNU sed: Extended Commands. (line 1209)
4791 * restarting a cycle: Branching and flow control.
4792 (line 2678)
4793 * Sandbox mode: Command-Line Options.
4794 (line 312)
4795 * script parameter: Overview. (line 140)
4796 * Script structure: sed script overview.
4797 (line 384)
4798 * Script, from a file: Command-Line Options.
4799 (line 206)
4800 * Script, from command line: Command-Line Options.
4801 (line 201)
4802 * sed commands syntax: sed script overview.
4803 (line 391)
4804 * sed commands, multiple: sed script overview.
4805 (line 415)
4806 * sed script structure: sed script overview.
4807 (line 384)
4808 * Selecting lines to process: Numeric Addresses. (line 1449)
4809 * Selecting non-matching lines: Addresses overview. (line 1431)
4810 * semicolons, command separator: sed script overview.
4811 (line 415)
4812 * Several lines, selecting: Range Addresses. (line 1586)
4813 * Slash character, in regular expressions: Regexp Addresses. (line 1511)
4814 * space characters: Character Classes and Bracket Expressions.
4815 (line 2019)
4816 * Spaces, pattern and hold: Execution Cycle. (line 2470)
4817 * Special addressing forms: Range Addresses. (line 1611)
4818 * standard input: Overview. (line 112)
4819 * Standard input, processing as input: Command-Line Options.
4820 (line 338)
4821 * standard output: Overview. (line 120)
4822 * stdin: Overview. (line 112)
4823 * stdout: Overview. (line 120)
4824 * Stream editor: Introduction. (line 86)
4825 * subexpression: Back-references and Subexpressions.
4826 (line 2166)
4827 * Subprocesses: The "s" Command. (line 706)
4828 * Subprocesses <1>: Extended Commands. (line 1152)
4829 * Substitution of text, options: The "s" Command. (line 668)
4830 * suppressing output: Overview. (line 127)
4831 * syntax, addresses: sed script overview.
4832 (line 391)
4833 * syntax, sed commands: sed script overview.
4834 (line 391)
4835 * t, joining lines with: Branching and flow control.
4836 (line 2753)
4837 * t, versus b: Branching and flow control.
4838 (line 2753)
4839 * Text, appending: Other Commands. (line 871)
4840 * Text, deleting: Common Commands. (line 774)
4841 * Text, insertion: Other Commands. (line 930)
4842 * Text, printing: Common Commands. (line 782)
4843 * Text, printing after substitution: The "s" Command. (line 686)
4844 * Text, writing to a file after substitution: The "s" Command.
4845 (line 699)
4846 * Transliteration: Other Commands. (line 837)
4847 * Unbuffered I/O, choosing: Command-Line Options.
4848 (line 319)
4849 * upper-case letters: Character Classes and Bracket Expressions.
4850 (line 2023)
4851 * Usage summary, printing: Command-Line Options.
4852 (line 172)
4853 * Version, printing: Command-Line Options.
4854 (line 168)
4855 * whitespace characters: Character Classes and Bracket Expressions.
4856 (line 2019)
4857 * Working on separate files: Command-Line Options.
4858 (line 303)
4859 * Write first line to a file: Extended Commands. (line 1220)
4860 * Write to a file: Other Commands. (line 1066)
4861 * xdigit class: Character Classes and Bracket Expressions.
4862 (line 2027)
4863 * Zero, as range start address: Range Addresses. (line 1611)
4864
4865 Command and Option Index
4866 ************************
4867
4868 This is an alphabetical list of all 'sed' commands and command-line
4869 options.
4870
4871 * Menu:
4872
4873 * # (comments): Common Commands. (line 742)
4874 * --binary: Command-Line Options.
4875 (line 269)
4876 * --debug: Command-Line Options.
4877 (line 184)
4878 * --expression: Command-Line Options.
4879 (line 201)
4880 * --file: Command-Line Options.
4881 (line 206)
4882 * --follow-symlinks: Command-Line Options.
4883 (line 280)
4884 * --help: Command-Line Options.
4885 (line 172)
4886 * --in-place: Command-Line Options.
4887 (line 211)
4888 * --line-length: Command-Line Options.
4889 (line 252)
4890 * --null-data: Command-Line Options.
4891 (line 327)
4892 * --posix: Command-Line Options.
4893 (line 257)
4894 * --quiet: Command-Line Options.
4895 (line 178)
4896 * --regexp-extended: Command-Line Options.
4897 (line 290)
4898 * --sandbox: Command-Line Options.
4899 (line 312)
4900 * --separate: Command-Line Options.
4901 (line 303)
4902 * --silent: Command-Line Options.
4903 (line 178)
4904 * --unbuffered: Command-Line Options.
4905 (line 319)
4906 * --version: Command-Line Options.
4907 (line 168)
4908 * --zero-terminated: Command-Line Options.
4909 (line 327)
4910 * -b: Command-Line Options.
4911 (line 269)
4912 * -e: Command-Line Options.
4913 (line 201)
4914 * -E: Command-Line Options.
4915 (line 290)
4916 * -f: Command-Line Options.
4917 (line 206)
4918 * -i: Command-Line Options.
4919 (line 211)
4920 * -l: Command-Line Options.
4921 (line 252)
4922 * -n: Command-Line Options.
4923 (line 178)
4924 * -n, forcing from within a script: Common Commands. (line 750)
4925 * -r: Command-Line Options.
4926 (line 290)
4927 * -s: Command-Line Options.
4928 (line 303)
4929 * -u: Command-Line Options.
4930 (line 319)
4931 * -z: Command-Line Options.
4932 (line 327)
4933 * : (label) command: Programming Commands.
4934 (line 1131)
4935 * = (print line number) command: Other Commands. (line 1020)
4936 * {} command grouping: Common Commands. (line 821)
4937 * a (append text lines) command: Other Commands. (line 871)
4938 * alnum character class: Character Classes and Bracket Expressions.
4939 (line 1983)
4940 * alpha character class: Character Classes and Bracket Expressions.
4941 (line 1988)
4942 * b (branch) command: Programming Commands.
4943 (line 1135)
4944 * blank character class: Character Classes and Bracket Expressions.
4945 (line 1993)
4946 * c (change to text lines) command: Other Commands. (line 983)
4947 * cntrl character class: Character Classes and Bracket Expressions.
4948 (line 1996)
4949 * D (delete first line) command: Other Commands. (line 1077)
4950 * d (delete) command: Common Commands. (line 774)
4951 * digit character class: Character Classes and Bracket Expressions.
4952 (line 2001)
4953 * e (evaluate) command: Extended Commands. (line 1152)
4954 * F (File name) command: Extended Commands. (line 1170)
4955 * G (appending Get) command: Other Commands. (line 1110)
4956 * g (get) command: Other Commands. (line 1106)
4957 * graph character class: Character Classes and Bracket Expressions.
4958 (line 2004)
4959 * H (append Hold) command: Other Commands. (line 1102)
4960 * h (hold) command: Other Commands. (line 1098)
4961 * i (insert text lines) command: Other Commands. (line 930)
4962 * l (list unambiguously) command: Other Commands. (line 1033)
4963 * lower character class: Character Classes and Bracket Expressions.
4964 (line 2007)
4965 * N (append Next line) command: Other Commands. (line 1083)
4966 * n (next-line) command: Common Commands. (line 791)
4967 * P (print first line) command: Other Commands. (line 1095)
4968 * p (print) command: Common Commands. (line 782)
4969 * print character class: Character Classes and Bracket Expressions.
4970 (line 2011)
4971 * punct character class: Character Classes and Bracket Expressions.
4972 (line 2014)
4973 * q (quit) command: Common Commands. (line 758)
4974 * Q (silent Quit) command: Extended Commands. (line 1176)
4975 * r (read file) command: Other Commands. (line 1045)
4976 * R (read line) command: Extended Commands. (line 1193)
4977 * s command, option flags: The "s" Command. (line 668)
4978 * space character class: Character Classes and Bracket Expressions.
4979 (line 2019)
4980 * T (test and branch if failed) command: Extended Commands. (line 1203)
4981 * t (test and branch if successful) command: Programming Commands.
4982 (line 1139)
4983 * upper character class: Character Classes and Bracket Expressions.
4984 (line 2023)
4985 * v (version) command: Extended Commands. (line 1209)
4986 * w (write file) command: Other Commands. (line 1066)
4987 * W (write first line) command: Extended Commands. (line 1220)
4988 * x (eXchange) command: Other Commands. (line 1114)
4989 * xdigit character class: Character Classes and Bracket Expressions.
4990 (line 2027)
4991 * y (transliterate) command: Other Commands. (line 837)
4992 * z (Zap) command: Extended Commands. (line 1225)
4993
4994 GNU 'sed'
4995 1 Introduction
4996 2 Running sed
4997 2.1 Overview
4998 2.2 Command-Line Options
4999 2.3 Exit status
5000 3 'sed' scripts
5001 3.1 'sed' script overview
5002 3.2 'sed' commands summary
5003 3.3 The 's' Command
5004 3.4 Often-Used Commands
5005 3.5 Less Frequently-Used Commands
5006 3.6 Commands for 'sed' gurus
5007 3.7 Commands Specific to GNU 'sed'
5008 3.8 Multiple commands syntax
5009 3.8.1 Commands Requiring a newline
5010 4 Addresses: selecting lines
5011 4.1 Addresses overview
5012 4.2 Selecting lines by numbers
5013 4.3 selecting lines by text matching
5014 4.4 Range Addresses
5015 5 Regular Expressions: selecting text
5016 5.1 Overview of regular expression in 'sed'
5017 5.2 Basic (BRE) and extended (ERE) regular expression
5018 5.3 Overview of basic regular expression syntax
5019 5.4 Overview of extended regular expression syntax
5020 5.5 Character Classes and Bracket Expressions
5021 5.6 regular expression extensions
5022 5.7 Back-references and Subexpressions
5023 5.8 Escape Sequences - specifying special characters
5024 5.8.1 Escaping Precedence
5025 5.9 Multibyte characters and Locale Considerations
5026 5.9.1 Invalid multibyte characters
5027 5.9.2 Upper/Lower case conversion
5028 5.9.3 Multibyte regexp character classes
5029 6 Advanced 'sed': cycles and buffers
5030 6.1 How 'sed' Works
5031 6.2 Hold and Pattern Buffers
5032 6.3 Multiline techniques - using D,G,H,N,P to process multiple lines
5033 6.4 Branching and Flow Control
5034 6.4.1 Branching and Cycles
5035 6.4.2 Branching example: joining lines
5036 7 Some Sample Scripts
5037 7.1 Joining lines
5038 7.2 Centering Lines
5039 7.3 Increment a Number
5040 7.4 Rename Files to Lower Case
5041 7.5 Print 'bash' Environment
5042 7.6 Reverse Characters of Lines
5043 7.7 Text search across multiple lines
5044 7.8 Line length adjustment
5045 7.9 Reverse Lines of Files
5046 7.10 Numbering Lines
5047 7.11 Numbering Non-blank Lines
5048 7.12 Counting Characters
5049 7.13 Counting Words
5050 7.14 Counting Lines
5051 7.15 Printing the First Lines
5052 7.16 Printing the Last Lines
5053 7.17 Make Duplicate Lines Unique
5054 7.18 Print Duplicated Lines of Input
5055 7.19 Remove All Duplicated Lines
5056 7.20 Squeezing Blank Lines
5057 8 GNU 'sed''s Limitations and Non-limitations
5058 9 Other Resources for Learning About 'sed'
5059 10 Reporting Bugs
5060 Appendix A GNU Free Documentation License
5061 Concept Index
5062 Command and Option Index

savannah-hackers-public@gnu.org
ViewVC Help
Powered by ViewVC 1.1.26