Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

csplit - split files based on context

&pagelevel(4)&pagelevel

csplit splits the contents of a file or the text it reads from standard input into smaller sections and writes all or some of these sections to separate output files. The original file is left unaltered.
The way in which csplit divides a file and the sections for which output files are created are specified in the command-line arguments.
csplit creates a maximum of 100 output files per call.


Syntax


csplit[ -ks][ -f name][ -n number] file arg1 ... argn

No option specified:

The output files are named xx00, xx01, and so on.

For each output file that it creates, csplit writes a character count on standard output.
Any files that have already been created are removed if an error occurs.

-f name

The output files are called name00, name01, etc.

-f not specified:
The output files are named xx00, xx01, and so on.

-k

Files that have already been created are retained if an error occurs.

-n number

The current number of the output files comprises number digits, whereby 1 <= number <= 9.

Example

For -n 4, the output files are called xx0000, xx0001 etc.

-n not specified:
The current number consists of 2 digits.

-s

The output of a character count is suppressed.

file

Name of the input file.
If you use a dash (-) as the name for file, csplit reads from standard input.

arg1 ... argn

You can specify several arguments, each of which references a particular line in the input file. These lines represent the points at which csplit is to split the file into sections. Each dividing line becomes the first line of a new section. If you specify n arguments, csplit divides the file into n+1 sections.

csplit usually writes each section to a separate output file.

This does not apply when the argument %regular_expression%[+number][-number] is used (see below). The last section (section n) is always written to an output file.

The arguments you specify are processed by csplit in the order in which you list them. To begin with, the first line of the input file is the current line. After an argument has been processed, the line referenced by this argument becomes the current line. The line referenced by the next argument must lie in the range between but not including the current line and the end of the input file. Thus the line referenced by the second argument must come after the line referenced by the first argument.

argument can be specified as follows:

/regexp/[+number][-number]

An argument in the form /regexp/ references the next line after the current line that matches the specified regular expression. The section from the current line up to but not including the line that matches the regular expression is written to an output file. The line matching the regular expression now becomes the current line.

The +number or -number offset shifts the dividing line number lines after (+) or before (-) the line that matches the regular expression. The line that is number lines after (+) or before (-) the line matching the regular expression thus becomes the current line.

Simple regular expressions (see Tables and directories, Regular POSIX shell expressions) are recognized. If the argument contains blanks or shell metacharacters (see Tables and directories, POSIX shell metanotation), you must either escape every such character with a backslash \ or enclose the whole argument in single quotes ’...’. The regular expression must not contain any newline characters.

%regexp%[+number][-number]

An argument in the form %regexp% references the next line after the current line that matches the specified regular expression. The line that matches the regular expression becomes the current line. csplit in this case does not create an output file for the relevant section.

If the +number or -number offset is also specified, the current line will be the line that is number lines after (+) or before (-) the line containing the regular expression.

Simple regular expressions (see section “Regular POSIX shell expressions”"Regular POSIX shell expressions") are recognized. If the argument contains blanks or shell metacharacters (see section “Metacharacters for the POSIX shell”), you must either escape every such character with a backslash \ or enclose the whole argument in single quotes ’...’. The regular expression must not contain any newline characters.

num

This argument references the line with line number num. csplit writes the section from the current line up to but not including the numth line to an output file. The numth line then becomes the current line.

{n}

This argument is an abbreviation for n arguments of the previous type (see above) and means: "repeat the preceding argument n times", where n is an integer greater than 1.

The {n} argument can be entered after any of the above-mentioned arguments, with a blank to separate them.Thus if it follows an argument in the form /regexp/+number][-number] or %regexp%[+number][-number], this argument will be repeated n times.

Example

’/regexp/’ {2} is an abbreviation for ’/regexp/’ ’/regexp/’ ’/regexp/’

If {n} follows an argument of the num type, the file will be split n times, from the numth line onward, into sections of num lines each.

Example

100 {2} is an abbreviation for 100 200 300

Error

argument - out of range

The line referenced by the specified argument lies outside the permissible range. The legal range is from, but not including, the current line to the end of the file.


100 file limit reached at arg ...

You have specified so many arguments that csplit would need to create more than 100 output files.

Locale

The following environment variables affect the execution of csplit:

LANG

Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined.

LC_ALL

If set to a non-empty string value, override the values of all the other internationalization variables.

LC_COLLATE

Determines the internationalized environment for the behavior of ranges, equivalence classes and multicharacter collating elements within regular expressions.

LC_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments). LC_CTYPE governs character classes, character conversion (shifting) and the behavior of character classes in regular expressions.

LC_MESSAGES

Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.

NLSPATH

Determine the location of message catalogs for the processing of LC_MESSAGES.

Example 1

The file book contains a text that is subdivided into three chapters. The first chapter is preceded by a preface; an appendix follows the last chapter. Each chapter begins with the title "CHAPTER ..."; the title of the appendix is "APPENDIX".

You now wish to put the preface, the individual chapters, and the appendix into separate files. The output files are to be named chap00, chap01, etc.

$ csplit -f chap book '/CHAPTER/' '/CHAPTER/' '/CHAPTER/' '/APPENDIX/'
1636
15124
32743
20344
2576
$ ls
book
chap00
chap01
chap02
chap03
chap04

The file chap00 contains the preface and consists of 1636 characters. The appendix is located in the file chap04.
The same results could also have been obtained by abbreviating the csplit call as follows:

$ csplit -f chap book '/CHAPTER/' {2} '/APPENDIX/'
:

You can now edit the sections separately, and later you can join them again using cat:

$ cat chap0[0-4] > book

Example 2

The input file file is to be split into sections every hundred lines. To do this, you enter:

$ csplit file 100 {98}

The argument {98} stands for 98 arguments: 200 300 ... 9900.

If file contains 9900 or more lines, csplit creates 100 output files. The first output file xx00 includes line 1 to 99 (inclusive); the last output file, xx99, contains the rest of file from line 9900 onward.
If file contains fewer than 9900 lines, csplit issues the error message "{98} - out of range" and terminates. If you include option -k in the call, the files already created are retained.

$ csplit -k file 100 {98}
If file contains only 9830 lines, for example, then xx98 is the last output file created and includes lines 9800 to 9830.

Example 3

The file prog.c contains a C source program. The program includes a main function and a maximum of 20 further functions. In accordance with C conventions, each function ends with a right brace at the beginning of a line (in column 1). Right braces within a function are not located in the first column of a line.
Each function is now to be written to a separate file. To do this, you enter:

$ csplit -k prog.c '%main(%' '/^}/+1' {19}

If the program contains exactly 20 functions in addition to the main function, csplit splits the file into 22 sections.

Section 0 contains all lines from the beginning of the file up to but not including the start of the main function. This section will not be written to an output file (argument %main(%).

Section 1 contains the main function and is written to the output file xx00 (argument /^}/+1).

Functions 1 to 19 are similarly written to separate output files in succession (argument {19}). The final section, i.e. section 22, contains the rest of the input file (which in this case is function 20) and is written to the output file xx20.

If the program contains fewer than 20 functions, csplit will terminate at the last function and issue the error message "{19} - out of range". Since the -k option has been set, the created files will, however, be retained.

See also

ed, sh, split