This statement extracts files from the currently opened ZIP file.
EXTRACT-FILE | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
FILE-NAME = *ALL
All the files included in the container are extracted.
FILE-NAME = <composed-name 1..98 with-under with-wild(132)>
The specified file is extracted from the container. When wildcards are used, all files matching the pattern are extracted from the container.
FILE-NAME = <c-string 1..1024 with-low>
All files which match the specified string (wildcards according to the SDF rules for wildcard selection (see SDF syntax in the “Commands” manual [2]) are permitted) are extracted from the ZIP container. Specification as a C string must be used if the container was created in a non-BS2000 system and the file names concerned do not comply with BS2000 syntax (e.g. upper/lower case).
FILE-NAME = *PATH-NAME(...)
The specified file is extracted from the container. The operand will not interpret characters like forward slash, asterisk, square brackets and others as wildcards, but as part of the filename. Use this operand to specify a relative path name inside of the ZIP container.
TO-FILE = *BY-SOURCE / <filename 1..54 without-gen-vers with-wild-constr(80)>According to the origin of the zipped files, the output name will respect the following rules:
For BS2000 files, the output name is built according to the SDF rules for wildcard construction (see SDF syntax in the “Commands” manual [2]).
For other files (PC, Unix system), the output name is built by replacing the '*' character in the TO-FILE operand by the file name registered in the container. The eventual path of the file is ignored in the file name construction process..
TO-FILE=
File origin BS2000
File origin not BS2000
*BY-SOURCE
The output file name will be the file name as registered in the container.
If the file has been registered with a catid/userid, it will be extracted under this catid/userid.
If the file has been registered without catid/userid, it is extracted under the current userid/catid.
The output filename will be the file name without access path prefixed by the current catid/userid.
File names which do not comply with the syntax in BS2000 are renamed by BS2ZIP (see below).<filename 1..54 without-gen-vers with-wildconstr(80)>
Valid format is:
a file name without wildcard.
a file name with wildcards:
the output name is built according to the SDF rules for wildcard construction.
If the new file name does not respect BS2000 syntax, the extract of the file is rejected.
Valid format is:
a file name without wildcard.
[<PREFIX>]*[<SUFFIX>]: the output file name will be [<PREFIX>]filename[<SUFFIX>] where file name is the registered file name without directory path.
If the resultant file name does not comply with the syntax in BS2000, it is renamed by BS2ZIP (see below).
If some zipped file names are not BS2000 compliant, it is necessary to extract them one by one, specifying for each of them a valid output name. If extraction results in an invalid file name, the file concerned is assigned the following alternative file name:
FILExxxx.yyyymmdd.hhmmss
where xxxx is a sequence number, yyyymmdd.hhmmss is the current date and time. BS2ZIP shows that the file has been renamed with the message
SZP0090
. Example:% SZP0090 Warning. File name 'TEST_KDO_1.HTM' is not BS2000 compliant.
The file will be extracted under the name 'FILE0001.20111108.161442'
The catid /userid of those files are taken from the <PREFIX>. If they are not specified the catid/userid of the current user are used by default.
WRITE-MODE =
Controls whether an existing file with the same name as output file can be replaced. CODED-CHARACTER-SET of the existing file will infer current encoding and encoding after character conversion of the extracted file from an archive. Depending on CHARACTER-CONVERSION operand the following conversion will be performed:
CHARACTER-CONVERSION operand | C-C-S of existing file | Current encoding of file in archive | Character conversion type |
---|---|---|---|
*BY-CONTAINER-FORMAT | Non-UTF EBCDIC (EDF041, EDF042, ...) | Non-UTF ASCII (ISO88591, ISO88592, ...) | ASCII -> EBCDIC |
*BY-CONTAINER-FORMAT | Non-UTF ASCII (ISO88591, ISO88592, ...) | Same non-UTF ASCII (ISO88591, ISO88592, ...) | No conversion |
*BY-CONTAINER-FORMAT | UTF (UTF8, UTF16, UTFE) | Same encoding | No conversion |
*NO | Any encoding | Same encoding | No conversion |
*TO-WIN-ANSI | Non-UTF ASCII (ISO88591, ISO88592, ...) | Non-UTF EBCDIC(EDF041, EDF042, ...) | EBCDIC-> ASCII |
*TO-WIN-ANSI | Non-UTF EBCDIC (EDF041, EDF042, ...) | Extraction error | Extraction error |
*TO-WIN-ANSI | UTF (UTF8, UTF16, UTFE) | Same UTF (UTF8, UTF16, UTFE) | No conversion |
*TO-EBCDIC | Non-UTF ASCII (ISO88591, ISO88592, ...) | Extraction error | Extraction error |
*TO-EBCDIC | Non-UTF EBCDIC (EDF041, EDF042, ...) | Non-UTF ASCII (ISO88591, ISO88592, ...) | ASCII -> EBCDIC |
*TO-EBCDIC | UTF (UTF8, UTF16, UTFE) | Same UTF (UTF8, UTF16, UTFE) | No conversion |
*BY-PARAMETERS(FROM-CCS,TO-CSS) | Any encoding (existing file is ignored) | FROM-CCS value | FROM-CCS -> TO-CSS |
WRITE-MODE = *CREATE
The output file must not exist and is created. The statement is rejected if the file exists already.
WRITE-MODE = *REPLACE-ONLY
The output file must exist and is replaced. The statement is rejected if the file does not exist. If BS2ZIP extracts a foreign text file it will check CODED-CHARACTER-SET attribute of the replaced existing file. This attribute will be used an encoding of the extracted file. BS2ZIP also will infer CCSNAME of the encoding in which the file is stored inside of the archive from this attribute as it is described in the table above.
WRITE-MODE = *ANY
The output file is overwritten if exists or created otherwise. If BS2ZIP extracts a foreign text file and replaces an existing file it will check CODED-CHARACTER-SET attribute of the replaced file. This attribute will be used as encoding of the extracted file. BS2ZIP also will infer CCSNAME of the encoding in which the file is stored inside of the archive from this attribute as it is described in the table above.
DATA-TYPE =
This operand controls the record structure of the files to be extracted.
DATA-TYPE | Container format | Container format |
*NOT-SPECIFIED | If no file-info found in ZIP container | If no file-info found |
*CHARACTER | File is extracted as a SAM file | Statement rejected |
*BINARY | File is extracted as a PAM file | Statement rejected |
*SAM-BINARY | File is extracted as a binary SAM file (REC-FORM=U) | Statement rejected |
CHARACTER-CONVERSION =
This operand controls the WIN-ANSI/EBCDIC conversion. It is only supported for the following files:
files extracted with DATA-TYPE=*CHARACTER
files extracted with DATA-TYPE=*NOT-SPECIFIED and which are not indicated as PAM file in the file-info.
Type of conversion is determined by the presence and values of CODED-CHAR-SET attribute and of the extensible data field. They can be checked with SHOW-FILE-ATTRIBUTES command with operand INFORMATION=*ALL.
CHARACTER-CONVERSION | Conversion behaviour | |
*BY-CONTAINER-FORMAT | Default value. Behavior determined by the ZIP container open format. Attempts to restore to the original file encoding. | |
*NO | No conversion is performed | |
*TO-WIN-ANSI | WIN-ANSI conversion is performed (only for extraction to SAM/ISAM file) | |
*TO-EBCDIC | EBCDIC conversion is performed (only for extraction to SAM/ISAM file) | |
*BY-PARAMETERS(...) | Conversion will be made according to specified source and target character sets (only for extraction to SAM/ISAM file) | |
FROM-CCS | Origin coded character set, from which all characters will be converted. | |
TO-CCS=*STD | Target coded character set is picked by finding the first fully compatible coded character set in EBCDIC of the same ISO code variant number as origin coded character set. If origin coded character set is Unicode, than no conversion is performed. (ISO code variant number, see manual XHCS [5]). | |
TO-CCS | Target coded character set, to which all characters will be converted. |
The type of conversion is determined by:
- CHARACTER-CONVERSION operand value
- DATA-TYPE operand value
- WRITE-MODE operand and CODED-CHARACTER-SET attribute of the replaced file
- the presence and the value of the FCBTYPE attribute inside of the file-info
- the presence and the value of the CODED-CHAR-SET attribute inside of the file-info
- the presence and the value of the extensible data field.
- Scan of the first 32 kB of the file (It is performed only if the CODED-CHAR-SET attribute inside of the file-info is not present)
Metadata of a file in an archive can be checked with SHOW-FILE-ATTRIBUTES command with operand INFORMATION=*ALL (COMMENTS for the file-info and CCSNAME for the extensible data field). File-info with file attributes should be contained in the metadata of all files in archive added by BS2ZIP. File-info contains an attribute CODED-CHAR-SET, which is the original encoding of added files. If the file-info is not present in the metadata of file in archive, then it is assumed that the file is a foreign ; i.e. it was added by third party archive manager from open systems. By default such files are scanned to check the most likely encoding between ISO8859F, WCP1252, UTF8, UTF16. The encoding detection algorithm is probabilistic and it is recommended to set the type character conversion manually. Starting from V21.B10 the extensible data field is added by BS2ZIP to metadata of all files in archives by BS2ZIP. It contains a field with the name of the encoding in which the file is stored inside of the archive.
Character conversion of files in Unicode variants is only available with CHARACTER-CONVERSION=*BY-PARAMETERS(). Extraction of Unicode files with all other values of this operand results in character conversion being skipped. Supported Unicode CCSNAMEs: UTF8, UTFE, UTF16 (Big-endian).
All CCSNAMEs in BS2ZIP are divided into 3 types: single byte EBCDIC, single byte ASCII and Unicode. Below is the table with the list of encodings for each type:
EBCDIC | ASCII | UNICODE |
---|---|---|
EDF03IRV EDF03DRV EDF04DRV EDF041 EDF042 EDF043 EDF044 EDF045 EDF046 EDF047 EDF049 EDF04A EDF04B EDF04C EDF04D EDF04E EDF04F EEHCL2 EEHCLC EEHCLC1 EEHCLAA EEHCLG | ISO88591 ISO88592 ISO88593 ISO88594 ISO88595 ISO88597 ISO88599 ISO8859F WCP1252P | UTF8 UTFE UTF16 |
CHARACTER-CONVERSION = *BY-CONTAINER-FORMAT
Default value. Behavior determined by the ZIP container open format. BS2ZIP attempts to restore to the original file encoding.
Files are converted only
- if the archive is opened in winzip compatible format
- if the original file has SAM/ISAM structure are converted
- if DATA-TYPE is specified to *NOT-SPECIFIED or *CHARACTER
- file is not in Unicode
Condition | Result |
---|---|
| No character conversion is performed. |
File in archive has:
| Character conversion from current encoding to the original file encoding is performed1. |
File in archive has:
But does not have:
| Conversion from ISO8859F to EDF04F |
File in archive has:
But does not have:
| No character conversion is performed. |
| No character conversion is performed. |
| Character conversion to EBCDIC is performed1. |
| Character conversion from ASCII to EBCDIC is performed1. |
| No character conversion is performed. |
1 see info box below
CHARACTER-CONVERSION = *NO
No character conversion is performed. Metadata of a file in an archive only impacts CODED-CHARACTER-SET attribute of the output file. If neither extensible data field, nor file-info with CODED-CHAR-SET attribute are present, then BS2ZIP performs a scan of the first 32 kiB of the data to determine the most likely encoding between ISO8859F, WCP1252, UTF8, UTF16.
Condition | Result |
---|---|
| CODED-CHARACTER-SET attribute of the output file is set to CODED-CHAR-SET attribute of the file in the archive . |
| CODED-CHARACTER-SET attribute of the output file is set to CODED-CHAR-SET attribute of the file in the archive . |
| CODED-CHARACTER-SET attribute of the output file is set based on scan of the first 32 kB of the file in the archive . Possible values: ISO8859F, WCP1252, UTF8, UTF16 |
| CODED-CHARACTER-SET attribute of the output file remains preserved. |
CHARACTER-CONVERSION = *TO-WIN-ANSI
WIN-ANSI conversion is performed. The output file will always be in ASCII, except when file in archive is stored in a Unicode variant.
Condition | Result |
---|---|
| No character conversion is performed. |
File in archive has:
| Character conversion from current encoding to ASCII is performed unless it is already in ASCII (according to extensible data field)*. |
File in archive has:
But does not have:
| Conversion from EDF04F to ISO8859F. |
File in archive has:
But does not have:
| No character conversion is performed. |
| Conversion from EDF04F to ISO8859F. |
| Character conversion from EBCDIC to ASCII is performed1. |
| No character conversion is performed. |
1 see info box below
CHARACTER-CONVERSION = *TO-EBCDIC
EBCDIC conversion is performed. The output file will always be in EBCDIC, except when file in archive is stored in Unicode variant.
Condition | Result |
---|---|
| No character conversion is performed. |
File in archive has:
| Character conversion from current encoding to EBCDIC is performed unless it is already in EBCDIC (according to extensible data field)*. |
File in archive has:
But does not have:
| Conversion from ISO8859F to EDF04F. |
File in archive has:
But does not have:
| No character conversion is performed. |
| Conversion from ISO8859F to EDF04F. |
| Character conversion from ASCII to EBCDIC is performed1. |
| No character conversion is performed. |
1 see Info box below.
CHARACTER-CONVERSION = *BY-PARAMETERS(...)
Conversion will be made according to specified source and target character sets.
Files are converted only
- if the original file has SAM/ISAM structure
- if DATA-TYPE is specified to *NOT-SPECIFIED or *CHARACTER.
This option allows to ignore metadata of a file in archive in order to set the character conversion type manually.
During extraction of an file from archive this option allows to ignore:
- CODED-CHAR-SET attribute inside of the file-info
- Extensible data field
- CODED-CHARACTER-SET attribute of existing file when WRITE-MODE = *REPLACE-ONLY or *ANY that overwrites existing file.
It is particularly useful for files added to archive by different archive managers in order to skip foreign file conversion scan of the first 32 kB. FROM-CCS and TO-CSS accept all CCSNAME values that are available, unless they both are not Unicode variant and have different ISO codes. Specifying the same value for both FROM-CSS and TO-CCS leads to BS2ZIP skipping conversion, but it will also specify this CCSNAME as CODED-CHARACTER-SET attribute of the extracted file.
TO-CSS also accepts *STD. If FROM-FILE is not Unicode and is ASCII, then BS2ZIP will pick a pair EBCDIC CCSNAME of the same ISO code and convert to it. In all other cases with TO-CSS=*STD conversion will be skipped, but BS2ZIP will also specify CCSNAME from FROM-CCS as CODED-CHARACTER-SET attribute of the extracted file.
If XHCS was not able to convert some of the characters, a warning message will be issued and these characters will be set to dot '.' (x'4B').
Metadata of file in archive can contain CCSNAMEs within that are not present on the system currently. Extensible data field contains information about ISO codes of the encoding, whether it was EBCDIC/ASCII, Unicode or not. This information will be used to substitute encodings to their best available alternative defined on the system.
For example: according to metadata of a file in an archive its current encoding is WCP1252 (ISO code 15), while the original file encoding was EDF04F, but WCP1252 is not currently defined in XHCS. BS2ZIP calls XHCS to check available CCSNAMEs with ISO code 15; it sees that ISO8859F is available and converts the file from ISO8859F to EDF04F during file extraction.
BLOCK-CONTROL-INFO =
Controls the block control attribute of the resulting file. This allows particularly to extract original PAMKEY file on a NK disk.
BLOCK-CONTROL-INFO = *KEEP
The resulting file keeps the same block control attribute than the original file
BLOCK-CONTROL-INFO = *IGNORE
The resulting file is created with the default block control of the disk where it is saved.
PAD-EMPTY-RECORD = *NO / *YES
Controls if empty lines are padded with a blank character when extracted from winzip compatible archive. Padding is only applied when DATA-TYPE is *NOT-SPECIFIED or *CHARACTER.
DELIMITER =
Controls which line delimiter is assumed to be separating the lines in the extracted file. This setting is ignored unless DATA-TYPE is *NOT-SPECIFIED or *CHARACTER and the archive is in winzip compatible format.
If this option is set to *STD / *CRLF / *LF / *NL, then the delimiter option is based on the stored encoding of the file in archive and CHARACTER-CONVERSION option. Delimiters associated with different encodings are in the following table:
Delimiter | Single byte ASCII, UTF8 | EBCDIC | UTF16 |
---|---|---|---|
*CRLF | 0D0A | 0D25 | 000D000A |
*LF | 0A | 25 | 000A |
*NL | 0A | 15 | 000A |
DELIMITER = *STD
Files added by BS2ZIP that include an extensible data field will use the delimiter specified within that field for separation. This delimiter can be displayed with SHOW-FILE-ATTRBIUTES INFO=*ALL. For files that were added without extensible data field (by third party utilities or BS2ZIP older than V21.0B10) BS2ZIP will look for all of the delimiters associated with the encoding, e.g. if file is in EBCDIC, then BS2ZIP will look for 0D25, 25, 15.
DELIMITER = *CRLF / *LF / *NL
BS2ZIP will look for of the delimiters associated with the encoding of the file, e.g. if file is in EBCDIC, then BS2ZIP will look only for 25, if LF is specified.
DELIMITER = *0D0A / *0A / *0D25 / *25 / *15 / *000D000A / *000A
BS2ZIP will ignore the encoding of the file and will look only for the specified line delimiter.
LOGGING = *MINIMUM / *MAXIMUM
Controls the amount of the message output.
LOGGING = *MINIMUM
Only error messages will be sent.
LOGGING =*MAXIMUM
All messages will be sent. Currently the [guaranteed] message SZP0122 is sent after each file extraction; further messages may be added in the future.
Notes
If data encryption had been set using the MODIFY-ZIP-OPTIONS statement, encrypted files are decrypted again when they are extracted. The standard Zip 2.0 encryption used here is compatible with WinZip on Windows-based systems.
Files extracted from a container created on the BS2000, are created with the same organization characteristics as the original file, except the padding factor and blockcontrol. The padding factor is the default DMS padding value. This implies that the size of extracted SAM and ISAM file can be different from the size of the original files.
Files extracted from a container created in a foreign environment, are created
as SAM files with BUF-LEN=STD(16), provided that DATA-TYPE=*NOT-SPECIFIED or *CHARACTER,
as PAM files with BUF-LEN=STD(16) provided that DATA-TYPE=*BINARY and
as SAM files with REC-FORM=U provided that DATA-TYPE=*SAM-BINARY.
K and NK disks
When a zipped file with PAMKEY block control has to be extracted on a NK disk, use the operand BLOCK-CONTROL-INFO=*IGNORE. The file will be converted into NK format. However, in case of SAM or ISAM files with records occupying all the available space in blocks, data truncation will occur. In this case, an error will be detected and the extract processing is aborted for the current file. The output file is erased.
To extract NK disk files (especially load modules), that have been added to a container, on a PAMKEY disk, the option BLOCK-CONTROL-INFO=*KEEP must be set.
If the K2 key is pressed during EXTRACT-FILE statement, processing is interrupted with the query message
SZP0208
:The user can simply continue processing.
The user can terminate processing and return to statement mode (//). The files which had not been added by the time the interruption occurred are not extracted. If required, they must be extracted again.
To select files from an archive coming from open systems, take into account that the registered file names are case sensitive. So, use the c-string format to select filenames containing lower cases.
Rules for naming extracted files:
BS2000 files:
(please refer to SDF rules for wildcard construction in the “Commands” manual [2])FILE-NAME
TO-FILE
Registered file names
Resulting file names 2
MYFILE1
*BY-SOURCE
MYFILE1
:ccid:$cuid.MYFILE1
*
*BY-SOURCE
MYFILE1
MYFILE2:ccid:$cuid.MYFILE1
:ccid:$cuid.MYFILE2MY*
EXT-*
MYFILE1
MYFILE2:ccid:$cuid.EXT-FILE1
:ccid:$cuid.EXT-FILE2MYFILE1
*BY-SOURCE
:XXXX:$UID.MYFILE1
No file found
:XXXX:$UID.
*BY-SOURCE
:XXXX:$UID.MYFILE1
:XXXX:$UID.MYFILE1
$UID.
*BY-SOURCE
$UID.MYFILE1
:ccid:$UID.MYFILE1
2where $cuid = current userid and :ccid: = catid of the userid
Not BS2000 files:
FILE-NAME | TO-FILE | Registered file names | Resulting file names 3 |
MYFILE1 | *BY-SOURCE | /temp/data/myfile.txt | No file found |
* | *BY-SOURCE | /temp/data/myfile1.txt | :ccid:$cuid.MYFILE1.TXT |
'*myfile*' | EXT-* | /temp/data/myfile1.txt | :ccid:$cuid.EXT-MYFILE1.TXT |
3where $cuid = current userid and :ccid: = catid of the userid