Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Strings

The Java data type string is provided in JNI as data type jstring. This type cannot be used directly in C; in particular, it has no commonality with the C data type char *. In order to convert the string to a form which can be processed in C, the corresponding JNI interfaces must be used for the conversion (see JNI documentation).

The Java data typeChar is available at the JNI interface as data type jchar. This is compatible with the C data type unsigned short and constitutes one character in Unicode representation. The first 256 characters in Unicode are identical to the ISO8859-1 encoding. Unicode characters outside this range are not supported in C/C + + in BS2000. Processing of these characters must therefore be undertaken by users themselves.

The UTF-8 representation of Unicode, which is partially used by Java in the JNI, plays a special role. In UTF-8 representation, Unicode characters are encoded into one, two or three bytes. Under this encoding, Unicode characters with codes 1 to 127 are represented with this value in a single byte, corresponding once again exactly to the ASCII encoding of these characters.

Moreover, UTF-8 byte sequences are always terminated in Java with a NULL byte, which enables them to be processed as C strings. Here, the Unicode NULL character is encoded into two bytes so as to avoid confusion with the string delimiter in C, since, unlike in C, it is perfectly acceptable in Java for strings to contain NULL characters.

The following simple rules apply to the processing of UTF-8 byte sequences in C:

  • The NULL byte marks the end of the byte sequence, and is absolutely essential.

  • Bytes for which the function isascii_ascii() returns the value “true” (1-127) are also in fact ASCII characters as per ISO8859-1

  • To represent Unicode characters outside the range 1 to 127, all the other bytes are treated as if they were part of a multibyte sequence. These have to be interpreted by the user.

As nearly all these conversion functions constitute character sequences at least in a form which is upwardly compatible with ASCII, code conversion from ASCII to EBCDIC and vice versa does not play a special role in BS2000. Naturally, this applies not only to strings but also, for example to byte arrays or characters (jchar).

References to “ASCII” in this manual always refer to the ISO8859-1 character set (ISO Latin 1) or its 7 bit offshoot (ISO 646). “EBCDIC” refers to the character set DF04-1 (international reference version) with swapped 0x15 and 0x25 or its 7 bit offshoot DF03-1.

As well as explicit conversion facilities, to support ASCII strings, appropriate compiler and runtime system extensions are available which allow you to work directly with ASCII strings and characters in C.

Explicit conversion

The JNI conversion functions (see „JavaTM Native Interface” [13]) work in BS2000 exactly as specified. They always return or else expect Unicode or UTF-8.

Some functions are available in CRTE for explicit conversion between ASCII (8859-1) and EBCDIC (DF04-1). These are declared in the header file <ascii_ebcdic.h>, which is part of the CRTE distribution. These conversion functions are described in the manual “CRTE” [3].


Example

The next example illustrates usage in a native method which ascertains the value of an environment variable and removes the prefix JAVA_ from this. On the Java side the method will be declared as:


public native String get_jenviron(String name);


The associated C program could look like this:


#include <jni.h> 
#include ".....h"         // Header generated by javah
#include <stdlib.h>
#include <ascii_ebcdic.h>
JNIEXPORT jstring JNICALL 
Java_..._get_jenviron(JNIEnv *env, jobject jthis, 
                      jstring name)
{ 
    const char *utf_name; 
    char *ebcdic_name, *ebcdic_value, *utf_value; 
    jstring value;
    utf_name = (env*)->GetStringUTFChars(env,name,NULL),
    ebcdic_name = _a2e_dup(utf_name); 
    (*env)->ReleaseStringUTFChars(env,name,utf_name); 
    ebcdic_value = getenv(ebcdic_name); 
    free(ebcdic_name); 
    if (ebcdic_value == NULL) 
      return NULL;
    if (strncmp(ebcdic_value,"JAVA_",5) == 0) 
      utf_value = _e2a_dup(ebcdic_value+5); 
    else 
       utf_value = _e2a_dup(ebcdic_value); 
    value = (*env)->NewStringUTF(env,utf_value); 

    free(utf_value); 
    return value; 
}


The above sample code does not contain any error handling. It is implicitly assumed that in all strings only characters from the 7 bit ASCII character set will occur. Moreover, this code is naturally very much BS2000-specific.

ASCII strings in the C code

As of version V3.0B, the C/C++ compiler allows you to generate an equivalent ASCII code as an alternative to the normal EBCDIC encoding for string and character literals. This setting must apply to a complete compilation unit (source file) and is controlled via the compiler options -Kliteral_encoding_ascii and -Kliteral_encoding_ascii_full. The difference between the two options lies in the treatment of octal and hexadecimal sequences in such literals. With -Kliteral_encoding_ascii such literal parts are not converted.

ASCII strings in the C runtime system

In addition to the above conversion routines, the C runtime system provides further support for the use of ASCII strings and characters. All key XPG4 functions that work with or return strings or characters are available in a variant for ASCII coding. When one of the compiler options for ASCII use described in the section "ASCII strings in the C code" is set, the corresponding library functions are generally used automatically without the need for user intervention. You can change this behavior for mixed operation (see the manual “CRTE” [3]).

If the compiler option -Kieee_floats is set at the same time, the combined ASCII/IEEE variants are used (e.g. with printf).

As of C Compiler V3.1A and CRTE V2.4C, the arguments of the vector argv[] are passed as ASCII strings when compiling the main program with one of the compiler options described in the section "ASCII strings in the C code". The global variables of the C runtime system tzname and the strings of environ are saved as ASCII strings. Explicit conversion of argv[] is therefore unnecessary.

If explicit access is made to the strings of the global variables tzname or environ, it should be noted that as of JENV V1.4B these are stored as ASCII strings (formerly EBCDIC strings). However, the Technical Standard “the Single UNIX Specification” warns against explicit access to the environ variable (see “X/Open System Interface (XSI) Specification” [16]). Implicit access using getenv() and putenv() functions as in the past and is compatible with previous versions.


Example

If you use these options, the above C program could look like this:


#include <jni.h> 
#include ".....h" //        javah generated Header 
#include <stdlib.h>

JNIEXPORT jstring JNICALL
Java_..._get_jenviron(JNIEnv *env, jobject jthis,
                      jstring name) 
{ 
   const char *utf_name;
   char *utf_value;
   utf_name = (*env)->GetStringUTFChars(env,name,NULL); 
   utf_value = getenv(utf_name); 
   (*env)->ReleaseStringUTFChars(env,name,utf_name); 
   if (utf_value == NULL) 
       return NULL;
   if (strncmp(utf_value,"JAVA_",5) == 0) 
     return (*env)->NewStringUTF(env,utf_value+5); 
   else 
     return (*env)->NewStringUTF(env,utf_value); 
}


This implementation is exactly the same as one which could also be used on Unix systems This form is therefore the one most highly recommended for ported code.