iSeries Data Migration Toolkit

iSeries Data Migration Toolkit

Requirements

Sun Java 2 V1.4 SDK or equivalent.

Key Features

extends the usefulness of IBM iSeries and AS/400 data

completely converts iSeries or AS/400 physical file DDS and data to the Java platform

```
converts physical file DDS into XML 
```

converts field reference file XML into DTD

produces SQL DDL schemas for creating relational tables

produces source code Java programs that convert iSeries binary data into loader format

uses national EBCDIC to Unicode conversion

generated source code for conversion programs is customizable

substitutes user-defined characters for illegal characters in names

substitutes long descriptive field names for abbreviated field names

Description

The utility of the data processed and stored on IBM iSeries and AS/400 systems may be extended by copying the data to other platforms, where additional processing capability is available. IBM systems employ proprietary encoding schemes to store data and reformatting this data for use on other platforms requires a description of its structure to guide conversion. The structure of data stored on IBM systems is described using a proprietary language that may be interpreted and translated to produce the equivalent data structure description in a common standard language employed on other platforms.

Users of the iSeries Data Migration Toolkit (DMK) enables users of these IBM systems to easily migrate their data files to any other platform, or into the Java environment of their iSeries systems. It is a collection of tools, executing on these other platforms, that translate physical file Data Description Specifications (DDS) into equivalent Extensible Markup Language (XML) descriptions, and create Java programs from these data structure descriptions for reformatting iSeries binary data for loading into files or databases, or for analysis. The XML descriptions enable these other platforms to have the equivalent record structure description capability that DDS provides for the iSeries systems. The toolkit uses and generates standard Java as distributed freely by Sun in their SDK. The DMK is platform-independent and no software other than a text editor and the SDK are required. A DMK for System/38 DDS and data may be special ordered.

Copying iSeries Files

DDS source files and binary data files are copied from the iSeries or AS/400 system over the user's local area network (LAN) into a connected system. If LAN connectivity to the AS/400 is not available, the source files and binary data are copied to removable media readable by the target system. DDS source file encoding translation from EBCDIC into ASCII is normally automatically provided by the means used for copying between systems. Custom translators for iSeries source files with encodings untranslatable into UTF-8 may be special ordered.

DDS to XML Translation

Physical file DS source files written in the proprietary IBM specification language are translated into XML, using tags derived from the DDS language. The entire DDS specification is translated. Specifications and overrides referencing previously translated field reference files are substituted when the resulting XML description is next used. An example Java program for reading and processing XML is included. The program uses the Simple API for XML (SAX) to read and parse the XML.

Field Reference Files

DDS field reference files, the data dictionary feature of DDS, are first converted to XML, then translated into Document Type Definition (DTD) files to provide substitution entities for XML file descriptions coded with substitutable references. Changes to the field reference XML are propagated to referencing descriptions when next used. This DMK feature parallels the use of field reference files on iSeries systems.

SQL DDL Schemas

The generated XML descriptions are used by the toolkit to produce ANSI SQL Data Definition Language (DDL) schemas. The schema generator utilizes a commonly-used subset of DDS positional and keyword parameters. Date and time fields are described as plain text fields, because correct date and time translation depends on the combination of source locale and the target database. Custom implementation of these and other keywords may be provided on special order.

Data Conversion

The DMK generates Java source code programs from XML file descriptions for conversion of iSeries binary data into Unicode comma-delimited text. The conversion programs translate binary data field-by-field and record-by-record. Fields are translated according to their data types and other specifications from the subset mentioned. Text field translation is dependent on the encoding configuration of the source iSeries system as specified by DDS CCSID keyword parameters or as specified by the user. EBCDIC single-byte character (SBCS) and double-byte (DBCS) to Unicode conversions are supported through Java classes from the Java SDK. Some 40 national EBCDIC encodings and their variants are directly supported. Custom encodings may be special ordered.

The Java conversion programs are easily adaptable by programmers to reformat data files into XML for export to other environments, or for direct insertion of records into flat files, or rows into relational tables, without intermediate loader files. An alternative use of the method allows XML-described fixed record length binary data files from any source to be translated, including non-IBM multi-byte encodings. Custom conversion program generators may be special ordered.

Field Name Translation

iSeries programming languages permit the use of special characters (asterisks, etc.) in file, record and field names. Such characters are usually illegal or have special meaning in other programming languages, including Java. The DDS translator provides default or user-defined substitution of such characters.

Field Name Substitution

Use of the DDS ALIAS keyword parameter allows the toolkit to optionally substitute long data field names in DDL schema for the abbreviations imposed by the fixed form DDS field name specification. Custom logic for file, record and field name substitution may also be developed on special order.

Supported Field Types

Field Type	Data Conversion
Text	yes
Zoned Decimal	yes
Binary	yes
Floating point	yes
Hexadecimal	special order
Date	alphanumeric
Time	alphanumeric
Timestamp	alphanumeric

Supported Keyword Parameters

Keyword Parameter	Description	Translated Into XML	Translated Into DTD	Used In Data Conversion
ABSVAL	Absolute Value	yes
ALIAS	Alternative Name	yes	yes
ALTSEQ	Alternative Collating Sequence	yes
ALWNULL	Allow Null Value	yes
CCSID	Coded Character Set Identifier	yes		yes
CHECK	Check	yes	yes
CHKMSGID	Check Message Identifier	yes	yes
CMP	Comparison	yes	yes
COLHDG	Column Heading	yes	yes
COMP	Comparison	yes	yes
DATFMT	Date Format	yes	yes	special
DATSEP	Date Separator	yes	yes	special
DESCEND	Descend	yes
DFT	Default	yes
DIGIT	Digit	yes
EDTCDE	Edit Code	yes	yes
EDTWRD	Edit Word	yes	yes
FCFO	First-Changed First-Out	yes
FIFO	First-In First-Out	yes
FLTPCN	Floating-Point Precision	yes	yes	yes
FORMAT	Format	yes		yes
LIFO	Last-In First-Out	yes
NOALTSEQ	No Alternative Collating Sequence	yes
RANGE	Range	yes
REF	Reference	yes		yes
REFFLD	Referenced Field	yes		yes
REFSHIFT	Reference Shift	yes	yes
TEXT	Text	yes	no
TIMFMT	Time Format	yes	yes	special
TIMSEP	Time Separator	yes	yes	special
UNIQUE	Unique	yes		yes
UNSIGNED	Unsigned	yes
VALUES	Values	yes
VARLEN	Variable-Length Field	yes	yes	yes
ZONE	Zone	yes

Supported EBCDIC Encodings

CCSID	SDK Support	Description
37	Cp037	US, Canada, Netherlands, Portugal, Brazil, New Zealand, Australia
256		Netherlands
273	Cp273	Austria, Germany
277	Cp277	Denmark, Norway
278	Cp278	Finland, Sweden
280	Cp280	Italy
284	Cp284	Catalan/Spain, Spanish Latin America
285	Cp285	United Kingdom, Ireland
290		Japan Katakana (extended) range
297	Cp297	France
300		Japan, English
420	Cp420	Arabic
423		Greece
424	Cp424	Hebrew
500	Cp500	Belgium, Canada, Switzerland, International Latin-1
833		Korea (extended range)
834		Korea Host DBCS
835		DBCS Traditional Chinese Host
838	Cp838	Thailand extended SBCS
870	Cp870	Latin 2 Multilingual
871	Cp871	Iceland
875	Cp875	Greek
880		Cyrillic Multilingual
892		EBCDIC, OCR A
893		EBCDIC, OCR B
905		Turkey Latin-3
918	Cp918	Pakistan (Urdu)
924		Latin 9
930	Cp930	Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
933	Cp933	Korean Mixed with 1880 UDC, superset of 5029
935	Cp935	Simplified Chinese Host mixed with 1880 UDC, superset of 5031
937	Cp937	Traditional Chinese Host mixed with 6204 UDC, superset of 5033
939	Cp939	Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
1025	Cp1025	Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovina, Macedonia (FYR)
1026	Cp1026	Latin-5 Turkey
1027		Japanese (Latin) Extended
1069		Latin 4
1087		Symbol Set (Adobe)
1097	Cp1097	Iran (Farsi)/Persian
1110		Latin 2 Multilingual
1112	Cp1112	Baltic Multilingual
1113		Latin 6
1122	Cp1122	Estonia
1123	Cp1123	Ukraine
1130		Vietnamese
1132		Lao
1136		Hitachi Katakana
1137		Devanagari
1140	Cp1140	Variant of Cp037 with Euro character
1141	Cp1141	Variant of Cp273 with Euro character
1142	Cp1142	Variant of Cp277 with Euro character
1143	Cp1143	Variant of Cp278 with Euro character
1144	Cp1144	Variant of Cp280 with Euro character
1145	Cp1145	Variant of Cp284 with Euro character
1146	Cp1146	Variant of Cp285 with Euro character
1147	Cp1147	Variant of Cp297 with Euro character
1148	Cp1148	Variant of Cp500 with Euro character
1149	Cp1149	Variant of Cp871 with Euro character
1153		Latin 2 Multilingual with euro
1154		Cyrillic Multilingual with euro
1155		Turkey with euro
1156		Baltic Multi with euro
1157		Estonia with euro
1158		Cyrillic, Ukraine with euro
1164		Vietnamese with euro
1165		Latin 2 EBCDIC/Open Systems
1364		EBCDIC
1388		EBCDIC
4396		Japanese Host DB including 1880
5026		EBCDIC, Subset of 933
5035		EBCDIC
5123		EBCDIC
8482		Host SBCS Katakana
9030		Thailand
28709		SBCS Traditional Chinese Host (w/ euro update)

The AS/400 source code examples referenced on this page are from "Application System/400 Application Development by Example", SC41-9852-00, International Business Machines Corporation, 1991.