iSeries Data Migration Toolkit

Requirements

Sun Java 2 V1.4 SDK or equivalent.

Key Features
Description

The utility of the data processed and stored on IBM iSeries and AS/400 systems may be extended by copying the data to other platforms, where additional processing capability is available. IBM systems employ proprietary encoding schemes to store data and reformatting this data for use on other platforms requires a description of its structure to guide conversion. The structure of data stored on IBM systems is described using a proprietary language that may be interpreted and translated to produce the equivalent data structure description in a common standard language employed on other platforms.



Users of the iSeries Data Migration Toolkit (DMK) enables users of these IBM systems to easily migrate their data files to any other platform, or into the Java environment of their iSeries systems. It is a collection of tools, executing on these other platforms, that translate physical file Data Description Specifications (DDS) into equivalent Extensible Markup Language (XML) descriptions, and create Java programs from these data structure descriptions for reformatting iSeries binary data for loading into files or databases, or for analysis. The XML descriptions enable these other platforms to have the equivalent record structure description capability that DDS provides for the iSeries systems. The toolkit uses and generates standard Java as distributed freely by Sun in their SDK. The DMK is platform-independent and no software other than a text editor and the SDK are required. A DMK for System/38 DDS and data may be special ordered.

Copying iSeries Files

DDS source files and binary data files are copied from the iSeries or AS/400 system over the user's local area network (LAN) into a connected system. If LAN connectivity to the AS/400 is not available, the source files and binary data are copied to removable media readable by the target system. DDS source file encoding translation from EBCDIC into ASCII is normally automatically provided by the means used for copying between systems. Custom translators for iSeries source files with encodings untranslatable into UTF-8 may be special ordered.

DDS to XML Translation

Physical file DS source files written in the proprietary IBM specification language are translated into XML, using tags derived from the DDS language. The entire DDS specification is translated. Specifications and overrides referencing previously translated field reference files are substituted when the resulting XML description is next used. An example Java program for reading and processing XML is included. The program uses the Simple API for XML (SAX) to read and parse the XML.

Field Reference Files

DDS field reference files, the data dictionary feature of DDS, are first converted to XML, then translated into Document Type Definition (DTD) files to provide substitution entities for XML file descriptions coded with substitutable references. Changes to the field reference XML are propagated to referencing descriptions when next used. This DMK feature parallels the use of field reference files on iSeries systems.

SQL DDL Schemas

The generated XML descriptions are used by the toolkit to produce ANSI SQL Data Definition Language (DDL) schemas. The schema generator utilizes a commonly-used subset of DDS positional and keyword parameters. Date and time fields are described as plain text fields, because correct date and time translation depends on the combination of source locale and the target database. Custom implementation of these and other keywords may be provided on special order.

Data Conversion

The DMK generates Java source code programs from XML file descriptions for conversion of iSeries binary data into Unicode comma-delimited text. The conversion programs translate binary data field-by-field and record-by-record. Fields are translated according to their data types and other specifications from the subset mentioned. Text field translation is dependent on the encoding configuration of the source iSeries system as specified by DDS CCSID keyword parameters or as specified by the user. EBCDIC single-byte character (SBCS) and double-byte (DBCS) to Unicode conversions are supported through Java classes from the Java SDK. Some 40 national EBCDIC encodings and their variants are directly supported. Custom encodings may be special ordered.

The Java conversion programs are easily adaptable by programmers to reformat data files into XML for export to other environments, or for direct insertion of records into flat files, or rows into relational tables, without intermediate loader files. An alternative use of the method allows XML-described fixed record length binary data files from any source to be translated, including non-IBM multi-byte encodings. Custom conversion program generators may be special ordered.

Field Name Translation

iSeries programming languages permit the use of special characters (asterisks, etc.) in file, record and field names. Such characters are usually illegal or have special meaning in other programming languages, including Java. The DDS translator provides default or user-defined substitution of such characters.

Field Name Substitution

Use of the DDS ALIAS keyword parameter allows the toolkit to optionally substitute long data field names in DDL schema for the abbreviations imposed by the fixed form DDS field name specification. Custom logic for file, record and field name substitution may also be developed on special order.

Supported Field Types

Field Type

Data Conversion

Text

yes

Zoned Decimal

yes

Binary

yes

Floating point

yes

Hexadecimal

special order

Date

alphanumeric

Time

alphanumeric

Timestamp

alphanumeric

Supported Keyword Parameters

Keyword Parameter

Description

Translated Into XML

Translated Into DTD

Used In Data Conversion

ABSVAL

Absolute Value

yes



ALIAS

Alternative Name

yes

yes


ALTSEQ

Alternative Collating Sequence

yes



ALWNULL

Allow Null Value

yes



CCSID

Coded Character Set Identifier

yes


yes

CHECK

Check

yes

yes


CHKMSGID

Check Message Identifier

yes

yes


CMP

Comparison

yes

yes


COLHDG

Column Heading

yes

yes


COMP

Comparison

yes

yes


DATFMT

Date Format

yes

yes

special

DATSEP

Date Separator

yes

yes

special

DESCEND

Descend

yes



DFT

Default

yes



DIGIT

Digit

yes



EDTCDE

Edit Code

yes

yes


EDTWRD

Edit Word

yes

yes


FCFO

First-Changed First-Out

yes



FIFO

First-In First-Out

yes



FLTPCN

Floating-Point Precision

yes

yes

yes

FORMAT

Format

yes


yes

LIFO

Last-In First-Out

yes



NOALTSEQ

No Alternative Collating Sequence

yes



RANGE

Range

yes



REF

Reference

yes


yes

REFFLD

Referenced Field

yes


yes

REFSHIFT

Reference Shift

yes

yes


TEXT

Text

yes

no


TIMFMT

Time Format

yes

yes

special

TIMSEP

Time Separator

yes

yes

special

UNIQUE

Unique

yes


yes

UNSIGNED

Unsigned

yes



VALUES

Values

yes



VARLEN

Variable-Length Field

yes

yes

yes

ZONE

Zone

yes



Supported EBCDIC Encodings

CCSID

SDK Support

Description

37

Cp037

US, Canada, Netherlands, Portugal, Brazil, New Zealand, Australia

256


Netherlands

273

Cp273

Austria, Germany

277

Cp277

Denmark, Norway

278

Cp278

Finland, Sweden

280

Cp280

Italy

284

Cp284

Catalan/Spain, Spanish Latin America

285

Cp285

United Kingdom, Ireland

290


Japan Katakana (extended) range

297

Cp297

France

300


Japan, English

420

Cp420

Arabic

423


Greece

424

Cp424

Hebrew

500

Cp500

Belgium, Canada, Switzerland, International Latin-1

833


Korea (extended range)

834


Korea Host DBCS

835


DBCS Traditional Chinese Host

838

Cp838

Thailand extended SBCS

870

Cp870

Latin 2 Multilingual

871

Cp871

Iceland

875

Cp875

Greek

880


Cyrillic Multilingual

892


EBCDIC, OCR A

893


EBCDIC, OCR B

905


Turkey Latin-3

918

Cp918

Pakistan (Urdu)

924


Latin 9

930

Cp930

Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026

933

Cp933

Korean Mixed with 1880 UDC, superset of 5029

935

Cp935

Simplified Chinese Host mixed with 1880 UDC, superset of 5031

937

Cp937

Traditional Chinese Host mixed with 6204 UDC, superset of 5033

939

Cp939

Japanese Latin Kanji mixed with 4370 UDC, superset of 5035

1025

Cp1025

Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovina, Macedonia (FYR)

1026

Cp1026

Latin-5 Turkey

1027


Japanese (Latin) Extended

1069


Latin 4

1087


Symbol Set (Adobe)

1097

Cp1097

Iran (Farsi)/Persian

1110


Latin 2 Multilingual

1112

Cp1112

Baltic Multilingual

1113


Latin 6

1122

Cp1122

Estonia

1123

Cp1123

Ukraine

1130


Vietnamese

1132


Lao

1136


Hitachi Katakana

1137


Devanagari

1140

Cp1140

Variant of Cp037 with Euro character

1141

Cp1141

Variant of Cp273 with Euro character

1142

Cp1142

Variant of Cp277 with Euro character

1143

Cp1143

Variant of Cp278 with Euro character

1144

Cp1144

Variant of Cp280 with Euro character

1145

Cp1145

Variant of Cp284 with Euro character

1146

Cp1146

Variant of Cp285 with Euro character

1147

Cp1147

Variant of Cp297 with Euro character

1148

Cp1148

Variant of Cp500 with Euro character

1149

Cp1149

Variant of Cp871 with Euro character

1153


Latin 2 Multilingual with euro

1154


Cyrillic Multilingual with euro

1155


Turkey with euro

1156


Baltic Multi with euro

1157


Estonia with euro

1158


Cyrillic, Ukraine with euro

1164


Vietnamese with euro

1165


Latin 2 EBCDIC/Open Systems

1364


EBCDIC

1388


EBCDIC

4396


Japanese Host DB including 1880

5026


EBCDIC, Subset of 933

5035


EBCDIC

5123


EBCDIC

8482


Host SBCS Katakana

9030


Thailand

28709


SBCS Traditional Chinese Host (w/ euro update)

The AS/400 source code examples referenced on this page are from "Application System/400 Application Development by Example", SC41-9852-00, International Business Machines Corporation, 1991.