Sunteți pe pagina 1din 118

Unicode in SAP NetWeaver

Sebastian Buhlinger SAP Consultant, HP-SAP EMEA CC


2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Agenda
1. 2. 3. 4.

Introduction to Unicode Unicode & SAP in General Technology in Depth Sizing Information for Unicodebased SAP Systems
2

3/31/2004

Introduction to Unicode

3/31/2004

1. Introduction to Unicode
What

is text? History of character encoding Problem of character encoding From ASCII to Unicode What is Unicode exactly? The Unicode Standard Where is Unicode used? The Unicode Consortium Unicode Encodings
3/31/2004 4

What is text?

Code pages & encodings describe the handling of and the way text is stored in
Computers Files Data structures

Inside a computer program or data file, text is stored as a sequence of numbers just like everything else A character is a:
Letter, Digit, Period, Hyphen, Punctuation or Math symbol

Furthermore there are control characters typically not visible


3/31/2004 5

History of Character Encoding


Historically,

computers were pretty slow, had fairly little memory and were very expensive Up to 1960s I/O meant pushing holes into paper tapes Most of the character sets date back to punch-card age and are designed with these cards in mind In the early days of computers every hardware manufacturer used proprietary technology (and encodings) International data interchange was no issue and so nothing needed to fit together
3/31/2004 6

Problem of character encoding


Which

number is assigned to which character? When typing an A on the keyboard, the computer uses the character code as a basis for pulling the character shape of A from a font file listing with the same binary number, and displays or prints it The character A may also have different integer values in different programs or data files (A might be in an Arabic font file) In some instances no number available for certain characters (f.i. &auml ) All data encoded in the form of binary numerical codes
3/31/2004 7

Character repertoire
English

alphabet: with some digits and little

more: ~ 60 characters
Western Korean: Chinese

European Standard: ~ 300 characters for several languages ~12.000 syllables dictionaries: ~ 50.000 letters

Hundreds

of other characters in common use, such as math and currency symbols


8

3/31/2004

From ASCII to Unicode


Most

character sets and encodings in 70s/80s were modifications or extensions of ASCII of them used 8-bit with a subset of the 94 used ASCII characters common encodings nowadays use single byte per character (SBCS) are all limited to 256 characters to that, none of them can even cover the letters for the Western European languages
9

Many Most They Due

3/31/2004

From ASCII to Unicode


Consequence:

many different 8-bit encodings were created to fulfill the needs of different user communities for data interchange in global networked information society and collaborative business world: single character set for all languages in use can encode 4.294.967.296 different characters, symbols and control characters
10

Solution

Unicode

3/31/2004

What is Unicode exactly?


Unicode Unicode

= universally encoded character set to store information from any language defines
properties for each character standardizes script behavior provides a standard algorithm for bi directional text defines cross-mappings for other standards

Unicode

defines a unique code value for every character, regardless of platform, program or programming language used

3/31/2004

11

What is Unicode exactly?


The

Unicode standard primarily encodes scripts rather than languages comprise several languages that historically share the same set of symbols many cases a script may serve to write dozens of languages (e.g. the Latin script) other cases one script complies to one language (e.g. Hangul)

Scripts In In

3/31/2004

12

What is Unicode exactly?


Additionally

it also includes punctuation marks, diacritics, mathematical symbols, technical symbols, musical symbols, arrows, dingbats etc. all, the Unicode Standard comprises >95.000 characters, ideograph sets, symbols (version 4.0)

In

3/31/2004

13

The Unicode Standard


The

Unicode Standard is a character coding system designed to support the worldwide


interchange, processing, and display of written text of the diverse languages and technical disciplines of the modern world

In

addition, it supports classical and historical texts of many written languages


14

3/31/2004

Where is Unicode used?


The

Unicode standards has been adopted by many software and hardware vendors Mosts OSs support Unicode Unicode is required for international document and data interchange, the Internet and the WWW, and therefore by modern standards such as:
Java, C#, Perl, Python Markup languages such as XML, HTML, XHTML, MathML, WML etc. JavaScript LDAP CORBA etc.
3/31/2004 15

The Unicode Consortium


The

Unicode Consortium is a non-profit organization originally founded to


develop, extend, and promote the use of the Unicode Standard

Members

of the Consortium include major computer corporations, software producers, database vendors, research institutions, international agencies, various user groups, and interested individuals
16

3/31/2004

The Unicode Consortium


The

Consortium cooperates with

W3C and ISO and has liaison status "C" with ISO/IEC/ JTC 1/SC2/WG2, which is responsible for in refining the specification and expanding the character set of ISO/IEC 10646

3/31/2004

17

Unicode Encodings
UTF

= Unicode Transformation Format UCS = Universal Character Set CESU = Compatibility Encoding Scheme
Conversion

between different encodings is a simple, bit-wise operation (defined in standard) No performance excessive conversion table necessary!
3/31/2004 18

Unicode Encodings
UTF-8:

Unicode Transformation based on 8bit representation Compatibility Encoding Scheme of UTF-16 on an 8-bit base Unicode Transformation based on 16-bit representation

CESU-8:

UTF-16:

3/31/2004

19

Unicode Encodings
UCS-2:

Universal Character Set 2 byte variation (16-bit) Unicode Transformation based on 32-bit representation Universal Character Set 4 byte variation (32 bit)

UTF-32:

UCS-4:

3/31/2004

20

Unicode Encodings
Not

all Unicode characters are 2 bytes long no doubling of hw requirements in the first place encoding determines the length of a character in one Unicode encoding can be longer than 1 byte; therefore Unicode characters can be longer than characters defined in a standard code page
21

Unicode

Character

3/31/2004

UTF-8
UTF-8 Its

is the 8-bit encoding of Unicode

a variable-width encoding and also a strict superset of 7-bit ASCII superset means that every character in 7-bit ASCII is available in UTF-8 with the same corresponding code point value character = 1byte 4 bytes in the encoding from European scripts: either 1or 2

Strict

Characters

bytes
Asian
3/31/2004

scripts: 3 or 4 bytes
22

UTF-8
UTF-8 Main

used for UNIX-platforms, HTML and most Internet Browsers benefits of UTF-8: compact storage requirements for European scripts in general European scripts will occupy less storage on disk and memory ease of migration > since 7-bit ASCII data remains the same in UTF-8, data conversion effort between ASCII based character sets and UTF-8 is reduced significantly
23

3/31/2004

UTF-8 / CESU-8 (8-bit encodings)


8-bit

encodings are well-suited for data transfer since all 7-bit ASCII and 8-bit ISO characters retain the same code points communication with legacy and nonUnicode systems variable character length
24

Easier

Downside:

3/31/2004

UCS-2

UCS-2 has a fixed width of 16 bit (2 bytes) UCS-2 is the Unicode encoding for Java & Win NT 4.0 Main benefits of UCS-2: More compact storage requirements for Asian scripts (each character represented with 2 bytes only) String processing will be faster because all characters are of the same width Good compatibility with Java and Microsoft clients Downside: UCS-2 can support Unicode characters defined up to Unicode 3.0 only (max. 65.536)
3/31/2004 25

UTF-16
UTF-16

is the 16-bit encoding of Unicode an extension of UCS-2

Basically One

Unicode character can be 2 or 4 bytes in the encoding from European and most Asian scripts are represented in 2 bytes characters are represented in 4 bytes

Characters

Supplementary UTF-16
3/31/2004

is the main Unicode encoding from Windows 2K


26

UTF-16
Main

benefits of UTF-16: More compact storage requirements for Asian scripts (2 bytes for commonly used characters) Ideal if European and Asian scripts are used together --> UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part) Balance of efficient access to characters and economical use of storage

Above
3/31/2004

mentioned points reason for use of UTF-16 in SAP Web Application Server
27

UCS-2 / UTF-16 (16-bit encodings)


16-bit

encodings offer a compromise between the pros and cons of the 8-bit and the 32-bit encodings, respectively do not need as much memory as 32-bit encodings, but offer quasi fixed character length has a fixed character length, but it cannot define more than 2^16 (65.636) characters
28

They

UCS-2

3/31/2004

UTF-32
32-Bit

encoding when memory space is no concern

Popular

Fixed

width (4Byte)

3/31/2004

29

UCS-4 / UTF-32 (32-bit encodings)


All

32-bit encodings have a fixed length

This

advantage is outweighed by the extensive memory & storage requirements

3/31/2004

30

Example #1
Character UTF-8 UCS-2 UTF-16

A c

41 63 C3 86 C3 B6 DA 64 E4 BA 75 F0 9D 84 9E

0041 0063 00C6 00F6 0664 9875 N/A

0041 0063 00C6 00F6 0664 9875 D834 DD1E


31


3/31/2004

Example #2 character U+AC00


UTF8 HEX BIN E 1110 A 1010 B 1011 0 0000 8 1000 0 0000

Lead Byte Indicator


Remove lead bytes 1110 1010 1010 Regroup bits 1010

Trailing Byte Indicator


1011 11 0000 0000 0000 1000 00 0000 0000 0000

1100

UTF16

BIN HEX

1010 A

1100 C

0000 0

0000 0
32

3/31/2004

Unicode & SAP in General

3/31/2004

33

2. Unicode & SAP in General


Languages

and characters Characters on Disk/Memory Code Pages SAP & Code Pages Language Combinations before Unicode Recommendations from SAP (w/o Unicode) Unicode-compliant SAP products When/why do customers need Unicode?
3/31/2004 34

Language and characters


Languages Only A

are written in fonts

a few languages use the same fonts

font is a group of characters

3/31/2004

35

Characters on Disk/Memory
A a

character is stored as a byte sequence on disk

code page defines the mapping between the byte sequence and a character

Characters on Disk/Memory

3/31/2004

36

Code Pages
The

code page determine what character you can see and enter

Characters on Disk/Memory

3/31/2004

37

Code Pages
different

code pages map different characters to the same byte sequence


Single Byte Double Byte

Characters on Disk/Memory

3/31/2004

38

SAP & Code Pages

3/31/2004

39

Language Combinations before Unicode

Single Standard Code Pages


supports specific sets of languages the number and combination of languages that are supported cannot be altered

Standard code pages and R/3 languages (w/o EBCDIC)

Double-Byte Code Pages


3/31/2004 40

Language Combinations before Unicode


It

is also possible to specify a customerspecific language; this language must use one of the code pages that SAP supports; see Note 0112065

3/31/2004

41

Language Combinations before Unicode

Blended Code Pages ( Rel. 3.1D) SAP proprietary code pages that contain characters from one or more standard code pages
increases the combinations of languages that can be used functionally, a Blended Code Page system uses a single code page a Blended Code Page is a single code page system users can see and enter all characters contained in the code page, regardless of their log-in language
3/31/2004 42

Language Combinations before Unicode

SAP Code Page

Supported Languages

3/31/2004

43

Language Combinations before Unicode


the

availability of SAP blended code pages is platform dependent, because SAP blended locales need to be created for each platform
Blended Locale Status (x = available = not available)

3/31/2004

44

Language Combinations before Unicode

MDMP ( Rel. 3.1I)


Multi-Display / Multi-Processing

allows dynamic code page switching on the application server therefore permits any combination of standard code pages on one system the log-on language determines the code page that is active for each user an MDMP system is recommended if:
1. one or more additional code pages are required to add languages to your existing installation 2. a blended code page cannot support the combination of languages you need for a new installation. For example, an MDMP system with the code pages 1100 and 8000, allows German and Japanese users to log onto the same R/3 system in their respective languages

3/31/2004

45

Language Combinations before Unicode


Example
8000 - SJIS

Front End

DB

Application Server
1100 ISO-1

Japan

Each

user can only access one code page at a time: a user who logs in as a Japanese user cannot enter German characters, and all German characters in the database will not be correctly displayed
46

Germany

3/31/2004

Language Combinations before Unicode


Example

Japanese User

German User

3/31/2004

47

Language Combinations before Unicode


Please Note:
It

is possible for a user to log on with German and then manipulate the character set and font settings so that he can enter what appear to be Japanese characters; these characters will not be correctly stored in the database and this data will be corrupt a user wants to enter f.i. Japanese, he/she must log on in Japanese

If

3/31/2004

48

Language Combinations before Unicode


Please Note:
To

insure that no data corruption occurs, the following restrictions must be followed: Global data must contain only 7-bit ASCII characters, which are in all code pages Users may use only the characters of their log-in language or 7-bit ASCII Batch processes must be assigned with the correct user ID and language EBCDIC code pages are not supported

3/31/2004

49

Recommendations from SAP


(w/o Unicode)
In

general, using a single standard code page for new installations and upgrades is the optimal decision additional languages or language combinations are needed, SAP recommends Unambiguous Blended Code Pages for new installations and MDMP for existing installations Blended Code Pages only support certain language combinations and therefore an MDMP setup may be the only possibility for new installations as well
50

If

Unambiguous

3/31/2004

Unicode-compliant SAP products


All

Unicode installations are currently planned only with written permission of SAP carried out as customer projects together with SAP, except of new installations of R/3 Enterprise Extension Set 2.0

3/31/2004

51

Unicode-compliant SAP products


Note 79991)
SAP

(SAP

Web Application Server ( 6.20) Customer Relationship Management

mySAP

(CRM)
The Unicode version of mySAP CRM 4.0 is available via Ramp-Up
mySAP

Supply Chain Management (SCM)

The Unicode version of mySAP SCM 4.0 is available via Ramp-Up


mySAP

Supplier Relationship Management (SRM)

The Unicode version of mySAP SRM 4.0 is available via Ramp-Up conversions (with or without MDMP) of existing SRM installations
3/31/2004 52

Unicode-compliant SAP products


Note 79991)
mySAP

(SAP

Business Intelligence (BW)

The Unicode version of mySAP BW 3.5 is available via Ramp-Up the conversion of existing BW installations as customer project SAP Note 643813 has a collection of all relevant SAP notes concerning Unicode-based SAP BW installations
mySAP

Product Lifecycle Management (PLM)

The Unicode version of mySAP PLM 4.0 is available via Ramp-Up


SAP

R/3 Enterprise (Ext. 1.10 & higher) Exchange Infrastructure


53

SAP

3/31/2004

When/why do customers need Unciode?


Global

businesses that require IT systems to support multilingual data without any restrictions f.i. customers with one WW central SAP system interfaces open the door to a global customer base, and IT systems must consequently be able to support multiple local languages simultaneously

Web

3/31/2004

54

When/why do customers need Unciode?


With

J2EE integration, mySAP components fully support web standards, and with Unicode, it now can take full advantage of XML and Java Unicode makes it possible to seamlessly integrate inhomogeneous SAP and non-SAP system landscapes NetWeaver

Only

3/31/2004

55

Technology in Depth

3/31/2004

56

3. Technology in Depth
Unicode

& Operating Systems Unicode & Databases SAP Unicode-based Code Pages How to Unicode-enable a program Unicode-enabled ABAP Migrating to Unicode enabled ABAP Unicode Conversion, IMIG Lab Test SAP System-to-System communication Printing & Output Management
3/31/2004 57

Unicode & Operating Systems HP-UX


HP-UX All

is Unicode-enabled since version 10.x

Unicode locales in the HP-UX operating environment are based on the UTF-8 format locale includes a base language in the UTF-8 code set and the regional data related to this base language includes local formatting rules, text messages, help messages, and other related files locale also supports several other scripts for input, display, code conversion, and printing
58

Each

This

Each

3/31/2004

Unicode & Operating Systems Windows


Some

Unicode support has been included in Microsoft Windows since Windows 95, and Windows NT 4 2000 and Windows XP/2003 are based on Unicode instead of the ANSI or WGL4 character sets Win2K, your version of Windows may have used a different character set if you live in a country such as Egypt, Greece, Israel, Russia or Thailand that uses a non-Latin alphabet

Windows

Before

3/31/2004

59

Unicode & Operating Systems Windows


The

first 128 characters were the same as in ANSI, but many of the places in the second set of 128 were taken by characters from the Arabic, Greek, Hebrew, Cyrillic or Thai alphabets caused and still causes problems when moving documents between operating systems such as DOS, Windows, Mac OS and UNIX or exchanging documents electronically that were created on computers using different character sets

This

3/31/2004

60

Unicode & Operating Systems Linux


Before

UTF-8 emerged, Linux users all over the world had to use various different languagespecific extensions of ASCII popular were ISO 8859-1 and ISO 8859-2 in Europe, ISO 8859-7 in Greece, KOI-8 / ISO 8859-5 / CP1251 in Russia, EUC and Shift-JIS in Japan, BIG5 in Taiwan, etc. made the exchange of files difficult and application software had to worry about various small differences between these encodings
61

Most

This

3/31/2004

Unicode & Operating Systems Linux


Because

of these difficulties, major Linux distributors and application developers have now started to phase out these older legacy encodings in favor of UTF-8 support has improved dramatically over the last few years and ever more people now use UTF8 on a daily basis in
text files (source code, HTML files, email messages, etc.) file names standard input and standard output, pipes

UTF-8

3/31/2004

62

Unicode & Operating Systems Linux


In

UTF-8 mode, terminal emulators (such as xterm) transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16bit font
63

Similarly,

3/31/2004

Unicode & Operating Systems Linux


Before

you start experimenting with UTF-8 under Linux, update your installation to a recent distribution with up-to-date UTF-8 support is particular the case if you use an installation older than SuSE 8.1 or Red Hat 8.0 these, UTF-8 support was far too limited and experimental to be recommendable for daily use
64

This

Before

3/31/2004

Little vs. Big Endian


UCS

and Unicode are first of all just code tables that assign integer numbers to characters exist several alternatives for how a sequence of such characters or their respective integer values can be represented as a sequence of bytes two most obvious encodings store Unicode text as sequences of either 2 or 4 bytes sequences

There

The

3/31/2004

65

Little vs. Big Endian


The

official terms for these encodings are UCS2 and UCS-4, respectively otherwise specified, the most significant byte comes first in these (Big Endian convention) ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte we want to have a UCS-4 file, we have to insert three 0x00 bytes instead before every ASCII byte
66

Unless

An

If

3/31/2004

Little vs. Big Endian


UTF-16 UTF-16
[Big Endian]

Character

Unicode Scalar Value

UTF-8 / CESU-8

[Little Endian]

A
3/31/2004

U+0041 U+00C4 U+03B1 U+05D0 U+6653

41

41 00

00 41

C3 84 C4 00 00 C4 CE B1 B1 03 03 B1 D7 90 D0 05 05 D0 E6 99 93 53 66 66 53
67

Unicode & Databases


Supported Databases by SAP (WAS 6.20)
P Available ?
Currently not available

--

Unsupported in general

Win2K HP-UX Solaris AIX OS/400 OS/390 Linux SQL Server Oracle DB2 SAP DB P P P P -P P P -P P P -P P P --P ----P P P

?
--

3/31/2004

68

Unicode & Databases


Manufacturer
SQL Server Oracle

Version Encodings
2000 UTF-16 7.2 UTF-8 8 UTF-8 9i UTF-8 / UTF-16 10g UTF-8 / UTF-16

DB2

AIX CESU-8 AS400 UTF-16

SAP DB

7.0 UTF-16 8.0 UTF-8

3/31/2004

69

SAP Unicode-based Code Pages


With

the Unicode enablement of mySAP.com components (check chapter #1), the old code page management had to be changed of using SAP character numbers all code pages are now based on Unicode character Ids 5 digit SAP Character numbers no longer adequate

Instead

This change is valid for both Unicode and Non-Unicode Systems!


3/31/2004 70

SAP Unicode-based Code Pages

3/31/2004

71

SAP Unicode-based Code Pages

Connection between SAP character number & Unicode character ID is found in table TCP01 You can see the connection in the SPAD character section NOTE: not every character has a corresponding Unicode character ID! f.i.

3/31/2004

72

SAP Unicode-based Code Pages


The

migration of all SAP code pages from the old to the new format was done using report RSCP0126 definition of code pages is still in TCP00

The

Customers must migrate their own code pages (9xxx) using RSCP0126 themselves!

3/31/2004

73

How to Unicode-enable a program


Separate

Unicode and Non-Unicode version of


1 character = 1 byte

R/3
ABAP source

Non-Unicode R/3

(types C, N, D, T, STRING)

Non-Unicode kernel Non-Unicode database

1 character = 2 bytes UTF-16

Unicode R/3

(types C, N, D, T, STRING)

Unicode kernel Unicode database

No explicit Unicode data type in ABAP Single ABAP source for Unicode and non-Unicode systems
3/31/2004 74

How to Unicode-enable a program


Major Minor

part of ABAP coding is ready for Unicode without any changes part of ABAP coding has to be adapted to comply with Unicode restrictions (f.i. syntactical restrictions)

3/31/2004

75

How to Unicode-enable a program


Program

attribute Unicode checks active

3/31/2004

76

Unicode Enabled ABAP


Design Goals
Platform Highest

independence level of compatibility to the pre-Unicode

Identical behavior on Unicode and non-Unicode systems

world
Minimize costs for Unicode enabling of ABAP Programs

Main Features
Clear

distinction between character and byte processing 1 Character <> 1 Byte

3/31/2004

77

Unicode Enabled ABAP


ABAP lists: Difference between memory and display
length

3/31/2004

78

Migrating to Unicode enabled ABAP


Step 1
In

non-Unicode system

Adapt

all ABAP programs to Unicode syntax and runtime restrictions attribute "Unicode enabled" for all programs

Set

3/31/2004

79

Migrating to Unicode enabled ABAP


Step 2

Set up a Unicode system


Unicode kernel + Unicode database Only ABAP programs with the Unicode attribute are executable

Do runtime tests in Unicode system Check for runtime errors Look for semantic errors Check ABAP list layout with former double byte characters
3/31/2004 80

Migrating to Unicode enabled ABAP


Use UCCHECK to analyze your applications:
Remove Inspect

errors

statically not analyzable places (optional)

Untyped field symbols Offset with variable length Generic access to database tables
Set Do

Unicode program attribute using UCCHECK or SE38 / SE24 / ... additional checks with SLIN (e.g. matching of actual and formal parameters in function modules)
81

3/31/2004

Migrating to Unicode enabled ABAP

3/31/2004

82

Migrating to Unicode enabled ABAP

3/31/2004

83

Upgrade to Unicode

Upgrade to Unicode
With

Unicode, there are no limitations on users, and all languages in the ISO639 standard can be used is technically supported as of Basis Release 6.20, see Note 0379940 for more information single code page system (standard or Unambiguous Blended Code Page) can be upgraded to Unicode using the normal upgrade method
85

Unicode

3/31/2004

Unicode Conversion Roadmap


Preparation During preparation, topics such as
additional hardware requirements, downtime issues, Unicode-enabling of customer developments, and the special treatment of MDMP systems

have to be taken into consideration

3/31/2004

86

Unicode Conversion Roadmap


Conversion The Unicode conversion process is based on a system copy, and during this process, the database conversion and system shutdown/restart are as automated as possible For small to mid-size databases (< 1 TB), this is based on an SAP Unload/Reload of the complete database; minimum downtime tools will be used for larger databases.

3/31/2004

87

Unicode Conversion Roadmap


Post-Conversion

Once the Unicode system is up and running, you need to


verify data consistency on a scenario basis, as well as carry out general integration testing

For systems that support multiple languages, special emphasis needs to be placed on cross-language handling during the test phase. Correction tools are provided by SAP, which can be used in the case that conversion did not run properly.
3/31/2004 88

Unicode Conversion Roadmap


Post-Conversion

Additional Tool: SAP Data Management - reducing the database size and growth To keep your database costs in check, the SAP Data Management service frees up valuable database resources by showing you how to reduce the size and growth of your database by typically 25 % (see details).

3/31/2004

89

Unicode Conversion at a Glance


Preparation Conversion Post-Conversion

Set up the Unicode Conversion Project Check Prerequisites Data Analysis for downtime minimization special MDMP treatment Enabling of Customer Developments

Highly automated System will be down during database conversion Unload /reload process for small databases Minimum downtime tool for large databases

Unicode system is up and running Verification of Data Consistency Integration Testing focused on language handling

3/31/2004

90

Upgrade Paths to Unicode (R/3 Enterprise)


Source system Target system

R/3 3.1i

R/3 4.0b Direct upgrade

R/3 Enterprise non-Unicode

R/3 Enterprise
Conversion

Unicode

R/3 4.5b

l First upgrade, then conversion to Unicode R/3 4.6b l R/3 Enterprise Ramp-Up started 2002-07 l Unicode availability follows a phase of restricted shipment with pilot customers
91

R/3 4.6c
3/31/2004

Upgrade Paths to Unicode (BW 3.1)


Source system Target system

BW 2.0B

BW 3.1 non-Unicode BW 2.1C

BW 3.1
Conversion

Unicode

l Interfacing R/3 MDMP on a project base only l Unicode BEXGUI restrictions apply l First upgrade, then conversion to Unicode l BW 3.1 Ramp-Up starting 2002-12 BW 3.0
3/31/2004

l Unicode availability follows a phase of restricted shipment with pilot customers


92

Upgrade Paths to Unicode (CRM 3.1)


Source system Target system

CRM 2.0C

CRM 3.1 non-Unicode CRM 2.0B

CRM 3.1
Conversion

Unicode

l Selected scenarios only cooperation with SAP GBU CRM required l First upgrade, then conversion to Unicode l CRM 3.1 Ramp-Up starting 2002-12 CRM 3.0
3/31/2004

l Unicode availability follows a phase of restricted shipment with pilot customers


93

Unicode Conversion at a Glance


Preparation Conversion Post-Conversion

Set up the Unicode Conversion Project Check Prerequisites Data Analysis for downtime minimization special MDMP treatment Enabling of Customer Developments

Highly automated System will be down during database conversion Unload /reload process for small databases Minimum downtime tool for large databases

Unicode system is up and running Verification of Data Consistency Integration Testing focused on language handling

3/31/2004

94

Prerequisites, special MDMP treatment

OSS Note 548016 Conversion from Unicode to non-Unicode is not possible The Unicode Conversion of MDMP AND also Ambiguous Code page systems ( Code Page numbers 6100, 6200 and 6500 ) is only supported on project basis with SAP involvement

OSS Note 543715 The Unicode Conversion of a BW 3.1 system requires additional steps regarding the system copy OSS Note 573044 If you are using HR functionality within R/3 Enterprise , also additional steps are mandatory
3/31/2004 95

6.30 Unicode & MCOD


With SAP WebAS 6.30 a database abstraction layer for the Java stack was introduced OpenSQL for Java Tables of the Java stack are stored in the same database instance like the tables of the ABAP stack in two different schema (except Informix) The concept of MCOD installations is fully supported by the combined stack of ABAP and Java
ABAP Stack (non Unicode/Unicode) System QA1 Java Stack (Unicode) ABAP Stack (non Unicode/Unicode) System TC2 Java Stack (Unicode) SAPTC2DB SAPQA1DB SAPTC2 SAPQA1

3/31/2004

96

Unicode Conversion at a Glance


Preparation Conversion Post-Conversion

Set up the Unicode Conversion Project Check Prerequisites Data Analysis for downtime minimization special MDMP treatment Enabling of Customer Developments

Highly automated System will be down during database conversion Unload /reload process for small databases Minimum downtime tool for large databases

Unicode system is up and running Verification of Data Consistency Integration Testing focused on language handling

3/31/2004

97

Unicode Conversion - IMIG

Whitepaper:
SAP R/3 incremental migration test

http://saphpcc.bbn.hp.com/Global/Compet/migration/migration.HTM

3/31/2004

98

SAP System-to-System Communication

3/31/2004

99

SAP System-to-System communication


SAP

Web Application Server ( 6.20)

Only one source code exists for Unicode-based and nonUnicode-based systems, new developments can be smoothly exchanged The interfaces (e.g. RFC) have been extended, so that communication between other Unicode-based systems or non-Unicode-based systems is possible. Furthermore, SAP provides standard tools for the installation of (and conversion to) Unicode-based systems that can also be used for checking and Unicode-enabling of customer developments
3/31/2004 100

SAP System-to-System communication

solid lines: receiver can receive all characters dotted lines: receiver cannot receive characters, which are not in its own code page. But as long as you restrict the character set, data can be sent from everywhere to everywhere.
Unicode R/3

Latin-1

SJIS

http/RFC

MDMP R/3

WWW

SJIS

Latin-1 http/RFC
Non-Unicode

SJIS

R/3

3/31/2004

101

SAP System-to-System communication


RFC Unicode <-> Unicode
no problem
non

Unicode <-> non Unicode <-> non Unicode

old stuff, receiver converts code page if possible


Unicode

the Unicode side converts from/ to the code page of the non Unicode side MDMP is converted with a languages key System settings allow the configuration of error handling
3/31/2004 102

SAP System-to-System communication


RFC (SM59) Unicode <> non Unicode

3/31/2004

103

SAP System-to-System communication


RFC (SM59) Unicode <> non Unicode

3/31/2004

104

Printing & Output Management


What is a SAP device type?

configuration file for the SAP printer driver that ensures proper functionality between the SAP data stream and the printer or output device where the data is sent

Printer drivers & device types


In R/3, a distinction is made between "printer driver" and "device type A device type consists of a variety of attributes defined for an output device One of these attributes is the printer driver to be used by SAPscript (R/3 forms processor) for this particular printer
3/31/2004 105

Printing & Output Management

device types cover aspects such as control commands for font selection, page size, character set selection, character set used and so on a device type must be specified to enable directprinting from the SAP applications for every new printer defined in SAP environment device types are created by SAP for the entire HP LaserJet printer family on the basis of PCL5, PCL6 and PostScript SAP develops, tests and supports device types for HP products that can be found here:
http://h40045.www4.hp.com/printing_solutions/Device_Types.html
3/31/2004 106

Printing & Output Management


at

present, there are five SAPscript printer drivers They include:


HP-PCL5 (for example, HP Laserjet 3,4,5,6 series) PostScript printers (PS level 2) PRESCRIBE (for example, Kyocera FS-1500) device types SWIN/SAPWIN/xxSWIN/xxSAPWIN

3/31/2004

107

Printing & Output Management


Unicode Device Types

LEXMARK is going into HP accounts, claiming that only LEXMARK could support SAP UNICODE printing. in order to support UNICODE character-sets on an HP printer, customers need to have a UNICODE compliant printer and a SAP UNICODE device-type UNICODE compliant printer are defined by firmware support for UTF8 and/or UTF16 and UNICODE fonts loaded on the printer today LEXMARK is the preferred vendor for SAP UNICODE printing
3/31/2004 108

Background:

Printing & Output Management


Solution for HP

all OZ based printers (LJ2300 and higher) support by default UNICODE UTF16 fonts in PCL6 the LJ2300, CLJ9500 and future products will support UTF8 fonts in PCL5 firmware role is planned to also support all current OZ based printers (LJ4200/4300, LJ9000, CLJ4600, CLJ5500) to support UTF-8 in PCL5 furthermore the UNICODE fonts need to be loaded on the printer (e.g. stored on internal hard-disc) today we have a UNICODE-prototype-solution available to print from an SAP environment for more information, contact Alan Cooke (U.S.) or Stephen Westberg (EMEA)
3/31/2004 109

Sizing Information for Unicode-based SAP Systems

3/31/2004

110

Sizing Info - General


The space requirements for encoding a text, compared to encodings currently in use (8 bit per character for European languages, more for Chinese/ Japanese/ Korean), is as follows next Slide This has an influence on disk storage space and network download speed (when no form of compression is used)
3/31/2004 111

Sizing Info - General


UTF-8 No change for US ASCII, just a few percent more for ISO-8859-1, 50% more for Chinese/Japanese/Korean, 100% more for Greek and Cyrillic UCS-2 and UTF-16 No change for Chinese/Japanese/Korean. 100% more for US ASCII and ISO-8859-1, Greek and Cyrillic UCS-4 100% more for Chinese/Japanese/Korean. 300% more for US ASCII and ISO-8859-1, Greek and Cyrillic

3/31/2004

112

Expected Hardware Requirements


Increase of CPU requirements Depending on existing solution: ISO-LATIN1 (ASCII) Unicode: +30% Double-Byte/MDMP Unicode: + <5% Increase of memory requirements Increase of memory requirements depending on underlying DB (+ ~50%) Application Server internally based on UTF-16; DB either UTF-8, CESU-8 or UTF-16

3/31/2004

113

Unicode Conversion Demo

JAVA Applet Demo

3/31/2004

114

Expected Hardware Requirements

Database growth depending on


DB Unicode encoding schema (e.g. CESU-8, UTF-16) Languages in use

A
1 Byte

1100 8000 CESU-8 UTF-16

1100 8000 CESU-8 UTF-16

1100 8000 CESU-8 UTF-16

Encoding Manufacturers UTF-8 CESU-8 UTF-16 Oracle, SAP DB (8.0) DB/2 (AIX) SQL Server, DB/2 (AS400), SAP DB (7.0)

Additional Storage Reqs 35% 60-70%

Network load: (draft results) <7% for Latin-1, about 15% for Japanese, 25% for other Asian languages
3/31/2004 115

Expected Hardware Requirements


R/3 Release 4.0 4.5 4.6c 4.7 (6.20) non-Unicode

CPU Memory

1 1

+20% +20%

+15% DB: +20%; App:+10% +10%

+5% +5%

Disk

+10%

+10%

NON-Unicode
3/31/2004 116

Expected Hardware Requirements


R/3 Release 4.7 (6.20) non-Unicode 4.7 with Unicode

CPU Memory Disk

1 1 1

+30% to 35% +50% +~35% (UTF-8) +60-70% (UTF-16)

Unicode
3/31/2004 117

S-ar putea să vă placă și