Unicode in Netweaver

Unicode in SAP NetWeaver
Sebastian Buhlinger SAP Consultant, HP-SAP EMEA CC

2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Agenda
1. 2. 3. 4.
Introduction to Unicode Unicode & SAP in General Technology in Depth Sizing Information for Unicodebased SAP Systems
2
3/31/2004
Introduction to Unicode
3/31/2004
1. Introduction to Unicode
What
is text? History of character encoding Problem of character encoding From ASCII to Unicode What is Unicode exactly? The Unicode Standard Where is Unicode used? The Unicode Consortium Unicode Encodings
3/31/2004 4
What is text?
Code pages & encodings describe the handling of and the way text is stored in
Computers Files Data structures
Inside a computer program or data file, text is stored as a sequence of numbers just like everything else A character is a:
Letter, Digit, Period, Hyphen, Punctuation or Math symbol
Furthermore there are control characters typically not visible

3/31/2004 5
History of Character Encoding

Historically,
computers were pretty slow, had fairly little memory and were very expensive Up to 1960s I/O meant pushing holes into paper tapes Most of the character sets date back to punch-card age and are designed with these cards in mind In the early days of computers every hardware manufacturer used proprietary technology (and encodings) International data interchange was no issue and so nothing needed to fit together
3/31/2004 6
Problem of character encoding

Which
number is assigned to which character? When typing an A on the keyboard, the computer uses the character code as a basis for pulling the character shape of A from a font file listing with the same binary number, and displays or prints it The character A may also have different integer values in different programs or data files (A might be in an Arabic font file) In some instances no number available for certain characters (f.i. &auml ) All data encoded in the form of binary numerical codes
3/31/2004 7
Character repertoire
English
alphabet: with some digits and little
more: ~ 60 characters
Western Korean: Chinese
European Standard: ~ 300 characters for several languages ~12.000 syllables dictionaries: ~ 50.000 letters
Hundreds
of other characters in common use, such as math and currency symbols

8
3/31/2004
From ASCII to Unicode

Most
character sets and encodings in 70s/80s were modifications or extensions of ASCII of them used 8-bit with a subset of the 94 used ASCII characters common encodings nowadays use single byte per character (SBCS) are all limited to 256 characters to that, none of them can even cover the letters for the Western European languages
9
Many Most They Due
3/31/2004
From ASCII to Unicode

Consequence:
many different 8-bit encodings were created to fulfill the needs of different user communities for data interchange in global networked information society and collaborative business world: single character set for all languages in use can encode 4.294.967.296 different characters, symbols and control characters
10
Solution
Unicode
3/31/2004
What is Unicode exactly?

Unicode Unicode
= universally encoded character set to store information from any language defines
properties for each character standardizes script behavior provides a standard algorithm for bi directional text defines cross-mappings for other standards
Unicode
defines a unique code value for every character, regardless of platform, program or programming language used
3/31/2004
11

The
Unicode standard primarily encodes scripts rather than languages comprise several languages that historically share the same set of symbols many cases a script may serve to write dozens of languages (e.g. the Latin script) other cases one script complies to one language (e.g. Hangul)
Scripts In In
3/31/2004
12

Additionally
it also includes punctuation marks, diacritics, mathematical symbols, technical symbols, musical symbols, arrows, dingbats etc. all, the Unicode Standard comprises >95.000 characters, ideograph sets, symbols (version 4.0)
In
3/31/2004
13
The Unicode Standard

The
Unicode Standard is a character coding system designed to support the worldwide

interchange, processing, and display of written text of the diverse languages and technical disciplines of the modern world
In
addition, it supports classical and historical texts of many written languages

14
3/31/2004
Where is Unicode used?

The
Unicode standards has been adopted by many software and hardware vendors Mosts OSs support Unicode Unicode is required for international document and data interchange, the Internet and the WWW, and therefore by modern standards such as:
Java, C#, Perl, Python Markup languages such as XML, HTML, XHTML, MathML, WML etc. JavaScript LDAP CORBA etc.
3/31/2004 15
The Unicode Consortium

The
Unicode Consortium is a non-profit organization originally founded to

develop, extend, and promote the use of the Unicode Standard
Members
of the Consortium include major computer corporations, software producers, database vendors, research institutions, international agencies, various user groups, and interested individuals
16
3/31/2004
The Unicode Consortium

The
Consortium cooperates with
W3C and ISO and has liaison status "C" with ISO/IEC/ JTC 1/SC2/WG2, which is responsible for in refining the specification and expanding the character set of ISO/IEC 10646
3/31/2004
17
Unicode Encodings
UTF
= Unicode Transformation Format UCS = Universal Character Set CESU = Compatibility Encoding Scheme
Conversion
between different encodings is a simple, bit-wise operation (defined in standard) No performance excessive conversion table necessary!
3/31/2004 18
Unicode Encodings
UTF-8:
Unicode Transformation based on 8bit representation Compatibility Encoding Scheme of UTF-16 on an 8-bit base Unicode Transformation based on 16-bit representation
CESU-8:
UTF-16:
3/31/2004
19
Unicode Encodings
UCS-2:
Universal Character Set 2 byte variation (16-bit) Unicode Transformation based on 32-bit representation Universal Character Set 4 byte variation (32 bit)
UTF-32:
UCS-4:
3/31/2004
20
Unicode Encodings
Not
all Unicode characters are 2 bytes long no doubling of hw requirements in the first place encoding determines the length of a character in one Unicode encoding can be longer than 1 byte; therefore Unicode characters can be longer than characters defined in a standard code page
21
Unicode
Character
3/31/2004
UTF-8
UTF-8 Its
is the 8-bit encoding of Unicode
a variable-width encoding and also a strict superset of 7-bit ASCII superset means that every character in 7-bit ASCII is available in UTF-8 with the same corresponding code point value character = 1byte 4 bytes in the encoding from European scripts: either 1or 2
Strict
Characters
bytes
Asian
3/31/2004
scripts: 3 or 4 bytes
22
UTF-8
UTF-8 Main
used for UNIX-platforms, HTML and most Internet Browsers benefits of UTF-8: compact storage requirements for European scripts in general European scripts will occupy less storage on disk and memory ease of migration > since 7-bit ASCII data remains the same in UTF-8, data conversion effort between ASCII based character sets and UTF-8 is reduced significantly
23
3/31/2004
UTF-8 / CESU-8 (8-bit encodings)

8-bit
encodings are well-suited for data transfer since all 7-bit ASCII and 8-bit ISO characters retain the same code points communication with legacy and nonUnicode systems variable character length
24
Easier
Downside:
3/31/2004
UCS-2

UCS-2 has a fixed width of 16 bit (2 bytes) UCS-2 is the Unicode encoding for Java & Win NT 4.0 Main benefits of UCS-2: More compact storage requirements for Asian scripts (each character represented with 2 bytes only) String processing will be faster because all characters are of the same width Good compatibility with Java and Microsoft clients Downside: UCS-2 can support Unicode characters defined up to Unicode 3.0 only (max. 65.536)
3/31/2004 25
UTF-16
UTF-16
is the 16-bit encoding of Unicode an extension of UCS-2
Basically One
Unicode character can be 2 or 4 bytes in the encoding from European and most Asian scripts are represented in 2 bytes characters are represented in 4 bytes
Characters
Supplementary UTF-16
3/31/2004
is the main Unicode encoding from Windows 2K

26
UTF-16
Main
benefits of UTF-16: More compact storage requirements for Asian scripts (2 bytes for commonly used characters) Ideal if European and Asian scripts are used together --> UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part) Balance of efficient access to characters and economical use of storage
Above
3/31/2004
mentioned points reason for use of UTF-16 in SAP Web Application Server
27
UCS-2 / UTF-16 (16-bit encodings)

16-bit
encodings offer a compromise between the pros and cons of the 8-bit and the 32-bit encodings, respectively do not need as much memory as 32-bit encodings, but offer quasi fixed character length has a fixed character length, but it cannot define more than 2^16 (65.636) characters
28
They
UCS-2
3/31/2004
UTF-32
32-Bit
encoding when memory space is no concern
Popular
Fixed
width (4Byte)
3/31/2004
29
UCS-4 / UTF-32 (32-bit encodings)

All
32-bit encodings have a fixed length
This
advantage is outweighed by the extensive memory & storage requirements
3/31/2004
30
Example #1
Character UTF-8 UCS-2 UTF-16
A c
41 63 C3 86 C3 B6 DA 64 E4 BA 75 F0 9D 84 9E
0041 0063 00C6 00F6 0664 9875 N/A
0041 0063 00C6 00F6 0664 9875 D834 DD1E

31

3/31/2004
Example #2 character U+AC00

UTF8 HEX BIN E 1110 A 1010 B 1011 0 0000 8 1000 0 0000
Lead Byte Indicator

Remove lead bytes 1110 1010 1010 Regroup bits 1010
Trailing Byte Indicator

1011 11 0000 0000 0000 1000 00 0000 0000 0000
1100
UTF16
BIN HEX
1010 A
1100 C
0000 0
0000 0
32
3/31/2004
Unicode & SAP in General
3/31/2004
33
2. Unicode & SAP in General

Languages
and characters Characters on Disk/Memory Code Pages SAP & Code Pages Language Combinations before Unicode Recommendations from SAP (w/o Unicode) Unicode-compliant SAP products When/why do customers need Unicode?
3/31/2004 34
Language and characters

Languages Only A
are written in fonts
a few languages use the same fonts
font is a group of characters
3/31/2004
35
Characters on Disk/Memory
A a
character is stored as a byte sequence on disk
code page defines the mapping between the byte sequence and a character
3/31/2004
36
Code Pages
The
code page determine what character you can see and enter
3/31/2004
37
Code Pages
different
code pages map different characters to the same byte sequence

Single Byte Double Byte
3/31/2004
38
SAP & Code Pages
3/31/2004
39
Language Combinations before Unicode
Single Standard Code Pages

supports specific sets of languages the number and combination of languages that are supported cannot be altered
Standard code pages and R/3 languages (w/o EBCDIC)
Double-Byte Code Pages

3/31/2004 40

It
is also possible to specify a customerspecific language; this language must use one of the code pages that SAP supports; see Note 0112065
3/31/2004
41
Blended Code Pages ( Rel. 3.1D) SAP proprietary code pages that contain characters from one or more standard code pages
increases the combinations of languages that can be used functionally, a Blended Code Page system uses a single code page a Blended Code Page is a single code page system users can see and enter all characters contained in the code page, regardless of their log-in language
3/31/2004 42
SAP Code Page
Supported Languages
3/31/2004
43

the
availability of SAP blended code pages is platform dependent, because SAP blended locales need to be created for each platform
Blended Locale Status (x = available = not available)
3/31/2004
44
MDMP ( Rel. 3.1I)

Multi-Display / Multi-Processing
allows dynamic code page switching on the application server therefore permits any combination of standard code pages on one system the log-on language determines the code page that is active for each user an MDMP system is recommended if:
1. one or more additional code pages are required to add languages to your existing installation 2. a blended code page cannot support the combination of languages you need for a new installation. For example, an MDMP system with the code pages 1100 and 8000, allows German and Japanese users to log onto the same R/3 system in their respective languages
3/31/2004
45

Example
8000 - SJIS
Front End
DB
Application Server
1100 ISO-1
Japan
Each
user can only access one code page at a time: a user who logs in as a Japanese user cannot enter German characters, and all German characters in the database will not be correctly displayed
46
Germany
3/31/2004

Example
Japanese User
German User
3/31/2004
47

Please Note:
It
is possible for a user to log on with German and then manipulate the character set and font settings so that he can enter what appear to be Japanese characters; these characters will not be correctly stored in the database and this data will be corrupt a user wants to enter f.i. Japanese, he/she must log on in Japanese
If
3/31/2004
48

Please Note:
To
insure that no data corruption occurs, the following restrictions must be followed: Global data must contain only 7-bit ASCII characters, which are in all code pages Users may use only the characters of their log-in language or 7-bit ASCII Batch processes must be assigned with the correct user ID and language EBCDIC code pages are not supported
3/31/2004
49
Recommendations from SAP

(w/o Unicode)
In
general, using a single standard code page for new installations and upgrades is the optimal decision additional languages or language combinations are needed, SAP recommends Unambiguous Blended Code Pages for new installations and MDMP for existing installations Blended Code Pages only support certain language combinations and therefore an MDMP setup may be the only possibility for new installations as well
50
If
Unambiguous
3/31/2004
Unicode-compliant SAP products

All
Unicode installations are currently planned only with written permission of SAP carried out as customer projects together with SAP, except of new installations of R/3 Enterprise Extension Set 2.0
3/31/2004
51

Note 79991)
SAP
(SAP
Web Application Server ( 6.20) Customer Relationship Management
mySAP
(CRM)
The Unicode version of mySAP CRM 4.0 is available via Ramp-Up
mySAP
Supply Chain Management (SCM)
The Unicode version of mySAP SCM 4.0 is available via Ramp-Up

mySAP
Supplier Relationship Management (SRM)
The Unicode version of mySAP SRM 4.0 is available via Ramp-Up conversions (with or without MDMP) of existing SRM installations
3/31/2004 52

Note 79991)
mySAP
(SAP
Business Intelligence (BW)
The Unicode version of mySAP BW 3.5 is available via Ramp-Up the conversion of existing BW installations as customer project SAP Note 643813 has a collection of all relevant SAP notes concerning Unicode-based SAP BW installations
mySAP
Product Lifecycle Management (PLM)
The Unicode version of mySAP PLM 4.0 is available via Ramp-Up

SAP
R/3 Enterprise (Ext. 1.10 & higher) Exchange Infrastructure

53
SAP
3/31/2004
When/why do customers need Unciode?

Global
businesses that require IT systems to support multilingual data without any restrictions f.i. customers with one WW central SAP system interfaces open the door to a global customer base, and IT systems must consequently be able to support multiple local languages simultaneously
Web
3/31/2004
54
When/why do customers need Unciode?

With
J2EE integration, mySAP components fully support web standards, and with Unicode, it now can take full advantage of XML and Java Unicode makes it possible to seamlessly integrate inhomogeneous SAP and non-SAP system landscapes NetWeaver
Only
3/31/2004
55
Technology in Depth
3/31/2004
56
3. Technology in Depth
Unicode
& Operating Systems Unicode & Databases SAP Unicode-based Code Pages How to Unicode-enable a program Unicode-enabled ABAP Migrating to Unicode enabled ABAP Unicode Conversion, IMIG Lab Test SAP System-to-System communication Printing & Output Management
3/31/2004 57
Unicode & Operating Systems HP-UX

HP-UX All
is Unicode-enabled since version 10.x
Unicode locales in the HP-UX operating environment are based on the UTF-8 format locale includes a base language in the UTF-8 code set and the regional data related to this base language includes local formatting rules, text messages, help messages, and other related files locale also supports several other scripts for input, display, code conversion, and printing
58
Each
This
Each
3/31/2004
Unicode & Operating Systems Windows

Some
Unicode support has been included in Microsoft Windows since Windows 95, and Windows NT 4 2000 and Windows XP/2003 are based on Unicode instead of the ANSI or WGL4 character sets Win2K, your version of Windows may have used a different character set if you live in a country such as Egypt, Greece, Israel, Russia or Thailand that uses a non-Latin alphabet
Windows
Before
3/31/2004
59
Unicode & Operating Systems Windows

The
first 128 characters were the same as in ANSI, but many of the places in the second set of 128 were taken by characters from the Arabic, Greek, Hebrew, Cyrillic or Thai alphabets caused and still causes problems when moving documents between operating systems such as DOS, Windows, Mac OS and UNIX or exchanging documents electronically that were created on computers using different character sets
This
3/31/2004
60
Unicode & Operating Systems Linux

Before
UTF-8 emerged, Linux users all over the world had to use various different languagespecific extensions of ASCII popular were ISO 8859-1 and ISO 8859-2 in Europe, ISO 8859-7 in Greece, KOI-8 / ISO 8859-5 / CP1251 in Russia, EUC and Shift-JIS in Japan, BIG5 in Taiwan, etc. made the exchange of files difficult and application software had to worry about various small differences between these encodings
61
Most
This
3/31/2004

Because
of these difficulties, major Linux distributors and application developers have now started to phase out these older legacy encodings in favor of UTF-8 support has improved dramatically over the last few years and ever more people now use UTF8 on a daily basis in
text files (source code, HTML files, email messages, etc.) file names standard input and standard output, pipes
UTF-8
3/31/2004
62

In
UTF-8 mode, terminal emulators (such as xterm) transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16bit font
63
Similarly,
3/31/2004

Before
you start experimenting with UTF-8 under Linux, update your installation to a recent distribution with up-to-date UTF-8 support is particular the case if you use an installation older than SuSE 8.1 or Red Hat 8.0 these, UTF-8 support was far too limited and experimental to be recommendable for daily use
64
This
Before
3/31/2004
Little vs. Big Endian

UCS
and Unicode are first of all just code tables that assign integer numbers to characters exist several alternatives for how a sequence of such characters or their respective integer values can be represented as a sequence of bytes two most obvious encodings store Unicode text as sequences of either 2 or 4 bytes sequences
There
The
3/31/2004
65

The
official terms for these encodings are UCS2 and UCS-4, respectively otherwise specified, the most significant byte comes first in these (Big Endian convention) ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte we want to have a UCS-4 file, we have to insert three 0x00 bytes instead before every ASCII byte
66
Unless
An
If
3/31/2004

UTF-16 UTF-16
[Big Endian]
Character
Unicode Scalar Value
UTF-8 / CESU-8
[Little Endian]
A
3/31/2004
U+0041 U+00C4 U+03B1 U+05D0 U+6653
41
41 00
00 41
C3 84 C4 00 00 C4 CE B1 B1 03 03 B1 D7 90 D0 05 05 D0 E6 99 93 53 66 66 53
67
Unicode & Databases

Supported Databases by SAP (WAS 6.20)
P Available ?
Currently not available
--
Unsupported in general
Win2K HP-UX Solaris AIX OS/400 OS/390 Linux SQL Server Oracle DB2 SAP DB P P P P -P P P -P P P -P P P --P ----P P P
?
--
3/31/2004
68
Unicode & Databases

Manufacturer
SQL Server Oracle
Version Encodings
2000 UTF-16 7.2 UTF-8 8 UTF-8 9i UTF-8 / UTF-16 10g UTF-8 / UTF-16
DB2
AIX CESU-8 AS400 UTF-16
SAP DB
7.0 UTF-16 8.0 UTF-8
3/31/2004
69
SAP Unicode-based Code Pages

With
the Unicode enablement of mySAP.com components (check chapter #1), the old code page management had to be changed of using SAP character numbers all code pages are now based on Unicode character Ids 5 digit SAP Character numbers no longer adequate
Instead
This change is valid for both Unicode and Non-Unicode Systems!

3/31/2004 70
3/31/2004
71
Connection between SAP character number & Unicode character ID is found in table TCP01 You can see the connection in the SPAD character section NOTE: not every character has a corresponding Unicode character ID! f.i.
3/31/2004
72

The
migration of all SAP code pages from the old to the new format was done using report RSCP0126 definition of code pages is still in TCP00
The
Customers must migrate their own code pages (9xxx) using RSCP0126 themselves!
3/31/2004
73
How to Unicode-enable a program

Separate
Unicode and Non-Unicode version of

1 character = 1 byte
R/3
ABAP source
Non-Unicode R/3
(types C, N, D, T, STRING)
Non-Unicode kernel Non-Unicode database
1 character = 2 bytes UTF-16
Unicode R/3
(types C, N, D, T, STRING)
Unicode kernel Unicode database
No explicit Unicode data type in ABAP Single ABAP source for Unicode and non-Unicode systems
3/31/2004 74

Major Minor
part of ABAP coding is ready for Unicode without any changes part of ABAP coding has to be adapted to comply with Unicode restrictions (f.i. syntactical restrictions)
3/31/2004
75

Program
attribute Unicode checks active
3/31/2004
76
Unicode Enabled ABAP

Design Goals
Platform Highest
independence level of compatibility to the pre-Unicode
Identical behavior on Unicode and non-Unicode systems
world
Minimize costs for Unicode enabling of ABAP Programs
Main Features
Clear
distinction between character and byte processing 1 Character <> 1 Byte
3/31/2004
77
Unicode Enabled ABAP

ABAP lists: Difference between memory and display
length
3/31/2004
78
Migrating to Unicode enabled ABAP

Step 1
In
non-Unicode system
Adapt
all ABAP programs to Unicode syntax and runtime restrictions attribute "Unicode enabled" for all programs
Set
3/31/2004
79

Step 2
Set up a Unicode system

Unicode kernel + Unicode database Only ABAP programs with the Unicode attribute are executable
Do runtime tests in Unicode system Check for runtime errors Look for semantic errors Check ABAP list layout with former double byte characters
3/31/2004 80

Use UCCHECK to analyze your applications:
Remove Inspect
errors
statically not analyzable places (optional)
Untyped field symbols Offset with variable length Generic access to database tables
Set Do
Unicode program attribute using UCCHECK or SE38 / SE24 / ... additional checks with SLIN (e.g. matching of actual and formal parameters in function modules)
81
3/31/2004
3/31/2004
82
3/31/2004
83
Upgrade to Unicode
Upgrade to Unicode
With
Unicode, there are no limitations on users, and all languages in the ISO639 standard can be used is technically supported as of Basis Release 6.20, see Note 0379940 for more information single code page system (standard or Unambiguous Blended Code Page) can be upgraded to Unicode using the normal upgrade method
85
Unicode
3/31/2004
Unicode Conversion Roadmap

Preparation During preparation, topics such as
additional hardware requirements, downtime issues, Unicode-enabling of customer developments, and the special treatment of MDMP systems
have to be taken into consideration
3/31/2004
86

Conversion The Unicode conversion process is based on a system copy, and during this process, the database conversion and system shutdown/restart are as automated as possible For small to mid-size databases (< 1 TB), this is based on an SAP Unload/Reload of the complete database; minimum downtime tools will be used for larger databases.
3/31/2004
87

Post-Conversion
Once the Unicode system is up and running, you need to

verify data consistency on a scenario basis, as well as carry out general integration testing
For systems that support multiple languages, special emphasis needs to be placed on cross-language handling during the test phase. Correction tools are provided by SAP, which can be used in the case that conversion did not run properly.
3/31/2004 88

Post-Conversion
Additional Tool: SAP Data Management - reducing the database size and growth To keep your database costs in check, the SAP Data Management service frees up valuable database resources by showing you how to reduce the size and growth of your database by typically 25 % (see details).
3/31/2004
89
Unicode Conversion at a Glance

Preparation Conversion Post-Conversion
Set up the Unicode Conversion Project Check Prerequisites Data Analysis for downtime minimization special MDMP treatment Enabling of Customer Developments
Highly automated System will be down during database conversion Unload /reload process for small databases Minimum downtime tool for large databases
Unicode system is up and running Verification of Data Consistency Integration Testing focused on language handling
3/31/2004
90
Upgrade Paths to Unicode (R/3 Enterprise)

Source system Target system
R/3 3.1i
R/3 4.0b Direct upgrade
R/3 Enterprise non-Unicode
R/3 Enterprise
Conversion
Unicode
R/3 4.5b
l First upgrade, then conversion to Unicode R/3 4.6b l R/3 Enterprise Ramp-Up started 2002-07 l Unicode availability follows a phase of restricted shipment with pilot customers
91
R/3 4.6c
3/31/2004
Upgrade Paths to Unicode (BW 3.1)

BW 2.0B
BW 3.1 non-Unicode BW 2.1C
BW 3.1
Conversion
Unicode
l Interfacing R/3 MDMP on a project base only l Unicode BEXGUI restrictions apply l First upgrade, then conversion to Unicode l BW 3.1 Ramp-Up starting 2002-12 BW 3.0
3/31/2004
l Unicode availability follows a phase of restricted shipment with pilot customers

92
Upgrade Paths to Unicode (CRM 3.1)

CRM 2.0C
CRM 3.1 non-Unicode CRM 2.0B
CRM 3.1
Conversion
Unicode
l Selected scenarios only cooperation with SAP GBU CRM required l First upgrade, then conversion to Unicode l CRM 3.1 Ramp-Up starting 2002-12 CRM 3.0
3/31/2004
l Unicode availability follows a phase of restricted shipment with pilot customers

93

3/31/2004
94
Prerequisites, special MDMP treatment
OSS Note 548016 Conversion from Unicode to non-Unicode is not possible The Unicode Conversion of MDMP AND also Ambiguous Code page systems ( Code Page numbers 6100, 6200 and 6500 ) is only supported on project basis with SAP involvement
OSS Note 543715 The Unicode Conversion of a BW 3.1 system requires additional steps regarding the system copy OSS Note 573044 If you are using HR functionality within R/3 Enterprise , also additional steps are mandatory
3/31/2004 95
6.30 Unicode & MCOD

With SAP WebAS 6.30 a database abstraction layer for the Java stack was introduced OpenSQL for Java Tables of the Java stack are stored in the same database instance like the tables of the ABAP stack in two different schema (except Informix) The concept of MCOD installations is fully supported by the combined stack of ABAP and Java
ABAP Stack (non Unicode/Unicode) System QA1 Java Stack (Unicode) ABAP Stack (non Unicode/Unicode) System TC2 Java Stack (Unicode) SAPTC2DB SAPQA1DB SAPTC2 SAPQA1
3/31/2004
96

3/31/2004
97
Unicode Conversion - IMIG
Whitepaper:
SAP R/3 incremental migration test
http://saphpcc.bbn.hp.com/Global/Compet/migration/migration.HTM
3/31/2004
98
SAP System-to-System Communication
3/31/2004
99
SAP System-to-System communication

SAP
Web Application Server ( 6.20)
Only one source code exists for Unicode-based and nonUnicode-based systems, new developments can be smoothly exchanged The interfaces (e.g. RFC) have been extended, so that communication between other Unicode-based systems or non-Unicode-based systems is possible. Furthermore, SAP provides standard tools for the installation of (and conversion to) Unicode-based systems that can also be used for checking and Unicode-enabling of customer developments
3/31/2004 100
solid lines: receiver can receive all characters dotted lines: receiver cannot receive characters, which are not in its own code page. But as long as you restrict the character set, data can be sent from everywhere to everywhere.
Unicode R/3
Latin-1
SJIS
http/RFC
MDMP R/3
WWW
SJIS
Latin-1 http/RFC
Non-Unicode
SJIS
R/3
3/31/2004
101

RFC Unicode <-> Unicode
no problem
non
Unicode <-> non Unicode <-> non Unicode
old stuff, receiver converts code page if possible

Unicode
the Unicode side converts from/ to the code page of the non Unicode side MDMP is converted with a languages key System settings allow the configuration of error handling
3/31/2004 102

RFC (SM59) Unicode <> non Unicode
3/31/2004
103

RFC (SM59) Unicode <> non Unicode
3/31/2004
104
Printing & Output Management

What is a SAP device type?
configuration file for the SAP printer driver that ensures proper functionality between the SAP data stream and the printer or output device where the data is sent
Printer drivers & device types

In R/3, a distinction is made between "printer driver" and "device type A device type consists of a variety of attributes defined for an output device One of these attributes is the printer driver to be used by SAPscript (R/3 forms processor) for this particular printer
3/31/2004 105
device types cover aspects such as control commands for font selection, page size, character set selection, character set used and so on a device type must be specified to enable directprinting from the SAP applications for every new printer defined in SAP environment device types are created by SAP for the entire HP LaserJet printer family on the basis of PCL5, PCL6 and PostScript SAP develops, tests and supports device types for HP products that can be found here:
http://h40045.www4.hp.com/printing_solutions/Device_Types.html
3/31/2004 106

at
present, there are five SAPscript printer drivers They include:

HP-PCL5 (for example, HP Laserjet 3,4,5,6 series) PostScript printers (PS level 2) PRESCRIBE (for example, Kyocera FS-1500) device types SWIN/SAPWIN/xxSWIN/xxSAPWIN
3/31/2004
107

Unicode Device Types
LEXMARK is going into HP accounts, claiming that only LEXMARK could support SAP UNICODE printing. in order to support UNICODE character-sets on an HP printer, customers need to have a UNICODE compliant printer and a SAP UNICODE device-type UNICODE compliant printer are defined by firmware support for UTF8 and/or UTF16 and UNICODE fonts loaded on the printer today LEXMARK is the preferred vendor for SAP UNICODE printing
3/31/2004 108
Background:

Solution for HP

all OZ based printers (LJ2300 and higher) support by default UNICODE UTF16 fonts in PCL6 the LJ2300, CLJ9500 and future products will support UTF8 fonts in PCL5 firmware role is planned to also support all current OZ based printers (LJ4200/4300, LJ9000, CLJ4600, CLJ5500) to support UTF-8 in PCL5 furthermore the UNICODE fonts need to be loaded on the printer (e.g. stored on internal hard-disc) today we have a UNICODE-prototype-solution available to print from an SAP environment for more information, contact Alan Cooke (U.S.) or Stephen Westberg (EMEA)
3/31/2004 109
Sizing Information for Unicode-based SAP Systems
3/31/2004
110
Sizing Info - General

The space requirements for encoding a text, compared to encodings currently in use (8 bit per character for European languages, more for Chinese/ Japanese/ Korean), is as follows next Slide This has an influence on disk storage space and network download speed (when no form of compression is used)
3/31/2004 111
Sizing Info - General

UTF-8 No change for US ASCII, just a few percent more for ISO-8859-1, 50% more for Chinese/Japanese/Korean, 100% more for Greek and Cyrillic UCS-2 and UTF-16 No change for Chinese/Japanese/Korean. 100% more for US ASCII and ISO-8859-1, Greek and Cyrillic UCS-4 100% more for Chinese/Japanese/Korean. 300% more for US ASCII and ISO-8859-1, Greek and Cyrillic
3/31/2004
112
Expected Hardware Requirements

Increase of CPU requirements Depending on existing solution: ISO-LATIN1 (ASCII) Unicode: +30% Double-Byte/MDMP Unicode: + <5% Increase of memory requirements Increase of memory requirements depending on underlying DB (+ ~50%) Application Server internally based on UTF-16; DB either UTF-8, CESU-8 or UTF-16
3/31/2004
113
Unicode Conversion Demo
JAVA Applet Demo
3/31/2004
114
Database growth depending on

DB Unicode encoding schema (e.g. CESU-8, UTF-16) Languages in use
A
1 Byte
1100 8000 CESU-8 UTF-16
1100 8000 CESU-8 UTF-16
1100 8000 CESU-8 UTF-16
Encoding Manufacturers UTF-8 CESU-8 UTF-16 Oracle, SAP DB (8.0) DB/2 (AIX) SQL Server, DB/2 (AS400), SAP DB (7.0)
Additional Storage Reqs 35% 60-70%
Network load: (draft results) <7% for Latin-1, about 15% for Japanese, 25% for other Asian languages
3/31/2004 115

R/3 Release 4.0 4.5 4.6c 4.7 (6.20) non-Unicode
CPU Memory
1 1
+20% +20%
+15% DB: +20%; App:+10% +10%
+5% +5%
Disk
+10%
+10%
NON-Unicode
3/31/2004 116

R/3 Release 4.7 (6.20) non-Unicode 4.7 with Unicode
CPU Memory Disk
1 1 1
+30% to 35% +50% +~35% (UTF-8) +60-70% (UTF-16)
Unicode
3/31/2004 117

Unicode in Netweaver

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Unicode in Netweaver

Încărcat de

Drepturi de autor:

Formate disponibile

Unicode in SAP NetWeaver

Sebastian Buhlinger SAP Consultant, HP-SAP EMEA CC

Furthermore there are control characters typically not visible

History of Character Encoding

Problem of character encoding

alphabet: with some digits and little

of other characters in common use, such as math and currency symbols

From ASCII to Unicode

Many Most They Due

From ASCII to Unicode

What is Unicode exactly?

What is Unicode exactly?

What is Unicode exactly?

The Unicode Standard

Unicode Standard is a character coding system designed to support the worldwide

addition, it supports classical and historical texts of many written languages

Where is Unicode used?

The Unicode Consortium

Unicode Consortium is a non-profit organization originally founded to

The Unicode Consortium

Consortium cooperates with

is the 8-bit encoding of Unicode

UTF-8 / CESU-8 (8-bit encodings)

is the 16-bit encoding of Unicode an extension of UCS-2

is the main Unicode encoding from Windows 2K

UCS-2 / UTF-16 (16-bit encodings)

encoding when memory space is no concern

UCS-4 / UTF-32 (32-bit encodings)

32-bit encodings have a fixed length

advantage is outweighed by the extensive memory & storage requirements

0041 0063 00C6 00F6 0664 9875 N/A

0041 0063 00C6 00F6 0664 9875 D834 DD1E

Example #2 character U+AC00

Lead Byte Indicator

Trailing Byte Indicator

Unicode & SAP in General

2. Unicode & SAP in General

Language and characters

are written in fonts

a few languages use the same fonts

font is a group of characters

character is stored as a byte sequence on disk

code pages map different characters to the same byte sequence

SAP & Code Pages

Language Combinations before Unicode

Single Standard Code Pages

Standard code pages and R/3 languages (w/o EBCDIC)

Double-Byte Code Pages

Language Combinations before Unicode

Language Combinations before Unicode

Language Combinations before Unicode

SAP Code Page

Language Combinations before Unicode

Language Combinations before Unicode

MDMP ( Rel. 3.1I)

Language Combinations before Unicode

Language Combinations before Unicode

Language Combinations before Unicode

Language Combinations before Unicode

Recommendations from SAP

Unicode-compliant SAP products

Unicode-compliant SAP products

Web Application Server ( 6.20) Customer Relationship Management