Documente Academic
Documente Profesional
Documente Cultură
Carl R. Strathmeyer
Dialogic Corporation
Appeared in IEEE Communications Magazine May 1996
ABSTRACT
One significant hurdle blocking the effective utilization of computer-telephone
technology is the historical lack of communication between practitioners of the
information processing and telephony disciplines. These two disciplines have grown up
isolated from one another, with very different technical viewpoints and vocabularies.
There are few practitioners who are competent in both disciplines. The inevitable result
is a lack of effective communication, making it difficult to identify useful applications and
to organize effective projects spanning the two disciplines. This article provides an
introduction to basic computer-telephone concepts, with the goal of paving the way for
better inter-disciplinary communication and a more widespread commercial utilization of
computer-telephone technology.
What Is Computer Telephony?
In simplest terms, computer telephony is the technique of coordinating the actions of
telephone and computer systems. This technology has existed in commercial form since
the mid-1980s, but it has been exploited only in a few niche markets -- particularly in
large call centers, where call volumes easily justified the cost of complex custom-built
systems. But in the 1990s, several factors have combined to significantly simplify
computer-telephone systems and increase the marketplace's interest in computer
telephony. International standards for interconnecting telephone and computer systems
have been defined, notably the Computer-Supported Telephony Application (CSTA) call
modeling and protocol standards from ECMA. Mass-market application programming
interface (API) specifications have been heavily promoted by major market players such
as Microsoft and Novell, and are gaining rapid acceptance. Voice processing
technologies have advanced steadily, providing advanced features and high port
densities at attractive prices. Public networks are offering more and more services
which enable computer-telephone applications, such as Calling Line ID. And most
important, the world economy is doing business over the telephone at an increasing
rate, prompting business organizations to look for ways to make this process more
efficient and economical.
The Convergence of Computers and Telephony
Public and private telephone systems provide real-time information paths between two
or more parties. Traditionally, these information paths have taken the form of voice
connections, originally through hardwired analog circuitry but later through an
increasingly broad range of technologies such as radio transmission, digital signal
encoding, and fiber. Over time, these transmission paths were also exploited for nonvoice applications such as facsimile and data transmission.
At first, each non-voice application required a distinct set of dedicated "terminal
equipment", the telephony term for any user device connected to the telephone network.
Facsimile machines conversed only with other facsimile machines, computer devices
sent data files only to other computer devices, and so forth. But in the 1990s, these
disparate sets of equipment have begun to overlap, and the general-purpose computer
has emerged as the point of intersection.
Computers can now send and receive every kind of information that passes through the
telephone network: They can act as facsimile machines; they can interact with human
speakers through voice synthesis and recognition; and of course they can send and
receive data in many formats. It is this intersection, with the general-purpose computer
serving as the interface point, which makes computer telephony so intriguing and
potentially valuable to the marketplace.
Call Control and Media Processing
As they play this crucial interface role, computer systems must interact with the
telephone network in two fundamental ways.
First, they must be able to control how calls are established, reconfigured and
"torn down", the telephony term for concluding a call. We call this the "call
control" function.
Second, they must be able to send and receive information through the call
endpoint interface, generating and receiving the appropriate information formats
such as facsimile, voice, tones, or data. We call this the "media processing"
function.
A computer telephone application usually requires some combination of both functions.
These call control and media processing functions have counterparts in ordinary human
telephone usage:
Picking up the telephone handset, pressing dialing digits, and listening for the
tones signaling the successful completion of the call represent human call control
functions.
Once the call is established, speaking and listening to the far party represent
human media processing functions.
The first computer telephone applications concentrated on media processing, with only
limited call control functions. For example, the first voicemail systems answered
incoming calls, presented a greeting, and then recorded the caller's message. Such a
system consists primarily of media processing functions, with call control functions
limited to detecting a ring, answering the call, and hanging up after the message has
been taken.
By comparison, newer voicemail and automated attendant applications have added
functions such as call transferring, outdialling and paging. Applications like these require
more comprehensive call control. As the cost of signal processing technologies have
come down, these applications have also added advanced media processing functions
such as voice synthesis, voice recognition, and fax interfaces.
Call center applications require even more sophisticated call control functions. These
applications implement features such as greeting the caller with an extensive range of
voice response options and then transferring the caller to wait in a queue, ultimately
coordinating the simultaneous arrival of call and associated caller data at a service
representative's desk. Call center applications typically utilize the most advanced call
control and media processing functions, including special call control functions to
monitor calls as they pass through holding queues on their way to their ultimate
destinations, and comprehensive media processing functions which allow some callers
to complete their business without ever speaking to a human service representative.
Modular Media Processing Hardware
Media processing hardware is relatively simple so long as each telephone line has a
dedicated set of hardware resources. For example, a typical voice processing board
might support four analog telephone lines, with speech digitization and playback
circuitry hard-wired on each channel.
Media processing hardware gets considerably more complex, however, when
applications need to be able to reconfigure resources on-the-fly. Larger systems also
need to be expandable in modular increments to accommodate application growth.
For example, a medium-scale application may require a pool of two T1 circuit interfaces
(providing a total of 48 voice channels), 48 voice digitizers and playback units, eight
speech recognizers, eight facsimile processing channels, and twenty-four analog
interfaces for headsets. These resources must be reconfigurable on-the-fly, meaning
that an incoming call on a given T1 channel must be assignable to the digitizers,
playback units, recognizers, facsimile processors and analog interfaces in any
combination.
Such a configuration cannot fit onto a single circuit board (and would not be easily
expandable even if it could), so several architectures have been proposed by which
such systems can be assembled. The two leading proposals, MVIP and SCbus, specify
time-division buses for talk path interconnection and a separate communication
mechanism for coordinating the subsystems. The MVIP effort is administered by the
GO-MVIP organization; the SCbus was developed by the SCSA working group, recently
subsumed within the Enterprise Computer Telephony Forum (ECTF). Both of these
groups have also proposed programming interfaces for the control of such systems;
these are discussed later.
Signaling: The Call Control Connection
The telephone network is a widely-distributed system of intelligent switching nodes. For
these nodes to cooperate successfully for the establishment and tearing down of calls,
they must communicate with each other and with the users' terminal equipment. This
process is called "signaling". An accurate and reliable signaling connection between
telephone and computer systems is essential to successful computer-telephone
applications, since signaling is the means of call control and constitutes the only
communication between the intelligent systems in the two domains.
Signaling can take place "inband", that is, through the telephony talk path channel, or
"out-of-band", that is, through some communication channel other than the talk path. In
today's telephone network, terminal equipment signaling is generally in-band (except for
ISDN devices), while signaling between telephone switches is often done out-of-band
for security and performance reasons.
The original terminal-equipment signaling was, of course, the human voice as a
subscriber spoke to the operator. The first automatic terminal equipment signaled with
timed make-break pulses across an analog telephone line and special switch-generated
tones to alert the subscriber to call states such as ringing and busy/engaged. In many
telephone systems, tone signaling is now used for inband terminal equipment signaling
in both directions. The best-known scheme for terminal-equipment-to-network signaling
is Dual Tone Multi-Frequency (DTMF), under which the terminal equipment generates
simultaneous pairs of tones to represent each dialed digit.
Unfortunately, the signaling from the telephone network back to the terminal equipment
has not been similarly standardized, a situation all too familiar to subscribers trying to
make international calls. The signaling tones returned from the far end of an
international call often do not resemble local signaling tones, and the subscriber may
not be able to tell the difference between another country's busy/engaged signal and a
ringing signal.
Needless to say, it is a significant challenge to design computer-telephone terminal
equipment which can accurately interpret the widely-varying tones and other in-band
signals generated by various elements of the worldwide telephone network. Indeed,
achieving accurate and reliable signaling between computer-based telephone interfaces
and traditional telephone equipment is one of the greatest difficulties in building reliable
computer-telephone applications.
This difficulty can be somewhat alleviated by shifting to out-of-band signaling schemes,
which generally rely on unambiguous digital messaging. For example, the digital
message-oriented signaling of an ISDN basic rate terminal device is much more reliable
than analog in-band signaling. (But note that even ISDN basic rate signaling is not yet
completely standardized around the world.) A similar digital, message-oriented (but but
non-standard) signaling capability is provided by the signaling schemes used by the
digital telephone sets offered by many PBX vendors. And computer-telephone
integration (CTI) links, now offered on most modern PBXs, offer a signaling mechanism
through which a computer system can receive consolidated signaling for groups of
telephone extensions.
Multiple signaling methods are often available on a single telephone system. One PBX
might simultaneously support a CTI link, ISDN trunk circuits, and proprietary digital set
signaling. Any of these will provide more accurate signaling information for computertelephone applications than is available through inband analog terminal device signaling
on those same switches.
First-Party and Third-Party Call Control
The relationship between a computer application and the call control it exerts over a
telephone line is classified as first-party or third-party call control.
First-party call control is call control exerted over a telephone line on which the
computer application is also a "talking" party -- that is, a call on which the application is
also capable of exercising media processing functions.
For example, if a computer application receives an inbound call on a voice board having
a normal telephone line interface, senses the ring signal, answers the call, and initiates
the system's voicemail application to greet the caller, it is using first-party call control.
Third-party call control is call control exerted over telephone lines on which the
computer application is not necessarily also a "talking" party.
For example, if a server-based application is monitoring several users' telephone lines
(without the benefit of an actual physical connection to each of those lines), is alerted to
an arriving call on one of the lines, and causes that call to be diverted to some other
user's telephone, it is exerting third-party call control. Third-party call control usually also
implies out-of-band signaling, since there is by definition no direct connection between
the computer system running the application and the telephone line being controlled.
Generally, first-party call control functions are those which could be accomplished by a
human attendant via a standard telephone set attached to the telephone system in the
same manner as the application equipment. Third-party call control functions are those
which would require a human attendant to use a specialized telephone set with special
priveleges, such as an operator's console.
Sharing Computer-Telephone Resources
Computer-telephone applications vary considerably in complexity depending upon
whether they allow the sharing of telephone-related resources. For example, an
application that has sole control of a voice card and telephone line (such as a voice
response application connected to a dedicated line) is much simpler in design and
construction from an application which must share control of resources with several
other applications and/or a human user. Control mechanisms for these shared
applications are often one of the most difficult aspects of computer-telephone
application design.
For example, a telephone line terminating at a facsimile card installed in a user's
personal computer would be a non-shared resource. (Figure 1) The only applications
which can use this telephone line and its associated facsimile capability are those
residing on that one particular computer system. On the other hand, a telephone line
terminating on a server with a pool of facsimile cards could be used by any system
connected to the same local area network and authorized to use the facsimile server.
(Figure 2)
Each of these configurations has advantages and disadvantages. The shared
configuration requires the overhead of more sophisticated access control and
management capabilities, but the pooling of resources inherent in this scheme offers
more efficiency in resource allocation and thus better handling of peaks and valleys in
usage patterns as compared resources dedicated to individual systems. From an
economic perspective, dedicated resources are more appropriate for individuals or very
small work groups; server-based resources are better for medium to large work groups
and for enterprise-wide systems.
Resource-sharing modes are often confused with first-party and third-party call control
modes. Shared resources, accessed through a server, are usually configured for thirdparty call control, while dedicated resources are usually restricted to first-party call
control functions. But this is not always the case. A dedicated ISDN line, terminating at
a single computer system, can accomplish third-party call control functions through the
capabilities of the ISDN D-channel signaling protocol without ever establishing an actual
talk-path through an ISDN B-channel. Conversely, a call control server connected to a
PBX via a CTI link may offer only first-party call control functions to client applications,
even though the application call control requests pass through a shared server.
Choices For Out-of-Band Signaling
The most challenging aspect of computer-telephone applications is signaling, that is,
achieving accurate and reliable call control. The most important recent commercial
advances in computer telephony have been in this area, with improvements both in the
underlying signaling connections and in the programming interfaces (APIs) which
enable application software to exercise that signaling capability.
As mentioned earlier, the most reliable way to implement signaling between a telephone
system and a computer telephone application is to use out-of-band signaling, which
creates a direct message-based digital information link between the intelligent
telephone switch and the computer-based application. This approach is much more
accurate than in-band signaling, under which the application must attempt to generate
and recognize widely-varying and ambiguous analog signals in the call's talk path.
Out-of-band signaling is available in several forms:
The D-channel associated with basic and primary rate ISDN lines;
The proprietary digital signaling between PBXs and digital telephone sets;
The switch-to-switch signaling protocol called Signaling System 7 (SS7) used in
public and large private telephone networks; and
The CTI links available for many modern PBXs and some public exchange
switches.
The practitioner will frequently need to choose between these mechanisms when
designing a computer-telephone system.
Many interesting computer telephone applications can be built using only the out-ofband signaling capabilities of the ISDN basic and primary rate specifications. (Figure 3)
For example, an application system connected to the telephone network through an
ISDN facility can provide a network-based automatic call distributor (ACD) distributing
calls to remote public network subscribers, or a call routing application for private PBX
networks. These applications, however, are often limited by the telephone domain
where the ISDN signaling is valid and consistent. For example, the ACD application
may not operate correctly when calls cross between public telephone network
boundaries, and the call routing application depends on inter-PBX feature transparency
and may not work in a heterogeneous network of different manufacturers' PBXs. These
limitations will gradually disappear as ISDN telephone service becomes consistent
worldwide.
In contrast to ISDN D-channel signaling, the SS7 and CTI link techniques can provide a
more complete view of calls passing through the corresponding telephone domains. The
domain for SS7 signaling can be as large as an entire public telephone network; the
domain for a CTI link is a single telephone switch or a small number of tightly-integrated
switches.
SS7 is a complex protocol, and is closely tied to the internal operation of a telephone
network. Because of this, terminal equipment is not usually granted the privilege of an
SS7 connection. A few long-distance telephone carriers do offer such a connection via
appropriate security firewalls.
A typical such service announces each call to the customer's computer application via
the SS7 protocol and then allows the application to choose among a set of predetermined call routing options by replying with another SS7 message. (Figure 4) An
arrangement based on SS7 requires sophisticated customer premises equipment, and
is usually only appropriate for call centers handling large call volumes.
CTI links serve a similar purpose, but on a smaller scale more suitable for the relatively
simpler environment of a customer premises PBX or a single public telephone
exchange switch. (Figure 5) CTI links also offer a broader range of call control functions
than commercial customer-premises SS7 services, including call initiation and hangup
as well as call routing. CTI links can operate using either a proprietary protocol (such as
Northern Telecom's Meridian Link Protocol and AT&T's ASAI protocol) or a standard
protocol (such as the ECMA CSTA protocol mentioned earlier).
The CSTA protocol has now been implemented by a growing number of switch vendors
including major manufacturers such as Siemens ROLM, Ericsson, and Alcatel. Note that
commercial CTI link implementations vary in the set of features supported, and although
though they are standards-based, even CSTA implementations are not necessarily
equivalent or interoperable.
Because they provide access to shared resources, both the SS7-based connections
and CTI links typically terminate in a server rather than a specific application computer.
This allows multiple applications to influence calls flowing through a common telephone
domain, and provides greater flexibility regarding the computer systems on which these
applications can be installed.
Application Programming Interfaces
An application programming interface (API) is the mechanism through which application
software manipulates telephone resources. APIs are necessary for both the call control
and media processing functions.
Several existing non-telephony APIs have found a useful role in computer telephony,
particularly for controlling media processing functions.
For example, once a telephone call is established, the Microsoft Windows APIs used for
the manipulation of desktop multimedia objects (for example, the playing of sound files
through a local speaker) can be used to send and receive similar multimedia content
over the telephone connection. Because of their heritage, however, the resource
models used by these existing APIs turn out to be more suitable for local (non-shared)
resources than for remote or shared resources. New APIs and resource models are
needed to implement shared media processing resources on shared servers.
Several cross-vendor efforts have sprung up to address this need, including the MultiVendor Interface Program (MVIP) and the Enterprise Computer Telephony Forum
(ECTF), each of which has activities relating to software architectures and APIs for
shared media processing resources.
Proprietary APIs for first-party call control were first developed by modem, voice board,
and fax board manufacturers to support their own products. The only API in this group
to achieve de facto standards status was the Hayes modem command set, which
included basic functions for dialing and hanging up telephone calls.
APIs for third-party call control did not have equivalents in traditional application
environments and had to be developed specifically to support computer telephony. The
first third-party APIs were developed by computer manufacturers to support applications
running on their own systems. For example, IBM introduced the CallPath API and
Digital Equipment introduced the Computer-Integrated Telephony (CIT) API in the late
1980s for use on their respective systems.
The industry took a major step forward in the 1990s with the introduction of two call
control APIs which were not linked to any individual computer manufacturer:
The Telephony Services API (TSAPI) developed by AT&T and Novell, and
The Telephony API (TAPI) developed by Microsoft.
These APIs, both strongly oriented towards the desktop personal computer and its
flourishing software industry, have made mass-market computer telephone applications
economically feasible for the first time.
APIs vs. Commercial Products
A programming interface is simply a specification; it is not a commercial product in its
own right. As straightforward as this may sound, the two concepts are often confused in
the marketplace.
An API is the meeting point for two commercial products:
and computing systems is just a transition phase. Very soon, the distinction between
telephone switches and LAN servers will disappear, as hybrid telephony servers are
brought to market containing both switching and application-interface functions.
Computer telephony is at an important turning point: The necessary elements of the
technology have been developed; now we need to educate large numbers of insightful
practitioners who can put it to productive use.
Extrado de http://www.dialogic.com/company/whitepap/carlieee.htm