Sunteți pe pagina 1din 10

An Introduction To Computer Telephony

Carl R. Strathmeyer
Dialogic Corporation
Appeared in IEEE Communications Magazine May 1996

ABSTRACT
One significant hurdle blocking the effective utilization of computer-telephone
technology is the historical lack of communication between practitioners of the
information processing and telephony disciplines. These two disciplines have grown up
isolated from one another, with very different technical viewpoints and vocabularies.
There are few practitioners who are competent in both disciplines. The inevitable result
is a lack of effective communication, making it difficult to identify useful applications and
to organize effective projects spanning the two disciplines. This article provides an
introduction to basic computer-telephone concepts, with the goal of paving the way for
better inter-disciplinary communication and a more widespread commercial utilization of
computer-telephone technology.
What Is Computer Telephony?
In simplest terms, computer telephony is the technique of coordinating the actions of
telephone and computer systems. This technology has existed in commercial form since
the mid-1980s, but it has been exploited only in a few niche markets -- particularly in
large call centers, where call volumes easily justified the cost of complex custom-built
systems. But in the 1990s, several factors have combined to significantly simplify
computer-telephone systems and increase the marketplace's interest in computer
telephony. International standards for interconnecting telephone and computer systems
have been defined, notably the Computer-Supported Telephony Application (CSTA) call
modeling and protocol standards from ECMA. Mass-market application programming
interface (API) specifications have been heavily promoted by major market players such
as Microsoft and Novell, and are gaining rapid acceptance. Voice processing
technologies have advanced steadily, providing advanced features and high port
densities at attractive prices. Public networks are offering more and more services
which enable computer-telephone applications, such as Calling Line ID. And most
important, the world economy is doing business over the telephone at an increasing
rate, prompting business organizations to look for ways to make this process more
efficient and economical.
The Convergence of Computers and Telephony
Public and private telephone systems provide real-time information paths between two
or more parties. Traditionally, these information paths have taken the form of voice
connections, originally through hardwired analog circuitry but later through an
increasingly broad range of technologies such as radio transmission, digital signal
encoding, and fiber. Over time, these transmission paths were also exploited for nonvoice applications such as facsimile and data transmission.
At first, each non-voice application required a distinct set of dedicated "terminal
equipment", the telephony term for any user device connected to the telephone network.

Facsimile machines conversed only with other facsimile machines, computer devices
sent data files only to other computer devices, and so forth. But in the 1990s, these
disparate sets of equipment have begun to overlap, and the general-purpose computer
has emerged as the point of intersection.
Computers can now send and receive every kind of information that passes through the
telephone network: They can act as facsimile machines; they can interact with human
speakers through voice synthesis and recognition; and of course they can send and
receive data in many formats. It is this intersection, with the general-purpose computer
serving as the interface point, which makes computer telephony so intriguing and
potentially valuable to the marketplace.
Call Control and Media Processing
As they play this crucial interface role, computer systems must interact with the
telephone network in two fundamental ways.
First, they must be able to control how calls are established, reconfigured and
"torn down", the telephony term for concluding a call. We call this the "call
control" function.
Second, they must be able to send and receive information through the call
endpoint interface, generating and receiving the appropriate information formats
such as facsimile, voice, tones, or data. We call this the "media processing"
function.
A computer telephone application usually requires some combination of both functions.
These call control and media processing functions have counterparts in ordinary human
telephone usage:
Picking up the telephone handset, pressing dialing digits, and listening for the
tones signaling the successful completion of the call represent human call control
functions.
Once the call is established, speaking and listening to the far party represent
human media processing functions.
The first computer telephone applications concentrated on media processing, with only
limited call control functions. For example, the first voicemail systems answered
incoming calls, presented a greeting, and then recorded the caller's message. Such a
system consists primarily of media processing functions, with call control functions
limited to detecting a ring, answering the call, and hanging up after the message has
been taken.
By comparison, newer voicemail and automated attendant applications have added
functions such as call transferring, outdialling and paging. Applications like these require
more comprehensive call control. As the cost of signal processing technologies have
come down, these applications have also added advanced media processing functions
such as voice synthesis, voice recognition, and fax interfaces.
Call center applications require even more sophisticated call control functions. These
applications implement features such as greeting the caller with an extensive range of
voice response options and then transferring the caller to wait in a queue, ultimately
coordinating the simultaneous arrival of call and associated caller data at a service
representative's desk. Call center applications typically utilize the most advanced call
control and media processing functions, including special call control functions to
monitor calls as they pass through holding queues on their way to their ultimate

destinations, and comprehensive media processing functions which allow some callers
to complete their business without ever speaking to a human service representative.
Modular Media Processing Hardware
Media processing hardware is relatively simple so long as each telephone line has a
dedicated set of hardware resources. For example, a typical voice processing board
might support four analog telephone lines, with speech digitization and playback
circuitry hard-wired on each channel.
Media processing hardware gets considerably more complex, however, when
applications need to be able to reconfigure resources on-the-fly. Larger systems also
need to be expandable in modular increments to accommodate application growth.
For example, a medium-scale application may require a pool of two T1 circuit interfaces
(providing a total of 48 voice channels), 48 voice digitizers and playback units, eight
speech recognizers, eight facsimile processing channels, and twenty-four analog
interfaces for headsets. These resources must be reconfigurable on-the-fly, meaning
that an incoming call on a given T1 channel must be assignable to the digitizers,
playback units, recognizers, facsimile processors and analog interfaces in any
combination.
Such a configuration cannot fit onto a single circuit board (and would not be easily
expandable even if it could), so several architectures have been proposed by which
such systems can be assembled. The two leading proposals, MVIP and SCbus, specify
time-division buses for talk path interconnection and a separate communication
mechanism for coordinating the subsystems. The MVIP effort is administered by the
GO-MVIP organization; the SCbus was developed by the SCSA working group, recently
subsumed within the Enterprise Computer Telephony Forum (ECTF). Both of these
groups have also proposed programming interfaces for the control of such systems;
these are discussed later.
Signaling: The Call Control Connection
The telephone network is a widely-distributed system of intelligent switching nodes. For
these nodes to cooperate successfully for the establishment and tearing down of calls,
they must communicate with each other and with the users' terminal equipment. This
process is called "signaling". An accurate and reliable signaling connection between
telephone and computer systems is essential to successful computer-telephone
applications, since signaling is the means of call control and constitutes the only
communication between the intelligent systems in the two domains.
Signaling can take place "inband", that is, through the telephony talk path channel, or
"out-of-band", that is, through some communication channel other than the talk path. In
today's telephone network, terminal equipment signaling is generally in-band (except for
ISDN devices), while signaling between telephone switches is often done out-of-band
for security and performance reasons.
The original terminal-equipment signaling was, of course, the human voice as a
subscriber spoke to the operator. The first automatic terminal equipment signaled with
timed make-break pulses across an analog telephone line and special switch-generated
tones to alert the subscriber to call states such as ringing and busy/engaged. In many
telephone systems, tone signaling is now used for inband terminal equipment signaling
in both directions. The best-known scheme for terminal-equipment-to-network signaling

is Dual Tone Multi-Frequency (DTMF), under which the terminal equipment generates
simultaneous pairs of tones to represent each dialed digit.
Unfortunately, the signaling from the telephone network back to the terminal equipment
has not been similarly standardized, a situation all too familiar to subscribers trying to
make international calls. The signaling tones returned from the far end of an
international call often do not resemble local signaling tones, and the subscriber may
not be able to tell the difference between another country's busy/engaged signal and a
ringing signal.
Needless to say, it is a significant challenge to design computer-telephone terminal
equipment which can accurately interpret the widely-varying tones and other in-band
signals generated by various elements of the worldwide telephone network. Indeed,
achieving accurate and reliable signaling between computer-based telephone interfaces
and traditional telephone equipment is one of the greatest difficulties in building reliable
computer-telephone applications.
This difficulty can be somewhat alleviated by shifting to out-of-band signaling schemes,
which generally rely on unambiguous digital messaging. For example, the digital
message-oriented signaling of an ISDN basic rate terminal device is much more reliable
than analog in-band signaling. (But note that even ISDN basic rate signaling is not yet
completely standardized around the world.) A similar digital, message-oriented (but but
non-standard) signaling capability is provided by the signaling schemes used by the
digital telephone sets offered by many PBX vendors. And computer-telephone
integration (CTI) links, now offered on most modern PBXs, offer a signaling mechanism
through which a computer system can receive consolidated signaling for groups of
telephone extensions.
Multiple signaling methods are often available on a single telephone system. One PBX
might simultaneously support a CTI link, ISDN trunk circuits, and proprietary digital set
signaling. Any of these will provide more accurate signaling information for computertelephone applications than is available through inband analog terminal device signaling
on those same switches.
First-Party and Third-Party Call Control
The relationship between a computer application and the call control it exerts over a
telephone line is classified as first-party or third-party call control.
First-party call control is call control exerted over a telephone line on which the
computer application is also a "talking" party -- that is, a call on which the application is
also capable of exercising media processing functions.
For example, if a computer application receives an inbound call on a voice board having
a normal telephone line interface, senses the ring signal, answers the call, and initiates
the system's voicemail application to greet the caller, it is using first-party call control.
Third-party call control is call control exerted over telephone lines on which the
computer application is not necessarily also a "talking" party.
For example, if a server-based application is monitoring several users' telephone lines
(without the benefit of an actual physical connection to each of those lines), is alerted to
an arriving call on one of the lines, and causes that call to be diverted to some other
user's telephone, it is exerting third-party call control. Third-party call control usually also
implies out-of-band signaling, since there is by definition no direct connection between
the computer system running the application and the telephone line being controlled.
Generally, first-party call control functions are those which could be accomplished by a

human attendant via a standard telephone set attached to the telephone system in the
same manner as the application equipment. Third-party call control functions are those
which would require a human attendant to use a specialized telephone set with special
priveleges, such as an operator's console.
Sharing Computer-Telephone Resources
Computer-telephone applications vary considerably in complexity depending upon
whether they allow the sharing of telephone-related resources. For example, an
application that has sole control of a voice card and telephone line (such as a voice
response application connected to a dedicated line) is much simpler in design and
construction from an application which must share control of resources with several
other applications and/or a human user. Control mechanisms for these shared
applications are often one of the most difficult aspects of computer-telephone
application design.
For example, a telephone line terminating at a facsimile card installed in a user's
personal computer would be a non-shared resource. (Figure 1) The only applications
which can use this telephone line and its associated facsimile capability are those
residing on that one particular computer system. On the other hand, a telephone line
terminating on a server with a pool of facsimile cards could be used by any system
connected to the same local area network and authorized to use the facsimile server.
(Figure 2)
Each of these configurations has advantages and disadvantages. The shared
configuration requires the overhead of more sophisticated access control and
management capabilities, but the pooling of resources inherent in this scheme offers
more efficiency in resource allocation and thus better handling of peaks and valleys in
usage patterns as compared resources dedicated to individual systems. From an
economic perspective, dedicated resources are more appropriate for individuals or very
small work groups; server-based resources are better for medium to large work groups
and for enterprise-wide systems.
Resource-sharing modes are often confused with first-party and third-party call control
modes. Shared resources, accessed through a server, are usually configured for thirdparty call control, while dedicated resources are usually restricted to first-party call
control functions. But this is not always the case. A dedicated ISDN line, terminating at
a single computer system, can accomplish third-party call control functions through the
capabilities of the ISDN D-channel signaling protocol without ever establishing an actual
talk-path through an ISDN B-channel. Conversely, a call control server connected to a
PBX via a CTI link may offer only first-party call control functions to client applications,
even though the application call control requests pass through a shared server.
Choices For Out-of-Band Signaling
The most challenging aspect of computer-telephone applications is signaling, that is,
achieving accurate and reliable call control. The most important recent commercial
advances in computer telephony have been in this area, with improvements both in the
underlying signaling connections and in the programming interfaces (APIs) which
enable application software to exercise that signaling capability.
As mentioned earlier, the most reliable way to implement signaling between a telephone
system and a computer telephone application is to use out-of-band signaling, which
creates a direct message-based digital information link between the intelligent

telephone switch and the computer-based application. This approach is much more
accurate than in-band signaling, under which the application must attempt to generate
and recognize widely-varying and ambiguous analog signals in the call's talk path.
Out-of-band signaling is available in several forms:
The D-channel associated with basic and primary rate ISDN lines;
The proprietary digital signaling between PBXs and digital telephone sets;
The switch-to-switch signaling protocol called Signaling System 7 (SS7) used in
public and large private telephone networks; and
The CTI links available for many modern PBXs and some public exchange
switches.
The practitioner will frequently need to choose between these mechanisms when
designing a computer-telephone system.
Many interesting computer telephone applications can be built using only the out-ofband signaling capabilities of the ISDN basic and primary rate specifications. (Figure 3)
For example, an application system connected to the telephone network through an
ISDN facility can provide a network-based automatic call distributor (ACD) distributing
calls to remote public network subscribers, or a call routing application for private PBX
networks. These applications, however, are often limited by the telephone domain
where the ISDN signaling is valid and consistent. For example, the ACD application
may not operate correctly when calls cross between public telephone network
boundaries, and the call routing application depends on inter-PBX feature transparency
and may not work in a heterogeneous network of different manufacturers' PBXs. These
limitations will gradually disappear as ISDN telephone service becomes consistent
worldwide.
In contrast to ISDN D-channel signaling, the SS7 and CTI link techniques can provide a
more complete view of calls passing through the corresponding telephone domains. The
domain for SS7 signaling can be as large as an entire public telephone network; the
domain for a CTI link is a single telephone switch or a small number of tightly-integrated
switches.
SS7 is a complex protocol, and is closely tied to the internal operation of a telephone
network. Because of this, terminal equipment is not usually granted the privilege of an
SS7 connection. A few long-distance telephone carriers do offer such a connection via
appropriate security firewalls.
A typical such service announces each call to the customer's computer application via
the SS7 protocol and then allows the application to choose among a set of predetermined call routing options by replying with another SS7 message. (Figure 4) An
arrangement based on SS7 requires sophisticated customer premises equipment, and
is usually only appropriate for call centers handling large call volumes.
CTI links serve a similar purpose, but on a smaller scale more suitable for the relatively
simpler environment of a customer premises PBX or a single public telephone
exchange switch. (Figure 5) CTI links also offer a broader range of call control functions
than commercial customer-premises SS7 services, including call initiation and hangup
as well as call routing. CTI links can operate using either a proprietary protocol (such as
Northern Telecom's Meridian Link Protocol and AT&T's ASAI protocol) or a standard
protocol (such as the ECMA CSTA protocol mentioned earlier).
The CSTA protocol has now been implemented by a growing number of switch vendors
including major manufacturers such as Siemens ROLM, Ericsson, and Alcatel. Note that
commercial CTI link implementations vary in the set of features supported, and although

though they are standards-based, even CSTA implementations are not necessarily
equivalent or interoperable.
Because they provide access to shared resources, both the SS7-based connections
and CTI links typically terminate in a server rather than a specific application computer.
This allows multiple applications to influence calls flowing through a common telephone
domain, and provides greater flexibility regarding the computer systems on which these
applications can be installed.
Application Programming Interfaces
An application programming interface (API) is the mechanism through which application
software manipulates telephone resources. APIs are necessary for both the call control
and media processing functions.
Several existing non-telephony APIs have found a useful role in computer telephony,
particularly for controlling media processing functions.
For example, once a telephone call is established, the Microsoft Windows APIs used for
the manipulation of desktop multimedia objects (for example, the playing of sound files
through a local speaker) can be used to send and receive similar multimedia content
over the telephone connection. Because of their heritage, however, the resource
models used by these existing APIs turn out to be more suitable for local (non-shared)
resources than for remote or shared resources. New APIs and resource models are
needed to implement shared media processing resources on shared servers.
Several cross-vendor efforts have sprung up to address this need, including the MultiVendor Interface Program (MVIP) and the Enterprise Computer Telephony Forum
(ECTF), each of which has activities relating to software architectures and APIs for
shared media processing resources.
Proprietary APIs for first-party call control were first developed by modem, voice board,
and fax board manufacturers to support their own products. The only API in this group
to achieve de facto standards status was the Hayes modem command set, which
included basic functions for dialing and hanging up telephone calls.
APIs for third-party call control did not have equivalents in traditional application
environments and had to be developed specifically to support computer telephony. The
first third-party APIs were developed by computer manufacturers to support applications
running on their own systems. For example, IBM introduced the CallPath API and
Digital Equipment introduced the Computer-Integrated Telephony (CIT) API in the late
1980s for use on their respective systems.
The industry took a major step forward in the 1990s with the introduction of two call
control APIs which were not linked to any individual computer manufacturer:
The Telephony Services API (TSAPI) developed by AT&T and Novell, and
The Telephony API (TAPI) developed by Microsoft.
These APIs, both strongly oriented towards the desktop personal computer and its
flourishing software industry, have made mass-market computer telephone applications
economically feasible for the first time.
APIs vs. Commercial Products
A programming interface is simply a specification; it is not a commercial product in its
own right. As straightforward as this may sound, the two concepts are often confused in
the marketplace.
An API is the meeting point for two commercial products:

An application which generates requests according to the API, and


A service provider which receives those requests and executes them in a certain
telephone environment.
Like the application, the service provider is software, typically taking the form of a
device driver which implements an interface to a particular type of telephone equipment.
The rapid commercial advance of computer telephony in the 1990s can be attributed to
the development and marketing of commercial service-provider software products which
implement the TAPI and/or TSAPI APIs.
Novell offers a commercial CTI server product, Netware Telephony Services, which
operates within the Novell Netware environment. It provides a TSAPI interface between
applications on remote client machines and telephone system driver modules provided
by third parties, thus creating an interface between those applications and telephone
switching systems.
Microsoft has taken a similar path with TAPI, building a capability into the Microsoft
Windows family of operating systems which provides a TAPI interface between
Windows-based client applications and third-party service provider driver modules.
In both cases, the driver modules for specific telephone systems must be built by third
parties, much as printer drivers must be supplied by printer manufacturers before their
printers can be used under the Microsoft Windows operating system. Each of these
products also supports a single API.
It is possible to build a CTI server which supports multiple APIs simultaneously,
mapping requests from all APIs into a single common function set. This is the approach
taken by the CTI server from Dialogic Corporation, CT Connect, which supports both
TAPI and TSAPI interfaces. The Dialogic software also differs from the Novell and
Microsoft products in that it includes built-in drivers for the ECMA CSTA link protocol
and several other proprietary CTI link protocols.
Non-Traditional Telephony
Computer telephone applications are not restricted to the traditional forms of telephone
systems based on switches, transmission circuits, and telephone instruments.
For example, the new isoEthernet technology provides telephony talk paths operating
across an enhanced Ethernet local area network physical plant. Such an environment is
capable of delivering standard telephone service, that is, a real-time voice path between
two or more endpoints. It just accomplishes this in a new way, without the necessity of
installing traditional telephone switches and wiring. And because of its inherently wider
bandwidth, telecommunications facilities such as isoEthernet can handle new kinds of
calls such as interactive video and images. All of these new capabilities stretch the
limits of today's definition of telephony and expand the potential meaning of call control
and media processing.
To accommodate these expanding definitions, new models and implementation
methods for computer telephony will have to be found. For example, with isoEthernet,
the switching points are highly distributed in a potentially complex toplogy. This
distributed connection model is significantly different from traditional telephony, and will
require new models and methods through which applications can exercise call control in
that environment. The goal should be to provide these new capabilities in an forwardcompatible manner from current computer-telephony architectural models.
Computer Telephony and Client-Server Computing

Appropriate application software architectures are essential to the effective use of


computer telephone technology. A difficult hurdle in the early adoption of computertelephone systems was the unfortunate requirement to modify business application
software in order to make use of the new telephone features. Because most application
software ran in a centralized mainframe or minicomputer, the central application
software had to be changed to implement a computer-telephone feature. Most
companies elected not to attempt such changes, and declined to implement computertelephone application features even though the business benefit was attractive.
For example, an insurance company might have wanted a certain screen of database
information to "pop up" for its service representatives as they answered customer
telephone calls. Computer telephone technology has long been capable of generating
the necessary telephone-based trigger event to accomplish this. But the insurance
company would probably have balked at the necessity of modifying its central customer
database application in order to achieve this feature. The risk and effort involved in
changing centralized mission-critical application software was simply too great.
However, as corporations shift away from a total dependency on mainframe-based
applications and towards client-server architectures, the integration of computer
telephone features becomes easier and less risky.
Client-server applications depend on intelligence at the desktop, and rely on pulling
information to the desktop rather than pushing data outwards from the mainframe to a
dumb terminal. With the client-server approach, computer-telephone application
features can be implemented at the desktop or in a department-level server rather than
in the mainframe system, an easier and less risky approach which makes computer
telephony accessible to a wider range of organizations.
For example, with the client-server approach, when a call arrives at a customer service
representative's desk a corresponding telephone event message can be sent to an
application running at that user's desktop. This event message, delivered through a
computer telephony API, can trigger a desktop application to retrieve the desired
information via whatever retrieval mechanism is appropriate -- including fetching the
data from a mainframe.
This retrieval logic can be built into the existing client-server desktop application, or
implemented as a new desktop application which interoperates with the existing one. In
the latter case, desktop application integration tools such as Microsoft's Dynamic Data
Exchange (DDE) and Object Linking and Embedding (OLE) can be used as an open,
standard inter-application communication mechanism, further simplifying the integration
effort with existing applications and eliminating the necessity to change them.
Computer Telephony: A Wealth of Options
Computer telephony today is characterized by a wealth of choices and options.
Computer-telephone applications can be small or large in scope, simple or complex in
operation. Any single feature can be implemented in a staggering number of ways, with
implementation choices on both the telephone and computing sides of the equation.
There is no right or wrong way to build a computer-telephone system.
With all of these choices, it is essential that the systems practitioner become
knowledgeable about both computing and telephony, and begin to learn ways in which
these two systems environments can be linked together.
Competence in both disciplines will become essential as the two technologies become
even more closely integrated. The current focus on linkage between discrete telephone

and computing systems is just a transition phase. Very soon, the distinction between
telephone switches and LAN servers will disappear, as hybrid telephony servers are
brought to market containing both switching and application-interface functions.
Computer telephony is at an important turning point: The necessary elements of the
technology have been developed; now we need to educate large numbers of insightful
practitioners who can put it to productive use.

Extrado de http://www.dialogic.com/company/whitepap/carlieee.htm

S-ar putea să vă placă și