Sunteți pe pagina 1din 61

Fundamentals of Multimedia

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 1
Fundamentals of Multimedia

About the Book

This Hand Book is designed for B.Sc. (IST) Honours student. It discusses the
Fundamentals of Multimedia. All topics are explained in lucid manager and can be easily
understood by the students, This Hand Book can be very useful not only for the examination
point- of- view but also helpful for the interview aspirant. A large number of carefully selected
worked examples and step by step procedures given for solving problems to benefit the students.

About the Author

Er. Bimal Kumar Ray completed his Post Graduate Diploma in Computer
Programming and Applications (PGDCPA) from All India Society for Electronics and
Computer Technology, Govt. of India, Master of Science in Information Technology (M.SC.IT)
from Visva Bharati Central University , Santiniketan, West Bengal and Master of Technology in
Compuer Science (MTech CS) from Utkal University, Vani Vihar, BBSR, Odisha. He has 13
years of Teaching Experience in Technical Fields for Programming Language: (C, Data
Structure ,C++, Visual Basic, Microsoft Visual Studio), Java Technologies: (Core Java, JSP,
Servlet, JDBC, J2EE, JEB etc.) Web Development Technologies: (PHP, HTML5, XML, CSS,
AJAX, JSON, JQUERY, BOOTSTRAP, DRUPAL, ANGULAR JS, JAVASCRIPT,
VBSCRIPT), Mobile Development Technologies: Android Programming, Database: (Ms
Access, SQL Server 2000, Oracle11g, MySQL, SQLite) , Ms Office Packages 2010 and Tally-
ERP.

Thanking You,

Author
Er. Bimal Kumar Ray

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 2
Fundamentals of Multimedia

Multimedia Objectives
Multimedia data representation
Multimedia data processing
Multimedia data compression
Multimedia data transmission
Multimedia mobile games
Multimedia data security
Multimedia human computer interaction

Introduction to Multimedia
What is Media? Media is something that can be used for presentation of information. Two basic
ways to present some information are:
1. Unimedia presentation: Single media is used to present information.
2. Multimedia presentation: More than one media is used to present information.
Multi: means many; much; multiple
Medium: Means for distribution and presentation of information. An intervening substance
through which something is transmitted or carried on a media of mass communication such as
newspaper, magazine, or television. Medium a substance regarded as the means of transmission of
a force or effect; a channel or system of communication, information, or entertainment.

Multimedia Definitions: Multimedia is media and content that uses a combination of different
content forms. The term is used in contrast to media which only use traditional forms of printed or
hand-produced material. Multimedia includes a combination of text, audio, still images, animation,
video, and interactivity content forms. When a viewer of a multimedia presentation is allowed to
control what elements are delivered and when, it is interactive multimedia. Multimedia is linear,
when it is not interactive and the users just sit and watch as if it is a movie. Multimedia is
nonlinear, when the users are given the navigational control and can browse the contents at will.
Multimedia is usually recorded and played, displayed or accessed by information content
processing devices, such as computerized and electronic devices, but can also be part of a live
performance. Multimedia also describes electronic media devices used to store and experience
multimedia content. Multimedia is an inter-disciplinary subject because it involves a variety of
different theories and skills: these include computer technology, hardware and software; arts and
design, literature, presentation skills; application domain knowledge. Multimedia Authoring tools
are those programs that provide the capability for creating complete multimedia presentations by
linking together objects such as a paragraph of text (song), an illustration, an audio, with
appropriate interactive user control.

Need For Multimedia


Multimedia in information technology refers to use of more than one of these media for
information presentation to users.
Common media for storage, access, and transmission of information are:
1. Text (alphanumeric characters)
2. Graphics (line drawings and images)
3. Animation (moving images)

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 3
Fundamentals of Multimedia

4. Audio (sound)
5. Video (Videographer real-life events)
Text Media
Alphanumeric characters are used to present information in text form. Computers are
widely used for text processing.
Keyboards, OCRs, computer screens, and printers are some commonly used hardware
devices for processing text media.
Text editing, text searching, hypertext, and text importing/exporting are some highly
desirable features of a multimedia computer system for better presentation and use of
text information.
Graphics Media
Computer graphics deals with generation, representation, manipulation, and display of
pictures (line drawings and images) with a computer.
Locating devices (such as a mouse, a joystick, or a stylus), digitizers, scanners, digital
cameras, computer screens with graphics display capability, laser printers, and plotters are
some common hardware devices for processing graphics media.
Some desirable features of a multimedia computer system are painting or drawing software,
screen capture software, clip art, graphics importing, and software support for high
resolution.

Animation Media
Computer animation deals with generation, sequencing, and display (at a specified rate) of
a set of images (called frames) to create an effect of visual change or motion, similar to a
movie film (video)
Animation is commonly used in those instances where videography is not possible or
animation can better illustrate the concept than video.
Animation deals with displaying a sequence of images at a reasonable speed to create an
impression of movement. For a jerk-free full motion animation, 25 to 30 frames per second
is required.
Scanners, digital cameras, video capture board interfaced to a video camera or VCR,
computer
monitors with image display capability, and graphics accelerator board are some
common hardware devices for processing animation media.
Some desirable features of a multimedia computer system with animation facility are
animation creation software, screen capture software, animation clips, animation file
importing, and software support for high resolution, recording and playback capabilities,
and transition effects.

Audio Media
Computer audio deals with synthesizing, recording, and playback of audio or sound with a
computer.
Sound board, microphone, speaker, MIDI devices, sound synthesizer, sound editor and
audio mixer are some commonly used hardware devices for processing audio media.
Some desirable features of a multimedia computer system are audio clips, audio file
importing, and software support for high quality sound, recording and playback

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 4
Fundamentals of Multimedia

capabilities, text-to-speech conversion software, speech-to-text conversion software, and


voice recognition software.

Video Media
Computer video deals with recording and display of a sequence of images at a reasonable
speed to create an impression of movement. Each individual image of such a sequence is
called a frame.
Video camera, video monitor, video board, and video editor are some of the commonly
used hardware devices for processing video media.
Some desirable features of a multimedia computer system with video facility are video
clips and recording and playback capabilities.

Hypermedia VS Multimedia
A hypertext system: meant to be read nonlinearly, by following links that point to other parts of
the document, or to other documents.
Hypermedia: not constrained to be text-based, can include other media, e.g., graphics, images,
and especially the continuous media sound and video.
- The World Wide Web (WWW) the best example of a hypermedia application.
Multimedia: means that computer information can be represented through audio, graphics,
images, video, and animation in addition to traditional media.

Linear VS Non-Linear Multimedia


Linear:
A Multimedia Project is identified as Linear when:
It is not interactive
Users have no control over the content that is being showed to them.
Example: A movie, a non-interactive lecture / demo show

Non-Linear:
A Multimedia Project is identified as Non-Linear when:
It is interactive
Users have control over the content that is being showed to them.
Users are given navigational control

Example: Games, Courseware and Interactive CD etc

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 5
Fundamentals of Multimedia

Global Structure of Multimedia


A multimedia Global Structure System is characterized by computer-controlled, integrated
production, manipulation, storage and communication of independent information, which is
encoded at least through a continuous (time-dependent) and a discrete (time-independent) medium.

Application
Domain

System
Domain

Device
Domain

Figure: Global Structure


Application Domain
Application domain provides functions to the user to develop and present multimedia
projects. This includes Software tools, and multimedia projects development methodology.
Many functions of document handling and other applications are accessible and presented
to the user through a user interface.
A document consists of a set of structured information represents in different media and
generated or recorded at the time of presentation.

System Domain
System domain including all supports for using the functions of the device domain that is
operating systems, communication systems (networking) and database systems.
The interface the device domain and the system domain is specified by the computer
technology
Database System allows a structured access to data a management of large database.
Operating System interface between computer hardware with system software and all
other software components. ( network, memory, processor, input/output devices etc)
Communication System data transmission according to the timing and reliability
requirements of the networked multimedia application

Device Domain
Device domain basic concepts and skill for processing various multimedia elements and for
handling physical device

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 6
Fundamentals of Multimedia

Processing of digital audio, video graphics, images, animation data are based on digital
signal processing.
Graphics, visualization, computer vision, data compression, graph theory, networking,
database systems. Multimedia and Hypermedia

Cross Domain
Compositions must allow any type of logical structure besides those ones for
synchronization (presentation) purposes.
Multimedia involves multiple modalities of text, audio, images, drawings, animation, and
video.
Examples of how these modalities are put to use: Video teleconferencing, Distributed
lectures for higher education, Tele-medicine and Co-operative work environments etc.

AUDIO
TEXT

GRAPHIC
ANIMATION

VIDEO

Media and Data Streams


Data Stream is any sequence of individual packets transmitted in a time-dependent fashion.
Packets can carry information of either continuous or discrete media. Media can be classified with
respect to different criteria according to perception, representation, presentation, storage
transmission and information Exchange medium.

Perception Medium
Perception media refers to the nature of information perceived by humans, which is not strictly
identical to the sense that is stimulated. For example, a still image and a movie convey information
of a different nature, though stimulating the same sense. The question to ask here is: How do
humans perceive information?
In this context, we distinguish primarily between what we see and what we hear. Auditory media
include music, sound, and voice. Visual media include text, graphics, and still and moving pictures.
This differentiation can be further refined. For example, a visual medium can consist of moving
pictures, animation, and text. In turn, moving pictures normally consist of a series of scenes that, in
turn, are composed of single pictures.

Representation Medium
The term representation media refers to how information is represented internally to the computer.
The encoding used is of essential importance. The question to ask here is: How is information
encoded in the computer? There are several options:
Each character of a piece of text is encoded in ASCII.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 7
Fundamentals of Multimedia

A picture is encoded by the CEPT or CAPTAIN standard, or the GKS graphics standard
can serve as a basis.
An audio data stream is available in simple PCM encoding and a linear quantization of 16
bits per sampling value.
A single image is encoded as Group-3 facsimile or in JPEG format.
A combined audio-video sequence is stored in the computer in various TV standards (e.g.,
PAL, SECAM, or NTSC), in the CCIR-601 standard, or in MPEG format.

Presentation Medium
The term presentation media refers to the physical means used by systems to reproduce
information for humans. For example, a TV set uses a cathode-ray tube and loudspeaker. The
question to ask here is: Which medium is used to output information from the computer or
input in the computer?
We distinguish primarily between output and input. Media such as paper, computer monitors, and
loudspeakers are output media, while keyboards, cameras, and microphones are input media.

Storage Medium
The term storage media is often used in computing to refer to various physical means for storing
computer data, such as magnetic tapes, magnetic disks, or digital optical disks. However, data
storage is not limited to the components available in a computer, which means that paper is also
considered storage medium. The question to ask here is: Where is information stored?

Transmission Medium
The term transmission media refers to the physical means cables of various types, radio tower,
satellite, or ether (the medium that transmit radio waves)that allow the transmission of
telecommunication signals. The question to ask here is: Which medium is used to transmit data?
The information is transmitted over networks, which use wire and cable transmission such as
coaxial cable and fiber optical as well as free air space transmission.

Information Exchange Medium


Information exchange media include all data media used to transport information that is all storage
and transmission media. The question to ask here is: Which data medium is used to exchange
information between different locations?
For example, information can be exchanged by storing it on a removable medium and transporting
the medium from one location to another. These storage media include microfilms, paper, and
floppy disks. Information can also be exchanged directly, if transmission media such as coaxial
cables, optical fibers, or radio waves are used.

Multimedia Hardware Products


Input Devices: (Keyboards, Mice, Digital Cameras, MIDI Keyboards, Touch Screens, Trackballs,
Scanner , Voice Recognition Systems, Magnetic Card Encoders and Readers, Tablets etc.)

Output Devices: (Monitors, Speakers, Printer, Projector, Video Devices)

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 8
Fundamentals of Multimedia

Storage Devices: (CD-ROM Drives, Pen drive, Magneto-optical drives, Communication devices,
Modems)

Multimedia Software Products


Painting and Drawing Software
Painting and drawing software are the most important items in your toolkit because the impact of
the graphics in your project will likely have the greatest influence on the end user. Painting
software is dedicated to producing excellent bitmapped images. Drawing software is dedicated to
producing line art that is easily printed to paper. Drawing packages include powerful and
expensive computer-aided design (CAD) software. Ex: Desk Draw, Desk Paint, Designer etc

3-D Drawing Software


CAD (computer-aided design) is software used by architects, engineers, drafters, artists and others
to create precision drawings or technical illustrations. It can be used to create two-dimensional (2-
D) drawings or three dimensional modules. The CAD images can spin about in space, with lighting
conditions exactly simulated and shadows properly drawn. With CAD software you can stand in
front of your work and view it from any angle, making judgments about its design. Ex: AutoCAD,
3D Max etc

Image Editing Software


Image editing applications are specialized and powerful tools for enhancing and retouching
existing bitmapped images. These programs are also indispensable for rendering images used in
multimedia presentations. Modern versions of these programs also provide many of the features
and tools of painting and drawing programs, and can be used to create images from scratch as well
as images digitized from scanners, digital cameras or artwork files created by painting or drawing
packages.
Ex: Adobe Photoshop, Adobe CorelDraw etc

Optical Character Recognition (OCR) software


Often you will have printed matter and other text to incorporate into your project, but no electronic
text file. With Optical Character Recognition software, a flat-bed scanner and your computer you
can save many hours of typing printed words and get the job done faster and more accurately. Ex:
Perceive

Sound Editing Software


Sound editing tools for both digitized and MIDI sound let you see music as well as hear it. By
drawing the representation of the sound in a waveform, you can cut, copy, paste and edit segments
of the sound with great precision and making your own sound effects. Using editing tools to make
your own MIDI files requires knowing about keys, notations and instruments and you will need a
MIDI synthesizer or device connected to the computer. Ex: Sound Edit Pro, Sony Sound Forge etc

Animations and Digital Movies Software


Animations and digital movies are sequences of bitmapped graphic scenes (frames), rapidly played
back. But animations can also be made within an authoring system by rapidly changing the
location of objects to generate an appearance of motion.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 9
Fundamentals of Multimedia

Movie-making tools let you edit and assemble video clips captured from camera, animations,
scanned images, other digitized movie segments. The completed clip, often with added transition
and visual effects can be played back.
Ex: Animator Pro, Macromedia Director, Flash-Mx, adobe premiere Pro and Super Video
Windows etc

Multimedia in Education
Interactive Multimedia in Education and Training emerges out of the need to share information and
knowledge on the research and practices of using multimedia in various educational settings. Roles
and application of multimedia in different education and training contexts are highlighted, as are
case studies of multimedia development and use, including areas such as language learning,
cartography, engineering education, health sciences, and others. Multimedia can stimulate more
than one sense at a time, and in doing so, may be more attention-getting and attention-holding.
There is substantial research supporting the effectiveness of Information Technology-Assisted
Project-Based Learning (IT-assisted PBL). When IT-assisted PBL is used in a constructivist,
cooperative learning environment, students learn more and retain their knowledge better.
Moreover, students learn the content area being studied, how to design and carry out a project, and
uses of IT.
Provide students with opportunities to represent and express their prior knowledge.
Allow students to function as designers, using tools for analyzing the world, accessing and
interpreting information, organizing their personal knowledge, and representing what they
know to others.
Multimedia applications engage students and provide valuable learning opportunities.
Empower students to create and design rather than "absorbing representations created by
others."
Encourages deep reflective thinking.
Create personally meaningful learning opportunities.

Studies to Support Multimedia use in Education


Studies support distinctive differences in ways students retain information gathered and applied
using multimedia versus traditional modes of instruction show that student composition
representing ideas simultaneously through text and audio, video and sound increased the likelihood
that students will acquire an understanding of complex information. It is a reasonable conjecture
that using an even wider range of media will extend this effect. The same study also noted that
students with a wide range of abilities "readily mastered these tools and were highly motivated by
the opportunity to augment their writing with other media." That is, this increased variety of
expression enhanced attitudes as well.

A primarily qualitative, observational investigation was conducted over a two-year period while
the students worked cooperatively to create interactive displays for a touch-sensitive multimedia
kiosk for the zoo.

Several categories emerged out of the qualitative analysis of the data which included extensive
videotapes, interviews, observations, and student-created materials. The students' strong
appreciation that they were preparing multimedia materials for a real audience emerged as the core
category in the analysis related findings were:
Students demonstrated great concern for accuracy in their displays,
Students quickly assumed the major responsibility for content and editing decisions despite
the fact that the original task of designing the displays had been structured for them by the
teacher.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 10
Fundamentals of Multimedia

Students accessed wide ranges of science materials and sources to find the content they
desired.
Their commitment to and enthusiasm for the project remained very high.

Educational Benefits of Multimedia Tools

Multimedia tools giving students an opportunity to produce documents of their own provides
several educational advantages.
Students that experience the technical steps needed to produce effective multimedia
documents become better consumers of multimedia documents produced by others.
Students indicate they learn the material included in their presentation at a much greater
depth than in traditional writing projects.
Students work with the same information from four perspectives:
1) Researcher, they must locate and select the information needed to understand the chosen
topic.
2) Authors, they must consider their intended audience and decide what amount of
information is needed to give their readers an understanding of the topic.
3) Designers, they must select the appropriate media to share the concepts selected.
4) Writers, they must find a way to fit the information to the container including the
manner of linking the information for others to retrieve.
All of these contribute to student learning and help to explain the improved student learning that is
often associated with IT-assisted PBL.

There is another aspect to developing multimedia documents that empowers students. Students
quickly recognize that their electronic documents can be easily shared. Because of this, students
place a greater value on producing a product that is of high standard. An audience of one the
teacher is less demanding than an audience of many particularly ones peers. Students quickly
recognize that publishing a multimedia document that communicates effectively requires attention
to both the content and the design of the document.

Project Management Skills


Creating a timeline for the completion of the project.
Allocating resources and time to different parts of the project.
Assigning roles to team members.

Research Skills
Determining the nature of the problem and how research should be organized.
Posing thoughtful questions about structure, models, cases, values, and roles.
Searching for information using text, electronic, and pictorial information sources.
Developing new information with interviews, questionnaires and other survey methods.
Analyzing and interpreting all the information collected to identify and interpret patterns.

Organization and Representation Skills


Deciding how to segment and sequence information to make it understandable.
Deciding how information will be represented (text, pictures, movies, audio, etc.).
Deciding how the information will be organized (hierarchy, sequence) and how it will be
linked.

Presentation Skills
Mapping the design onto the presentation and implementing the ideas in multimedia.
Attracting and maintaining the interests of the intended audiences.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 11
Fundamentals of Multimedia

Reflection Skills
Evaluating the program and the process used to create it.
Revising the design of the program using feedback.

Introduction of Sound
Sound is actually vibration. You can create your own sound vibrations by using your vocal
cords. Feel your throat and hum.
Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid,
liquid, or gas, composed of frequencies within the range of hearing.
To create sound, your computer feeds electricity at a certain wave length through a
stretched membrane (the speaker), causing it to vibrate.
Sound is a wave, similar to the ripples on a pond or the ocean waves you might see
crashing on a beach, Instead of being a wave on the ocean surface.
Sound is a form of energy, just like electricity and light. Sound is made when air molecules
vibrate and move in a pattern called sound.
A sound wave is the pattern of disturbance caused by the movement of energy traveling
through a medium such as air, water, or any other liquid or solid matter.
Light and sound are both waves. However, the former can travel through a vacuum while
the latter cannot.
Sound methodology and audio techniques engage in processing these sound waves.
The important things of sound are coding, storage on recorders or digital audio tape, music
and speech processing.
Multimedia application use audio in the form of music and speech, music and its MIDI
standard as well as speech synthesis, speech recognition and speech transmission.
The audio and video signals are compressed because similar methods are used for
compressing the data of different media.

Basic Sound Concept


Sound is a physical phenomenon caused by vibration of material, such as a violin string or a wood
log. This type of vibration triggers pressure wave fluctuations in the air around the material. The
pressure waves propagate in the air. The pattern of this oscillation is called wave form .We hear a
sound when such a wave reaches our ears.

Pressure wave oscillation in the air

This wave form occurs repeatedly at regular intervals or periods. Sound waves have a natural
origin, so they are never absolutely uniform or periodic. A sound that has a recognizable
periodicity is referred to as music rather than sound, which does not have this behavior. Examples
of periodic sounds are sounds generated by musical instruments, vocal sounds, wind sounds, or a

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 12
Fundamentals of Multimedia

bird song. Non-periodic sounds are, for example, drums, coughing, sneezing, or the brawl or
murmur of water.

Frequency
A sound frequency is the reciprocal value of its period. Similarly, the frequency represents the
number of periods per second and is measured in hertz (Hz) or cycles per second (cps). A common
abbreviation is kilohertz (kHz), which describes 1,000 oscillations per second, corresponding to
1,000Hz.Sound processes that occur in liquids, gases, and solids are classified by frequency range:
Infrasonic: 0 to 20Hz,
Audiosonic: 20Hz to 20 kHz, Ultrasonic: 20 kHz to 1GHz, Hypersonic: 1GHz to 10THz

Sound in the audiosonic frequency range is primarily important for multimedia systems. In this
text, we use audio as a representative medium for all acoustic signals in this frequency range. The
waves in the audiosonic frequency range are also called acoustic signals. Speech is the signal
humans generate by use of their speech organs. These signals can be reproduced by machines. For
example, music signals have frequencies in the 20Hz to 20 kHz range. We could add noise to
speech and music as another type of audio signal. Noise is defined as a sound event without
functional purpose, but this is not a dogmatic definition. For instance, we could add unintelligible
language to our definition of noise.

Amplitude
A sound has a property called amplitude, which humans perceive subjectively as loudness or
volume. The amplitude of a sound is a measuring unit used to deviate the pressure wave from its
mean value (idle state).

Computer Representation of Sound


Before the continuous curve of a sound wave can be represented on a computer, the computer has
to measure the wave amplitude in regular time intervals. It then takes the result and generates a
sequence of sampling values, or samples for short.

(a) Wave form


(b) Sampled Waveform
(c) Three-bit quantization

Sampling a Wave

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 13
Fundamentals of Multimedia

Mechanism that converts an audio signal into a sequence of digital samples is called an analog-to-
digital converter (ADC) and a digital-to-analog converter (DAC) is used to achieve the opposite
conversion.

To convert sounds between our analog and the digital of the computer, we use a device called an
analog to digital converter (ADC). A digital to analog converter (DAC) is used to convert these
numbers back to sound (or to make the numbers usable by an analog device, like a loudspeaker).
An ADC takes smooth functions (of the kind found in the physical world) and returns a list of
discrete values. A DAC takes a list of discrete values (like the kind found in the computer world)
and returns a smooth, continuous function, or more accurately the ability to create such a function
from the computer memory or storage medium.

Figure: A pictorial description of the recording and playback of sounds through an ADC/DAC.

Analog and Digital Waveform Representations


Now lets take a look at two time domain representations of a sound wave, one analog and one
digital, in Figure

An analog waveform and its digital cousin: the analog waveform has smooth and continuous
changes, and the digital version of the same waveform has a stair step look. The black squares are
the actual samples taken by the computer. The grey lines suggest the "stair casing" that is an
inevitable result of converting an analog signal to digital form. Note that the grey lines are only for

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 14
Fundamentals of Multimedia

showall that the computer knows about are the discrete points marked by the black squares.
There is nothing in between those points.

Sampling Rate
The rate at which a continuous wave form is sampled is called the sampling rate. Like frequency,
the sampling rate is measured in Hz. For example, CDs are sampled at a rate of 44,100 Hz, which
may appear to be above the frequency range perceived by humans. However, the bandwidth in this
case, 20,000Hz 20Hz = 19,980Hz that can represent a digitally sampled audio signal is only about
half as big as a CD sampling rate, because CDs use the Nyquist sampling theorem. This means that
a sampling rate of 44,100Hz covers only frequencies in the range from 0Hz to 22,050Hz. This limit
is very close to the human hearing capability.

Quantization
The digitization process requires two steps. First the analog signal must be sampled. This means
that only a discrete set of values is retained at (generally regular) time or space intervals. The
second step involves quantization. The quantization process consists of converting a sampled
signal into a signal that can take only a limited number of values. An 8-bit quantization provides
256 possible values, while a 16-bit quantization in CD quality results in more than 65,536 possible
values.

Audio File Format and Codec


Choosing digital audio formats for playing sound and music files Hundreds of file formats exist for
recording and playing digital sound and music files. While many of these file formats are software
dependant for example a Creative Labs Music File is a .cmf there are several well-known and
widely supported file formats. While different operating systems have different popular music file
formats, we'll mainly focus on those that are most commonly used on Windows-based PCs. Many
different digital audio formats and different software are used to create, store and manipulate these
files. That there is also a wide range of devices and products available that support multiple
formats. Should you not have the correct device for playing a particular file, you can also look for
software conversion tools that will convert one file type to another. Because some audio files are
open standards and some are proprietary, chances are we'll be seeing a wide variety of digital audio
formats for some time to come. An audio file format and audio codec (compressor/decompressor)
are two very different things. Audio codecs are the libraries that are executed in multimedia
players. The audio codec is actually a computer program that compresses or decompresses digital
audio data according to the audio file format specifications. For example, the WAV audio file
format is usually coded in the OCM format, as are the popular Macintosh AIFF audio files.

Audio Format Categories:


1. Uncompressed Audio Formats: (Referred to as PCM formats) are just as the name
suggests formats that use no compression. This means all the data is available, at the risk of
large file sizes. A WAV audio file is an example of an uncompressed audio file.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 15
Fundamentals of Multimedia

2. Lossless Compression Audio Formats: applies compression to an uncompressed audio


file, but it doesn't lose information or degrade the quality of the digital audio file. The
WMA audio file format uses lossless compression.
3. Lossy Compression Audio Formats: will result in some loss of data as the compression
algorithm eliminates redundant or unnecessary information basically it tosses what it sees
as irrelevant information. Lossy compression has become popular online because of its
small file size; it is easier to transmit over the Internet. MP3 and Real Audio files use a
lossy compression.

Common Windows-Compatible Audio/Sound Formats


1. MP3 (.mp3)
MP3 is the name of the file extension and also the name of the type of file for MPEG, audio
layer 3. Layer 3 is one of three coding schemes (layer 1, layer 2 and layer 3) for the
compression of audio signals. Layer 3 uses perceptual audio coding and psychoacoustic
compression to remove all superfluous information (more specifically, the redundant and
irrelevant parts of a sound signal. The stuff the human ear doesn't hear anyway). It also
adds a MDCT (Modified Discrete Cosine Transform) that implements a filter bank,
increasing the frequency resolution 18 times higher than that of layer 2. The result in real
terms is layer 3 shrinks the original sound data from a CD (with a bit rate of 1411.2 kilobits
per one second of stereo music) by a factor of 12 (down to 112-128kbps) without
sacrificing sound quality.

2. WMA - Windows Media Audio (.wma)


Windows Media Audio is a Microsoft file format for encoding digital audio files similar to
MP3 though can compress files at a higher rate than MP3. WMA files, which use the
".wma" file extension, can be of any size compressed to match many different connection
speeds, or bandwidths.

3. WAV (.wav)
WAV is the format used for storing sound in files developed jointly by Microsoft and IBM.
Support for WAV files was built into Windows 95 making it the de facto standard for
sound on PCs. WAV sound files end with a .wav extension and can be played by nearly all
Windows applications that support sound.

4. Real Audio (.ra .ram .rm)


Real Audio is a proprietary format, and is used for streaming audio that enables you to play
digital audio files in real-time. To use this type of file you must have RealPlayer (for
Windows or Mac), which you can download for free. Real Audio was developed by
RealNetworks.

5. MIDI - Musical Instrument Digital Interface (.mid)


MIDI is a standard adopted by the electronic music industry for controlling devices, such as
synthesizers and sound cards, that emit music. At minimum, a MIDI representation of a
sound includes values for the note's pitch, length, and volume. It can also include additional
characteristics, such as attack and delay time.
6. RIFF: Resource Interchange File Format a Microsoft developed format capable of
handling digital audio and MIDI.

7. SDMI (Secure Digital Music Interface)

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 16
Fundamentals of Multimedia

Standard released July 13, 1999. Designed to protect against most forms of unauthorised
copying

8. SND (sound): Used by Apple Macintosh (with interpreters for the PC available) limited to
8 bits.

9. AIFF: Originally developed by Apple, the "Audio Interchange File Format" is mostly used
by Silicon Graphics and Macintosh machines. AIFF files are easily converted to other file
formats, but can be quite large. One minute of 16-bit stereo audio sampled at 44.1 kHz
usually takes up about 10 megabytes. AIFF is often used in high-end applications where
storage space is not a consideration.

10. RealAudio: Developed by Progressive Networks, RealAudio was the first format to allow
for real time streaming of music and sound over the web. Listeners are required to
download the Real player to enjoy sound in RealAudio Format. The Real player can also
stream video and is currently in use by millions of Internet users worldwide.

11. Ogg (.ogg)


Ogg is an audio compression format, comparable to other formats used to store and play
digital music, but differs in that it is free, open and unpatented. It uses Vorbis, a specific
audio compression scheme that's designed to be contained in Ogg.

12. MOV (movie): Used by Apple Macintosh (with interpreters for the PC available), basically
a video format where the pictures are omitted.
13. Dolby Digital Surround Sound: Also known as AC3 (Audio Coding), or Dolby 5.1
(where .1 indicates subwoofer bass channel). Dolby Digital has been chosen as the standard
sound technology for DVD (digital video disk) and HDTV (High definition TV).

14. Dolby Digital Surround Sound Digital Track on Film: It is a digital encoded system of 6
separate and independent surround sound channels, for 6 speakers (Front (Left/right), Rear
(left/right), Front center and Sub-woofer. Since it is digital you can get no noise and so a
sound engineer can place sounds exactly in each channel.

Music
Any sound can be represented as a digitized sound signal that is a sequence of samples, each
encoded with binary digits. This sequence may be uncompressed as on audio compact disks, or
compressed. We also know that any sound may be represented in that way, including music. A
characteristic of this representation mode is that it does not preserve the sound semantic
description. Unless complex recognition techniques are used, the computer does not know whether
a bit sequence represents speech or music, for example, and if music what notes are used and by
which instrument.

Music can be described in a symbolic way. On paper, we have the full scores. Computers and
electronic musical instruments use a similar technique, and most of them employ the Musical
Instrument Digital Interface (MIDI), a standard developed in the early 1980s. The MIDI standard
defines how to code all the elements of musical scores, such as sequences of notes, timing
conditions, and the instrument to play each note.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 17
Fundamentals of Multimedia

Introduction to Musical Instrument Digital Interface (MIDI)


MIDI interface between electronics musical instruments and computers is a small piece of
equipment that plugs directly into the computers serial port and allows the transmission of music
signal. MIDI represents a set of specifications used in instrument development so that instruments
from different manufacturers can easily exchange musical information. The MIDI protocol is an
entire music description language in binary form. Each word describing an action of a musical
performance is assigned a specific binary code. MIDI data is communicated digitally through a
production system as a string of MIDI messages. MIDI is a standard control language and
hardware specification allows suitably Equipment electronic musical instruments and devices to
communicate real-time and nonreal-time performance and control data. a MIDI interface is
composed of two different components:
1. Hardware to Connect the Equipment: MIDI hardware specifies the physical connection
of musical instruments. It adds a MIDI port to an instrument, it specifies a MIDI cable (that
connects two instruments), and processes electrical signals received over the cable.
2. Data Format Encodes: Information to be processed by the hardware. The MIDI data
format does not include the encoding of individual sampling values, such as audio data
formats. Instead, MIDI uses a specific data format for each instrument, describing things
like the start and end of scores, the basis frequency, and loudness, in addition to the
instrument itself.

MIDI Devices
An instrument that complies with both components defined by the MIDI standard is a MIDI device
(e.g., a synthesizer) able to communicate with other MIDI devices over channels. The MIDI
standard specifies 16 channels. Synthesizer: an electronic musical instrument, typically operated
by a keyboard, producing sounds by generating and combining signals of different frequencies. A
MIDI device is mapped onto a channel. Musical data transmitted over a channel are reproduced in
the synthesizer at the receive end. The MIDI standard identifies 128 instruments by means of
numbers, including noise effects (e.g., a phone ringing or an airplane take-off). For example 0
specifies a piano, 12 a marimba, 40 a violin, and 73 a flute.

Some instruments enable a user to play one single score (e.g., a flute) exclusively, while other
instruments allow concurrent playing of scores (e.g., an organ). The maximum number of scores
that can be played concurrently is an important property of synthesizers. This number can vary
between 3 and 16 scores per channel.
A computer uses the MIDI interface to control instruments for playout. The computer can use the
same interface to receive, store, and process encoded musical data. In the MIDI environment, these
data are generated on a keyboard and played out by a synthesizer the heart of each MIDI system. A
typical synthesizer is similar to a regular piano keyboard, but it has an additional operating
element. A sequencer is used to buffer or modify these data. In a multimedia application, the
sequencer resides in the computer.
1. Microprocessor
The microprocessor communicates with the keyboard to know what notes the musician
is playing and with the control panel to know what commands the musician wants to
send to the microprocessor.
Pressing keys on the keyboard signals the micro- processor what notes to play and how
long to play them.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 18
Fundamentals of Multimedia

2. Sound Generator
Sound generator is to produce an audio signal.
Sound generator changes the quality of sound. for examples are pitch, loudness, notes,
tone etc
3. Sequencer :
replay a sequence of MIDI messages
4. MIDI Mixture Interface:
connect a group of MIDI devices together
5. Sound Sampler:
Record sound, then replay it on request can perform transposition shift of one
base sample, to produce different pitches can take average of several samples,
then produce a unique quality inter-polated output sound.
6. Control Panel:
- Control all the MIDI Devices functions.
7. Memory: to store all information for sound format.

MIDI Messages
MIDI uses a specific data format for each instrument. MIDI data format is digital and data are
group of message. The message is transmitted to connected system to the computer. A musician
play a key, the MIDI interface generates a MIDI message that defines the start of each strike and
intensity. Musician release the key to create digital sound signal and transmitted. Messages are
assigned to channels. A channel is a separate path through which signals can flow, Devices set to
respond to particular channels Every message (except system messages) have a channel number
which is stored in bits 0..3 of the status byte.

1. MIDI Channel Messages


Whenever a MIDI device is instructed to respond to a specific channel number, it will
ignore any message not directed to that channel.
-On the other hand, if a message is transmitted to that channel, the device will
respond to the message (within the devices capability limits).
The 7 bits (not including the MSB) of the first data byte code the note # that should be
turned on - here, it is 64
The 7 bits of the second data byte indicate the attack velocity (volume level of the note) -
here, it is 90
Channel Messages have 4 modes:
Mode 1: Omni On + Poly, usually for testing devices
Mode 2: Omni On + Mono, has little purpose
Mode 3: Omni Off + Poly, for general purpose
Mode 4: Omni Off + Mono, for general purpose
where:
i. Omni On/Off:
respond to all messages regarding of their channel
ii. Poly/Mono:
respond to multiple/single notes per channel
2. Channel Voice Messages

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 19
Fundamentals of Multimedia

Channel voice messages are used to transmit real-time performance data throughout a
connected MIDI system
A Note On message indicates the beginnning of a MIDI note
The message consists of three bytes of information: MIDI channel number; MIDI note
number; Attack velocity value
The final byte indicates the velocity at which the key was pressed
The Control Change message transmits information that relates to real-time control over the
performance parameters of a MIDI instrument
Control change messages correspond to changes in controllers such as foot pedals, relative
balance of a stereo sound field, etc.
Pitch Bend Change messages are transmitted by an instrument whenever its pitch bend
wheel is moved either in the positive (raise pitch) or negative (lower pitch) position from its
central (no pitch bend) point
The Program Change message changes the program or preset number that is active in a
device or instrument
1. Up to 128 presets can be selected by using this message
2. This can be used, for example, to switch between the different sounds of a
synthesizer or to change the rhythm patterns of a drum machine.

3. System Message
Real-time System Messages
Start
1st byte: Status byte 11111010
Direct slave devices to start playback from time 0
Stop
1st byte: Status byte 11111100
direct slave devices to stop playback
song position value doesnt change
can restore the playback at the place where it stops with the continue message
Continue
1st byte: Status byte 11111011
direct slave devices to start playback from the present song position value
System Reset
1st byte: Status byte 11111111
devices will return the control value to default setting.
e.g. reset MIDI mode / program number assigned to patch

System Exclusive messages


MIDI specification cant address every unique need of each MIDI device
leave room for device-specific data
sysEx message are unique to a specific manufacturer
1st byte: Status byte 11110000
2nd byte: manufacturer ID,
e.g. 1 = sequential, 67=Yamaha
3rd byte (onwards): data byte(s)

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 20
Fundamentals of Multimedia

MIDI Software
Music Recording and Performance Application.
Recording of MIDI Message as they enter the computer from other MIDI device,
store, editing and play back the message in performance.
Recording software:
Ex: Sony Sound Forge, sonar, cool edit pro etc
Much more efficient than using tape recording can redo recording process can
easily do editing also allows effects (reverb, echo, chorus etc)
Musical Notations and Printing Application
Writing music traditional musical notion the user can then play back the music
using a performance program or print the music on paper for live performance
publication.
Music Education Application
Synthesizer Patch Editor and Librarians
- Information stage of different synthesizer patches in the computer memory and
editing of patches in the computer.

MIDI Application
1. Studio Production
recording, playback, cut-and-splice editing
creative control/effect can be added
2. Making Score
with score editing software, MIDI is excellent in making score
some MIDI software provide function of auto /accompaniment intelligent chord
arrangement.
3. Learning
You can write a MIDI orchestra, who are always eager to practice with you

4. Commercial products
Mobile phone ring tones, music box music..
5. Musical Analysis
MIDI has detailed parameters for every input note
It is useful for doing research
For example, a pianist can input his performance with a MIDI keyboard, then we
can analyze his performance style by the parameters

Speech
Speech can be processed by humans or machines, although it is the dominant form of
communication of human beings. The field of study of the handling of digitized speech is called
digital speech processing. Speech is based on spoken languages, which means that it has a
semantic content.
Human beings use their speech organs without the need to knowingly control the generation of
sounds. Speech understanding means the efficient adaptation to speakers and their speaking habits.
Despite the large number of different dialects and emotional pronunciations, we can understand
each other language. The brain is capable of achieving a very good separation between speech and

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 21
Fundamentals of Multimedia

interference, using the signals received by both ears. It is much more difficult for humans to filter
signals received in one ear only. The brain corrects speech recognition errors because it
understands the content, the grammar rules, and the phonetic (of a system writing symbols) and
lexical (vocabulary) word forms. Speech signals have two important characteristics that can be
used by speech processing applications:
1. Voiced speech signals (in contrast to unvoiced sounds) have an almost periodic structure
over a certain time interval, so that these signals remain quasi-stationary for about 30
milliseconds.
2. The spectrum of some sounds has characteristic maxima that normally involve up to five
frequencies. These frequency maxima, generated when speaking, are called formants. By
definition, a formant is a characteristic component of the quality of an utterance.

Speech Generation/Recognition
Speech recognition is a very interesting field for multimedia systems. In combination with speech
synthesis, it enables us to implement media transformations. The primary quality characteristic of
each speech recognition session is determined by a probability of to recognize a word correctly. A
word is always recognized only with a certain probability. Factors like environmental noise, room
acoustics, and the physical and psychical state of the speaker play an important role. A poor
recognition rate is p=0.95, which corresponds to five percent wrongly recognized words. With a
sentence of only three words, the probability that the system will recognize all triples correctly
drops to p=0.050.950.95=0.86. This small example shows that a speech recognition
system should have a very high single-word recognition rate.
1. Pitch shifting of separate words
You can supply a pitch offset for a word. Just place the offset amount (in semitones) after the word
as a number, enclosed in parentheses: "Semitone up(1), 2 semitones down(-2)". If in this sample
the base pitch is, for example, F#2, the word "up" will have pitch G2 (one semitone higher), the
word "down" will have pitch E2 (two semitones lower).

2. Separating words
To get more naturally sounding sentences, you may try replacing intervals " " with underscore "_".
So the sentence: This is example sentence, turns to: "This_is_example_sentence".Note that
underscored sentences are recognized as a single word by Fruity Slicer channels and are not sliced
properly.

3.Speech Output
Speech output deals with the machine generation of speech. Considerable work has been achieved
in this field. A major challenge in speech output is how to generate these signals in real time for a
speech output system to be able, for instance, to convert text to speech automatically. Some
applications (e.g., time announcements) handle this task with a limited vocabulary, but most use an
extensive if not unlimited vocabulary. The speech a machine outputs has to be understandable and
should sound natural. In fact, understandability is compulsory and naturalness a nice thing to have
to increase user acceptance. It is important to understand the most important technical terms used
in relation to speech output, including:

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 22
Fundamentals of Multimedia

Speech basic frequency means the lowest periodic signal share in the speech signal. It
occurs in voiced sounds.
A phonetics is a member of the set of the smallest units of speech that serve to distinguish
one utterance from another in a language or dialect. It is the smallest meaningful linguistic
unit but does not carry content.
Allophones specify variants of a phoneme as a function of its phonetic environment.
A morpheme is a meaningful linguistic unit whether in free form or bound form that
contains no smaller meaningful parts. For example, house is a morpheme, while housing is
not.
A voiced sound is generated by oscillations of the vocal cords. The characters M, W, and L
are examples. Voiced sounds depend strongly on the speaker.
Unvoiced sounds are generated with the vocal cords open, for example, F and S. These
sounds are relatively independent of the speaker.

Speech Analysis
The primary quality characteristic of each speech recognition session is determined by a
probability of to recognize a word correctly. A word is always recognized only with a certain
probability. Speech analysis can serve to analyze who is speaking that is to understanding,
recognize a speaker for his identification and verification. The computer identifies and verifies
fingerprint, voice

Speech Synthesis
Computers can translate an encoded description of a message into speech. This scheme is called
speech synthesis. A particular type of synthesis is text-to-speech conversion. Fair-quality text-to-
speech software has been commercially available for various computers and workstations, although
the speech produced in some lacks naturalness. Speech recognition is normally achieved by
drawing various comparisons. With the current technology, a speaker-dependent recognition of
approximately 25,000 words is possible. The problems in speech recognition affecting the
recognition quality include dialects, emotional pronunciations, and environmental noise. It will
probably take some time before the considerable performance discrepancy between the human
brain and a powerful computer will be bridged in order to improve speech recognition and speech
generation.

Speech Transmission
Speech transmission is a field relating to highly efficient encoding of speech signals to enable low-
rate data transmission, while minimizing noticeable quality losses. The following sections provide
a short introduction to some important principles that interest us at the moment in connection with
speech input and output. Speech processing and speech transmission technology are expanding
fields of active research. New challenges arise from the 'anywhere, anytime' paradigm of mobile
communications, the ubiquitous use of voice communication systems in noisy environments and
the convergence of communication networks toward Internet based transmission protocols, such as
Voice over IP. As a consequence, new speech coding, new enhancement and error concealment,

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 23
Fundamentals of Multimedia

and new quality assessment methods are emerging. advances in digital speech transmission
provides an up-to-date overview of the field, including topics such as speech coding in
heterogeneous communication networks, wideband coding, and the quality assessment of
wideband speech.

Introduction Graphics and Images


Graphics and images are both non-textual information that can be displayed and printed. They may
appear on screens as well as on printers but cannot be displayed with devices only capable of
handling characters.
Graphics are normally created in a graphics application and internally represented as an
assemblage of objects such as lines, curves, or circles. Attributes such as style, width, and color
define the appearance of graphics. We say that the representation is aware of the semantic contents.
The objects graphics are composed of can be individually deleted, added, moved, or modified later.
In contrast, images can be from the real world or virtual and are not editable in the sense given
above. They ignore the semantic contents. They are described as spatial arrays of values. The
smallest addressable image element is called a pixel. The array, and thus the set of pixels, is called
a bitmap. Object-based editing is not possible, but image editing tools exist for enhancing and
retouching bitmap images. The drawback of bitmaps is that they need much more storage capacity
then graphics. Their advantage is that no processing is necessary before displaying them, unlike
graphics where the abstract definition must be processed first to produce a bitmap. Of course,
images captured from an analog signal, via scanners or video cameras, are represented as bitmaps,
unless semantic recognition takes place such as in optical character recognition.

Digital Image Representation


Virtual image, a point or system of points, on one side of a mirror or lens, which, if it existed,
would emit the system of rays which actually exists on the other side of the mirror or lens. an
image using numbers is to declare its contents using position and size of geometric forms and
shapes like lines, curves, rectangles and circles; such images are called vector images. Bitmap or
Raster images are digital photographs, they are the most common form to represent natural
images and other forms of graphics that are rich in detail. Bitmap images are how graphics is
stored in the video memory of a computer. The term bitmap refers to how a given pattern of bits in
a pixel maps to a specific color.

The preceding description of an image can be seen as a cooking recipe for how to draw the
image, it contains geometrical primitives like lines, curves and circles describing color as well as
relative size, position and shape of elements. When preparing the image for display is has to be
translated into a bitmap image, this process is called rasterization. A vector image is resolution
independent, this means that you can enlarge or shrink the image without affecting the output
quality. Vector images are the preferred way to represent Fonts, Logos and many illustrations.
Even if two images have the same number of pixels, the quality of the images may differ in quality
due to differences in how the images are captured More expensive digital cameras will have larger
digital sensors than less expensive ones (larger sensors cost more) So if the two cameras produce
images with the same number of pixels, the pixels in the larger array will represent a larger area
so more information is packed into each pixel Sometimes we use the term image resolution to
refer to the size of the image in pixels The size of the image ( resolution) depends on the number of

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 24
Fundamentals of Multimedia

pixels per inch (along with size in pixels) associated with an image In a printer, where a number of
dots may be needed to represent a pixel, we may use the term dots per inch. To be still more pixel
the image resolution depends on the device used to form the image

Image Format
JPG/JPEG (Joint Photographic Export Groups)
JPG is optimized for photographs and similar continuous tone images that contain many, many
colors. It can achieve astounding compression ratios even while maintaining very high image
quality. GIF compression is unkind to such images. JPG works by analyzing images and discarding
kinds of information that the eye is least likely to notice. It stores information as 24 bit color.
Important: the degree of compression of JPG is adjustable. At moderate compression levels of
photographic images, it is very difficult for the eye to discern any difference from the original,
even at extreme magnification. Compression factors of more than 20 are often quite acceptable.
Better graphics programs, such as Paint Shop Pro and Photoshop, allow you to view the image
quality and file size as a function of compression level, so that you can conveniently choose the
balance between quality and file size.
This is the format of choice for nearly all photographs on the web. You can achieve
excellent quality even at rather high compression settings. I also use JPG as the ultimate
format for all my digital photographs. If I edit a photo, I will use my software's proprietary
format until finished, and then save the result as a JPG.
Digital cameras save in a JPG format by default. Switching to TIFF or RAW improves
quality in principle, but the difference is difficult to see. Shooting in TIFF has two
disadvantages compared to JPG: fewer photos per memory card, and a longer wait between
photographs as the image transfers to the card. I rarely shoot in TIFF mode.
Never use JPG for line art. On images such as these with areas of uniform color with sharp
edges, JPG does a poor job. These are tasks for which GIF and PNG are well suited. See
JPG vs. GIF for web images.

PNG (Portable Network Graphics)


PNG is also a lossless storage format. However, in contrast with common TIFF usage, it looks for
patterns in the image that it can use to compress file size. The compression is exactly reversible, so
the image is recovered exactly.
PNG is of principal value in two applications:
I): If you have an image with large areas of exactly uniform color, but contains more than 256
colors, PNG is your choice. Its strategy is similar to that of GIF, but it supports 16 million
colors, not just 256. If you want to display a photograph exactly without loss on the web, PNG
is your choice. Later generation web browsers support PNG, and PNG is the only lossless
format that web browsers support.

II): PNG is superior to GIF. It produces smaller files and allows more colors. PNG also
supports partial transparency. Partial transparency can be used for many useful purposes, such
as fades and ant aliasing of text. Unfortunately, Microsoft's Internet Explorer does not properly
support PNG transparency, so for now web authors must avoid using transparency in PNG
images.

GIF (Graphics Interchange Formats)

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 25
Fundamentals of Multimedia

GIF creates a table of up to 256 colors from a pool of 16 million. If the image has fewer than 256
colors, GIF can render the image exactly. When the image contains many colors, software that
creates the GIF uses any of several algorithms to approximate the colors in the image with the
limited palette of 256 colors available. Better algorithms search the image to find an optimum set
of 256 colors. Sometimes GIF uses the nearest color to represent each pixel, and sometimes it uses
"error diffusion" to adjust the color of nearby pixels to correct for the error in each pixel.
GIF achieves compression in two ways. First, it reduces the number of colors of color-rich
images, thereby reducing the number of bits needed per pixel, as just described. Second, it
replaces commonly occurring patterns (especially large areas of uniform color) with a short
abbreviation: instead of storing "white, white, white, white, white," it stores "5 white."
Thus, GIF is "lossless" only for images with 256 colors or less. For a rich, true color image,
GIF may "lose" 99.998% of the colors.
If your image has fewer than 256 colors and contains large areas of uniform color, GIF is
your choice. The files will be small yet perfect. Here is an example of an image well-suited
for GIF:
Do NOT use GIF for photographic images, since it can contain only 256 colors per image.

TIFF (Tagged Image File Format)


This is usually the best quality output from a digital camera. Digital cameras often offer around
three JPG quality settings plus TIFF. Since JPG always means at least some loss of quality, TIFF
means better quality. However, the file size is huge compared to even the best JPG setting, and the
advantages may not be noticeable.
A more important use of TIFF is as the working storage format as you edit and manipulate
digital images. You do not want to go through several load, edit, save cycles with JPG
storage, as the degradation accumulates with each new save. One or two JPG saves at high
quality may not be noticeable, but the tenth certainly will be. TIFF is lossless, so there is no
degradation associated with saving a TIFF file.
Do NOT uses TIFF for web images. They produce big files, and more importantly, most
web browsers will not display TIFFs.

BMP (Bitmap Picture)


BMP is an uncompressed proprietary format invented by Microsoft. There is really no reason to
ever use this format.

PSD/ PSP (Photoshop Document/ Paint Shop Pro)


PSD are proprietary formats used by graphics programs. Photoshop's files have the PSD extension,
while Paint Shop Pro files use PSP. These are the preferred working formats as you edit images in
the software, because only the proprietary formats retain all the editing power of the programs.
These packages use layers, for example, to build complex images, and layer information may be
lost in the nonproprietary formats such as TIFF and JPG. However, be sure to save your end result
as a standard TIFF or JPG, or you may not be able to view it in a few years when your software has
changed.

Currently, GIF and JPG are the formats used for nearly all web images. PNG is supported by most
of the latest generation browsers. TIFF is not widely supported by web browsers, and should be

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 26
Fundamentals of Multimedia

avoided for web use. PNG does everything GIF does, and better, so expect to see PNG replace GIF
in the future. PNG will not replace JPG, since JPG is capable of much greater compression of
photographic images, even when set for quite minimal loss of quality.

RAW (Raw Image Format)


RAW is an image output option available on some digital cameras. Though lossless, it is a factor of
three of four smaller than TIFF files of the same image. The disadvantage is that there is a different
RAW format for each manufacturer, and so you may have to use the manufacturer's software to
view the images. (Some graphics applications can read some manufacturer's RAW formats.)

Use RAW only for in-camera storage, and copy or convert to TIFF, PNG, or JPG as soon as you
transfer to your PC. You do not want your image archives to be in a proprietary format. Although
several graphics programs can now read the RAW format for many digital cameras, it is unwise to
rely on any proprietary format for long term storage. Will you be able to read a RAW file in five
years? In twenty? JPG is the format most likely to be readable in 50 years. Thus, it is appropriate to
use RAW to store images in the camera and perhaps for temporary lossless storage on your PC, but
be sure to create a TIFF, or better still a PNG or JPG, for archival storage.

Graphics Animation Format


A few dominant formats aimed at storing graphics animations (i.e., series of drawings or graphic
illustrations) as to video (i.e., series of images). For exp graphics file- .dir, .fla, .flc, .fli, .gif, .ppt,
.dgi, .wmf etc. Difference: animations are considerably less demanding of resources than video
files.
FLC is an animation or moving picture file format; it was originally created by Animation
Pro. Another format, FLI, is similar to FLC.
GL produces somewhat better quality moving pictures. GL animations can also usually
handle larger file sizes.
Many older formats: such as DL or Amiga IFF files, Apple Quicktime files, as well as
animated GIF89 files (no compression between frames).

PS & PDF
Postscript is an important language for typesetting, and many high-end printers have a
Postscript interpreter built into them.
Postscript is a vector-based picture language, rather than pixel-based: page element
definitions are essentially in terms of vectors.
Postscript includes text as well as vector/structured graphics.
Bit-mapped images can be included in output files.
Postscript page description language itself does not provide compression; in fact,
Postscript files are just stored as ASCII.
PDF files that do not include images have about the same compression ratio, 2:1 or 3:1, as
do files compressed with other LZW-based compression tools (Lempel-Ziv-Welch).

Microsoft Windows: WMF: the native vector file format for the Microsoft Windows operating
environment:

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 27
Fundamentals of Multimedia

Consist of a collection of GDI (Graphics Device Interface) function calls, also native to the
Windows environment.
When a WMF file is played (typically using the Windows PlayMetaFile() function) the
described graphics is rendered.
WMF files are ostensibly device-independent and are unlimited in size.

Image Processing Criteria


Human Vision perceives and understand image.
Computer Vision Image Understanding, Interpretation, Image processing.
Low Level image processing transform of one image to another
High Level image understandings make decisions according to information in image
Signal function (variable with physical meaning) one-dimensional (dependent on time),
two-dimensional
(Images dependent on two co-ordinates in a plane), three-dimensional (describing an object in
space) , higher-dimensional

Image Synthesis
Image Synthesis refers to processing of a 2D/3D picture by a computer.
Combination of text with document recognition, image enhancement, image synthesis and
image reconstruction.
Creation of the original picture (geometric representation, rotation, surface model)
Shape change (correction in 3-dimensions, size etc)
Image display (display in 3-dimensions, light source, shading, filtering and other
adjustments)
This processing technique may be, Image enhancement, Image restoration, and Image
compression.
The most common data types for graphics and image file formats 24-bit color and 8-bit
color.
Some formats are restricted to particular hardware / operating system platforms, while
others are cross-platform formats.
Even if some formats are not cross-platform, there are conversion applications that will
recognize and translate formats from one system to another.
Most image formats incorporate some variation of a compression technique due to the
large storage size of image files. Compression techniques can be classified into either
lossless or lossy.
Dithering is used to calculate patterns of dots such that values from 0 to 255 correspond to
patterns that are more and more filled at darker pixel values, for printing on a 1-bit printer.
Mixing of colors, merging of pixels of different colors to create an area of intermediate
color.
The main strategy is to replace a pixel value by a larger pattern, say 2 x 2 or 4 x 4, such
that the number of printed dots approximates the varying-sized disks of ink used in analog,
in halftone printing (e.g., for newspaper photos).
Half-tone printing is an analog process that uses smaller or larger filled circles of black ink
to represent shading, for newspaper printing.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 28
Fundamentals of Multimedia

Image Analysis
Image analysis is techniques for extract descriptions from images that are necessary for higher-
level scene analysis methods.
Image enhancement:
1. contrast to make a graphic display more useful for display & analysis.
2. It includes gray level & contrast manipulation, noise reduction, edge sharpening, filtering,
coloring,
Image Recognition:
1. Image formatting means capturing an image from a camera and bringing it into a digital
form.
2. Digital representation of an image in the form of pixels.
3. Preparing and transforming the image
4. Labeling, Grouping, Matching, Extracting and Conditioning for image
Image restoration: image to minimize the effect of degradations. (Filter Effects)
Image compression: minimizing the number of bits required to represent an image.

Image Transmission
Transmission of digital images through computer networks, Internet etc.
Image size depends on the image representation format used for transmission.
Raw image data transmission
Compressed image data transmission
Symbolic image data transmission
Image size is equal to the structure size, which carries the transmitted symbolic
information of the image.
Examples: the transmission of an image with a resolution of 640 x 480 pixels and pixel
quantization of 8 bits per pixel requires transmission of 307,200 bytes through the network.

Basic Concepts of Video


Video is the most recent addition to the elements of multimedia. Carefully planned video can
enhance a presentation Before adding video to a project, it is essential to understand the medium,
how to integrate it, its limitations, and its costs. It places the greatest demands on the computer and
memory (using about 108 GB per hour for full motion) Often requires additional hardware (video
compression board, audio board, RAID - Redundant Array of Independent Disks- for high speed
data transfer)

Digital Video
Digital video architecture consists of a format for encoding and playing back video files by
a computer.
Architecture includes a player that can recognize and play files created for that format.
Digital video has replaced analog as the method of choice for making and delivering video
for multimedia.
Digital video device produces excellent finished products at a fraction of the cost of analog.
Digital video eliminates the image-degrading analog-to-digital conversion.
Many digital video sources exist, but getting the rights can be difficult, time-consuming,
and expensive.

Analog Video

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 29
Fundamentals of Multimedia

Analog television sets remain the most widely installed platforms for delivering and
viewing video.
Television sets use composite input. Hence colors are less pure and less accurate than
computers using RGB component.
NTSC television uses a limited color palette and restricted luminance (brightness) levels
and black levels.
Some colors generated by a computer that display fine on a RGB monitor may be illegal for
display on a NTSC TV.
While producing a multimedia project, consider whether it will be played on a RGB
monitor or a conventional television set.

Video Clip
Ways to obtain video
1. shoot new film clips with a digital camcorder
2. convert you own video clips to digital format
3. acquire video from an archive - often very expensive, difficult to obtain permissions
or licensing rights
Be sure to obtain permission from anyone you film or for any audio you use!

How Video Works


The video signal is magnetically written to tape by a turning(spinning) recording head
following a helical path
Audio is recorded on a separate directly track
The control track regulates the speed and keeps the tracks aligned as the tape plays/records.
Light passes from an object through the video camera lens and is converted into an
electrical signal by a CCD (charge-coupled device).
High quality cameras have 3 CCD
Signal contains 3 channels of color information (red, green, blue) and a synchronization
pulse.
If each channel of a color signal is separate it is called RGB ( preferred)
A single composite of the colors and sync signal is less precise
A typical video tape has separate tracks for audio, video, and control

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 30
Fundamentals of Multimedia

Video Signal Representation


A video objective to the viewer a sense of presence in the scne and of and participation in the
events portrayed. To convey this objective, the televised image shoulde convey spatial and
temporal content of the scene. Composite video is the format of an analog television (picture only)
signal before it is combined with a sound signal and modulated onto a Radio Frequency (RF)
carrier. An electron beam carries corresponding pattern information such as intensity in a viewer
scene. Video is defined for our purposes here as "moving pictures." Still imaging, like what is
found in digital still cameras or scanners, is not covered. The requirements for still imaging do
have a lot in common with those for video, but the differences are significant enough to be dealt
with as a separate discipline.
Video signal representation includes three aspects:
The Visual Representation.
Video Transmission.
Video Digitalization.

Important measures include:

1. Vertical detail and viewing distance:


The geometry of the TV image depends on the ratio of the picture width W to height H, generally
referred to as aspect ratio. The conventional aspect ratio is 4/3=1.33. (Aspect ratio: TV 4/3; HDTV
16/9).

2. Horizontal detail and picture width


The picture width for conventional TV is 4/3 x picture height.

3. Total Detail Content of the Image


The product of the number of elements vertically and horizontally equals the number of picture
elements in the image.
Vertical resolution = number of pixels in the picture height
Number of pixels in width of picture = vertical resolution x aspect ratio

4. Perception of depth
Perception of depth depends primarily on the angular separation of the images received by
the two eyes of the viewer.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 31
Fundamentals of Multimedia

In flat screen of TV, a considerable degree of depth is inferred from the perspective
appearance of the object matter.
The focal length of lenses and changes in depth of focus in a camera influence the depth
perception

5. Luminance and Chrominance


Color vision is achieved through three signals, proportional to the relative intensities of Red, Green
and Blue light (RGB) in each portion of the screen. However, during the transmission of the
signals from camera to the receiver (display), a different color encoding that uses luminance and
two chrominance signals are used.

6. Temporal aspects of illumination:


In contrast to continuous pressure waves of an acoustic signal, a discrete sequence of individual
pictures can be perceived as a continuous sequence. To represent visual reality, two conditions
must be met:
Rate of repetition of images must be high enough to guarantee smooth motion from frame
to frame.
The rate must be high enough so that resolution (persistence) of vision extends over
intervals between discrete flashes.

7. Continuity of Motion:
Motion is the presentation of a rapid succession of slightly different still pictures (frames).
We perceive continuity of motion at any frame rate faster than 15 frames per second.
Smooth video motion is achieved at 30 frames per second.
Movies use 24 frames per second and have jerkiness especially when large objects are
moving fast and close to the viewer.

8. Flicker: Marginal at least 50 refreshes cycles/s


Movie: 2x24=48
TV: Half picture by line-interleaving
Scanning rate: at lease 25Hz, finish one frame in 1/25s
A full TV picture the transmission at 30Hz/25Hz.
Half-picture after the transmitted using line-interleaved method.

The NTSC (National Television Systems Committee) standard for motion video signal specified
frame
rates of 30/sec, as compared to 25/sec for the European PAL system.

Video Transmission
Video signal are transmitted to receive through a signal TV channel. The oldest standard for
transmission and reception of video signals is the NTSC.
To encode color, a video signal is a composite of three signals.
For transmission purposes, a video signal consists of one luminance and two chrominance
signals.

Color Representation in Video


RGB Signal:

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 32
Fundamentals of Multimedia

Consists of separate signals for red, green and blue colors. Other colors can be coded as
combination of these primary colors.

YUV Signal:
Since human perception is more sensitive to brightness than any chrominance information,
therefore a coding technique can distinguish between luminance and chrominance.
Instead of three separate colors, the brightness information (luminance Y) is separated from the
color information (two chrominance channels U and V).
The luminance component must always be transmitted because of the compatibility requirements
for black and white video.
The component division for YUV signal is:
Y=0.30 R + 0.59 G + 0.11 B
U= (B-Y) x 0.493
V = (R-Y) x 0.877
The YUV encoding can be specified a 4:2:2 signals.
Any error in the resolution of the luminance Y is more important than in the chrominance
values U and V. Therefore, the luminance values are coded using higher bandwidth than
chrominance value. YUV is used in PAL and SECAM systems.
YIQ signal:
Is a coding similar to YUV signal and is the basis for NTSC format.
Y =0.30R + 0.59G + 0.11B
I = 0.60R - 0.28G - 0.32B
Q = 0.21R - 0.52G + 0.31B
Human eye is most sensitive to Y, next to I, next to Q. Therefore, NTSC allocates 4 MHz of the
bandwidth to Y, 1.5 MHz to I, and 0.6 MHz to Q.

Video Digitalization
Digitization consists of sampling the gray (color) level in the picture at M x N array of points.
Then quantized samples are converted to bit streams.
The next step in creation of digital motion video is to digitize pictures in time and get a sequence
of digital images per second that approximates analog motion video.

Computer Video Formats


The computer video format depends on the input and output devices for the motion video medium.
Computer Video Controller Standards:

1. Color Graphics Adapter (CGA) has a resolution of 320x200 pixels with simultaneous
presentation of 4 colors. The storage capacity per image is therefore: 320 x 200 pixels x 2bits/pixel
= 128000 bits = 16,000 bytes
2. The Enhanced Graphic Adapter (EGA) supports a resolution of 640x350 pixels with 16-color
presentation resulting in a storage capacity of 112,000 bytes per image.
3. The Video Graphic Array (VGA) works mostly with a resolution of 640x480 pixels and can
display 256 colors simultaneously. The monitor is controlled through an RGB output. The storage
capacity per image is then 307,200 bytes.

4. The Super Video Graphic Array (SVGA) offers resolutions up to 1024x768 pixels and color
formats up to 24 bits per pixel. The storage capacity per image is then 2,359,296 bytes. Low-cost
SVGA video adapters with video accelerator chips overcome the speed penalty of using a higher
resolution.

Video File Formats

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 33
Fundamentals of Multimedia

A video file format is a file format for storing digital video data on a computer system. Video is
almost always stored in compressed form to reduce the file size. A video file normally consists of a
container format containing video data in a video coding format and audio data in an audio coding
forma.

Container VS Codecs
If youve ever worked with digital video before, youre probably aware that there are many
different types of video files. A digital video file usually consists of two parts. These two parts are
called the Container and the Codec. The container refers to what the actual file type or
extension is, for example: .AVI or .MOV. Now, within each of these containers there is a codec
which is like a set of instructions that specifies the specific coding and settings of how the video
plays on your player. A few popular codecs you may have seen before are: DV NTSC, DivX, Sony
YUV. There are much fewer video file containers than codecs, and each container can have
hundreds of codecs within them. Most of the popular computer video software is actually
preloaded with several of the most popular and most used codec and container decoders to properly
playback popular file formats.

AVCHD (Advanced Video Codec High Definition)


AVCHD (.mts) is a high end, high-definition (HD) format which was originally developed by
Sony and Panasonic for high definition home theaters. Its not really suitable for sharing due to the
excessive file sizes, but the format is becoming more and more popular due to HD camcorders
using this format. Video in this format would be best suited as the master copy of your video
project and serves as a great piece to edit with. AVCHD is still in its early life as a video format
and since its still fairly new, compatibility with certain video editing programs may be an issue.
Some video editing software applications have begun to support this format but many of can not
fully handle it quite yet. Additionally, playback of AVCHD files requires speedy CPUs and a
sufficient amount of RAM. That alone makes this format more difficult to work with but, on the
other hand, it maintains high quality. As time goes by, it will no doubt become easier to use and be
more integrated with editing applications.

AVI (Audio Video Interlaced)


AVI format is a long-time standard developed by Microsoft and has been around as long as digital
video has. AVI files (particularly when uncompressed) tend to be HUGE, way too big for the
internet or uploading to someone. AVI is more for the beginning of a video project using it as
something to edit off of, not the end. In that sense, it is not really a sharing format. Theyll slide
into just about any video editing program and the quality is still high enough to be a master clip.
AVI is windows-based and is virtually universal. The problem is, not all AVIs are created equally
and you can still run into compatibility issues due to different codecs on the videos. The important
thing to know is that whatever streams inside the container (AVI) is not necessarily the same from
one AVI video to the next because the codecs used for compression can vary from file to file. This
is because AVI is whats known as a container format, which basically means it contains
multiple streams of different type data, including a control track and separate video and audio
streams.

FLV (Flash Video Format)


Flash video (FLV) is the single most common sharing format on the web today. Youll see the
.FLV file extension on videos encoded by Adobe Flash software to play within the Adobe Flash
Player. Virtually everyone (99%) has the adobe player installed in their browser and so this has fast
become the most common online video viewing platform. Almost all the video sharing sites stream
video in flash. You can upload formats other than flash, and those sites will convert it into flash for
streaming to the end user. Many television news operations are also now using Flash Video on
their websites as a way to keep viewers up to date at all times. Most of those sites accept uploads in

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 34
Fundamentals of Multimedia

a handful of formats like QuickTime, mpeg4, or wmv, and then they convert it to flash or MP4
before actually putting it out on the net for viewing. In addition to the nearly universal flash video
player, FLV is popular because it gives one of the smallest file sizes after compression yet it retains
fairly good quality. This means that the videos load quickly on the internet, and wont spend a lot
of time using up your bandwidth. If you self-host your own videos, you should convert them to
flash for greatest compatibility with the highest percentage of Internet viewers.Although FLVs are
the most common format found on the web today, the standard is moving towards the use of using
MP4 H.264 files within flash players as it is compatible with both online and mobile .

MPEG (.mpg) (Motion Picture Experts Group)


MPEGwas developed by the Motion Picture Experts Group. This international group was
established in 1988 to develop standards for digital audio and video formats. However, theyre just
one of many groups looking to standardize and develop new technologies for digital video.

MPEG-4 (.MP4)
MPEG-4 is another great sharing format for the internet. Its a small file size, but looks fairly clean
in comparison with other video codecs of the same file size. Its the video format employed by a
growing number of camcorders and cameras and it is highly recommended this day and age. In
fact, YouTube actually recommends that users upload using MP4 format. YouTube accepts
multiple formats, and then converts them all to .flv or .mp4 in their back-end for distribution.

WMV (Windows Media Video)


A .WMV file indicates a windows media video file. Windows Media Video is used for both
streaming and downloading content via the Internet. Microsofts Windows Media Player, an
application bundled with Windows operating systems, is built for WMV files. WMV files are of a
pretty small file size, actually one of the smallest. As a result of the low file sizes, the videos are
compressed so much they start to lose their quality in a hurry. In fact, Id say the resolution is
pretty crummy in comparison to modern codecs. But a tiny file size can be a real advantage in
some situations. If you get an email with an actual video attached instead of just a link to a video, it
is probably a wmv file. They are the only ones small enough to attach to an email.

MOV (Quick Time Movie)


.MOV is the file extension used to identify an Apple Quick Time Movie. .MOV is an extremely
common sharing format, especially among Mac users. It is considered one of the best looking file
formats. While MOV files do look great, the files sizes are extremely big. Due to the fact that
QuickTime hasnt been a Mac-only program for quite some time, QuickTime versions and players
exist on almost all PCs. The vast majority of the videos we personally upload to the web are
QuickTime format, followed by MPEG4.

DivX (.avi, .divx)


DivX is a very popular codec for compressing MPEG-4 videos. The size difference between DivX
and an MPEG-2 encoded DVD is a factor of about ten, which makes it a very popular way of
encoding large videos that need to be transferred over the internet. The codec can work as a plug-in
for existing players such as Windows Media Player.

Streaming Video
"Streaming" is a method by which video is downloaded and played at the same time, in the way a
TV broadcast works - they send a signal, you watch it; you don't need storage media, just a
receiver. This is a big benefit, but it also means your viewers can't save the file. They have to go
back to the source every time they want to watch it. Formats that can support streaming are Flash,

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 35
Fundamentals of Multimedia

MPEG-4, QuickTime, Real and Windows Media. Creating a streaming video file isn't difficult -
most applications have a "save for web" option in their export menu which will allow you to create
a streaming video file. What makes streaming video complicated is that, in order to actually stream
a file, you must store it on a streaming video server which runs special software that manages the
connection.

TELEVISION
Television is most widely used telecommunication medium for transmitting and received moving
images that are either monochromatic (B/W) or color and sound. Television program, whether live
or recorded the receiver is handling. Three different kinds of information these are: 1) the picture;
2) the sound, 3) the synchronization .If we are recording a videotape of this program we shall have
to fix all three kinds of information on the tape.

A television set, television programming or television transmission. A standard television set


comprises multiple internal electronic circuits, including those for receiving and decoding
broadcast signals. A visual display device which lacks a tuner is properly called a monitor, rather
than a television. A television system may use different technical standards such as Digital
Television (DTV) and High-Definition Television (HDTV). Television systems are also used for
surveillance, industrial process control, and guiding of weapons, in places where direct observation
is difficult or dangerous.

Closed-Circuit Television: (CCTV) are in use, the most common usage of the medium is for
broadcast television, which was modeled on the existing radio broadcasting systems developed in
the 1920s, and uses high-powered radio-frequency transmitters to broadcast the television signal to
individual TV receivers.

Light Embbed Diode/ Organic Light Emitting Diode: (LED/ OLED) TVs are the new brand of
televisions that have arrived in the market. LED TVs are some what similar to Liquid-Crystal
Display (LCD) TVs. However, the innovative backlighting feature has made LED TV an advanced
version of LCD television models.

Conventional System
Black- and white TV and current color TV is based on the representation. To understand later
reasoning behind data rates of motions video and computer based animation we focus on the
description of their respective signals rather than specific camera or monitor technologies. We
analyze the video signal coming from a camera and resulting pictures. Different video format
standards were established in different parts of the world.

National Television Standards Committee (NTSC):


These standards define a method for encoding information into electronic signal that creates
a television picture.
It has screen resolution of 525 horizontal scan lines and a scan rate of 30 frames per second.
1 frame = 525 horizontal lines every 1/30 second
2 passes - odd/even lines, 60/second (60 Hz)
interlacing - to reduce flicker
It is used in parts of US, Japan, many other countries

Phase Alternate Line (PAL) and Sequential Color and Memory (SECAM):
PAL has a screen resolution of 625 horizontal lines and a scan rate of 25 frames per second.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 36
Fundamentals of Multimedia

It is used in parts of UK, Australia, South Africa and Western Europe.


SECAM has a screen resolution of 625 horizontal lines and is a 50 Hz system.
SECAM differs from NTSC and PAL color systems in its basic technology and broadcast
method.
It is used in parts of France, Russia and Eastern Europe.

Advanced Television Systems Committee (ATSC) Digital Television (DTV):


This digital standard provides TV stations with sufficient bandwidth to present four or five
Standard Television (STV) signals or one High Definition TV (HDTV) signal.
This standard allows for transmission of data to computers and for new Advanced TV
(ATV) interactive services.

High Definition Television (HDTV) now available,


allow viewing of Cinemascope and Panavision movies with aspect ratio 16:9 ( wider than
high)
Twice the resolution, interlaced format
Digitized then compressed for transmission
Television video is based on analog technology and international broadcast standards
Computer video is based on digital technology and other image display standards
DVD and HDTV merges the two

Computer Based Animation


Animation can explain what the mind can conceive (pronunciation). Animation film visualises the
invisible. The creative imagination gives life to the abstract and the amorphous. Animation is the
process of displaying still images in a rapid sequence to create the illusion of movement.
Animation displays a sequence of images in 2-D or 3-D to provide the illusion of motion. A
simulation of movement created by displaying a series of pictures, or frames. Cartoons on
television is one example of animation. Animation on computers is one of the chief ingredients of
multimedia presentations.There are many software applications that enable you to create
animations that you can display on a computer monitor.

Computer animation (or CGI animation) is the art of creating moving images with the use of
computers. It is a subfield of computer graphics and animation. Increasingly it is created by means
of 3D computer graphics, though 2D computer graphics are still widely used for stylistic, low
bandwidth, and faster real-time rendering needs. Sometimes the target of the animation is the
computer itself, but sometimes the target is another medium, such as film. It is also referred to as
CGI (Computer-Generated Imagery or Computer-Generated Imaging), specially when used in
films. To create the illusion of movement, an image is displayed on the computer screen and
repeatedly replaced by a new image that is similar to the previous image, but advanced slightly in
the time domain (usually at a rate of 24 or 30 frames/second). This technique is identical to how
the illusion of movement is achieved with television and motion pictures.

Difference between animation and video: Whereas video takes continuous motion and
breaks it up into discrete frames, animation starts with independent pictures and puts them together
to form the illusion of continuous motion.

Traditional Animation

Traditional animation (also called cel animation or hand-drawn animation) was the process used
for most animated films of the 20th century. The individual frames of a traditionally animated film
are photographs of drawings, which are first drawn on paper. To create the illusion of movement,
each drawing differs slightly from the one before it. The animators' drawings are traced or

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 37
Fundamentals of Multimedia

photocopied onto transparent acetate sheets called cels, which are filled in with paints in assigned
colors or tones on the side opposite the line drawings. The completed character cels are
photographed one-by-one onto motion picture film against a painted background by a rostrum
camera.

The traditional cel animation process became obsolete by the beginning of the 21st century. Today,
animators' drawings and the backgrounds are either scanned into or drawn directly into a computer
system. Various software programs are used to color the drawings and simulate camera movement
and effects. The final animated piece is output to one of several delivery media, including
traditional 35 mm film and newer media such as digital video. The "look" of traditional cel
animation is still preserved, and the character animators' work has remained essentially the same
over the past 70 years. Some animation producers have used the term "tradigital" to describe cel
animation which makes extensive use of computer technology.

2D Animation
2D animation figures are created and/or edited on the computer using 2D bitmap graphics or
created and edited using 2D vector graphics. This includes automated computerized versions of
traditional animation techniques such as of tweening, morphing, onion skinning and interpolated
rot scoping.
Analog computer animation
Flash animation
PowerPoint animation

3D Animation
3D animation are digitally modeled and manipulated by an animator. In order to manipulate a
mesh, it is given a digital skeletal structure that can be used to control the mesh. This process is
called rigging. Various other techniques can be applied, such as mathematical functions (ex.
gravity, particle simulations), simulated fur or hair, effects such as fire and water and the use of
Motion capture to name but a few, these techniques fall under the category of 3d dynamics. Many
3D animations are very believable and are commonly used as Visual effects for recent movies.

Methods of Controlling Animations


Controlling animation is independent of the language used to describing it. Animation control
mechanisms can employ different technique. They are
Full Explicit Control
Procedural Control
Constraint -Based System
Tracking Live Action
Kinematics and Dynamics
Full Explicit Control
Here the animator provides a description of everything that occurs in the animation either
By specifying simple changes such as scaling, translation & rotation or by providing key
frame information and interpolation methods to use between key frames
An example of such system is given by BBOP system
Procedural Control
This is based on the communication between various objects to determine their properties.
This control is a significant part of several other control mechanisms.
For example in physically -based system the position of one object may influence the
motion of the other object
In actor -based systems the individual actors may pass their position to other actors to
affect the other actors behavior.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 38
Fundamentals of Multimedia

Constraint (Restriction)-Based Systems


Many object in nature move according to the influence of other object and hence the
motion may not be Linear always.
Such motions may be modeled by constraints.
Specifying an animated sequence using constraint is much easier to do than using explicit
control.
Systems using this type of controls are Suther lands Sketch PAD ,Bornings think PAD
Tracking Live Action
Trajectories of objects in the course of an animation can also be generated by tracking live
action Traditional animation uses rot scoping.
A film is made in which people /animals act out the parts of the characters in the animation
then the animators draw over the film enhancing the background and replacing the human
actors with their animated equivalents.
Another live-action technique is to attach some sort of indicator, say sensor, to key points
on a persons body.
By this tracking one gets locations corresponding key points in an animated model.
An example of this sort of interaction mechanism is the data glove which measures the
position and orientation of the weavers hand as well as the flexion and hyperextension of
each finger point.
Kinematics and Dynamics
Kinematics refers to the position and velocity of points.
A kinematical description of a scene for example may be of the form The ball is at the
origin at time t =0
It moves with a constant acceleration in the direction (1 ,2, 7) there after
Dynamics Take in to accounts the physical laws that govern the kinematics like the
Newtons Law of motion Euler -LaGrange equation for fluid ^ The dynamics description
of a scene may be as follows:
At time t= 0 seconds the ball is at position (o meters, 100 meters, 0 meters)
The ball has a mass of 1729 grams.
The force of gravity acts on the ball
Naturally the result of a dynamic simulation of such a model is that the ball falls.

Display Animation

To display animations with raster systems, the animated objects must be scan-converted
and stored as pixmap in the frame buffer.
Scan conversion must be done at least 10 times per second to ensure smooth visual
effects.
The actual scan-conversion must take a small portion of 10 times/second in
order to avoid distracting ghost effect
Double buffering is used to avoid the ghost effect
If rotating and scan-converting the object takes longer than 100
milliseconds, the animation is quite slow.
The transition from one image to the next image appears.

Example
Load Color Look-Up Table (CLUT) to display values as background color;
Scan-convert object into image0
Load CLUT to display only image0
Repeat
Scan-convert object into image1

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 39
Fundamentals of Multimedia

Load CLUT to display only image1


Rotate object data structure description
Scan-convert object into image0
Load CLUT to display only image0
Rotate object data structure description
Until (termination condition

Transmission of Animation
The transmission of animation over computer networks may be performed using one of two
approaches.
Animated objects may be represented symbolically using graphical objects scan-converted
pixmap images.
The transmission time is short because the symbolic representation of a animated object is
smaller in byte size than it pixmap representation.
The pixmap representation of the animated objects in transmitted and display on the
receiver side.
It is performed at the sender side where animation objects and operation command are
generated.
Transmission rate of the animation is equal to the size of the pixmap representation an
animated object multiplied by the number of graphical images per second.

Multimedia Data Compression Issues


Compression
Compression is the reduction in size of data in order to save space or transmission time. for data
transmission, compression can be performed on just the data content or on the entire transmission
unit depending on a number of factors. Content compression can be as simple as removing all extra
space characters, inserting a single repeat character to indicate a string of repeated characters, and
substituting smaller bit strings for frequently occurring characters. This kind of compression can
reduce a text file to 50% of its original size. Data compression is particularly useful in
communications because it enables devices to transmit or store the same amount of data in fewer
bits.

Compress is the process of taking one or more files and making them smaller by using a
compression algorithm. Commonly, file compression will combine each of the compressed files
into one compressed file containing each of the files. Compressed files enable downloading and
sharing files to be easier by shrinking the overall size and allowing a user to download one file
instead of dozens or hundreds of smaller files.

Reducing the 'electronic space' (data bits) used in representing a piece of information, by
eliminating the repetition of identical sets of data bits (redundancy) in an audio/video, graphic, or
text data file. White spaces in text and graphics, large blocks of the same color in pictures, or other
continuously recurring data, is reduced or eliminated by coding (encryption) with a program that
uses a particular type of compression algorithm. The same program is used to decompress
(decrypt) the data so that it can be heard, read, or seen as the original data.

In this phase there are two concurrent events: building an indexed dictionary and compressing a
string of symbols. The algorithm extracts the smallest substring that cannot be found in the
dictionary from the remaining uncompressed string. It then stores a copy of this substring in the
dictionary as a new entry and assigns it an index value. Compression occurs when the substring,
except for the last character, is replaced with the index found in the dictionary. The process then
inserts the index and the last character of the substring into the compressed string.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 40
Fundamentals of Multimedia

Decompression
Decompression is the inverse of the compression process. The process extracts the substrings from
the compressed string and tries to replace the indexes with the corresponding entry in the
dictionary, which is empty at first and built up gradually. The idea is that when an index is
received, there is already an entry in the dictionary corresponding to that index. Uncompressing (or
decompressing) is the act of expanding a compression file back into its original form.

In order to use a compressed file, you must first decompress it. The software used to decompress
depends on how the file was compressed in the first place. To decompress a .zip file you need
software, such as WinZip. To decompress a .sit file, you need the Stuffit Expander program.
WinZip does not decompress .sit files, but one version of StuffIt Expander can decompress both
.zip and .sit files. Files ending in .sea or .exe are called self-extracting files. These are compressed
files that do not require any special software to decompress. Just click on the file and it will
automatically decompress and open.

Data compression implies sending or storing a smaller number of bits. Although many methods are
used for this purpose, in general these methods can be divided into two broad categories: lossless
and lossy methods.

Lossless Compression
Lossless compression is a way to compress files without losing any data. This method shoves the
data closer together by replacing it with a type of shorthand. It can reduce file sizes by around half.
The .zip format uses lossless compression. With this form, the file decompresss to provide an exact
duplicate of the compressed file, with the same quality. However, it cannot compress files to a
really small size, making it less useful for very large files. In lossless data compression, the
integrity of the data is preserved. The original data and the data after compression and
decompression are exactly the same because, in these methods, the compression and
decompression algorithms are exact inverses of each other no part of the data is lost in the process.
Redundant data is removed in compression and added during decompression. Lossless
compression methods are normally used when we cannot afford to lose any data.

Run-Length Encoding
Run-length encoding is probably the simplest method of compression. It can be used to
compress data made of any combination of symbols. It does not need to know the
frequency of occurrence of symbols and can be very efficient if data is represented as 0s
and 1s.
The general idea behind this method is to replace consecutive repeating occurrences of a
symbol by one occurrence of the symbol followed by the number of occurrences.
The method can be even more efficient if the data uses only two symbols (for example 0
and 1) in its bit pattern and one symbol is more frequent than the other.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 41
Fundamentals of Multimedia

Huffman Coding
Huffman coding assigns shorter codes to symbols that occur more frequently and longer
codes to those that occur less frequently. For example, imagine we have a text file that uses
only five characters (A, B, C, D, E). Before we can assign bit patterns to each character, we
assign each character a weight based on its frequency of use. In this example, assume that
the frequency of the characters is as shown in Table 15.1.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 42
Fundamentals of Multimedia

A characters code is found by starting at the root and following the branches that lead to that
character. The code itself is the bit value of each branch on the path, taken in sequence.

Encoding
Let us see how to encode text using the code for our five characters. The original and the encoded
text

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 43
Fundamentals of Multimedia

Decoding
The recipient has a very easy job in decoding the data it receives. shows how decoding takes
place.

Lempel Ziv Encoding


Lempel Ziv (LZ) encoding is an example of a category of algorithms called dictionary-based
encoding. The idea is to create a dictionary (a table) of strings used during the communication
session. If both the sender and the receiver have a copy of the dictionary, then previously-
encountered strings can be substituted by their index in the dictionary to reduce the amount of
information transmitted.

Lossy Compression Methods


To make files up to 80 percent smaller, lossy compression is used. Lossy compression software
removes some redundant data from a file. Because data is removed, the quality of the
decompressed file is less than the original. This method compresses graphic, audio and video files,

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 44
Fundamentals of Multimedia

and the slight damage to quality may not very noticeable. JPEG uses lossy compression, which is
why files converted to JPEG lose some quality. MP3 also uses lossy compression to fit a great deal
of music files in a small space, although the sound quality is lower than with WAV, which uses
lossless compression. Our eyes and ears cannot distinguish subtle changes. In such cases, we can
use a lossy data compression method. These methods are cheaperthey take less time and space
when it comes to sending millions of bits per second for images and video. Several methods have
been developed using lossy compression techniques. JPEG (Joint Photographic Experts Group)
encoding is used to compress pictures and graphics, MPEG (Moving Picture Experts Group)
encoding is used to compress video, and MP3 (MPEG audio layer 3) for audio compression.

Joint Photographic Experts Group Preparation


The JPEG standard is applicable to both grey scale and colour images. This is done by breaking a
picture down in to a number of components. When each component is put together, the end image
can be viewed. For example a back and white image would have only one component which would
depict the grey scale of the image, whereas a colour image generally has three components (one for
each primary colour, red green and blue or equivalent).

Each component is made up of x columns and y rows of samples. These samples are simply the
actual "pixel" data of the image. In JPEG Baseline the number of samples is the same as the
resolution of the image. A single pass through a component and its samples is known as a scan. In
JPEG Baseline this does not mean a great deal as only one scan of the image is used.

In JPEG encoding the data is broken into a number of blocks called Minimum Coded Units
(MCUs). These MCUs are simply made by taking a number of 8x8 pixel sections of the source
image. MCUs are used to break down the image into workable blocks of data as well as to allow
manipulation of local image correlation at a given part of the image by the encoding algorithm.

When processing each MCU, the algorithm always moves through the component from the left to
right, then top to bottom order.

There are two methods of processing MCUs and components. The first method is whereby each
MCU contains data from a single component only, and hence is comprised of only one unit of data.
The second method is where the data from each component is interleaved within a single MCU.
This means that each MCU contains all the data for a particular physical section of an image, rather
than a number of MCUs containing each component of that section.

Further to this, there are horizontal and vertical sampling factors. These sampling factors dictate
how many 8x8 pixel sections are to be placed within an MCU when the component data is
interleaved. For this project, the simplest option was taken which is a horizontal and vertical
sampling factor of 1.

JPEG file consists of separated components, which are processed independently from each other.
Contents and resolution of the components could differ depending on application and quality
requirements. The most common format is YUV using the two color components U and V and the
brightness component Y. Depending on the desired image quality the resolution of the color
components could be reduced, for example by the factor two or four; i.e. beside the full brightness
information two or four pixels will be combined together to one color information. This approach
results from the selective sensitivity of the human eye for the perception of differences in color and
brightness. The brightness is substantially of higher importance for the total perception. Therefore
it is recommended to provide a proportionally larger part of the data volume for the encoding of
the brightness. These procedures are called 4:2:2 or 4:1:1 subsampling.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 45
Fundamentals of Multimedia

Human Visual System


A number of inherent properties of images and the human visual system can be manipulated to
allow for greater compression through the JPEG algorithm.

One of these is the tendency for the human eye to notice variations of brightness intensity much
more than variations of colour in an image. The JPEG algorithm can take advantage of this by
applying different rules for brightness and colour variations.

Another property is that real world images generally do not have sharp boundaries for brightness
intensity changes and colour changes. This means that spatial frequencies detected within an image
are generally low order, ie gradual change rather than quick changes over a localised area of an
image.

JPEG: Transformation
Discrete Cosine Transform (DCT) the requirements to an ideal format for image compression can
be outlined as follows:
The image information should be differentiated according to their meaning for the
qualitative impression.
The internal structure should offer the opportunity to decide, to what extent the omission of
information deteriorates the image quality.
Only relevant information should require data capacity. If a section does not contain
information like monochromatic areas, this should be directly reflected within encoded
data.
The pixel wise representations used for conventional data formats do not fulfill these
requirements. For that reason the original data has to be converted by the help of the so-
called Discrete Cosine Transform (DCT). After conversion the data show the following
characteristics:
Simple structures are reflected in low values; complex, more detailed structures in high
values.
Values resulting from the DCT reflect the geometrical structure of the contents. Details, as
horizontal or vertical patterns, not contained in the image will be represented by the value
zero.
Monochromatic areas will be described by a single value. All other values become zero.
Normally 8 x 8 pixels of each component will be transformed at a time with the help of the
following formula:

JPEG: Quantization (Weighting of the Contents)

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 46
Fundamentals of Multimedia

The result of the DCT is a set of values also describing the image contents. In contrast to the
original format they do not affect the image quality in the same manner. The reason is the
subjectively different perception of details.
With the help of quantization more relevant values are described with a higher accuracy.
Values influencing the subjective perception to a smaller extend will be devalued and
represented with lower range of values.
Quantization is not reversible and substantially affects the image quality.
The quantization is processed with special tables which will be stored within the
corresponding JPEG file. They are essential for decoding.

JPEG: Entropy Coding (Huffman Coding)


Data, which result from the preceding steps, are provided with a different quantity of redundancy,
which does not supply any contribution for the quality of the image representation. This
redundancy is reduced e.g. with the help of a Huffman coding.
The basis for this coding is a code tree according to Huffman, which assigns short code
words to symbols frequently used and long code words to symbols rarely used.
Beside Huffman coding another procedure is specified, which does not have a practical
meaning because of the patent situation (arithmetic coding).
Additionally a special form of Run Length Encoding is applied.
JPEG: Compatibility
The complete JPEG standard comprises of a variety of different options and operating modes. A lot
of applications and import filters do not support all of them. Often compatibility problems arise, if
professional applications generate files with rarely used features.
As a quasi-standard JPEG was already in use before the final definition of the standard was
available. However, most of the problems regarding this have been eliminated in the course
of the development.
Due to the large spreading and because of the restriction on conventional fields of
application, compatibility problems are not of importance in practice.

Joint Photographic Experts Group Modes


1. Sequential Mode
2. Lossless Mode
3. Progressive Mode
4. Hierarchical Mode
In "Motion JPEG", Sequential JPEG is applied to each image in a video.
1. Sequential Mode

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 47
Fundamentals of Multimedia

Each image component is encoded in a single left-to-right, top-to-bottom scan.


Baseline Sequential Mode, the one that we described above, is a simple case of the Sequential
mode:
1. It supports only 8-bit images (not 12-bit images)
2. It uses only Huffman coding (not Arithmetic coding)
2. Lossless Mode
Lossless JPEG has some popularity in medical imaging, and is used in DNG and some digital
cameras to compress raw images, but otherwise was never widely adopted. JPEG-LS is a simple
and efficient baseline algorithm which consists of two independent and distinct stages called
modeling and encoding. JPEG-LS was developed with the aim of providing a low-complexity
lossless and near-lossless image compression standard that could offer better compression
efficiency than lossless JPEG.
Lossless JPEG is actually a mode of operation of JPEG. This mode exists because the
Discrete Cosine Transform (DCT) based form cannot guarantee that encoder input would exactly
match decoder output since the Inverse DCT is not rigorously defined. Unlike the lossy mode
which is based on the DCT, the lossless coding process employs a simple predictive coding model
called Differential Pulse Code Modulation (DPCM). This is a model in which predictions of the
sample values are estimated from the neighboring samples that are already coded in the image.
Most predictors take the average of the samples immediately above and to the left of the target
sample. DPCM encodes the differences between the predicted samples instead of encoding each
sample independently. The differences from one sample to the next are usually close to zero. A
typical DPCM encoder is displayed
1. A special case of the JPEG where indeed there is no loss.
2. It does not use DCT-based method! Instead, it uses a predictive (differential coding)
method:
A predictor combines the values of up to three neighboring pixels (not blocks as in the Sequential
mode) as the predicted value for the current pixel, indicated by "X" in the figure below. The
encoder then compares this prediction with the actual pixel value at the position "X", and encodes
the difference (prediction residual) losslessly.

It can use any one of the following seven predictors:

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 48
Fundamentals of Multimedia

Predictor Prediction

1 A

2 B

3 C

4 A+B-C

5 A + (B - C) / 2

6 B + (A - C) / 2

7 (A + B) / 2

Since it uses only previously encoded neighbors, the very first pixel I(0, 0) will have to use
itself. Other pixels at the first row always use P1, at the first column always use P2.
Effect of Predictor (test result with 20 images):

3. Progressive Mode
1. Goal: display low quality image and successively improve.
2. Two ways to successively improve image:
Spectral selection: Send DC component and first few AC coefficients first, then
gradually some more ACs.
Successive approximation: send DCT coefficients MSB (Most Significant Bit)
to LSB (Least Significant Bit). (Effectively, it is sending quantized DCT
coefficients first, and then the difference between the quantized and the non-
quantized coefficients with finer quantization stepsize.)
4. Hierarchical Mode
This format allows a number of different resolution images to be held within one file. This means
that only one image file needs to be made for a number of resolutions it could be viewed at. Again
it uses more than one scan to define the image, the decoder simply uses scans in order until it has
reached the resolution it requires.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 49
Fundamentals of Multimedia

A Three-level Hierarchical JPEG Encoder


(a) Down-sample by factors of 2 in each dimension, e.g., reduce 640 x 480 to 320 x 240

(b) Code smaller image using another JPEG mode (Progressive, Sequential, or Lossless).

(c) Decode and up-sample encoded image

(d) Encode difference between the up-sampled and the original using Progressive,
Sequential, or Lossless.
1. Can be repeated multiple times.
2. Good for viewing high resolution image on low resolution display.

Motion Picture Expert Group


Motion Picture Expert Group the general name for the working group, in which the MPEG
specifications were developed. The MPEG format consists of the versions 1, 2 and 4 concerning
the coding of audio and video information.
MPEG-1
Coding of moving pictures and associated audio for digital storage media at up to about 1, 5
Mbit/s.
The development of MPEG-1 started in 1988 and was applied in the early 90's in a first
generation of products. The major focus of MPEG-1 is the coding at data transfer rates
which were given by the common parameters for CD ROM (video) and ISDN (audio).
In the meantime MPEG-1 is of minor importance with exception of the MP3 audio format.
The basic assumptions about maximum data transfer rate, volume of the media and buffer

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 50
Fundamentals of Multimedia

capacity are outdated. Meanwhile the sophisticated MPEG-2 data streams can be decoded
only by using software implementations based on standard PCs.
MPEG-1 Applications:
MP3 Player (MPEG-1 Audio Layer III)
The native layer III audio coding of MPEG-1 represents the most successful application.
Extensions and further developments are effective here meanwhile too (e.g. AAC).
Audio On Demand (MPEG-1 Audio Layer III)
The first audio on demand services operated with two ISDN lines in parallel. It was the aim to
supply the customers in real time. These basic parameters were responsible, among others, for the
dimensioning of the third audio layer.
DCC (Digital Compact Cassette; MPEG-1 Audio Layer II)
The DCC should serve as digital follow-up system for the analogue compact cassette. The linear
tape system worked with audio layer II and was able to achieve a quality similar to CD. Contrary
to the competitive system MiniDisc (Sony), the DCC could not be placed at the market. The
MiniDisc uses an ATRAC format for audio compression.
CD-I FMV (CD-Interactive Full Motion Video)
The CD-I, an interactive CD player with multimedia functions, intended for home applications was
one of the first MPEG-1 video implementations. The CD-I system based on the standard speed for
audio playback (2324 byte/frame = 174,300 byte/s) and has set the basic requirements for the
standard transfer rate of MPEG 1. The CD-I-System could not be established at the market, too.
CD-ROM-XA Video
Special solutions decoding MPEG-1 video streams were available equipped with specific decoder
hardware. Due to the price and the limited quality of MPEG-1 these products did not achieve a
larger relevance. Concerning data transfer rate, the same is valid as mentioned with the CD-I.
MPEG-2 Applications:
Generic coding of moving pictures and associated audio information
MPEG-2 provides a better utilization of resources and introduces multi-channel features for
audio coding. A larger data transfer rate is specified and due to that a substantially better
image quality is possible, compared with MPEG-1. Additionally extended functions are
specified for the scaling of resolution, frame rate and quantization.
The most popular application is the DVD-Video using MPEG-2 video streams. For audio
coding both AC-3 (by Dolby) and MPEG-2 is allowed, at least by the specification.
MPEG-4 Applications: Coding of audio-visual objects
MPEG-4 primarily focuses on complex procedures required for particular large
compression rates (for video conference systems or internet transmission, etc.). Especially

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 51
Fundamentals of Multimedia

real-time conditions in combination with limited system resources are the most important
problems for this type of applications. Until today the development of MPEG-4 is not
finalized completely.

The Principle of MPEG


MPEG 1 and MPEG 2 utilize a number of circumstances which result from the human vision or
from the specific characteristics of video data:
1. Colour sensivity of human vision
Due to the less sensitiveness of the eye for colour perception, the colour representation
(chrominance) is reduced by the factor 4 in relation to the brightness representation (luminance).
The colour information of four pixels will be combined together to a single value. In total the data
volume is halved. A 4:1:1 YCC colour system is applied.
2. Perception of details
Information must be rejected in every case to achieve the target data rates. It is of importance for
the visible quality, that only details are impaired having a subjectively low influence. To achieve
this, the original data has to be converted by the discrete cosine transform (DCT), followed by
weighting the data (quantization). With the help of these two steps, it can be distinguished
approximately between important (relevant) and unimportant (irrelevant) picture information. The
procedures are nearly identical with those used for JPEG coding.
3. Differences between pictures
At many scenes only a part of the picture moves or changes, while the background remains
unchanged. The contents of the previous or following picture sections can be addressed, only the
differences have to be encoded.
Picture Types
The MPEG standard specifies four picture types which differ regarding compression efficiency and
access behaviour:
I-Pictures (Intra Coded Pictures)
I-pictures are completely coded and do not refer to preceding or succeeding pictures.
P-Pictures (Predictive Coded Pictures)
The picture's contents is derived from a preceding picture, changes are coded by its difference in
combination with motion compensation. P-pictures can only be represented, if the original pictures
are still accessible.
B-Pictures (Bidrectionally Coded Pictures)
B-pictures will be derived from both a preceding and a succeeding picture. In principle they
represent an interpolation between two pictures.
D-Pictures (DC Coded Pictures)

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 52
Fundamentals of Multimedia

D-pictures contain the entire contents like I-pictures, but are massively compressed. They are
intended for fast-forward functions.
4. Movement of details
Modifications from picture to picture frequently occur, if central motives change their position or if
the background moves itself. By determining the movement of certain picture elements, a clear
saving can be achieved. It is only necessary to indicate the changes in the position by movement
vectors.

Video Compression /MPEG Encoding


The Moving Picture Experts Group (MPEG) method is used to compress video. In principle, a
motion picture is a rapid sequence of a set of frames in which each frame is a picture. In other
words, a frame is a spatial combination of pixels, and a video is a temporal combination of frames
that are sent one after another. Compressing video, then, means spatially compressing each frame
and temporally compressing a set of frames.

Spatial Compression
The spatial compression of each frame is done with JPEG, or a modification of it. Each frame is a
picture that can be independently compressed.

Temporal Compression
In temporal compression, redundant frames are removed. When we watch television, for example,
we receive 30 frames per second. However, most of the consecutive frames are almost the same.
For example, in a static scene in which someone is talking, most frames are the same except for the
segment around the speakers lips, which changes from one frame to the next

RUN-TIME ENCODING
The run-time requirements for encoding are substantially depending on the desired usage. To point
this out it is helpful to compare the encoders requirements for broadcasting and for DVD-Video
production.
1. Example: Broadcasting
Within a production chain for a live transmission the efforts for encoding may not exceed a
specific delay time. To avoid synchronization problems this has to be regarded for both audio and
video. Both signals must be coupled together before broadcasting. Procedures matching this
requirements will be called real-time or synchronous processes.
2. Example: DVD-Video

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 53
Fundamentals of Multimedia

The encoding time is not important for DVD production; it may take much longer than for
decoding. Only the contrary aspects quality and financial expenditures has to be regarded. As the
time conditions for encoding and decoding are totally independent, procedures of this category will
be called asynchronous processes. The example DVD production covers both time conditions and
complexity of the algorithm (system performance).

MPEG Decoding
In principle the MPEG standard defines only a data format and the directly associated
functionality of the decoder. The implementation of the encoder is free and offers a wide
range of options to achieve the best possible quality depending on the available computer
power and time.
The actual MPEG coding represents a process with strongly asynchronous characteristics.
Real time conditions can be achieved only by extensive dimensioning of the encoder
components or alternatively by quality losses.
The selections of the procedures have a determining influence on the necessary decoding
power. For prediction and interpolation the reference pictures have to be accessible within
the memory.

Audio Compression /Audio Encoding


Audio compression can be used for speech or music. For speech we need to compress a 64 kHz
digitized signal, while for music we need to compress a 1.411 MHz signal. Two categories of
techniques are used for audio compression: predictive encoding and perceptual encoding.
Predictive Encoding
In predictive encoding, the differences between samples are encoded instead of encoding
all the sampled values. This type of compression is normally used for speech. Several
standards have been defined such as GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or
5.3 kbps).
Perceptual Encoding: MP3
The most common compression technique used to create CD-quality audio is based on the
perceptual encoding technique. This type of audio needs at least 1.411 Mbps, which cannot
be sent over the Internet without compression. MP3 (MPEG audio layer 3) uses this
technique.

Storage Space
Uncompressed graphics, audio and video data require storage space capacity.
Data transfer of uncompressed video data over digital networks requires very high
bandwidth to be provided for a single point- to- point communication.
Audio/video in compressed or uncompressed form requires higher storage capacity than
other media.
Current magnetic data stage carriers take the form of floppy disks or hard disks and are
used as secondary storage media.
Optical storage media use the intensity of reflected laser light as an information source.
Optical storage media offer a higher storage density at a lower cost.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 54
Fundamentals of Multimedia

Digital Audio Tape (DAT): external devices such as video recorders are DAT can be used
for multimedia system
Audio Compact Disk (ACD): the successor to long play disks(LPs) is a commercially
product in the entertainment industry. Write Once Read Many (WORM), Compact Disk
- Read Only Memory (CD-Rom),Video Long Play (VLP), Compact- Disk Digital
Audio(CD-DA), Compact Disk Interactive (CD-I), Digital Video-Interactive ( DVI),
Compact Disk-Write Once (CD-WO) etc

Coding Requirements: Dialogue Mode


End-to-end delay does not exceed 150 ms for compression and decompression alone.
Compression and decompression should not exceed 50ms in order to ensure natural
dialogue.
In addition
o delay in the network,
o communications protocol processing in the end system,
o Data transfer to and from the respective input and output devices.
Coding Requirements: Retrieval Mode
Fast forward and fast rewind with synchronized display (or playback) of the data should be
possible
Random access to single images or audio passages in a data stream should be possible in
less than 0.5 s.
o Maintains interaction aspects in retrieval systems
Decompression of images, video or audio passages should be possible without interpreting
all past data.
o Allows random access and editing
Coding Requirements: Both Modes
Support display of the same data in different systems
Formats have to be independent of frame size and video frame rate
Audio and video compression should support different data rates at different qualities
Exactly synchronize audio and video
Support for economical solution
Software
Few VLSI chips
Enable cooperation of different systems
Data generated on a multimedia system can be reproduced on another system.

DATA STREAM
The multimedia data streaming system is used for transmitting multimedia data to a receiver by
means of a streaming server through a dynamic streaming process. The multimedia data streaming
system includes a converting module and a scheduler module. The scheduler module is used to
obtain the bandwidth condition between the streaming server and the receiver dynamically, it can
then request the converting module to convert the original data into streaming chunks with the
optimal bit rates in accordance with the bandwidth condition detected; the streaming chunks can
then be transmitted to the receiver by means of the streaming server.
Sequence of individual packets that are transmitted in a time-dependant mode
Packets can carry information of either continuous or discrete media
Popular approach to continuous media over the internet
Playback at users computer is done while the media is being transferred (no waiting till
complete download!!!)
You can find streaming in:
Internet Radio Stations

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 55
Fundamentals of Multimedia

Distance Learning Education


Cricket Live
Movie Trailers
Video Meeting ( conference)

Transmission of information carrying different media leads to data streams


Transmission Mode
Packets can reach receiver as fast as possible.
Suited for discrete media
Additional time constraints must be imposed for continuous media
Synchronous Transmission Mode
Defines maximum end-to-end delay
Packets can be received by chance before time
For retrieving uncompressed video at data rate 140Mbits/s & maximal end-to-end
delay 1 second the receiver should have temporary storage 17.5 Mbytes
Media Preparation
Multimedia I/O hardware + software
Audio support
Multiple-channel digital sound tracks, interaction
Video support
Video board + digitizers (up to 60 fps HDTV)
Graphical displays
Normal, head-mounted, surrounded displays, holography
Scanner devices
Image scanners, photo scanners (>2000 pixels/inch)
Recognition devices
Tracking devices
Media Composition: Text,Graphics, Image
Text editors
Fonts, text styles, text effects, graphical capabilities
E.g. MS Word, PPP, CorelDRAW, photoshop
Graphics editors (2D, 3D)
Manipulate on mathematical representations
Image editors (e.g. xv) manipulate on pixels
Pixel replication
Sampling and filtering (high frequencies)
Increase of resolution, change intensity, color

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 56
Fundamentals of Multimedia

Image + graphics combined

Media Composition: Sound


Sound editors
Locating and storing sounds
Record, read from file, retrieve from pasteboard, create
Recording and playback
8000 samples/sec, 12 bit precision, compression to 8 bits
Editing
Copy/paste, cut, delete, insert, replace
Fragmented sound data must be compacted, otherwise playback may suffer under inefficiency
E.g. MIDI-editors
Allow e.g. to synthesize orchestral music
Music synthesis, special effects (e.g. hall echos)
Hardware support by DSPs (digital signal processor) Distributed Multimedia Systems
Multimedia Applications - 7

Media Composition: Video


Video editors
Individual frames can be edited as in image editors
Temporal aspects as in animation editing
E.g. combine several cuts into a sequence etc.
E.g VidEdit, D/Vision Pro
Video book
The one-dimensional time line is made visible in the two-dimensional space
A whole film can be presented on a few pagesLszl Bszrmnyi Distributed Multimedia
Systems Multimedia Applications - 8
Media Integration
Multimedia editors
Integrate all kind of data
Unified view, coordinated control, multiple buffers, light-weight windows (panes)
Extremely large documents, often partitioned
E.g. Diamond Multimedia Editor (Diamond)
Hypermedia / Hypertext Editors (HTML, SGML)
Multimedia combined with non-linear links

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 57
Fundamentals of Multimedia

E.g. Hypercard (Macintosh), DynaText (Sun),


NoteCard (Xerox PARC), Hyper-G (TU Graz)
Authoring tools
E.g. PowerPoint, MovieWorks (QuickTime)

Multimedia Application
Multimedia is media and content that uses a combination of different content forms. The term is
used in contrast to media which only use traditional forms of printed or hand-produced material.
Multimedia includes a combination of text, audio, still images, animation, video, and interactivity
content forms. Multimedia is usually recorded and played, displayed or accessed by information
content processing devices, such as computerized and electronic devices, but can also be part of a
live performance. Multimedia (as an adjective) also describes electronic media devices used to
store and experience multimedia content. Multimedia is distinguished from mixed media in fine
art; by including audio.
Categorization of Multimedia
Multimedia may be broadly divided into linear and non-linear categories. Linear active content
progresses without any navigational control for the viewer such as a cinema presentation. Non-
linear content offers user interactivity to control progress as used with a computer game or used in
self-paced computer based training. Hypermedia is an example of non-linear content. Multimedia
presentations can be live or recorded. A recorded presentation may allow interactivity via a
navigation system. A live multimedia presentation may allow interactivity via an interaction with
the presenter or performer.
Characteristics of Multimedia
Multimedia Presentations may be viewed in person on stage, projected, transmitted, or
played locally with a media player. A broadcast may be a live or recorded multimedia
presentation. Broadcasts and recordings can be either analog or digital electronic media
technology. Digital online multimedia may be downloaded or streamed. Streaming
multimedia may be live or on-demand.
Multimedia Games and Simulations may be used in a physical environment with special
effects, with multiple users in an online network, or locally with an offline computer, game
system, or simulator. The various formats of technological or digital multimedia may be
intended to enhance the users' experience, for example to make it easier and faster to
convey information. Or in entertainment or art, to transcend everyday experience.
A Presentation using PowerPoint, Corporate presentations may combine all forms of media
content. Virtual reality uses multimedia content. Applications and delivery platforms of

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 58
Fundamentals of Multimedia

multimedia are virtually limitless. Multimedia finds its application in various areas including, but
not limited to, advertisements, art, education, entertainment, engineering, medicine, mathematics,
business, scientific research and spatial temporal applications.
Commercial Business: Much of the electronic old and new media used by commercial artists is
multimedia. Exciting presentations are used to grab and keep attention in advertising. Business to
business and interoffice communications are often developed by creative services firms for
advanced multimedia presentations beyond simple slide shows to sell ideas or liven-up training.
Commercial multimedia developers may be hired to design for governmental services and
nonprofit services applications as well.
Entertainment And Fine Arts: In addition, multimedia is heavily used in the entertainment
industry, especially to develop special effects in movies and animations. Multimedia applications
that allow users to actively participate instead of just sitting by as passive recipients of information
are called Interactive Multimedia. In the Arts there are multimedia artists, whose minds are able to
blend techniques using different media that in some way incorporates interaction with the viewer.
Another approach entails the creation of multimedia that can be displayed in a traditional fine arts
arena, such as an art gallery. Digital recording material may be just as durable and infinitely
reproducible with perfect copies every time.
Education: In Education, multimedia is used to produce Computer-Based Training courses. A
CBT lets the user go through a series of presentations, text about a particular topic, and associated
illustrations in various information formats. Edutainment is an informal term used to describe
combining education with entertainment, especially multimedia entertainment. The idea of media
convergence is also becoming a major factor in education, particularly higher education. Defined
as separate technologies such as voice, data and video that now share resources and interact with
each other, synergistically creating new efficiencies, media convergence is rapidly changing the
curriculum in universities all over the world.
Journalism: Newspaper companies all over are also trying to embrace the new phenomenon by
implementing its practices in their work. News reporting is not limited to traditional media outlets.
Freelance journalists can make use of different new media to produce multimedia pieces for their
news stories. It engages global audiences and tells stories with technology, which develops new
communication techniques for both media producers and consumers. Common Language Project is
an example of this type of multimedia journalism production. While some have been slow to come
around, other major newspapers like The New York Times, USA Today and The Washington Post
are setting the precedent for the positioning of the newspaper industry in a globalized world.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 59
Fundamentals of Multimedia

Engineering: Software engineers may use multimedia in Computer Simulations for anything from
entertainment to training such as military or industrial training. Multimedia for software interfaces
are often done as collaboration between creative professionals and software engineers.
Industry: In the Industrial sector, multimedia is used as a way to help present information to
shareholders, superiors and coworkers. Multimedia is also helpful for providing employee training,
advertising and selling products all over the world via virtually unlimited web-based technology.
Mathematical and Scientific Research: In mathematical and scientific research, multimedia is
mainly used for modeling and simulation. For example, a scientist can look at a molecular model
of a particular substance and manipulate it to arrive at a new substance. Representative research
can be found in journals such as the Journal of Multimedia.
Medicine: In Medicine, doctors can get trained by looking at a virtual surgery or they can simulate
how the human body is affected by diseases spread by viruses and bacteria and then develop
techniques to prevent it.
Document Imaging: Document imaging is a technique that takes hard copy of an image/document
and converts it into a digital format (for example, scanners).
Structuring Information in a Form: Multimedia represents the convergence of text, pictures,
video and sound into a single form. The power of multimedia and the Internet lies in the way in
which information is linked. Multimedia and the Internet require a completely new approach to
writing. The style of writing that is appropriate for the 'on-line world' is highly optimized and
designed to be able to be quickly scanned by readers. A good site must be made with a specific
purpose in mind and a site with good interactivity and new technology can also be useful for
attracting visitors. The site must be attractive and innovative in its design, function in terms of its
purpose, easy to navigate, frequently updated and fast to download. When users view a page, they
can only view one page at a time. As a result, multimedia users must create a mental model of
information structure.

Advantages
Multimedia applications allow the computer user to communicate with the computer
system in a variety of ways (speaking, writing, moving objects etc.).
Multimedia applications give a real world impression while using a computer.
One can communicate with people in remote locations just like all sitting in a single
drawing room.
You do not need to convert data into computer acceptable form. Data is acceptable in the
form of voice, moving pictures, and images etc.
Students and trainees find it easy to understand what is being taught to them.

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 60
Fundamentals of Multimedia

Disable persons can also use computer systems.


Computer system can be connected to other machines and electronic devices.
Naturally people get attraction toward computer learning.
Usage of computer products increases in business environments.

Disadvantages
Overall cost of computer system increases.
computer user must be well trained to take full advantage of multimedia applications.
Sometimes overall performance of computer slows down.
A proper maintenance must without which problems occur frequently.
Sometimes if any of the attached device gets out of order, all the system stops it working.
Remote connections must be reliable and fast.

*******************************************

Er. Bimal Kumar Ray


School of IT and Computer Science
Ravenshaw University
Cuttack, Odisha, India
Mobile: 9853274179

*******************************************

Prepared By: Er. Bimal Ku. Ray, PGDCPA/M.SC.IT/MTech in Computer Science Page: 61

S-ar putea să vă placă și