Sunteți pe pagina 1din 133

The Third International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC 2015)

Contents
Welcome Message02
Program at a Glance04
Keynote Speeches05
Tutorials11
Technical Program
Session 1A16
Session 1B26
Session 2A36
Session 2B50
Poster Session64
Session 3A84
Session 3B98
Session 4A110
Session 4B120
TPC co-Chairs128
Conference Organization131

.indd

2015/03/11

10:59:19

It

is a great pleasure to welcome all of you to the 3rd International

Japan-Egypt Conference on Electronics, Communications and Computers


(JEC-ECC 2015) and Fukuoka as well, the largest city in the Western Japan.
The past two editions were organized in Egypt. The first one was in Alexandria
in 2012 and the second was in Cairo in 2013, and this is the first time, the
conference being held in Japan.
This conference was planned to serve as a platform for collaboration
between Japanese, Egyptian and international researchers from academia and
industry in the field of electronics, communications and computer engineering.
Furthermore this year, Communications Systems and Signal Processing, Digital, Analog and Microwave
Systems Design and Implementation, and Computer Networks, Hardware and Software Systems are given a
special focus as the scope of the conference. As a result, we are successful to include 5 keynote lectures and
52 papers from Japan, Egypt and other countries, and it is the largest ever, compared with past two conferences.
The keynote speakers are the leading researchers not from Japan and Egypt only, rather one from Australia and
one from the USA as well. I believe that it would give a unique opportunity for researchers and engineers to
share their findings, and the students would benefit from the esters sharing their knowledge in the various areas
of electronics and computer engineering.
This conference is a collaborative effort, a joint venture of mainly of institutions; Egypt-Japan
University of Science and Technology (E-JUST) and Kyushu University. The third partner is the EJUST
Center, a bridge between E-JUST and Kyushu University. The E-JUST Center is committed to foster the
relationship with the E-JUST by collaborating with the Department of Electronics and Communications
Engineering (ECE) in its education and research activities as its Japanese counterpart. The exchange of
students, faculties and staffs are not limited to our objectives, we are equally committed developing and
successful implementation of the double-degree program between the ECE department of E-JUST and Kyushu
University.
I am grateful to the organizing committee and the technical committee, who worked very hard and
spent many sleepless nights in reviewing papers and providing feedback to authors and managing this
conference in a very professional manner. I also thank to our invited speakers, researchers, and students for
gracing their outstanding work and sharing with us your findings and opinions through this conference. Our
sponsors, the Faculty of Information Science and Electrical Engineering of Kyushu University, IEEE Fukuoka
Section and E-JUST are especially acknowledged for their continuous help and support, and I must admit that
without their support, the conference would not be at this level.
Chair, 3rd JEC-ECC 2015

.indd

2015/03/11

10:59:21

It

is a real honor and privilege to serve as a General Chair of the third

International Japan-Egypt Conference on Electronics, Communications, and


Computers (JEC-ECC 2015), Fukuoka, Japan, Mar. 1618, 2015. This is the first
time for the conference to be held in Japan, where two previous conferences (JECECC 2012 and JEC-ECC 2013) were held in Egypt.
This series of events is a fruit of continuous research collaboration
between Japan and Egypt. The conference strengthens the relationship between
Egypt, Japan, and other countries in research collaboration. It ensures the
continuation of transfer of knowledge between the joining countries. In addition
to being a technical conference, aside gatherings between the participating researchers are held to evaluate joint
research collaboration in the period prior to the conference and to develop cooperation policies for the period after.
Furthermore, the conference provides additional reinforcement for richer experience of all parties.
The organization of the conference is the result of many dedicated people and I would like to express my
sincere appreciation to all volunteers who have contributed to the success of this event. This includes planning,
inviting speakers, reviewing papers, organizing sessions, and many other activities.
I hope that the conference provides an exciting, useful, and pleasant experience to all attendees.
Chair, 3rd JEC-ECC 2015
Hossam Shalaby

.indd

2015/03/11

10:59:22

.indd

2015/03/11

10:59:24

Keynote Speech 1

Energy Efficient RF Frontends:


Using Injection Locking & Other Tricks
Ramesh Harjani, University of Minnesota
Abstract
Power dissipation is quickly becoming one to the most critical system
considerations as we continue our desire to remain untethered while wanting to make our portable devices smaller
and lighter. Local oscillator signal generation and signal distribution consumes a significant percentage of
transceiver power. We examine the use of subharmonic injection locking as a mechanism to improve system
performance while reducing power dissipation. In this presentation I will provide a brief introduction to
subharmonic injection locking including phase noise properties. This will be followed by examining the use of
subharmonic injection locking for three separate transceiver systems; an ultra-low power 802.15.6 transmitter, a
4GHz instantaneous wideband receiver and a 24GHz phased array.

Biography
Ramesh Harjani is the E.F. Johnson Professor of Electronic Communications in the Department of
Electrical & Computer Engineering at the University of Minnesota. He is a Fellow of the IEEE. He received his
Ph.D. in Electrical Engineering from Carnegie Mellon University in 1989. He was at Mentor Graphics, San Jose
before joining the University of Minnesota. He has been a visiting professor at Lucent Bell Labs, Allentown, PA
and the Army Research Labs, Adelphi, MD. He co-founded Bermai, Inc, a startup company developing CMOS
chips for wireless multi-media applications in 2001. His research interests include analog/RF circuits for wired and
wireless communication systems.
Dr. Harjani received the National Science Foundation Research Initiation Award in 1991 and Best Paper
Awards at the 1987 IEEE/ACM Design Automation Conference, the 1989 International Conference on ComputerAided Design, and the 1998 GOMAC. His research group was the winner of the SRC Copper Design Challenge in
2000 and the winner of the SRC SiGe challenge in 2003. He is an author/editor of seven books. He was an Associate
Editor for IEEE Transactions on Circuits and Systems Part II, 1995-1997, Guest Editors for the International Journal
of High-Speed Electronics and Systems and for Analog Integrated Circuits and Signal Processing in 2004 and a
Guest Editor for the IEEE Journal of Solid-State Circuits, 2009-2011. He was a Senior Editor for the IEEE Journal
on Emerging & Selected Topics in Circuits & Systems (JETCAS), 2011-2013. He was the Technical Program Chair
for the IEEE Custom Integrated Circuits Conference 2012-2013, the Chair of the IEEE Circuits and Systems Society
technical committee on Analog Signal Processing from 1999 to 2000 and a Distinguished Lecturer of the IEEE
Circuits and Systems Society for 2001-2002.

.indd

2015/03/11

10:59:25

Keynote Speech 2

Evolution of Opto-electronics
Technologies for Ultrawide-band Optical
Transmissions and Wireless
Communications
Kazutoshi Kato, Kyushu University
Abstract
Opto-electronics technologies have been successfully developed for expanding the capacity of optical
transmissions. The laser diodes, the optical modulators and the photodetectors have been usually the key
components and combinations of these components and the electronics are continuously leading to an invention of
new technology. Recently, opto-electronics technologies also play an important role at ultrawide-band wireless
communications. In this presentation, first, I review the background of the optical fiber network system and
requirements of opto-electronics devices for the system. Next, I pick up the technologies of the laser diodes and
photodetectors, which have been the essential components for the optical fiber network, and explain their history
and the recent trend. Then, I show the future photonics approach to realizing an ultrawide-band wireless system
such as the Tera-hertz wave communication. Finally, I present the activities of our laboratory on the high-speed
wavelength tunable laser and the Tera-hertz carrier generation.

Biography
Kazutoshi Kato received the B.S and M.S. degrees in physics and the Ph.D. degree from Waseda
University, Tokyo, Japan, in 1985, 1987, and 1993, respectively.
Since 1987, he had been with NTT Opto-Electronics Laboratories, Kanagawa, Japan, where he had been engaged
in research on opto-electronics devices for wide-band optical transmissions, microwave applications, and optical
access networks. From 1994 to 1995, he was on leave from NTT at France Telecom CNET Bagneux Laboratory,
France, as a Visiting Researcher working on novel photodetectors. From 2000 to 2003, he was with NTT Electronics
Corporation, where he was involved in developing photonic network systems. From 2009 to 2011, he was an
executive manager at the NTT Photonics Laboratories, Atsugi, Kanagawa, Japan. He is currently a Professor of
Information Science and Electrical Engineering, Kyushu University. His current research interests include the
advanced opto-electronics devices and subsystems for high-speed optical transmissions and high-frequency wireless
communications.
Dr. Kato is a senior member of the IEEE Photonics Society, a senior member of the Institute of Electronics,
Information and Communication Engineers ( IEICE ), Japan, and a member of the Japan Society of Applied Physics.

.indd

2015/03/11

10:59:27

Keynote Speech 3

Reliability in Cloud Computing:


Issues and Challenges
Bahman Javadi, University of Western Sydney
Abstract
Cloud computing is a new computing paradigm that delivers IT
resources to business and users as subscription-based virtual and dynamically scalable services in a pay-as-you-go
model. With the increasing presence, scale, and complexity of these systems, resource failures are inevitable. Such
failures can result in frequent performance degradation, premature termination of execution, data corruption and
loss, violation of Service Level Agreements (SLAs), and cause a devastating loss of customers and revenue. In this
talk, the reliability in Cloud computing systems will be reviewed and discussed. Various technical challenges and
issues in Cloud reliability including failure model, failure correlation and workload dependency will be discussed.
Moreover, useful library and tools such as Failure Trace Archive with a case study based on hybrid Cloud
architecture will be presented to finalise the talk.

Biography
Bahman Javadi is a Senior Lecturer in Networking and Cloud Computing at the University of Western
Sydney, Australia. He is recently appointed as the Director of Academic Program for the Postgraduate ICT Course
in the School of Computing, Engineering and Mathematics. Prior to this appointment, he was a Research Fellow at
the University of Melbourne, Australia. From 2008 to 2010, he was a Postdoctoral Fellow at the INRIA RhoneAlpes, France. He has been a Research Scholar at the School of Engineering and Information Technology, Deakin
University, Australia during his PhD course. He is co-founder of the Failure Trace Archive, which serves as a public
repository of failure traces and algorithms for distributed systems. He has received numerous Best Paper Awards at
IEEE/ACM conferences for his research papers. He served as a program committee of many international
conferences and workshops. He has also guest edited many special issue journals. His research interests include
Cloud and Grid computing, performance evaluation of large-scale distributed computing systems, and reliability
and fault tolerance.

.indd

2015/03/11

10:59:27

Keynote Speech 4

Next-generation Self-organizing
Networks
Haris Gaanin, Alcatel-Lucent Bell
Abstract
The next generation (called 5G) communication systems will most
likely not be an incremental advance on contemporary communication systems. They are expected to be extremely
dense and heterogeneous, which introduces many new challenges for network optimization and management. It is
under discussion whether the 5G networks will further enhance peak data rates or focus will be on area-wise spectral
and energy efficiency. In general, it is expected that 5G innovations will enhance new services and enrich our
societies beyond what we experience today. However, the largest technology challenge would be to enable customercentric technologies that takes into consideration customers quality of experience.
The next-generation networks should target to enrich a customer experience by providing broadband
multimedia content (a thousand-fold increase in network capacity) and the connectivity for mass (billions) of
devices. Because of this it is expected that the 5G network requirements will require more advanced selforganization and self-optimization (Self-X) capabilities. This is mainly because the current concepts may not be
flexible enough and sufficient to support such complex deployments and ultra-high performance requirements. This
is even more challenging when we consider that services may (and most probably will) have different performance
requirements (e.g. latency, bandwidth, etc.). Hence, in 5G networks, a customer (and service) management may be
an integral part of the network optimization process.
The todays requirements from the mobile customer perspective are known. They expect to be connected
all the time through different devices. They expect to have access to broadband services from indoor (home, office,
shopping mall) or outdoors. Today, mobile data traffic growths tenfold mainly from either indoor users and it is
clear that contemporary communication systems may not support this trend. Studies have shown that more than 50
percent of voice and 70 percent of all data traffic originates from indoor users. This sets a challenging requirement
on the 5G technologies to provide both target data rates per area and seamless customer experience with respect to
network, device and service. Future networks Self-X must be able to provide high quality of customer experience
across the network by maintaining a seamless connectivity and the connection quality irrespective of location and/or
interference from other sources. This is not the case today. In this presentation we give an overview of the technical
and business requirements for customer-centric Self-X network. We point out the issues that may arise with respect
to their optimization and management challenges. With this in mind we also describe the technical challenges and
give some ideas of possible directions. Finally, unlike today, new technologies must be able to utilize service
information and thus, optimize both the network and the service quality per customer.

.indd

2015/03/11

10:59:29

Biography
Haris Gaanin received his Dipl.-Ing. degree in Electrical engineering from the Faculty of Electrical
Engineering, University of Sarajevo in 2000. He received his M.E.E. and Ph.D.E.E. from Graduate School of
Electrical Engineering, Tohoku University, Japan, in 2005 and 2008, respectively. Since April 2008 until May 2010
he has been working first as Japan Society for Promotion of Science (JSPS) postdoctoral research fellow and then
as an Assistant Professor at Graduate School of Engineering, Tohoku University. He is currently working as
Research Director in Alcatel-Lucent Bell, Antwerp, Belgium. His professional interest is to develop, lead and
motivate the activities of real and virtual multinational research and development teams with strong emphasis on
product/solution development through applied research projects. Advanced signal processing and algorithms with
focus on mobile/wireless and wireline physical (L1) and media access (L2) layer technologies and network
architectures. He has more than 120 scientific publications (journals, conferences and patent applications). He is
senior member of the Institute of Electrical and Electronics Engineers (IEEE) and senior member of the Institute of
Electronics, Information and Communication Engineering (IEICE), where he is a chair of Europe Section. He is an
Associate Editor of IEICE Transactions on Communications and acted as a chair, review and technical program
committee member of various technical journals and conferences. He is a recipient of the 2013 Alcatel-Lucent
Award of Excellence, the 2010 KDDI Foundation Research Grant Award, the 2008 Japan Society for Promotion of
Science (JSPS) Postdoctoral Fellowships for Foreign Researchers, the 2005 Active Research Award in Radio
Communications, 2005 Vehicular Technology Conference (VTC 2005-Fall) Student Paper Award from IEEE VTS
Japan Chapter and the 2004 Institute of IEICE Society Young Researcher Award. He was awarded by Japanese
Government (MEXT) Research Scholarship in 2002.

.indd

2015/03/11

10:59:30

Keynote Speech 5

Functional Antennas Composed of


Unbalanced Fed Ultra Low Profile
Inverted L Antenna
Mitsuo Taguchi, Nagasaki University
Abstract
The authors have proposed the unbalanced fed ultra low profile inverted L (ULPIL) antenna on the
rectangular conducting plane.

This antenna is excited at the middle of the horizontal element.

When the size of

conducting plane is 0.245 (: wavelength) by 0.49 and the antenna height is /30, and the length of horizontal
element is around /4, the input impedance of this antenna is matched to 50 and its directivity becomes more than
4 dBi.

In this antenna, the inverted L element and the conducting plane are strongly coupled and the

electromagnetic field concentrates near the inverted L element and the ground plane.

By adjusting the antenna

structure and adding the parasitic elements, the dual band antenna, the wideband antenna for TV reception, and the
high gain planar antenna have been proposed. The single band MIMO and dual band MIMO antennas composed
of two ULPIL antennas have been proposed. The circular polarized antenna composed of ULPIL antenna and Lshaped slot, and the antennas for the wireless power transmission (WPT) system have also been proposed. When
the distance between transmitting and receiving antennas in WPT system is 10 mm, the power transfer efficiency of
99.2 % is obtained at the design frequency of 1 GHz. In this talk, these antennas will be introduced and their
design concepts will be presented.

Biography
He received his B. E. and M. E. degrees from Saga University, Japan in 1975 and 1977, respectively, and
a Dr. Eng. Degree from Kyushu University Japan in 1986.
University.

Since 1977, he had been a Research Associate in Saga

Since 1987, he had been an Associate Professor in Nagasaki University.

In 1996 he was a visiting

researcher at the Department of Electrical Engineering at the University of California, Los Angeles. Since 2007,
he has been a Professor in Nagasaki University. His research interests are low profile antennas for mobile
communication and the education by using the electromagnetic simulator. He was a Chair of Technical group of
Microwave Simulator in IEICE from 2006 to 2007, IEEE AP-S Fukuoka Chapter Chair from 2007 to 2008, and
IEICE Kyushu Section Chair in 2013. He wrote the following books; Portable TV Antenna, in Antenna
Engineering Handbook Fourth Edition, Chapter 30, edited by J. Volakis, McGraw Hill, 2007, Modern Antenna
Engineering, Sogo-Denshi Publishing, 2004 (in Japanese), and so on.

10

.indd

10

2015/03/11

10:59:31

Tutorial 1

Advances in Mobile Communication Networks


Haris Gaanin, Alcatel-Lucent Bell

Abstract
Today, it is estimated that there are over 5 billion broadband devices connected to different access (indoor
and outdoor) networks with more than one billion mobile broadband users. As more devices, applications, content
and services are connected to communications networks, the resulting complexity is driving up costs for service
providers and putting the customer experience at risk. This is especially the case when having in mind that different
wireless and wireline access technologies are coexisting around the customers. To succeed in this rapidly changing
market, operators need the ability to deliver high-value services that differentiate and enhance the customer
experience over their access technologies such as: fiber or digital subscriber line (DSL), powerline communication
(PLC), Wi-Fi and Mobile (Macro, Small cells) all targeting the speeds of up to and exceeding 1 Gbps. This tutorial
provides an overview of current communication technologies available for broadband access networks. The focus
is on the mobile technology (and network) evolution with respect to both user and operator challenges and
management (optimization) requirements. Finally, the talk presents a few advanced examples of optimization
solutions suitable for all-IP networks.

11

.indd

11

2015/03/11

10:59:32

Tutorial 2

OpenPAT: Analysing Programs the Easy Way


Simon Spacey, Waikato University

Abstract
OpenPAT.org is the home of the Open Program Analysis Toolkit project that originated in Cambridge
and Imperial UK. OpenPAT differs from program analysis toolkits such as SUIF (Stanford), GILK (Imperial),
Valgrind (Cambridge) and Pin (Intel) in that it instruments code statically and gathers dynamic timing, control and
data flow information as the program runs. In this presentation we will review the OpenPAT approach and examine
its benefits in comparison with the alternatives and then we will create a new tool for OpenPAT with just a few lines
of code that can be used to analyse the internal workings of programs written in any compilable language.

Biography
Dr Simon Spacey graduated top of the class in Computer Science at Cambridge University and completed
his Ph.D. in Computer Science at Imperial College London early winning the Systems Prize. Simon is currently a
Senior Lecturer at Waikato University in New Zealand where he lectures Computer Systems, Computer
Architectures, Software Engineering and Computational Optimization. Simon's main research area is in
Performance Optimisation which involves using tools like OpenPAT to analyse software to identify new system
architectures that deliver computational and power advantages.

12

.indd

12

2015/03/11

10:59:33

13

.indd

13

2015/03/11

10:59:34

.indd

14

2015/03/11

10:59:34

.indd

15

2015/03/11

10:59:34

Mutual Coupling Reduction between Microstrip Antenna


Array Elements Using CSRRs-Based DGS Bandstop Filter
Hany A. Atallah1, Adel B. Abdel-Rahman1, Kuniaki Yoshitomi2, and Ramesh K. Pokharel2
1
2

Egypt-Japan University of Science and Technology, Egypt, hany.a.atallah@ieee.org, and adel.bedair@ejust.edu.eg


Kyushu University, Nishi-ku, Fukuoka, Japan, yoshitomi@ejust.kyushu-u.ac.jp, and pokharel@ejust.kyushu-u.ac.jp
between closely spaced antenna elements and suppress surface
waves, several studies are conducted including DGS [3-7]. This
idea has been extended to specific application like reducing
scan blindness in microstrip arrays. Many shapes and
configurations of DGS have been studied such as rectangular
slot, circle, spiral, dumbbells [6], and V- and U-slots [7]. Each
DGS shape can be represented as an equivalent circuit model
consisting of inductance and capacitance, which leads to a
certain frequency band gap determined by the shape,
dimension and position of the defect. DGS gives an extra
degree of freedom in microwave circuit design and can be used
for a wide range of applications. Meanwhile, for antenna
application, DGS is mainly applied to the feeding technique to
minimize mutual coupling between arrays. In this paper, we
designed and realized a CSRRs-based DGS in antenna arrays
and paid attention to reduce the mutual coupling effect by
using it as a bandstop filter (BSF) [8] between the antenna
array elements. The microstrip antenna array operates in the Xband at 9 GHz is used in this study. Moreover, other radiation
properties of the antenna array are also observed and discussed.
Simulations results based on a 3D full-wave EM simulator are
presented. The simulation results showed a significant
reduction in the mutual coupling of 35 dB is achieved.

Abstract In this paper, complementary split ring resonators


(CSRR) defected ground structures (DGS) are introduced to
suppress surface waves (SW) between elements of a microstrip
antenna arrays that operates in the same frequency band and
reduce the mutual coupling. The CSRRs-DGS are easily etched on
the ground plane between the array elements. The CSRRs-based
DGS act as a bandstop filter (BSF) between the array elements and
operate in the same frequency band. Significant reduction of the
EM mutual coupling is achieved between array elements with
reduced edge-to-edge spacing of 7.5 mm (0.22 o). More than 34 dB
isolation between array elements is obtained using array of three
element CSRRs-DGS. It seems to be good in cognitive radio system
to reduce the mutual coupling between the sensing antenna and
communication antenna in Cognitive Radio MIMO applications.
Keywords- CSRRs (complementary split ring resonators);
mutual coupling; SW (surface Waves); DGS; BSF (bandstop filter).

I.

INTRODUCTION

Recently, pioneer research of the complementary split ring


resonator (CSRR) has been proposed in many researches [1]
and can be derived from the SRR structure in a straightforward
way by using the concepts of duality and complementariness.
This CSRR structure provides negative- effective permittivity.
Because of their small size CSRRs are called sub-lambda
structures. Due to this fact, a super-compact stop band structure
can be implemented using CSRRs. The CSRRs are etched on
the ground plane or the conductor line of planar transmission
structure, such as microstrip or CPW, and provide a negative
effective permittivity to the dielectric media [2]. The
electromagnetic (EM) behaviors of the CSRRs are similar to
those of the electromagnetic bandgap (EBG) structures [3].
However, it is difficult to design the dimension and finding the
equivalent circuits of EBG. Although EBG structures, DGS
and CSRR can provide the similarly stop band characteristics,
it may be worth pointing out the attenuation produced by
CSRR is better than EBG structures and DGS. In recent years,
isolation enhancement in array antenna applications poses a
strong challenge in the antenna community [4]. The mutual
coupling or isolation between closely placed antenna elements
is important in a number of applications. These include systems
depend on array antennas and more recently multiple-input
multiple-output (MIMO) wireless communication systems [5].
Surface waves cause many disadvantages for microstrip
antenna such as mutual coupling effect between elements on
antenna array, which exist whenever the substrate has a
dielectric permittivity greater than one (r > 1). In an antenna
array, the mutual coupling effect will deteriorate the radiation
properties of the array. To achieve low mutual coupling

II.

PROPOSED MICROSTRIP PATCH ANTENNA ARRAY WITH


CSRRS-BASED DGS

The proposed geometry of the antenna array with CSRRsbased DGS is shown in Fig. 1 through simulations using a
commercial full-wave analysis software package. The
rectangular patch has dimensions W = 11 mm, and L = 9 mm,
whereas the feeding microstrip has length Ltl = 17.5 mm and
width Wtl = 3.4 mm which ensures a 50 characteristic
impedance. The inset length in essence provides the necessary
impedance matching. The substrate used for this array was
Rogers Ro/3003 with the thickness of t = 1.524 mm and a
dielectric constant of r = 3. The spacing between the elements
is chosen to be 7.5 mm (0.22 o). CSRR structures are designed
to operate at transmission zeros in the same band of the antenna
array. The dimensions of the CSRR structures chosen for this
frequency of operation are rin = 1 mm, rin1 = 1.2 mm c = 0.4
mm, g = 0.4 mm and d = 0.4 mm respectively. The BSF affects
significantly the array mutual coupling and isolation between
two elements; hence the proposed geometry has a small
deviation in the resonant frequency about 2.7 % (250 MHz)
due to the presence of the CSRRs-based DGS in the ground
plane. The proposed configuration produced mutual coupling
about -61 dB better than the conventional array with the same
dimensions using numerical experimentation technique.

m
C
t
t

16

.indd

16

2015/03/11

10:59:36

a
e
,
d
c
y
s
a
s
d
a
s
4
s
n
l
)
d
g
e

Directivity (dBi)

Mi
in
M cro
rrostr
ttrrriip
ip Lin
i e

Wtl

W
Ltl

(a)

20

Wtl

0
-20
-30
-40

CSRRs Based-DGS

120

(a)

rin
n

Antenna directivity with out CSRRs at 9

d
c
sd
d

7.5

8.5

9rir n

9.5

Ground
Plane
10
10.5

11

s-parameterswith-3CSRRFINALLLLtwo curves
c

(b)

GHz (E and H-plane)

r n1
ri

240

210
180

sd

Directivity H-Plane(dBi)
Directivity E-Plane(dBi)
150

270

Antenna Array with 3-CSRRs-DGS (Top View)


b

300

-10

-50

Ltl

7.5 mm

330

30

10

Directivity (dBi)

e
s
g
d
r
h
l
a
,
a
d
a
o
e
s
y
a
n
.
e
t

Patch Antenna

M cro
Mi
r str
ro
tr
tri
rip
ip Lin
iine

Frequency(GHz)

10
5
0
-5
-10
-15
-20
-25
-30

30

330

300

270

Antenna Array with 3-CSRRs-DGS (Bottom View)


Directivity H-Plane(dBi)
Directivity E-Plane(dBi)

120

240

-10
150

|S|_Parameters(dB)

-20

(b)

-50
S11(dB)
S12(dB)
S11(dB)_without_CSRRs
S12(dB)_without_CSRRs

-60

7.5

8.5

9.5

Frequency(GHz)

10

10.5

11

Simulated |S|-parameters for the array with and without 3-CSRRs

REFERENCES

Figure 1. Antenna array configuration and simulated |S|-parameters

III.

[1]

RESULTS AND DISCUSSION

[2]

Table 1 contains literatures comparison and summary of the


mutual coupling for the proposed arrays with two, three
CSRRs-Based DGS and the conventional one. It is obvious that
the proposed arrays have a significant and good isolation than
the conventional where about 35 dB reduction is achieved.
TABLE I.

GHz (E and H-plane)

Figure 2 contains the radiation results of the proposed


antenna array without and with 3-CSRRs after intensive
optimization. It is obvious that the radiation patterns in the E
and H planes are stable. In addition the radiation results show
that there is a slight decrease due to the presence of the
CSRRs-based DGS, which is acceptable compared to the
obtained significant isolation and mutual coupling reduction.

-40

(c)

Antenna directivity with CSRRs at 9.25

Figure 2. Optimized array radiation pattern results with and without CSRRs

-30

-70

210
180

[3]

PERFORMANCE COMPARISON OF THE PROPOSED ANTENNA


ARRAY WITH THE CONVENTIONAL AND LITERATURES

Antenna Structure
Conventional Antenna
Array without CSRR
Proposed Array with
2-CSRRs-based DGS
Proposed Array with
3-CSRRs-based DGS
Ref. No. [3]
Ref. No. [4]

Results

Mutual
Coupling
(dB)

Improvement
(dB)

Directivity
(dBi)

Gain
(dB)

-26

10.6

10.3

-37

11

9.4

9.3

-61

35

9.6

9.5

9.3
17

Ref. No. [5]


Ref. No. [6]
Ref. No. [7]

[4]

20
6.19
10

[5]

[6]

[7]

Using EBG
Using interdigital
capacitor loaded slots
Using Metamaterials
Using dumbell DGS
Using U-shaped

[8]

R. S. Kshetrimayum, and S. S. Karthikeyan, A parametric study on the


stop band characteristics of CSRRs, International Journal of Recent
Trends in Engineering ,vol 1, no. 3, May 2009.
F. Martin, J. Bonache, F. Falcone, M. Sorolla and R. Marques, Split
ring resonator-based left-handed coplanar waveguide, Physics Letters
vol. 83, no. 22, 2003.
Stylianos D. Assimonis, Traianos V. Yioultsis, and Christos S.
Antonopoulos, Computational investigation and design of planar EBG
structures for coupling reduction in antenna applications, IEEE
Transactions on Magnetics, vol. 48, no. 2, February 2012.
A. B. Abdel-Rahman, Coupling reduction of antenna array elements
using small interdigital capacitor loaded slots, Progress In
Electromagnetics Research C, vol. 27, pp. 15-26, 2012.
M. Bait-Suwailam, M. S. Boybay, and O. Ramahi, Electromagnetic
coupling reduction in high-prole monopole antennas using singlenegative magnetic metamaterials for MIMO applications, IEEE Trans.
on Antennas and Propagation, vol. 58, no. 9, pp. 2894-2902, Sept. 2010.
F. Y. Zulkifli, E. T. Rahardjo, and D. Hartanto, Mutual Coupling
Reduction Using Dumbbell Defected Ground Structure for Multiband
Microstrip Antenna Array, Progress In Electromagnetics Research
Letters, vol. 13, pp. 29-40, 2010.
Saeed Farsi, Hadi Aliakbarian, Dominique Schreurs, Bart Nauwelaers,
and Guy A. E. Vandenbosch, Mutual coupling reduction between
planar antennas by using a simple microstrip U-section, IEEE
Antennas and Wirless Propagation Letters, vol. 11, 2012.
J. S. Hong, and M. J. Lancaster, Microstrip Filters for RF/Microwave
Applications, 1st ed., Chap. 6, John Wiley, New York, 2001.

17

.indd

17

2015/03/11

10:59:38

A Double-Sided Printed Compact UWB Antenna


Ran Song, Haruichi Kanaya, and Hongting Jia1
Grad. School of ISEE, 1E-JUST Center, Kyushu University
the feed lines should also be symmetric. Moreover, the flowing
directions of currents in the both feed lines are opposite, thus,
we can merged two symmetric microstrip lines as one
transmission line by omitting their ground planes, so this feed
line is called quasi microstrip line. In order to match the radiation
elements to a 50 connector, we design this line in tapered
shape. However, the width of the 50 feed line is closed to the
diameter of the outer metal of the connector. If we directly
connect them, the connector will be shorted, in other words, the
input impedance viewing from the input connector will be near
Keywords- Ultra-wideband (UWB); Double-sided printed
to zero. In order to eliminate this effect, we will introduce a
antenna; Quasi microstrip feed line
smaller taper instead of the direct connection. The interval of
two radiation elements and the length of feed line are also affect
the antenna performance, by simulation results we choose
I.
INTRODUCTION
L=14.75mm, W1 =3.4mm, W2 =1.5mm, and W3 =4mm.
Ultra-wideband techniques are widely used in wireless
communication
systems,
measuring
apparatus,
and
nondestructive inspection, remote sensing, and so on, since their
low cost, easy fabrication, low spectral power density, high
resolution, and high data transmission rates. In these systems,
the antenna plays a very important role in improving their
performances. The UWB antennas are requested to have stable
omnidirectional radiation patterns, gain flatness, and linear
phase variation. In particular, the dimension and frequency
bandwidth of the UWB antenna directly impact on the resolution
as applying it to a nondestructive inspection system basing
electromagnetic wave, because the UWB antenna is much
closed to the near fields. Designing of an efficient and compact
Figure 1. Geometry of a double-sided printed UWB antenna.
UWB antenna with smaller electrical size and extremely wide
band is still a major challenge and attracted the interest of many
III. NUMERICAL AND EXPERIMENTAL RESULTS
researches both in industry and academia.

Abstract In this paper, a double-sided printed compact ultrawideband antenna has been studied. The frequency band
considered is from 2.35 GHz to 22.5 GHz. The fractional
bandwidth of the antenna is 162%. This antenna covers the entire
band of the UWB applications, which has approved by the Federal
Communications Commission. The antenna is fabricated on an
inexpensive FR4 substrate with 35.4mm 22mm 1.6mm. The
measurement results are almost in agreement with the simulation
solutions.

The UWB antenna is firstly simulated using a 3-D full-wave


simulator of High Frequency Structure Simulator (HFSS), and
then is fabricated on a cheaper FR4 substrate. Figure 2 shows
the comparison of measurements and simulation results. It is
found that they are almost in agreement except for some lower
frequencies. These errors are due to the rough fabrication. The
errors of the high frequency vibration come from some
insufficient calibration. In the considered frequencies of 2.3522.5 GHz, all the simulated and measured return losses are less
than -10 dB and -8 dB, respectively. The fractional bandwidth
of the proposed antenna achieves to 162%. By the results, the
calculated return losses are all less than -14 dB in the
conventional UWB of 3.1 GHz to 10.6 GHz.

Many planar antennas have been proposed in UWB


applications such as monopole antenna [1-3], printed slot
antenna [4], and double-sided printed antenna [5], and so on,
since very low cost and easy fabrication. In this paper, a doublesided printed compact UWB antenna has been proposed. In this
design, we use two symmetric octagonal patches to be as
radiation elements, which are printed in different sides. A
tapered quasi microstrip line connects these radiation elements
and the input connector, in order to match the antenna in wide
band. This feed line is incorporated by two symmetric microstrip
lines, so that an extremely wide band performance is realized.
The symmetric elements can spare the large ground comparing
with monopole type, resulting in smaller dimension.
II.

Figure 3 shows the radiation patterns of the designed


antenna at 3 GHz, 10 GHz, and 20 GHz. At the low frequencies,
the antenna radiates the electromagnetic wave in the dominant
dipole-mode, the H-plane patterns reveals omnidirectional
characteristic from the lowest frequency up to 11 GHz, which
covers the whole conventional UWB frequencies. The effect of
high-order modes gradually become strong as the frequency
goes up, so that the radiation patterns will be different to the
characteristic of a dipole antenna.

ANTENNA STRUCTURE AND DIMENSION

Figure 1 shows the geometry of a double-sided printed UWB


antenna. The designed antenna is fabricated on an inexpensive
FR4 substrate whose thickness and relative permittivity are
1.6mm and 4.6, respectively. The radiation elements are printed
on double sides of the FR4 substrate with octagonal shape, since
the octagonal shape has a very wide radiation nature, where
a=6.63mm, r=8mm. Since the radiation elements are symmetric,

18

.indd

18

2015/03/11

10:59:39

g
,
e
d
n
d
e
y
e
r
a
f
t
e

Figure 4.

Figure 2. Simulation and measurement of S11 return loss of the antenna

Figure 4 shows the peak gain of the designed antenna using


HFSS simulator. The peak gain is flat from 2.35 GHz to 13 GHz.
The results reveal that the designed antenna has a good
omnidirectional characteristic at the low frequencies again.
Figure 5 shows the comparison of performances of the different
UWB antennas. It is found that the proposed antenna has smaller
electrical size.
Literature

Pass Band

Dimensions

Year

[1]

2.5~18GHz

2013

[2]

3.5~12GHz

70mm 60mm 1.27mm

[4]

3~11.2GHz

2011

[5]

3.1~10.6GHz

22mm 24mm 1.6mm

This paper

2.35~22.5GHz

35.4mm 22mm 1.6mm

2014

Figure 5.

28mm 24mm 1.6mm

2012

36mm 36mm 1.27mm

2004

Comparison between presented and recently reported UWB

IV.

SUMMARY

In this paper, we have proposed and investigated a doublesided printed compact ultra-wideband antenna. The simulated
and measured results show that the antenna has a good
omnidirectional characteristic at the low frequencies, which
covers the entire conventional UWB frequencies. The fractional
bandwidth of the proposed antenna has achieved to 162%. The
dimension is only 35.4mm 22mm 1.6mm.

e
d
s
s
r
e
e
s
h
e
e

,
t
l
h
f
y

Simulated peak antenna gain

V.

[1]
[2]

[3]
[4]
[5]
Figure 3. Radiation patterns at 3GHz,10GHz and 20GHz for E plane and H
plane

REFFRENCES

Cengizhan and M. Dikmen, An Octagonal Shaped Ultra Wide Band


Antenna With Reduced RCS, 2013 Second Interna tional Japan-Egypt
Conference on Electronics, Communications and Computers (JEC-ECC)
G. shrikanth Reddy, S. K. Mishra, S. Kharche, High Gain and low Crosspolar Compact Printed Elliptical Monopole UWB Antenna Loaded with
Partial Ground and Parasitic Patches, Progress In Electromagnetics
Research , Vol. 139, 265275, 2013
B. Gong, J. Li, Q. Zheng, Y. Yin, and X. Ren, A Compact Inductively
Loaded Monopole Antenna for Future UWB Applications, Progress In
Electromagnetics Research B, Vol. 43, 151167, 2012
Rezaul Azim, Mohammad Tariqul Islam, Compact Tapered-Shape Slot
Antenna for UWB Applications, IEEE ANTENNAS AND WIRELESS
PROPAGATION LETTERS, VOL. 10, 2011
Katsuki Kiminami, Akimasa Hirata, Double-Sided Printed Bow-Tie
Antenna for UWB Communications, IEEE ANTENNAS AND
WIRELESS PROPAGATION LETTERS, VOL. 3,

19

.indd

19

2015/03/11

10:59:40

A Low Power, High Efficency CMOS Power


Amplifier For IEEE 802.15.6 Applications
Ahmed Gadallah1, A. Allam1, H. Jia2, A. B. Abdel-Rahman1 and Ramesh K. Pokharel2
1

Egypt-Japan University of Science and Technology, Alexandria 21934, Egypt


2
E-JUST Center, Kyushu University, Nishi-ku 819-0395, Fukuoka, Japan.
E-mail:ahmed.gadallah@ejust.edu.eg
procedure helps in determining the appropriate input and
output match necessary to optimize gain and PAE.

AbstractA low power, 2.4 GHz, class AB power amplifier (PA)


for IEEE 802.15.6 applications was designed using load-pull
techniques in TSMC 0.18 m technology. Post-layout simulation
results of this PA showed a power gain of 15 dB, an input return
loss S11 of -13.5 dB and an output return loss S22 of -9.5 dB
at 2.4 GHz. The PA design achieved power added efficiency
(PAE) of 47.8% while delivering 7.5 dBm of power at an input
P1 dB compression of -7 dBm. The PA consumed 5 mW from
a 1.8V supply voltage. To the authors knowledge, that is the
highest achieved PAE for this class of PAs, while consuming
low DC power.

This paper is organized as follows: power amplifier design


using load-pull/source-pull techniques is discussed in
section II. Simulation results and comparisons are presented in
section III.
II.

CIRCUIT DESCRIPTION
VDD

Keywords-Power Amplifier (PA); Load-pull; Wireless Body


Area Network(WBAN); Low Power;

I.

C3

L2

L3

INTRODUCTION

RFOUT

In 2012, the IEEE 802.15.6 standard for wireless body area


networks (WBAN) was released by IEEE LAN/MAN
standards committee [1]. The IEEE 802.15.6 addresses short
range wireless communications around or inside human body
(but not limited to humans). Devices using this standard
operate at very low transmit power to ensure safety, minimize
the specific absorption rate (SAR) into the body, and increase
the battery life. The narrowband (NB) physical layer (PHY)
specifications of that standard stated that a compliant device
(hub/node) shall be able to support (transmission/reception) at
least one of the following frequency bands: 402 MHz to 405
MHz, 420 MHz to 450 MHz, 863 MHz to 870 MHz, 902 MHz
to928 MHz, 950 MHz to 958 MHz, 2360 MHz to 2400 MHz,
and 2400 MHz to 2483.5 MHz[1] [2]. In real applications, only
the bands around 400 MHz and 2.4 GHz are used, as signals of
the 400 MHz band have a minimum through-body transmission
loss. At, 2.4 GHz, a large available bandwidth exists, and small
antennas are needed for WBAN applications [2].

M2

Cout
C4

RFIN
L1

Cin
C1

M3

Rbias2
M1
C2

VBias2

Rbias1

VBias1

Figure 1. schmatic of the proposed PA

The designed two stages PA is shown in figure 1. The first


stage is the cascode driver gain stage. The second stage is
biased in class AB to improve efficiency. The second stage is
the power stage which consists of a common source transistor
loaded by an inductor. The input and output impedances are
optimized using source-pull/load-pull to get optimum output
power and high PAE. The design procedures start by
determining the width w, and the optimum source/load for the
power stage. Then, for the first stage, the input is matched to 50
ohms using the network formed by capacitors C1, C2 and
inductor L1. The values of the inter stage matching elements
capacitor C3 and inductor L2 are determined using source pull.

Low power, high efficiency and full integration are the key
design specifications for designing WBAN CMOS PAs. High
efficiency design is of particular importance for medical
implants. That point represents the main drive to seek the best
techniques to improve the power hungry PA efficiency. Many
IEEE.802.15.6 PAs have been recently presented relying on
traditional design techniques [2] and [3]. In this work we
propose a PA design using load pull in order to optimize the
PAE and output power, while assuring low power operation.
Load pull have slowly been adopted by VLSI PA designers [4].
Compared to traditional PA design methods, Load pull design

s
a
o
a

Using Agilent Advanced Design System (ADS) Figure 2


shows the optimum load-pull/source-pull impedance points of
the second stage, and S22 of the driver stage. As shown on
Figure 2, S22 of the first stage is designed to fall on the locus of
the optimum source impedances of the power stage. Figure 3
shows the PAE contours of the power stage. As can be seen
from Figure 3, one capacitive element is needed to get the
desired PAE.

20

.indd

20

2015/03/11

10:59:41

2
f
n
f
3
n
e

40

20

-10
10

-15

PAE (%)

Output Power (dBm)

30

-5

-20
-25

Pout Pre-Layout
Pout Post-layout

-30
-40

-35

-30

-25

PAE Pre-Layout
PAE Post-Layout
-20

-15
-10
Pin (dBm)

-5

-10
-20
10

Figure 5. Output Power and PAE simulation results

The input 1dB compression point of the designed PA


is -7 dBm, while the corresponding output power is 7.5 dBm.
The power output and PAE versus input power are shown in
figure 5; PAE is 47.8 % at the input 1dB compression point.
The proposed PA consumed 5 mW from 1.8V power supply.

Figure 2. Optimized load, source impedances and driver stage S22

Table I shows a summary of the proposed PA performance


in a comparison with published CMOS PAs operating in the
same band.
TABLE I.

PROPSED PA PERFORMANCE SUMMARY IN COMPARISON WITH


RECENTLY PUBLISED PAS

Performance summary
Ref.

PAE=47.8 %

PAE=41.8 %

Figure 3. Load-Pull constant PAE contours in 2% step at 2.4 GHz

III.

1.8

15

47.8

-7

1.4

19.5

15.4

28.5

-13

[6]

0.18

1.8

11

19.8

18

-5

[7]

0.13

1.2

18.09

22

25.34

-9.08

[5]

0
-5
S11 Pre-Layout
S11 Post-Layout
S22 Pre-Layout
S22 Post-Layout
S21 Pre-Layout
S21 Post-Layout
2.4
Freq (GHz)

2.6

2.8

[6]

3
x 10

Input
P1dB
[dBm]

2.4

[4]

2.2

PAE
[%]

2.45
2.42.48
2.42.483

[2]

10

Power
[mW]

0.18

15

-25
1.8

Gain
[dB]

0.18

[3]

-20

Supply
[V]

This*
work
[5]

[1]

20

-15

Freq.
[GHz]

REFERENCES

SIMULATION RESULTS AND COMAPRISON

-10

Tech.
[um]

*. Post-layout Simulation results

The proposed PA is simulated using Cadence spectre RF


simulator in TSMC CMOS 0.18 m technology. This PA
achieved power gain (S21) of 15 dB, input return loss (S11) and
output return loss (S22) of -13.5 dB and 9.5 dB respectively
as shown in figure 4.

S-Parameters (dB)

t
s
s
r
e
t
y
e
0
d
s

50

5
0

n
n
n

10

[7]

Figure 4. S-Parameters simulation results

IEEE Standard for Local and metropolitan area networks - Part 15.6:
Wireless Body Area Networks. pp. 1271, 2012.
L. Zhang, H. Jiang, J. Wei, J. Dong, F. Li, W. Li, J. Gao, J. Cui, B. Chi,
C. Zhang, and Z. Wang, A Reconfigurable Sliding-IF Transceiver for
400 MHz/2.4 GHz IEEE 802.15.6/ZigBee WBAN Hubs With Only 21%
Tuning Range VCO, IEEE J. Solid-State Circuits, vol. 48, no. 11, pp.
27052716, Nov. 2013.
H.-C. Chen, M.-Y. Yen, Q.-X. Wu, K.-J. Chang, and L.-M. Wang,
Batteryless Transceiver Prototype for Medical Implant in 0.18-m
CMOS Technology, IEEE Trans. Microw. Theory Tech., vol. 62, no. 1,
pp. 137147, Jan. 2014.
F. M. Ghannouchi and M. S. Hashmi, Load-Pull Techniques with
Applications to Power Amplifier Design, 2012.
S. M. Abdelsayed, M. J. Deen, and N. K. Nikolova, A Fully Integrated
Low-Power CMOS Power Amplifier for Biomedical Applications, in
The European Conference on Wireless Technology, 2005., 2005, pp.
277280.
K. Haridas and T. H. Teo, A 2.4-GHz CMOS Power Amplifier Design
for Low Power Wireless Sensors Network, in 2009 IEEE International
Symposium on Radio-Frequency Integration Technology (RFIT), 2009,
pp. 299302.
Anran Shao, Zhiqun Li, and Chuanchuan Wan, 0.13m CMOS Power
Amplifier For Wireless Sensor Network applications, in The 19th
Annual Wireless and Optical Communications Conference (WOCC
2010), pp. 14.

21

.indd

21

2015/03/11

10:59:42

Design of Class C Dynamic Biasing Low Phase Noise


Low Power 1.9GHz FBAR CMOS Oscillator

a
l
e
w
f
d
[
b
o
r

S.A Enche Ab Rahim 1, Guoqiang Zhang1 and Ramesh K. Pokharel 1


1

Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan

Abstract In this paper, a CMOS cross-coupled film bulk


acoustic resonator (FBAR) oscillator was designed in class-C
topology in order to reduce the phase noise and power
consumption. The dynamic biasing technique was used to ensure
the oscillation start-up. The designed oscillator exhibits a phase
noise of -160 dBc/Hz at 1MHz from carrier frequency with a total
power consumption of 1.13mW and a figure-of-merit (FoM) of
225 dBc/Hz.

o
r
M
t
c
c
t
o
t
A

Keywords- CMOS cross-coupled oscillator, FBAR oscillator,


Class-C oscillator.

I.

INTRODUCTION

The design of CMOS voltage-controlled oscillator (VCO)


remains challenging in RF blocks especially for ultra-low noise
applications as they are prone to the noise. It is known that the
phase noise performance of an oscillator depends on the power
consumption and on the oscillation amplitude, where the higher
these two key factors are, the better the phase noise is.
However, the reduction of power supply due to the
downscaling degrades the phase noise performance. It is
because, the maximum oscillation swing is limited by the
power supply. The demand of low power consumption further
increases the difficulties in maintaining the phase noise
performance of the oscillator. Therefore, many studies have
been done in order to find the methods to break the trade-off
between the phase noise and the power consumption.

Figure 1. Schematic of proposed FBAR oscillator.

chosen in order to benefit larger negative resistance, therefore


retrench the power consumption of the circuit. The same effect
is already observed in the case of LC oscillators [3].
The M1, M2, M3 and M4 are the NMOS and PMOS crosscoupled pair transistors. In traditional cross-couple topology
(or known as class-B), the gate of the NMOS and PMOS crosscouple pair is biased to the VDD. However, in class-C, the
gates are biased to Vbias1 and Vbias2 by means of RC network. It
is necessary to bias the gate at a voltage much lower than VDD
in order to avoid the cross-coupled pairs from entering the
triode operating region in steady-state. Hence the circuit can
get the benefits of operating in class-C, resulting in more
efficient oscillation currents generations [1, 2, 7]. Therefore,
we will have higher oscillation amplitude for a given current
consumption and thus improves the phase noise. On the other
hand, FBAR oscillator suffers a parasitic oscillation at low
frequency, which is not the case for LC oscillator. It is because,
high loop gain does not only occur at the oscillation frequency
but also at DC [4]. Therefore, the negative resistance seen at
DC has to be degenerated and it can be done by having
coupling capacitors (C1 and C2) at the source of the crosscoupled pairs [5]. The transistors M5 and M6 are the current
source transistors which operates separately at DC but they are
coupled at the oscillation frequency. These transistors also
provide a common-mode feedback to the circuit.

Class-C topology has attracted many researchers in these


recent years and has been seen as suitable candidate for a low
power, low phase noise oscillator design. Several research
work about class-C have been published, however until now,
the implementation were only done for LC-tank oscillator. The
studies show that for an LC-tank cross-coupled oscillator
working in class-C, a power saving up to 36% is obtained for
the same phase noise level [1].
In the design, a class-C type Film-Bulk Acoustic Resonator
(FBAR) based CMOS oscillator is designed for the first time.
However, the low Vbias1 may jeopardize the oscillation start-up,
as a higher Vbias1 is required at start-up than at its steady state
[2]. To break this tread-off, a dynamic biasing circuit is
introduced, ensuring a robust start-up and then forcing the
class-C operation. This results in the improvement of a Figure
of Merit (FOM) by 10 dB.
II.

The C3, C4, C5 and C6 are the tail capacitors that enforce
furthermore the circuit to operate in class-C. The parasitic
capacitors from the tail transistors might already push the
cross-couple pairs to work partially in class-C, however large

THE DESIGN OF CLASS-C FBAR OSCILLATOR

Fig. 1 shows the circuit schematic of the FBAR oscillator. It


is a complementary cross-coupled topology. This topology was

22

.indd

22

2015/03/11

10:59:43

added capacitance will guarantee the generation of impulselike current waveforms, and thus increases the current
efficiency of the transistors. Furthermore, these large capacitors
will filter out the noise from the tail transistors at higher
frequency thus the phase noise will be improved further. To
dynamically bias the pair transistor, an operational amplifier
[6] is used. It provides a negative feedback. It adjusts the Vbias1
by sensing the variation of the common-mode voltage at source
of the NMOS cross-couple pair and keeping it equal to a
reference voltage.
III.

TABLE I.

Process
Topology
Vdd (V)
f0 (MHz)
Power
(mW)
L (1MHz)
dBc/Hz
FOM

RESULTS AND DISCCUSIONS

The implementation of class-C cross-coupled FBAR


oscillator was done in CMOS18 technology. Simulations were
run in Spectre RF using Transient, PSS and PNOISE analysis.
Modified Butterworth Van Dyke model [7] is used to represent
the FBAR. Fig. 2 shows the drain-source current of the NMOS
cross-coupled pair with and without the presence of tail
capacitors. It shows that with the presence of tail capacitors,
taller and narrower current-form is obtained. The phase noise
of the class-C oscillator is given in Fig. 3. The phase noise of
the oscillator at 1MHz offset from the carrier is -160 dBc/Hz.
A performance summary and result comparison are given in
---- with Ctail

e
c
e
e

This
work
CMOS18
Crosscoupled
1.5
1964
1.13

[4]

[7]

[8]

CMOS18
Crosscoupled
1.1
1962
1.7

CMOS18
Pierce

CMOS13
Colpitts

1.8
1500
3.78

0.6
2000
0.126

-160.0

-151.5

-142.0

-149.0

225

215

224

Table 1. The FBAR oscillator in this work has better


performance in term of phase noise, which result a FOM of 225.
IV.

CONCLUSION

This paper presents a CMOS cross-coupled FBAR


oscillator adopting the class-C topology in order to enhance the
phase noise performance of the oscillator. It also employs the
dynamic biasing circuit in order to ensure a robust oscillation
start-up and at the same improving the phase noise
performance. Simulation results shows that the oscillator
oscillates at 1.964GHz with a phase noise of -160 dBc/Hz at
1MHz offset and a power consumption of 1.13mW, resulting a
figure of merit of 225 which is 10 dB improvement over the
Class B FBAR based CMOS oscillator [4].

without Ctail

e
t

y
e
t
D
e
n
e
,
t
r
w
,
y
t
g
t
e
o

COMPARISON WITH OTHER PUBLISHED OSCILLATOR

ACKNOWLEDGMENT
This work was supported by a Grant-in-Aid for Scientific
Research (B) (KAKENHI-B). This work was also partly
supported by VLSI Design and Education Center (VDEC), the
University of Tokyo in collaboration with CADENCE
Corporation and Agilent Corporation.
REFERENCES

Figure 2. Drain-source current of the NMOS pair for with or without tail
capacitors.

[1]
[2]
[3]
[4]
[5]

[6]
[7]
[8]

Figure 3. Phase noise performance of proposed class-C FBAR


oscillator

A. Mazzanti and P. Andreani, Class-C harmonic CMOS VCOs, with a


general result on phase noise, IEEE J. Solid-State Circuits, vol. 43, no
12, pp 2716-2728, December 2008.
L. Fanori and P. Andreani, Dynamic Bias Schemes for Class-C VCOs
IEEE The proceedings of NORCHIP, November 2011.
E. Vittoz Low-power Crystal and MEMS oscillators. The Experience of
Watch Developments, Springer 2010.
Guoqiang Zhang et. al, A 1.9 GHz low phase noise complementary
cross-coupled FBAR VCO in 0.18um CMOS technology, 44th
European Microwave Conference (EuMC), October 2014.
R. Thirunarayanan, A. Heragu, D. Ruffieux and C. Enz,
Complementary BAW oscillator for ultra-low power consumption and
low phase noise, 9th IEEE New Circuits and System Conferences
(NEWCAS), June 2011.
L. Fanori and P. Andreani, Highly efficient class-C CMOS VCOs,
including a comparison with class-B VCOs, IEEE J. Solid-State
Circuits, vol. 48, no 7, pp.1730-1740, July 2013.
J. Hu, R. Parker, R. Ruby and B. Otis, A Wide-Tuning Digitally
Controlled FBAR-Based Oscillator for Frequency Synthesis, IEEE
International Frequency Control Symposium, June 2010.
J. Shi and B. P. Otis, A Sub-100 W 2GHz Differential Colpitts
CMOS/FBAR VCO, IEEE Custom Integrated Circuit Conference ,
September 2011.

23

.indd

23

2015/03/11

10:59:43

.indd

24

2015/03/11

10:59:43

.indd

25

2015/03/11

10:59:44

Enhancement of eCommerce System for


UnReached Community by Creating a Village
Specific Online Catalog by Using GramWeb

1
c
c
o
e
c
a
t
g
2
e
t
t
p
3
i
t
w
p
s

Kazi Mozaher Hossein1, Ashir Ahmed1, Abdullah Al Emran2 and Akira Fukuda1
1

Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan, kmhjewel@f.ait.kyushu-u.ac.jp


2
Global Communication Center, Grameen Communications, Dhaka, Bangladesh emran@mail.grameen.com
population. Despite their low income and limited purchase
capacity, they make frequent purchases within their limited
spending power. The purchasing habits of BOP tell us that
they actually pay more for certain items than wealthier
customers. This BoP penalty is the consequence of local
monopolies, inadequate access, poor distribution and strong
traditional intermediaries.
In our previous work, we proposed a model that utilizes
the MFI resources to act as an intermediary between the
supplier and the consumer. In Bangladesh, around 20 million
people representing BOP have access to micro-finance
institutions (MFI) as in Fig.1. A VIE (Village Information
Entrepreneur) takes care of placing orders on behalf of the
clients in the community, makes payment and provides group
delivery. In order to make the process more efficient for a
VIE, we propose a village specific online catalog which will
be automatically created by considering the requirements and
demands of the villagers. The village demand will be collected
from their other online/offline activities. We introduce the
model and discuss the advantages. We plan to test this model
in an unreached community in Bangladesh.

Abstract e-Commerce became popular among the affluent


people in the world. However, a big portion of the population
(>70%) of the world cannot enjoy the advantages of e-Commerce
service because they dont have (1) access to online catalog (2)
payment system to pay for online purchase (3) home delivery
infrastructure in their community. In our previous work, we
proposed a model where MFI (Micro Finance Institute) resources
can play a role. An MFI manager takes care of placing group
orders, advanced payment from MFI account and provides
group delivery at the MFI service point. However, this model was
limited only to MFI members. We redesigned the model where a
VIE (Village Information Entrepreneur) can handle the tasks for
wider segments of customers. The online catalog can be more
efficient if we can study the villagers and build a catalog
considering their requirements. The village demand will be
collected from their other online/offline activities. We introduce
the model and discuss the advantages. We plan to test this model
in an unreached community in Bangladesh.
KeywordsLow-income
Information Platform

people,

e-Commerce,

Village

I. INTRODUCTION

I
u

v
(
d
E
H
s
h

c
W
c
f
f
g
c
v
p
v
h
d

II. E-COMMERCE FOR UNREACHED COMMUNITY BY USING


GRAMWEB AND MICRO-FINANCE

E-commerce enables people to purchase products from a


remote place at any time of the day and get the desired
products delivered to their doors. It saves time, money and
labor. A product seller can upload the product information on
the web and can breach the boundaries of the local market to
reach the customers on a global scale. A customer, on the
other hand, can search for the desired product in a much more
extensive selection space, and find a suitable product. In this
way, e-commerce brings benefits for both the buyers and
sellers as indicated by the trend in e-sales.
In order to purchase a product through a web-based ecommerce service, a customer needs access to the Internet and
an online payment mechanism, typically a credit card.
Presently only 26.6% of world population has access to the
Internet and 16.42% hold a credit card (this figure is assumed
from the fact that 3.3 billion cards were issued globally and on
an average each individual holds 3 credit cards ). How about
the remaining majority of the population? Do they not have
any interest in participating and enjoying the benefits of ecommerce? These unreached are the 4 billion people at the
BOP (Base of the Pyramid), comprising 69% of the world

A. Concept of using MFI resources and GramWeb

o
V
I
l
s
a

26

.indd

26

2015/03/11

10:59:45

e
d
t
r
l
g

n
e
n

p
a
l
d
d
e
l

the village. The consumer behavior analysis will help to find


out the demand of different seasonal products having a good
demand at different religious and cultural events. The catalog
will include branded products to ensure the consumers trust.
We will also consider the necessary products for which the
consumers need to transport long distance to purchase.

1. Access to online catalog: The suppliers share their product


catalog on a specified website. GramWeb is a website
containing village specific information. The local villagers
own and maintain their village sites. Therefore, it will be
effective for the suppliers to distribute village-specific
catalogs considering their needs. As not all villagers have
access to the Internet, these unreached people will not be able
to view an online product catalog. The MFI officer can fill the
gap.
2. Payment: Mobile money transfer became popular almost
everywhere in the world. Therefore, it is possible now to make
the payment by using mobile money transfer. Software needs
to be updated to relate the money sender with the purchased
product.
3. Product Delivery: Nowadays, currier services are available
in small cities. However, the last mile access (from village to
the small town) is still a challenge. SSW (social services on
wheels) project of Grameen and Kyushu University is
providing home delivery service for rural community. Our
system assumes to use this service.

Figure 2. Village specific ecommerce product catalog


IV. CONCLUSION

In this way, e-commerce services can be reached to the


unreached, the largest socio-economic group in the world.

In this work, we introduced a village specific online catalog


for enhancing eCommerce system for low income unreached
community. We propose a mechanism to gather villagers
demand from their online/offline activities, analyze them and
display them as a catalog on the website. The product
suppliers can also push their products in GramWeb. The logic
here is to estimate consumers demand from their past
activities. We listed the villagers online/offline activities and
would like to design an alorithm to build an optimum catalog
for the villagers.

B. GramWeb
GramWeb is a village information platform that collects
village specific information e.g. demographic information
(population, location, socio economic status etc.) as well as
daily activities of that community. A Village Information
Entrepreneur (VIE) owns the website for his/her village.
He/she is connected with the villagers by providing some other
social services to the villagers. The social services include
healthcare, education, purchase and learning activities.
III. DESIGN OF A VILLAGE SPECIFIC CATALOG

REFERENCES

Recent Internet-based marketers demonstrate user specific


catalog by analyzing users online activities, users profiles etc.
We are interested in investigating the same scenario for a
community or a group of people with same income level. Our
focus is to investigate the low-income villagers. Demands vary
from one village to another. Now the question is how we can
gather villagers demand for our village specific catalog. We
can predict the demand by considering the past activities. The
village specific catalog with some selective products will be
prepared after judging the consumers taste and preference in a
village. Fig. 2 shows the information sources and a
hypothetical model of the product catalog. Products will be
displayed
according
to
the
popularity
ranking.

[1]

[2]
[3]
[4]
[5]

We listed the items that villagers usually purchase from


outside. The list is then classified into different categories.
Village profile can be imported from GramWebs VIE (Village
Information Entrepreneur). Medicine and drug related purchase
log could be collected from local medicine shops. An in-depth
survey will be performed to find out the list of products which
are not available in the local market but have a good demand in

[6]
[7]

A. Ahmed, L. Kabir, and H. Yasuura, An information platform for low


literate villagers, Proc. IEEE 24th International Conference for
Advanced Information Networking and Applications (AINA 2010),
April 20-23, Perth, Australia.
A. Ahmed, M. A. Rahman, and T. Ohsugi, E-commerce for the
unreached community, Proc. IADIS International Conference ICT,
Society and Human Beings (ICT 2011), July 24-26, 2011, Rome, Italy.
T. London, R. Anupindi, and S. Sheth, Lessons learned from ventures
serving base of the pyramid producers, Journal of Business Research
63, pp. 582594, 2010.
A. Hussain, E-commerce and beyond: opportunities for developing
country SMEs, http://tradeforum.org/article, December 2013.
R. Varadarajan, Fortune at the bottom of the innovation pyramid: The
strategic logic of incremental innovations, Business Horizons 52, pp.
21-29, 2009.
M. Rivera-Santos, and C. Rufin, Global village vs. small town:
understanding networks at the base of the pyramid, International
Business Review 19, pp. 126-139, 2010.
S. Gold, R. Hahn, and S. Seuring, Sustainable supply chain
management in Base of the Pyramid food projects-A path to triple
bottom line approaches for multinationals?, International Business
Review 22, pp. 784-799, 2013.

27

.indd

27

2015/03/11

10:59:45

Using Latent Topics to Estimate Student Performance


Shaymaa E. Sorour1,2, Kazumasa Goda3 , Tsunenori Mine 4
1

Kafr Elsheik University, Faculty of Specific Education, Egypt


Graduate School of Information Science and Electrical Engineering, Kyushu University
3
Kyushu Institute of Information Science
4
Faculty of Information Science and Electrical Engineering, Kyushu University
the same subject in the two classes, there are differences
between the comments in the two classes; the difficulty of the
subject in the lesson also affects student attitudes to express
their behavior and sometimes does not give the students leeway
to write comments. Therefore this is a challenging problem.
The contributions of our work are the following:
A PLSA model is employed to analyze patterns and
relationships between the extracted words and latent
concepts contained in student comments.
The SVM model is applied to predict student grades
from three time-series items: P, C, and N.
The PLSA model is improved to reflect student
attitudes and situations more deeply in each lesson and to
achieve higher reliability for predicting student grades.
A new method is proposed to grasp the characteristics
of a range of lessons based on the PLSA model.
Experiments are conducted to validate the proposed
models by calculating the F-measure and accuracy for
each lesson and considering the predicted results with a
range of lessons.

Abstract
Examining student learning behavior is one of the crucial
educational issues. In this paper, we propose a new method to
predict student performance by using comment data mining that
highly reflect student learning attitudes and activities. Analyzing
comment data after each lesson helps to grasp student learning
attitudes and situations. This paper proposes a new model based
on a statistical latent class Topics for the task of student grade
prediction; our model convert student comments using
Probabilistic Latent Semantic Analysis (PLSA), and SVM
generates prediction models of final student grades. Choosing the
number of topics and the number of words in each topic for the
PLSA model successfully improve the prediction results. In
addition, considering the student grade predicted in a range of
lessons can deal with prediction error occurred in each lesson,
and achieves further improvement of the student grade
prediction.
Keywords- Comment Data Mining, Student Grade Prediction,
PLSA.

I.

INTRODUCTION

II.

For higher education institutions whose goal is to contribute the


quality of higher education, the success of creation of human
capital is the subject of a continuous analysis. Therefore, the
prediction of students performance is crucial for higher
education institutions, because the quality of teaching process is
the ability to meet students needs. The present study proposes
methods to predict student grades based on their comment data.
Students describe their learning attitudes, tendencies and
behaviors by writing their comments freely after each lesson.
This paper uses comment data of three time-series items: P, C,
and N from Goda et al. [2] to predict student final grades, where
item P (Previous) refers to the learning activity before the class
time. Item C (Current) shows the understanding and
achievements of class subjects during the class time, and item N
(Next) expresses the learning activity plan until the next class.
The goal of our study is to predict each student grade from
lessons 1 to 15, and to recognize the learning status and
attitudes so as to give feedback to each one. We propose a new
method that considers the predicted results with a range of
lessons based on the PLSA model. Also, we improve the PLSA
model to achieve higher accuracy results for predicting student
grades. Our methods outperform the methods of Sorour et al.
[3]. The experiments are conducted using data from two
classes (Class A and Class B) where the data for one class as
training and for the other as test data. Although students learn

I
th
th
g
th
1
b
T
h
a
le

BACKGROUND

Comments data were collected from 123 students in two classes


(Class A = 60 students) and (Class B = 63 students). They took
Godas courses that consisted of 15 lessons. Main subject from
lessons 1 to 6 is computer literacy, which give how to use some
IT tools. Computer literacy education is compulsory throughout
senior high schools in Japan, with only a few differences in the
details of course contents. From lessons 7 to 15 is introductory
C programming, students begin to learn the basics of
programming. In the 7th lesson, most students are novices at
programming; it's a new subject and not required until they
enter the university. In this research, we considered to predict
each student results from his/her comments. We chose four
grades (S, A, B, and C) instead of the mark itself as a student
result.
III.

METHODOLOGY

The procedure of the proposed method is based on five phases


as follows:
1- Comment Data Collection: This phase focuses on collecting student comments after each lesson.
2- Data Preparation: Our methods analyze P, C and N
comment data, extract words and parts of speech (verb, noun,
adjective, and adverb) with Mecab program, which is a

28

.indd

28

2015/03/11

10:59:47

t
o

d
r
a

s
k
m
e
t
e
y
f
t
y
t
r
t

OVERALL RATE OF CORRECTLY PREDICTED


RESULTS (TP)
Model

PLSA

0.756

0.783

0.773

PLSA*

0.782

0.843

0.792

TP rate

TP rate

0.9

0.9

P-Comments

Correct Prediction (%)

TABLE II.

0.85

0.8
PLSA
PLSA*

0.75
L(1-3)

L(1-6)

L(1-9)

L(1-12)

L(1-15)

Lessons

EXPERIMENT RESULTS

IV.

In our experiment, we evaluated the prediction performance of


the proposed models by 2-fold cross validation. Table I shows
the average overall accuracy results for predicting final student
grades from lessons 1 to 6 and lessons 7 to 15. Fig.1 displays
the average overall prediction F-measure results from lessons
1 to 15 based on P, C and N comments, which were analyzed
by the PLSA and the PLSA* models. We can distinguish from
Table I and Fig.1 that PLSA* for the C-Comment had the
highest prediction results among those items of comments. In
addition, the overall accuracy and F-measure results from
lessons 1 to 6 were higher than those from lessons 7 to 15.
TABLE I. OVERALL ACCURACY RESULTS.
Lesson 1-6

Model

Lesson 7-15
N

PLSA

0.583

0.643

0.513

0.501

0.563

0.92

PLSA*

0.632

0.683

0.577

0.586

0.631

0.554

TP rate

0.9

Correct Prediction (%)

d
t

3- Topics Model: In our research, we employ PLSA model


[4] to discover topics from comment data. The PLSA
provides a probabilistic formulation to model documents in a
text collection. It assumes that the words are generated from a
mixture of latent aspects (topics) which can be decomposed
from a document. The main difficulty in using the general
PLSA model is to consider the trade-off between predictive
performance on the training data and unseen new data, i.e.
the problem of overfitting to the training data. To solve this
problem we reduced the number of topics in the training data
and examined the number of words in each topic from
lessons 1 to 15. Here, the aspects of words (topics) for
training data become observable. We called this model
PLSA*. After some trials, we chose the best number of
topics and examined the number of words with high
probability per each lesson.
4- Training Phase: This phase builds prediction models of
student grades based on the results analyzed by the PLSA
and the PLSA* models by using the SVM model. In our
research, we used the MATLAB LibSVM tool with the RBF
kernel to predict student grades.
5- Test Phase: In this phase, we consider to predict final
student grades from comment data.

Next, we focus on the change of prediction of each student


grade from lessons 1 to 15 to understand each student more
deeply and to grasp the characteristics of a range of lessons.
We separated our prediction results into 5 ranges as shown in
Fig. 2. The majority vote was used to explore the final student
grade in a range of lessons. We calculated the correct
prediction (TP) value from lessons 1 to 15 with three
viewpoints: the P-, the C- and the N- comments. As shown in
Fig.2 and Table II, the PLSA* model had the highest
prediction results of student grade for the C-Comment. The
high prediction results were achieved in lessons L (1-6) and
lessons L (1-12). In addition, the lowest prediction results were
in L (1-9).

Correct Prediction (%)

s
e
s
y
.

Japanese morphological analyzer, create a word by comment


matrix with extracted words.

0.85

C-Comments

0.85

0.8
PLSA
PLSA*

0.75
L(1-3)

L(1-6)

L(1-9)

L(1-12)

L(1-15)

Lessons

N-Comments

0.8
0.75
0.7

PLSA
PLSA*

0.65
L(1-3)

L(1-6)

L(1-9)
Lessons

L(1-12)

L(1-15)

Figure 2. Overall TP results from P, C and N comments.

V.

CONCLUSION

The present study discussed student grade prediction methods


based on their free-style comments. In this paper, we employed
the PLSA model to extract topics based on the bag-of-words
representation of student comments. We chose the number of
topics and the number of words in each topic from lessons 1 to
15. This approach improved the overall prediction results
compared with the PLSA model. From the previous results,
we can conclude that the difficulty of the subject influences the
quality of the written comments; students wrote their
comments in detail while they were learning Computer
Literacy in lessons 1 to 6 more than while learning C
programming in lessons 7 to 15. Also, they described their
current activities (C-comment) more precisely than previous
and next activities (P- and N-comments).
REFERENCES
[1]

N
,
a

[2]
[3]

Figure 1. Overall F-measure results for PLSA and PLSA*.

K. Goda and T. Mine Correlation of grade prediction performance and


validity of-self-evaluation comments. Proc. of the 14th annual ACM
SIGITE conference on Information technology education, Florida, USA.
pp. 35-42, 2013.
S.Sorour, T. Mine and K. Goda Comment data mining for student grade
prediction considering differences in data for two classes, International
Journal of Computer & Information Science 15(2), 12-25, 2014.
T. Hofmann, Unsupervised learning by probabilistic latent semantic
analysis. Machine Learning, 42, pp.177-196, 2001.

29

.indd

29

2015/03/11

10:59:48

Production and Marketing of Quality Vegetables by


Small-Scale Farmers using ICT in Bangladesh

f
t
t
r

Dipok Kumar Choudhury1, Mansur Ahmed2, Akinori Ozaki3, Md. Abiar Rahman4, Shoichi Ito5 and Ashir Ahmed6

w
h
a
e
p
c
I
a
a
B
a
a
h
(
r
l
a
o
b
l
w
c
c
a
t
r
m

(MS Student): Agriculture and Resource Economics. Kyushu University, Fukuoka, Japan, dipokch@gmail.com
2
(Technical Manager, Agriculture): Kyushu University-JICA Grass Root Project, Dhaka, Bangladesh
3
(Coordinator): Kyushu University-JICA Grass Root Project, Dhaka, Bangladesh
4
(Associate Professor): Dept. of Agroforestry and Environment, Bangabandhu Sheikh Mujibur Rahman Agricultural University
(BSMRAU), Gazipur, Bangladesh
5
(Professor): Agriculture and Resource Economics. Kyushu University, Fukuoka, Japan
6
(Associate Professor): Dept. of Advanced Information Technology, Kyushu University, Fukuoka, Japan
1

farmers by providing the knowledge of chemical free farming


through ICT (Information and Communication Technology).
The project was supported by JICAs grass root funding,
implemented by Kyushu University and partnered by the local
organizations in Bangladesh. The project was implemented by
two phases.

Abstract Bangladesh is agriculture based country. Chemical


dependent traditional agricultural farming is a common practice
in Bangladesh. A project was implemented in the rural
Bangladesh funded by JICA for income generation of the poor
farmers by cultivating semi-organic vegetables using ICT. In
Phase-I, the production of semi-organic vegetables; and in PhaseII, marketing of the products were given emphasis. It was
observed that the production of semi-organic vegetables was
satisfactory and the marketing is improving with some
challenges. The use of ICT helped farmers to solve their
problems instantly in producing semi-organic vegetables that
helped to get higher benefit from the products.

II.

METHODOLOGY

A. Project Area and Semi-organic Vegetable Cultivation


The project phase-I was implemented during June, 2010 to
June, 2013 in Kapasia, Gazipur and Ekhlaspur, Chandpur and
phase-II will be implemented during January 2014-December,
2016. A total of 33 farmers for phase-I in two locations were
selected. In phase-II, we have extended the project in five
locations and engaged 75 model farmers to grow semi-organic
vegetables in and around their homesteads. Seasonal vegetables
were selected based on the demand and return. Season wise list
of the vegetables is presented in Table 1. All the inputs were
provided from the project. The farmers were given training on
compost making, organic pesticide preparation, companion
cropping, and production technology of vegetables under the
close supervision of the agriculture experts at the BSMRAU
and other related organizations. Some activities of the project
are shown in Fig. 1.

Keywords- Semi-organic vegetables; ICT; Tele-Centre; QVegie; Income Generation

I. INTRODUCTION
To feed the ever-increasing population, the productivity of
agricultural land needs to be intensified in Bangladesh. The
productive capacity of agricultural land is low due to poor soil
fertility, which has been degraded by mismanagement of
agricultural resources. Intensive cultivation is associated with
the use of high inputs of agrochemicals. The intensive use of
agrochemicals could cause not only the degradation of soil
fertility, but also the pollute environment and human health. A
good soil should have at least 2.5% organic matter. However,
the soil of about 45% of the net cultivable area in Bangladesh
has less than 1% [1]. It is believed that the lower productivity
of the soil is associated with the depletion of organic matter
due to increasing cropping intensity, higher rates of
decomposition of organic matter under the prevailing hot and
humid climate, use of lesser quantities of organic manure, little
or no use of green manure etc. [2]. As a result, farmers are not
getting desired yields. On the other hand, foods are not safe as
excessive agrochemicals are being used for crop production.
Due to poor marketing system, farmers are not getting real
price of quality products. Most of the farmers are poor and
illiterate. They do not know much about modern technologies
on production, marketing and management of products.
Kyushu University had proposed a grass-root project named
Income Generation Project for Farmers using ICT (IGPF)
through producing semi-organic vegetable called Q-Vegie
which is also expected to become a new brand in Bangladesh.
The goal was to generate income for BoP (Bottom of Pyramid)

i
r
a
n
m
t
s
Q
f
p

Table 1. Season-wise selected vegetables


Name of the season
Kharif (Summer)
(March-July)

Selected vegetable
Bitter Gourd, Bottle Gourd, Ash Gourd,
Cucumber, Okra, Ridge Gourd, Long Bean

Rabi (Winter)
(November-February)

Cabbage, Cauliflower, Tomato, Sweet Gourd,


Bottle Gourd, Coriander

Figure 1. Activities of the project

d
w
b
n
t
g
b

B. Marketing
In Phase-I, emphasize was given to the production of semiorganic vegetables, because it was a new concept among the

30

.indd

30

2015/03/11

10:59:48

,
l
y

o
d
,
e
e
c
s
t
e
n
n
e
U
t

farmers. In the Phase-II, marketing was given priority to ensure


the high income for the farmers. For marketing, Dhaka city was
targeted. Individual consumers, corporate consumers,
restaurants and hospitals were the main clients of the products.

generate more income for Q-Vegie farmers, 3): To improve


consumer health status.
B. Major Achievements
Phase-I aimed to increase income of farmers through semiorganic vegetable named Q-Vegie cultivation with the help of
ICT. The findings of the phase-I, the contribution of Q-Vegie
to the farmers total income was 12.08% and 8.42% in
Kapasia and Ekhlaspur, respectively. Due to increased
income, farmers could able to improve their livelihood, invest
to childrens education, physical wealth and housing facilities
and so on.
Phase-II aims to cultivate of semi-organic farming skills
of farmers and catalyst the smooth marketing support of QVegie by using ICT power at telecenter. The role of telecenter is to be a key station for the need based information
collection and distribution at rural area and creating more
efficiency of tele-center as well as its operator. All ICT tools
are maintained by the iFarM (Integrated Farming and
Marketing) platform where Q-Vegie project are monitored by
different agriculture experts, Marketing and ICT sectors from
a remote location as well as urban area. This platform is
becoming the foundation of new ICT initiatives to support
BoP farmers and also it is expected to be a unique model in
other areas among the BoP farmers.
Although production was satisfactory for the farmers, but
still there are some challenges on marketing. Now we are
gathering the requirements and local solutions to develop a
good channel for collecting vegetables, packaging, finding
customers and delivering of Q-Vegie products. It is expected
that after establishing a good marketing channel, income
generation for the farmers will be increased more.

C. Use of ICT
ICT tools were used in production and marketing activities,
which ensured quick and easy solution of the problems and
high income. There are many components of ICT like eagriculture, Agri-eye, semi-organic learning and e-commerce
etc. (1) E-agriculture application is a foremost part of ongoing
project phase-II. Generally, it helps technical farming update
content which was stored previously, Farming activity
Information uploading by farmers and finally communication
among the farmers, expert and market agent through this
application. (2) BIGBUS (BoP Information Generation,
Broadcast and Upload System) is a component of e-agriculture
and it can be used by even low literacy farmers. Farmers can
access to this system by their own phone and upload any
harvest information by following voice navigation. (3) C2D
(Click to dial) is also part of e-agriculture and only the
registered users will be connected each other based on need
like farmers, project staffs, Agriculture experts, Market agents
as well as consumers. (4) Agri-Eye helps to farmers in terms
of weather information and they can take decision for farming
based on that information. (5) Semi-organic Learning is a
learning support content of particular semi-organic vegetables
which were updated by the agriculture expert. These contents
combined by the text, picture and animation. Finally (6) Ecommerce is a market place of Q-Vegie where consumer can
access and select their products for purchasing and make
tradeoff with farmers by themselves. This application would
reduce middle man activities and increase farmer`s end profit
margin.
III.

CONCLUSION
Farmers indigenous knowledge has never been archived in
developing countries. ICT can collect these knowledge, archive
and disseminate to new farmers. Our Q-Vegie production
involves more manual labor cost and the risk is there to protect
the vegetables from insects in organic way. Our developed ICT
tools helped the farmers to access to the indigenous knowledge,
take faster action and increase productivity. Now we are
developing ICT tools for identifying suitable market and
delivering products in a safer way. .
ACKNOWLEDGMENT
The project was funded by JICA and implemented by
Kyushu University. The authors are thankful to BSMRAU,
WIN, GCC and BARI authority for their excellent cooperation
in implementing the project.
REFERENCES

RESULTS

A. Project Activities
In Phase-I, it was found that the farm income of rural
individual farmer is too low to meet their minimum livelihood
requirements because of lack of opportunity to obtain
advanced information on farming techniques and undeveloped
network of agricultural production and marketing. As a result,
most of the farmers cannot break away from their poverty and
they are still remaining at BoP [3]. Food security and food
safety as well as income generation can be improved through
Q-Vegie cultivation but the marketing system needs to be
facilitated more systematically in the next phase-II of the
project.
Unfortunately, the report of phase-I also showed that the
demand for Q-vegie was very weak and marketing channel
was not organized by the project in the Dhaka city, which has
been motivated the phase-II initiative including three more
new locations named Mirjapur, Monohordi and Boshundia. In
this connection, this current initiative has three new main
goals compared to previous phase; 1): To create entrepreneurs
based on new business model of Q-Vegie at rural areas. 2): To

[1]

[2]
[3]

Z. Karim, M.M.U. Mia, and S. Razia. 1994. Fertilizer in the National


Economy and Sustainable Environmental Development. Asia Pasific
Journal of Environment and Development 1(2): 48-67.
BARC. 2005. Fertilizer Recommendation Guide-2005, Bangladesh
Agricultural Research Council, Dhaka, Bangladesh.
Ozaki, A. MD. A. Rahman, K. Ogata, A. Ahmed, I. Miyajima, T.
Okayasu, T. Osugi, D. K. Choudhury and N. Al Amin. (2013). Impact of
ICT based farming knowledge dissemination on farmers incomeExperience of income generation project for farmers using ICT in
Bangladesh. Bulletin of Institute of Tropical Agriculture, Kyushu
University,pp.95-113.

31

.indd

31

2015/03/11

10:59:49

Telehealth for low resource unreached


communities
1

Andrew Rebeiro-Hargrave, 2Ashir Ahmed 3Naoki Nakashima and 4Partha Pratim Ghosh
1
Institute of Decision Science for Sustainable Society, Kyushu University, Japan
2
Department of Advanced Information Technology, Kyushu University, Japan
3
Kyushu University Hospital, Japan
4
Grameen Communications, Dhaka, Bangladesh
communities [5]. It consists of back-end of data servers and a
medical call center, and inexpensive front-end instances of
portable briefcase consisting of medical sensors and measuring
equipment (costing USD 500). The front-end communicates
with the back-end using mobile network coverage and Internet
(Fig 1).

Abstract We present a compact telehealth system that identifies


subjects at risk from noncommunicable diseases (NCD) and treats
them in their own environment. The clinical measuring tools of
telehealth system fits into a briefcase and is carried by a healthcare
worker to unreached communities. The clinical measuring tools
send data to a tablet triage server that identifies the level
morbidity. The subjects results are displayed as a color-coded risk
category green (safe), yellow (caution), orange (affected) and red
(emergent). Orange and Red subjects become patients and their
results trigger an immediate telehealth consultancy with a remote
doctor. The remote doctor uses the synchronized master triage
server to make a diagnosis and sends the patient an e-prescription.
We call this system the Portable Health Clinic. The color-coded
triage overcomes resistance to technology by healthcare workers
and patient because colors are easier to understand than
measurement values.

I. INTRODUCTION
Telehealth is the delivery of health-related services and
information via telecommunications technologies. Healthrelated services are delivered by healthcare workers and
supported by remote doctors in medical institutions [1].
Telehealth is normally used to keep and treat patients at home
and out of hospitals [2]. It is used to remotely monitor
chronically ill patients [3]. Telehealth is not used for medical
screening programs that test for chronic Non-Communicable
Diseases (NCD) such as hypertension, diabetes, dyslipidemia,
obesity, kidney disease and liver dysfunction in individuals who
do not show symptoms. Medical screening programs are good
for identifying morbidity at an early and treatable stage, and for
exposing individuals who normally ignore the disease
symptoms [4]. Once identified at risk, a patient is referred to a
clinic for treatment. A portable telehealth service at a screening
center can shorten the time between the patient being referred
and receiving a medical intervention. In this study we introduce
the Portable Health Clinic as an efficient medical screening
system and a synchronous telehealth component.

Fig.1 Portable Health Clinic architecture


The human-computer interface of the Portable Health Clinic
is based upon a WHO standard color-coded triage decision
making algorithm. Color coded risk categorization is very
convenient method to convey complex medical information to
the healthcare worker and illiterate clients in rural villages, and
the remote telemedicine doctor. When taken at first sight, the
array of sensor readings, such as blood pressure, blood and
urine glucose and hemoglobin level, are unfamiliar to a
semiskilled healthcare worker and her patients. The
combination of all the clinical data in a text list is often
meaningless to both parties. In contrast, when the sensor results
are combined with color circles of green (safe), yellow
(caution), orange (affected) and red (emergent) and shown to
the healthcare worker - the healthcare worker and the patient
quickly recognizes the need for a telemedicine consultation
with the remote doctor (for people with orange and red
readings). The busy remote doctor time is optimized by visually
scanning for red and orange circles instead of reading the
numbers. Using this approach, field data indicates the average
time to screen a villager with all sensors is 6 minutes. The
average time of the telehealth consultation with a remote doctor

II. METHODOLOGY
The Portable Health Clinic (PHC) system is an e-health
system with a telehealth component. The PHC was designed by
Kyushu University and Grameen Communications Global
Communication Center (GCC) to provide affordable e-Health
service to low-income subjects living in unreached

32

.indd

32

2015/03/11

10:59:50

2
is 18.25 minutes (turn-around time includes client registration,
sensor measuring, consultancy and e-prescription print-off).

had moved to the yellow category and 70(7%) patients had


moved to the green zone. This left 452(45%) patients remaining
in the orange and red categories (Fig 2).
The results for 2013 also showed a decrease in NCD
morbidity. In the first health checkup, 7,794 subjects were
screened. PHC triage categorized 4,365 subjects as low risk
(green 90 and yellow 3,975). PHC triage identified 3,118
patients as high risk and established patient-doctor telehealth
consultancies (orange 2,728 and red 390). The high-risk
patients were prescribed medicine and were informed to attend
the next PHC healthcare camp. In the secondary checkup round,
709 patients (data analysis is still ongoing) were re-examined.
The follow-up triage measurements showed improvement in the
patient previously categorized as orange and red; 320(46%)
patients had moved to the yellow category and 36 (5%) patients
had moved to the green zone. This left 353(49%) patients
remaining in the orange and red categories (Fig 2).

EXPERIMENTAL SETUP AND RESULTS


A. Experimental Design for NCD mass screening
Portable Health Clinic service was tested against a sample
population from unreached communities in Bangladesh. The
experiment design involved a PHC team (including healthcare
worker) setting up a temporary healthcare camp in urban, suburban and rural areas in Chandpur district and Shariatpur
district between September 2012 and November 2013.
B. Results of the mass NCD mass screening
The Portable Health Clinic screened 8,527 subjects in 2012
and 6,347 subjects in 2013 for the incidence of NCDs
(hypertension and diabetes) in their own environment. There
were 1,447 repeat subjects who were measured multiple times.
The Patient Health Records were stored in the Gramhealth
database. The median age of the sample population was 32
years. The average age was 37 years. The minimum age was 16
and maximum was 106. The relatively youthful average
resulted in high prevalence of green and yellow results in all
clinical checkup-items. Out of the total sample population
10,359 were identified not a risk for and 4,714 patients were
identified as suffering from NCDs. The distribution of
morbidity was strongly related to age. The prevalence of orange
(NCD affected) and red patients (NCD emergent) increased
with age cohort. At the age 50 years nearly half of all tested
required a telehealth consultancy.
The increase of morbidity (requiring a telehealth
consultancy) with age cohort will have a profound impact on
the future productivity of the unreached communities. It is
expected that aging population will survive longer, however
with over 50% affected with NCDs, their quality of life will be
impaired and there will be an increase demand on the low
resourced public healthcare system.

Fig 2. Portable Health Clinic follow-up visitors who were


prescribed medicine and were tested 2 months later.
III. CONCLUSIONS

C. Results of the telehealth consultations


The research objective of the PHC experiment was to identity
low-income people with NCDs and treat their medical
condition. The 2012 sample of NCD screening and telehealth
intervention community revisits (the same community is visited
twice) showed a decrease in morbidity. In the first health
checkup round, 8,527 subjects were screened. PHC triage
categorized 6,898 subjects as low risk (green 1,364 and yellow
5,534). Caution (yellow) subjects were given health guideline
booklets. PHC triage server identified 1,629 patients as high
risk and established patient-doctor telehealth consultancies
(orange 1,450 and red 179). The high risk patients were
prescribed medicine and were informed to attend the next PHC
community healthcare check-up, scheduled 2 months later. The
PHC team moved from one community to another and revisited
a community after eight weeks.
In the second check-up round (communities were revisited)
1,003 patients (61% of the original high risk patients) were reexamined. The purpose was to measure the suitability of the
prescribed drugs given to the patient. The follow-up triage
measurements showed a marked improvement in patients
previously categorized as orange and red; 481(48%) patients

The Portable Health Clinic addresses three telehealth areas: it


identifies risk patients and treats them in their own
environment; it enables the healthcare worker and patient to
overcome their resistance to technology by applying easy-tounderstand color-coded risk triage stratification humancomputer interface; and it reduces morbidity and keeps patients
out of hospital.
REFERENCES
[1]
[2]
[3]

[4]
[5]

Telemedicine, Opportunities and developments in Member States:


report of the second global survey of eHealth, World Health
Organization 2009
Craig J, Patterson V. Introduction to the practice of telemedicine.
Journal of Telemedicine and Telecare, 2005 11(1)3-9
Cafazzo, J.A., Leonard, K., Easty, A.C., Rossos, P.G., & Chan, C.T.
(2009, February 14). Bridging the self-care deficit gap: remote
patient monitoring and hospital at home. In Electronic Healthcare
First International Conference, eHealth 2008
Wilson J and Junger G, Principles and Practice of Screening for
Disease. World Health Organization 1968
Ahmed A, Inoue S, Kai E, Nakashima N and Nohara Y. Portable
Health Clinic: A pervasive way to serve the unreached community
for preventive healthcare. Proceedings of the 15th International
Conference on Human-Computer Interaction (HCI 2013)

33

.indd

33

2015/03/11

10:59:51

.indd

34

2015/03/11

10:59:51

.indd

35

2015/03/11

10:59:51

a) A Study of Interference-Aware Channel Segregation for HetNet Using Time-Division Channels

t
c

Ren Sugai, Katsuhiro Temma, Abolfazl Mehbodniya, Fumiyuki Adachi

r
t
d
c
p
c

Dept. of Communications Engineering, Graduate School of Engineering, Tohoku University


designed to be periodically transmitted from each BS. Then,
each BS (ii) computes the average CCI power on all available
channels by using past CCI measurement results and (iii)
updates the channel priority table to (iv) select the best channel
having the lowest average CCI power. After the channel
selection, it (v) broadcasts the beacon signal on the selected
channel. Each BS periodically repeats the procedure of (i) ~ (v).
The average CCI power measured at the m-th BS, BS(m),
on the c-th channel at time t is denoted by I BS( m ) (t ; c ) . Using
the average CCI powers on all available channels, the channel
priority table is updated for all available channels (c=0~C1).
The channel having the lowest average CCI power is selected
according to:

AbstractOne of the problem in heterogeneous networks


(HetNets), i.e., a combination of several small cell base stations
(SBSs) and an overlaid macro cell base station (MBS), is the cochannel interference (CCI) between BSs when MBS and SBSs
share the same radio resource. Using interference-aware channel
segregation based dynamic channel assignment (IACS-DCA),
each BS periodically measures the average CCI power on all
available channels. The channel with the lowest average CCI
power is not being used by neighboring BSs and hence, this
channel can be selected to use. In this way, IACS-DCA forms a
channel reuse pattern with low CCI in a distributed manner. In
this paper, we apply the IACS-DCA to HetNet using timedivision channels. We show by computer simulation that the
IACS-DCA is able to form a channel reuse pattern with low CCI.

c(m) arg min I BS( m ) (t ; c) ,

Keywords-channel segregation; dynamic channel assignment;


co-channel interference; heterogeneous network

II.

(1)

c0 ,C 1

which is used until the next channel priority table updating


time t+1.
The channel with the lowest average CCI power is
considered not to be used by neighboring BSs and hence, the
impact of causing interference to other BSs by using this
channel is expected to be small. Therefore, a channel reuse
pattern with low CCI can be formed by IACS-DCA.

INTRODUCTION

Due to scarce spectrum resources, the number of available


channels is limited in wireless networks and hence, the same
channel must be reused by different base stations (BSs). Since
the co-channel interference (CCI) limits the transmission
quality, the channels must be reused so as to minimize the CCI
at every BS. In addition, CCI environment changes over time
when new BS appears or user equipment (UE) moves.
Therefore, the channels should be properly re-allocated
according to change of CCI environment. To remedy this
problem, dynamic channel assignment (DCA) has been studied
[1]-[3]. There are two types of DCA: centralized and
distributed. The centralized DCA may not be practical due to
its prohibitively high computational complexity [4]-[5]. We
have been studying an interference-aware channel segregation
based DCA (IACS-DCA) [6]-[8], which is categorized into
distributed DCA. We have shown that in the network using
frequency division channels, IACS-DCA can form a channel
reuse pattern with low CCI in a distributed manner [6]-[8]. In
IACS-DCA, each BS periodically measures the average CCI
powers on all available channels to select the best channel
having the lowest average CCI power to use.
Heterogeneous networks (HetNets), which consist of
several small cell BSs (SBSs) and an overlaid macro cell BS
(MBS), are capable to deal with exponential increase in
wireless data traffic [9]. One major problem in HetNets is the
CCI between BSs when MBS and SBSs share the same radio
resource. In this paper, we apply IACS-DCA to HetNet using
time-division channels. We show by computer simulation that
a channel reuse pattern with low CCI is formed by IACS-DCA
in HetNet using time-division channels.
III. IACS-DCA

START
(i)

Instantaneous CCI pow.


measurement

(ii) Ave. CCI pow. computation


(iii) Channel priority table update
(iv)

Channel selection

(v)

Broadcast beacon signal

Channel priority Table at BS(m)


channel # Ave. CCI Pow.

Priority

#1

I BS( m) (t;0)
I BS( m) (t;1)

#C1

I BS( m) (t; C 1)

#0

Figure 1. Flowchart of IACS-DCA.

IV.

COMPUTER SIMULATION

A. Simulation model
An example of HetNet model is illustrated in Figure 2. An
MBS is located at the center of hexagonal macro cell. NSBS
SBSs are distributed uniformly within one macro cell and a
static UE is assumed to be uniformly located within each cell.
The simulation parameters are summarized in Table I. We
show that the channel reuse pattern formed by IACS-DCA can
reduce the CCI which originates from macro cell and is
received by small cell. The perfectly synchronous time division
multiple access (TDMA) system is assumed. As shown in
Figure 3, we assume C time-division channels (i.e., C timeslots
within one timeframe). For the measurement of the average
CCI power, the first order filtering with forgetting factor is
used. If a too small is used, the measured average CCI tends

IACS-DCA flowchart is shown in Figure 1. Each BS is


equipped with channel priority table. It periodically (i)
measures the instantaneous CCI powers by monitoring the
beacon signal on all available channels. The beacon signal is

a
s
t
w
T
C

36

.indd

36

2015/03/11

10:59:53

,
e
)
l
l
d
).
,
g
l
.
d

to follow the instantaneous CCI and the channel segregation


cannot be done. Hence, =0.99 is used [7].
In each simulation run, the signal-to-interference power
ratio (SIR) measurement is carried out when t=2000 (i.e., after
the channel reuse pattern gets stable). The cumulative
distribution function (CDF) of downlink SIR is obtained by
conducting the simulation run 300 times. We only consider
path loss in propagation channel. The initial channel is set to
channel #0 (c=0) for all BSs.
SBS
SBS

SBS

SBS

SBS

MBS

SBS

CDF

SBS

SBS

SBS
SBS

SBS

SBS

SBS

SBS
SBS

SBS

1timeframe

1timeslot

#1 #C1 time

Network

Macro cell

Small cell
Path loss
[11]
IACS-DCA

45

50

SIR(dB)

55

60

65

70

COLCLUSION

REFERENCES
[1]

I. Katzela and M. Naghshineh, Channel assignment schemes for


cellular mobile telecommunication systems: a comprehensive survey,
IEEE Personal Commun., vol. 3, no. 3, pp. 10-31, June 1996.
[2] H. Skalli, S. Ghosh, S. K. Das, L. Lenzini and M. Conti, Channel
assignment strategies for multiradio wireless mesh networks: issues and
solutions, IEEE Commun. Magazine, vol. 45, no. 11, pp. 86-95, Nov.
2007.
[3] A. Goldsmith, Wireless Communications, Cambridge University Press,
Aug. 2005.
[4] R. W. Nettleton, A high capacity assignment method for cellular
mobile telephone systems, Proc. IEEE 39th Vehicular Technology
Conference (VTC1989-Spring), May 1989.
[5] G. F. Marias, D. Skyrianoglou, and L. Merakos, A centralized approach
to dynamic channel assignment in wireless ATM LANs, Proc. 18th
Annual Joint Conference of the IEEE Computer and Communications
Societies (Infocom99), Mar. 1999.
[6] R. Matsukawa, T. Obara, and F. Adachi, A dynamic channel
assignment scheme for distributed antenna networks, Proc. IEEE 75th
Vehicular Technology Conference (VTC2012-Spring), May 2012.
[7] Y. Matsumura, S. Kumagai, T. Obara, T. Yamamoto, and F. Adachi,
Channel segregation based dynamic channel assignment for WLAN,
Proc. IEEE The 13th International Conference on Communication
Systems (ICCS2012), Nov. 2012.
[8] R. Sugai, M. T.H. Sirait, Y. Matsumura, K. Temma, and F. Adachi,
Impact of shadowing correlation on interference-aware channel
segregation based DCA, Proc. 2014 IEEE 11th Vehicular Technology
Society Asia Pacific Wireless Communications Symposium
(APWCS2014), Aug. 2014.
[9] A. Damnjanovic, J. Montojo, Y. Wei, T. Ji, T. Luo, M. Vajapeyam, T.
Yoo, O. Song, and D. Malladi, A survey on 3GPP heterogeneous
networks, IEEE Wireless Commun., vol. 18, no. 3, pp. 10-21, Jun. 2011.
[10] H.T. Friis, Noise figures of radio receivers, Proc.of the IRE, pp. 419422, July 1944.
[11] S. Samarakoon, M. Bennis, W. Saad and M. Latva-aho, Opportunistic
sleep mode strategies in wireless small cell networks, Proc. IEEE
International Conference on Communications 2014 (ICC2014), pp.27072712, June 2014.

Figure 3. Timeframe structure.


COMPUTER SIMULATION CONDITION

TABLE I.

40

ACKNOWLEDGMENT
The research results presented in this material have been
achieved by Towards Energy-Efficient Hyper-Dense Wireless
Networks with Trillions of Devices, a Commissioned
Research of National Institute of Information and
Communications Technology (NICT), JAPAN

SBS

Figure 2. An example of HetNet model.

#0

35

In this paper, we studied the IACS-DCA in HetNet using


time-division channels. We showed by computer simulation
that the CCI from marco cell which is exposed to small cells
can be avoided by IACS-DCA.

SBS

SBS
SBS

SBS

30

V.

SBS

SBS

MBS:off
MBS:on

Figure 4. Downlink SIR distribution.

SBS

SBS

s
e
s
e

a
.
e
n
s
n
n
s
e
s
s

SBS

SBS

SBS

NMBS=1, NSBS=29, C=8

0.1

0.01

No. of MBSs
NMBS=1
No. of SBSs
NSBS=29
No. of channels
C=8
Carrier frequency
2 [GHz]
Frequency bandwidth
10 [MHz]
Noise power spectrum
174 [dBm/Hz]
density [10]
Radius
250 [m]
Min. MBS-SBS distance
75 [m]
Transmit power of MBS
46 [dBm]
Radius
40 [m]
Min. SBS-SBS distance
40 [m]
Transmit power of SBS
30 [dBm]
15.3+37.6log10(dBS(m),BS(n)) [dB]
MBS-SBSMBS-UE
27.6+37.6log10(dBS(m),BS(n)) [dB]
SBS-SBSSBS-UE
dBS(m),BS(n): distance between BS(m) and BS(n) [m]
Filter forgetting factor
=0.99

B. Simulation result
Figure 4 plots the CDF of downlink SIR. For comparison, we
also plot the downlink SIR when MBS is off after channel
segregation is finished (there is no CCI from MBS to small cell when
the SIR measurement is carried out). We observe that downlink SIR
when MBS is on is almost equal to downlink SIR when MBS is off.
This proves the effectiveness of our proposed IACS-DCA in reducing
CCI between macro cell and small cells.

37

.indd

37

2015/03/11

10:59:53

A Novel Handover and Base Station Sleep Mode


Algorithm in HetNet
Rintaro Yoneya , Abolfazl Mehbodniya and Fumiyuki Adachi

Dept. of Communication Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan
6-6-05, Aza-Aoba, Aramaki, Aoba-ku, Sendai, Miyagi, 980-8579, Japan
Email: (yoneya, mehbod)@mobile.ecei.tohoku.ac.jp

adachi@ecei.tohoku.ac.jp

AbstractThe demand for wireless resources is increasing at


high pace. Heterogeneous network (HetNet) is a key solution to
address this increased demand. This paper presents a handover
(HO) algorithm for BS sleep mode algorithm in HetNet. The
proposed algorithm yields improvement in HO times compared
to the conventional algorithms in dense SBS deployment and
higher UE densities.
Index TermsHetNet, handover, base station sleep mode
algorithm, game theory, energy efficiency, mobility

SBS1

MBS
SBS3
SBS2
UE

I. I NTRODUCTION
Fig. 1. HetNet topology.

The demand for wireless resource is increasing at high


pace. Video streaming and social media [1] contribute to
this increase to a great extent. Consequently, traffic load and
energy consumption in wireless cellular system is increasing
accordingly and this urges the necessity of designing more
energy and spectral efficient systems.
Heterogeneous networks (HetNets), consisting of macro cell
base stations (MBSs) and small cell base stations (SBSs),
are proven to be highly effective in improving spectral efficiency [2]. The consumption energy in HetNets reduces
when combined with sleep mode algorithms which adapt to
network traffic conditions. Autonomous distributed sleep mode
algorithms do not need any information exchange through
back haul communication. In such algorithm, each BS decides
independently to turn ON or OFF depending on its traffic load
and consumption power.
To the best of our knowledge there has been no study on
mobility in HetNets which employ sleep mode algorithms. If
users move, UEs received power from connected BS changes
because of change of BSs transmission power, path loss
and so on. If the received power is low, the BS can not
communicate with the BS effectively. As a result, a handover
(HO) process is initiated. In this paper, we propose a HO
algorithm for UEs, combined with an sleep mode algorithm
for BSs in a HetNet scenario. The HO algorithm comprises
of two different phases, i.e., HO necessity estimation phase
(HONEP) and HO execution phase (HOEP).

Each BS chooses its strategy (transmission power level),


using Table I. MBSs strategy is strategy 1 or 4 in Table I.
Transmission power of s th BS is given according to:
TX
Ps (t) = s (t) PsM
AX ,

(1)

TX
us (t) = ( PsAll (t)/PsM
AX + s (t)),

(2)

TX
where s (t) is the transmission power level and PsM
AX is
the maximum transmission power of s th BS. A BSs total
consumption power is decided by its transmission power [1].
A BSs load is the summation of all UEs traffic load in the
cell. Please note UEs traffic load is defined as the ratio of
its required date rate over its actual link capacity. Utility of s
th BS is formed by its total consumption power, PsAll (t), and
traffic load , s (t) according to:

where and ( > 0, > 0) are weighting factors of


consumption power and traffic load. These constants define
the influence of consumption power and load.
III. A LGORITHMS
We use the sleep mode algorithm in [1] as shown in
Algorithm 1. The proposed UE association algorithm is shown
in Algorithm 2.
A. Traffic load estimation
Each BS estimates its traffic load, s (t), according to:
s (t) = s (t 1) + n(t) (s (t 1) s (t 1)),

II. S YSTEM M ODEL

(3)

where n(t) is the learning rate. s (t) is then transmitted via


periodic beacons to all UEs.

In this paper, we focus on the downlink transmission in


HetNet, assuming one MBS and several SBSs, S = {1, ..., S},
distributed uniformly within a macro cell, at the center of
which the MBS is located. Fig. 1 shows an example realization
of such HetNet scenario.

TABLE I

TRANSMISSION POWER LEVELS .

Identification Number
of Strategy i
1
2
3
4

The research results presented in this material have been achieved by


Towards Energy-Efficient Hyper-Dense Wireless Networks with Trillions of
Devices, a Commissioned Research of National Institute of Information and
Communications Technology (NICT), JAPAN.

Transmission
Level s (t)
0
1/3
2/3
1

Power

38

.indd

38

2015/03/11

10:59:55

Algorithm 1 : Sleep mode algorithm at BS [1].


1: Initialization: S = { 1, ..., S };
2: while do
3:
t 1 t,
4:
BSs strategy selection
5:
Calculation of traffic load estimation s (t) and transmission to all UEs
6:
Calculation of traffic load s (t), power consumption
PsAll (t) and utility us (t)
7:
if s (t) > 1 then
8:
Select UEs to connect
9:
end if
10:
Update of utility estimation u
s,i (t)
11: end while

Algorithm 3 : HONEP at UE.


1: if UE is connected to MBS then
2:
Always search for a new BS using (4)
3: else
4:
(UE is connected to SBS)
5:
if d(t) > rSBS (rSBS : small cell radius) then
6:
search for a new BS using (4)
7:
else
8:
if vb (t) < 0, d(t) dT H (dT H : distance threshold)
and PsRX (t) P T H (P T H : power threshold) then
9:
search for a new BS using (4)
10:
else
11:
Do not change current connected BS
12:
end if
13:
end if
14: end if

Algorithm 2 : Association algorithm at UE.


1: Input:
s (t) and PsRX (t)
2: Output:
s(z, t)
3: if UE is not currently connected to any BS then
4:
UE chooses a new BS, s(z, t), based on s (t) and
PsRX (t)
5: else
6:
Decide whether to HO or not
7:
if HO is necessary then
8:
UE chooses a new BS, s(z, t), based on s (t) and
PsRX (t)
9:
else
10:
UE does not change its BS
11:
end if
12: end if

Average times of handover per 1s

4
Proposal HO algorithm
Baseline

0
20

B. HO Algorithm
Each UE receives the traffic load estimation s (t), data
about BS position and the cell radius r from all BSs through
beacon signals. UE obtains the velocity and position information using its integrated GPS. UE decides about HO based on
these information and received power PsRX (t). Algorithm 3
shows HONEP at UE. In this algorithm, v(t)( 0) is the
velocity of UE and vb (t)( < vb (t) < +) is its velocity
component in the direction of the connected BS. d(t) is the
UEs distance to its connected BS. For new UEs or UEs
needing HO, each UE at point z selects the BS to connect
to (HOEP), s(z, t), based on the following criterion:

40

60
Number of Users

80

100

Fig. 2. Total number of HOs for 1 s vs different number of UEs for 7 SBSs
and average velocity 4 km/h.

is 1 s. Cell radius of MBS and SBS are 250 m and 40 m.


Maximum transmission power of MBS and SBS are 46 dBm
and 30 dBm. We compare our proposed joint HO and sleep
mode algorithm with the benchmark in [1].
In Fig. 2, we observe that for lower number of UEs, the
benchmark makes less HOs than the proposed algorithm. This
is because the benchmark algorithm does not initiate enough
HOs to prevent degradation in communications quality. The
benchmark algorithm does not make UEs frequently select
SBSs unlike our proposed algorithm. However, as the number
of UEs increase, the number of HOs in benchmark algorithm increases significantly. This is because the benchmark
algorithm causes pingpong effect and make UEs implement
unnecessary HOs.
This proves the effectiveness of the proposed algorithm for
higher UE densities.

s(z, t) = arg max{(


s (t) + s ) PsRX (t) (ds (t)) },
sS
(4)
where s is offset and  ( > 0) and ( > 0) are
coefficients which define the influence of traffic load estimation, s (t), and the distance between UE and s th BS, ds (t),
respectively.
IV. C OMPUTER S IMULATION
Simulation parameters are summarized in Table II. Total
simulation time is 10000 s and time interval for each iteration

V. C ONCLUSION
In this paper, a joint distributed handover (HO) and base
station sleep mode algorithm was proposed within the context
of HetNet. It was noticed that the proposed algorithm yields
significant improvement in HO times compared to conventional algorithms in high UE densities.

TABLE II
S IMULATION PARAMETERS .
Parameter
Value
Path loss (d:Distance of BS and user (m)) (unit: dB)
MBS - UE
15.3+37.6log10 (d) [1]
SBS - UE
27.9+36.7log10 (d) [1]
Algorithm Parameters
Power Threshold P T H
60 dBm [3]
Distance Threshold dT H
20 m
Weighting Coefficients for Power Consump10, 5
tion and Traffic Load, ,
Weighting Exponent of Traffic Load and
1, 0.5
Distance,  ,

R EFERENCES
[1] S. Samarakoon, M. Bennis, W. Saad, and M. Latva-aho, Opportunistic
sleep mode strategies in wireless small cell networks, in IEEE International Conference on Communications 2014 - Mobile and Wireless
Networking Symposium (ICC14-MWS), June 2014, pp. 27072712.
[2] S.Zhou, A.J.Goldsmith, and Z.Niu, On optimal relay placement and
sleep control to improve energy efficiency in cellular networks, in IEEE
International Conference on Communications, June 2011, pp. 16.
[3] G. F. Pedersen, Mobile phone antenna performance, in Aalborg University, November 2013, p. 14.

39

.indd

39

2015/03/11

10:59:56

An Algorithm for Optimum Channel Assignment in Multi-Cell WLANs

Mohamed Elwekeil1 , Masoud Alghoniemy2 , Osamu Muta3 , Adel B. Abd El-Rahman1 , and Hiroshi Furukawa4
1
Department of Electronics and Communications Engineering,
Egypt-Japan University of Science and Technology, Alexandria, 21934 Egypt
2
Department of Electrical Engineering, University of Alexandria, 21544 Egypt
3
Center for Japan-Egypt Cooperations in Science and Technology, Kyushu University, Fukuoka-shi, Fukuoka, Japan
4
Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka-shi, Fukuoka, Japan
e-mail: mohamed.elwekeil@ejust.edu.eg, alghoniemy@alexu.edu.eg,
muta@ait.kyushu-u.ac.jp, adel.bedair@ejust.edu.eg, furuhiro@ait.kyushu-u.ac.jp
AbstractAn optimization model for solving the channel
assignment problem in IEEE 802.11 WLANs is proposed. The
proposed model is based on minimizing the total interference
at all access points in the network while allowing only nonoverlapping channels. The proposed model is formulated as a
mixed integer linear program. The main advantage of the proposed algorithm is that it guarantees a global solution. Simulation
results show that the performance of the proposed algorithm is
better than those of both the pick-first greedy algorithm and the
single channel assignment method.

the channel assignment algorithm to be repeated each time when a


user enters, leaves or even moves within an AP service area. In this
case, when the SIR at any user becomes less than the predefined
threshold, the channel should be modified.
The contribution of the paper lies in presenting an optimization
model for solving the channel assignment problem where the objective is to minimize the total interference at all APs in the network,
considering only non-overlapping channels, in order to improve
overall network throughput. The proposed model is formulated as
a mixed integer linear program. Practically, the proposed algorithm
can be applied in the installation phase or after any modifications in
the topology.

KeywordsWLAN; IEEE 802.11; multicell; channel assignment; integer programming.

I.

II.

I NTRODUCTION

T HE P ROPOSED M ODEL

The proposed channel assignment algorithm aims to minimizing


the total interference at all APs in the network considering only
the non-overlapping channels. The main idea is that minimizing the
total interference at the AP level, will improve the overall network
throughput.

Most of the existing WLANs follow the IEEE 802.11b/g/n standard which operates on the unlicensed 2.4 GHz Industrial, Scientific
and Medical (ISM) band. Figure 1 shows the IEEE 802.11 channels
in the ISM band, where the bandwidth of each channel is about
22 MHz and the separation between every two adjacent channels
is only 5 MHz; thus, neighboring channels overlap with each other.
This band consists of eleven frequency channels with only three nonoverlapping channels [1]. Therefore, careful channel assignment in
Multi-cell WLANs is very important. In Multi-Cell WLAN, multiple
interfering access points (APs) produce a considerable increase in
collisions. In this situation, the objective of channel assignment is to
assign a channel for each AP in order to reduce interference and thus
maintain an acceptable throughput.

In particular, consider a network consisting of N APs and each


AP j is transmitting power equals P tj . We define co-channel interference factor, ij , between AP i and APj as an indicator whether there
is mutual interference among APs i and j (ij = 1) or not(ij = 0),
i.e.,

1, if fi = fj
,
(1)
ij =
0, otherwise
which can be put in the following form
ij = max(0, 1 |fi fj |),

(2)

where fi and fj are the channels assigned to APi and APj respectively. In order to get rid of the modulus function in (2), the co-channel
interference factor can be expressed by the following linear inequality

Fig. 1: Channels for the IEEE 802.11 in the 2.4GHz ISM band [2].

+ Zij
), ij 0,
ij 1 (Zij

The channel assignment problem for the IEEE 802.11 Multi-Cell


WLANs is investigated by many researchers. A survey on different
channel assignment algorithms for WLANs can be found in [2]. The
authors in [3] have developed a mathematical model that defines
the amount of interference between overlapping channels in Multicell WLAN networks where they have presented a dynamic channel
assignment algorithm that aims to minimize the total interference
at each AP. However, the proposed algorithm is a greedy algorithm
which does not find a global solution. The channel allocation model
presented in [4] is based on minimizing the total interference among
different APs, while maintaining the Signal to Interference Ratio
(SIR) at all users higher than a predefined threshold. This may require

(3)

where Zij
and Zij
are auxiliary variables representing the positive
+

Zij
= fi fj . This
and negative values of (fi fj ) with Zij
+

will assure that Zij + Zij equals the modulus |fi fj |. In order to
+

and Zij
is zero, which is
guarantee that at least one of the values Zij
+

+ Zij
;
required for the replacement of the modulus part in (2) by Zij
an EITHER-OR constraint (4) is defined as in [5]
+

Qij , Zij
(1 Qij ),
Zij

(4)

where Qij is a subsidiary binary variable and is an adequately large


+

+
upper bound (e.g., 100) for both Zij
and Zij
. Thus, fi fj = Zij

40

.indd

40

2015/03/11

10:59:57

a
s
d

n
,
e
s
m
n

g
y
e
k

h
e
,

100

and |fi fj | = Zij


when Qij = 1; while fi fj = Zij
and

|fi fj | = Zij when Qij = 0.

In order to impose non-overlapping channel assignment, the following


constraint can be used

AP4
80

Building length (m)

fi = 1 + 5ki , ki {0, 1, 2},

(5)

where ki is an auxiliary variable that is used to restrict the allowable


channels to be non-overlapping, namely, 1,5 and 11.

e
s
o
s
;

+
j

AP6

AP5

(6)

AP8
30
20

30

40

50

60
70
80
Building width (m)

90

100

110

120

Fig. 2: Topology of 9 APs.

(7)

TABLE I: Results for 9 APs.


AP
Channel fi
ID
Proposed
Pick-first
AP1
11
6
AP2
6
11
AP3
6
6
AP4
11
6
AP5
1
11
AP6
6
11
AP7
11
1
AP8
1
1
AP9
1
1
Total interference (dBm)

Pt

Define ij = L(dijj ) as the received power at APi from APj .


Then, the proposed channel assignment can be found by solving the
following integer linear program
 
min
i
j=i ij ij
+

s.t. ij + Zij
+ Zij
1
ij 0
+

Zij
fi + fj = 0
Zij
(8)
+
Qij 0
Qij {0, 1}
Zij

Zij + Qij
ki {0, 1, 2}
fi 5ki = 1

IV.

Interference (dBm)
Proposed
Pick-first
Single Ch
-60.30
-59.77
-49.74
-61.04
-59.42
-44.16
-61.44
-55.15
-45.46
-58.86
-54.66
-32.64
-59.23
-58.36
-48.86
-66.10
-62.04
-49.48
-60.75
-62.63
-46.25
-65.23
-69.34
-50.64
-60.05
-62.81
-32.71
-51.36
-49.23
-29.16

C ONCLUSION

An optimum channel assignment algorithm for Multi-Cell


WLANs is proposed; where the objective is to minimize the toatal
interference at the APs level; considering only the non-overlapping
channels. The proposed model has been formulated as an integer
program. Simulation results reveal that proposed channel assignment
algorithm outperforms both the single channel assignment and the
greedy pick-first algorithm. In future work, we will implement
relaxation techniques to reduce the complexity of the proposed integer
program.

In the previous optimization model, the objective is to find the


optimum non-overlapping channels, fi , that minimizes the total APs
interference in the network taking into account that only the nonoverlapping channels are permissible. The first constraint in (8) is a
linear inequality representing the co-channel interference factor (2).
The second constraint is presented to assure that the first constraint
is equivalent to equation (2), as declared before. The third and fourth
constraints play the role of the EITHER-OR constraint. The fifth
constraint is to assure that only non-overlapping channels (1, 6and
11) are allowed.

R EFERENCES

l
y

AP3

AP7

where Gt and Gr are transmit and receive antenna gains in the line
of sight direction, respectively. In the simulation, it is assumed that
do = 5 m, Gt = Gr = 3 dBi and the AP transmit power equals 20
dBm.

[1]

III.

AP1
60

40

where do is the reference distance for the antenna far field, dij is the
distance between APi and APj and LF S (do ) is the free space path
loss for distance do , which is given by
4do
LF S (do ) = 20 log10 (
)dB,
Gt Gr

70

50

In this paper, we consider the following simplified channel path


loss model [6]
L(dij ) = LF S (do ) + 35 log10 (dij /do )dB,

AP2

AP9
90

N UMERICAL R ESULTS

In this section, we present simulation to compare the performance


of the proposed channel assignment algorithm with those of both
single channel assignment and pick-first greedy algorithm [3]. In the
pick-first greedy algorithm, we have implemented 100 iterations to
guarantee that the algorithm will converge. In the single channel
assignment, we assumed that all APs are assigned channel 11. The
simulation is performed on a 2.4 GHz processor. The free-ware
optimization solver LP SOLVE [7] is used to solve the optimization
model (8).

[2]
[3]
[4]

Figure 2 illustrates an example of a topology consisting of nine


APs. The behavior of this topology is described by Table I. It is clear
that the proposed algorithm provides total interference which is less
than that of the single channel assignment by 22.21 dB. Moreover,
the proposed algorithm provides total interference which is less than
that of the pick-first algorithm by about 2.13 dB.

[5]
[6]
[7]

IEEE Standard for Information technology-Telecommunications and


information exchange between systems-Local and metropolitan area
networks-Specific requirements Part 11: Wireless LAN Medium Access
Control (MAC) and Physical Layer (PHY) Specifications, IEEE Std
802.11-2007 (Revision of IEEE Std 802.11-1999), p. C1, 2009.
S. Chieochan, E. Hossain, and J. Diamond, Channel assignment schemes
for infrastructure-based 802.11 WLANs: A survey, Communications
Surveys & Tutorials, IEEE, vol. 12, no. 1, pp. 124136, 2010.
R. Akl and A. Arepally, Dynamic channel assignment in IEEE 802.11
networks, in Portable Information Devices, 2007. PORTABLE07. IEEE
International Conference on. IEEE, 2007, pp. 15.
M. Boulmalf, T. Aouam, and H. Harroud, Dynamic channel assignment
in IEEE 802.11 g, in Wireless Communications and Mobile Computing
Conference, 2008. IWCMC08. International. IEEE, 2008, pp. 864868.
J. Bisschop, AIMMS-Optimization modeling. Lulu. com, 2006.
A. Goldsmith, Wireless communications. Cambridge university press,
2005.
M. Berkelaar, K. Eikland, P. Notebaert et al., lpsolve: Open source
(mixed-integer) linear programming system. sourceforge.

41

.indd

41

2015/03/11

10:59:58

Chunk-based Resource Allocation in MISO-OFDMA


Systems with Fairness Provision

d
a
s
i
t

Mahmoud M. Selim1,3, Osamu Muta2, Hossam M. H. Shalaby1, Hiroshi Furukawa3

s
p
s
c
i
a

1 (Egypt-Japan University of Science and Technology (E-JUST)): Department of Electronics and Communications Engineering,
New Borg El Arab, Alexandria, Egypt, mahmoud.sleem@ejust.edu.eg, shalaby@ieee.org
2 (Kyushu University): Center for Japan-Egypt Cooperation in Science and Technology, Fukuoka-shi, Fukuoka, Japan,
muta@ait.kyushu-u.ac.jp
3 (Kyushu University): Graduate School of Information Science and Electrical Engineering, Fukuoka-shi, Fukuoka, Japan,
furuhiro@ait.kyushu-u.ac.jp

A
are uniformly distributed such that > . System chunks are

denoted by c, = 1,2, , where = . The corresponding

average channel magnitude vector encountered by user k, =


1,2, , over any chunk c is then given by , =

AbstractThis paper proposes a chunk-based resource allocation


(RA) algorithm for the Multi-Input Single-Output Orthogonal
Frequency Division Multiple-Access (MISO-OFDMA) systems.
Our proposed algorithm allocates resources chunk-by-chunk to
maximize system sum rate under both average bit error rate
(BER) and minimum user rate constraints with fairness provision.
Simulation results reveal that our proposed algorithm provides
efficient trade-off among system sum rate and fairness compared
to three reference algorithms in the literature.

INTRODUCTION

Orthogonal frequency division multiple-access (OFDMA) is


currently exploited in most of modern wireless standards to
permit wide-band data services [1]. Moreover, using multiple
antennas at the transmitter in OFDMA systems (i.e. MIMOOFDMA) increases system, but complicates the resource
allocation (RA) process due to the existence of the spatial
domain and the problem of interference generated from subcarriers sharing. To make the RA process more tractable, a group
of M contiguous sub-carrier are formed together into one
resource unit (known as chunk) and RA is done on a chunk-basis
rather than a sub-carrier basis.

where , = [,(1)+1 , ] is the

d
c
s
t
p
p
a

B. Power amd bit allocation


User power per sub-carrier n associated with chunk c,
denoted as , , among the group of users selected for chunk
c is allocated through water-filling as , = , [
1

] where , = {[( ) ]
+

} and is the | |

associated channel matrix, is obtained by solving the water-

Many works have addressed the optimal chunk-based RA


under different constraints; bit error rate (BER) constraint [2]
and fairness constraint among users [3]. We propose a chunkbased RA algorithm for the downlink transmission of multi-user
MIMO-OFDMA system that maximizes system capacity with
joint consideration of BER, quality-of-service (QoS) and
fairness among users. Our proposed algorithm exploits spatial
correlation for user selection among chunks and zero forcing
beamforming (ZFBF) to cancel inter-user interference. The
performance of our proposed algorithm, in terms of some system
metrics, is compared to the algorithms in [2, 3] and the round
robin (RR) algorithm that allocate resources fairly among users.
II.

1/2

complex channel matrix and , is the 1 complex


channel gain vector of any sub-carrier n. To increase system
capacity, simultaneous transmission to multiple users through
different transmit antennas is allowed and a group of users is
selected for transmission for every chunk based on proposed
algorithm explained in details in Section III. Inter-user
interference (IUI) amongst selected users due to frequency
sharing is handled through ZFBF.

Keywords-component; resource allocation; MISO-OFDMA;


multi-user; chunk-based; ZFBF

I.

(, , )

s
i
w
K
m
0
[
c
u
a

filling equation [

] = where is the sub-

carrier power. Adaptive Quadrature Amplitude


Modulation (QAM) is exploited to provide link adaptation and
satisfy average BER constraint. The associated modulation level
, , = {0,4,16,64} is chosen to satisfy that the average
BER over the associated chunk c, given by [4] (1)

, =

1.6,
1

)
0.2 ( 1
=(1)+1
,

u
o
a
o
i
l
e

(1)

is below a threshold level BERth where , is the associated


signal-to-noise ratio (SNR). Modulation level is assumed to be
the same within all sub-carriers of the chunk.

SYSTEM MODEL

III.

A. Channel model
We consider a single-cell multi-user OFDMA-based system
with N sub-carriers served by one centric base station (BS)
equipped with T transmit antennas and single-antenna K users

d
T
a
Q
p

PROPOSED CHUNK-BASED ALLOCATION ALGORITHM

A sub-optimal chunk-based allocation algorithm is proposed


to compensate for the optimal exhaustive search solution. The
algorithm is initialized with empty set of users of all available
chunks. Chunks are allocated sequentially one-by-one and

42

.indd

42

2015/03/11

11:00:00

e
g
=
=

x
m
h
s
d
r
y

,
k

e
d
l
e

d
e

d
e
e
d

during every chunk allocation, the group of users which didnt


achieve a predefined minimum rate is set as a candidate
set. Then, the user with best channel condition over this chunk
is selected for transmission over this chunk and removed from
the set of candidate users.
In a subsequent step, additional user selection continues by
searching for users with minimum spatial correlation with
previously selected users among the set of candidate users and
selecting the one which maximize the cumulative rate over this
chunk. User selection is terminated if cumulative rate doesnt
increase or number of users selected so far becomes T. The
algorithm is terminated when all chunks are allocated.
IV.

SIMULATION & RESULTS

A. Simulation Environment
We simulate the downlink of multi-user MISO-OFDMA
system with = 4 antennas, bandwidth = 100 MHz divided
into 1024 sub-carriers, under Rayleigh fading channel model
with exponential power decay profile (PDP). Number of users,
K, is set to 10, The BER constraint, BERth, is set to 10-3, the
minimum user rate per sub-carrier, Rmin/N , is set heuristically to
0.5. We compare our proposed algorithm with the algorithm in
[2] that maximizes sum rate under power and average BER
constraints only, the algorithm in [3] that maximizes sum rate
under power, BER and proportional rate constraints among users
and the round robin algorithm.

Fig. 1 Avg. Sum Rate per subcarrier vs. SNR.

B. Sum rate performance


Fig. 1 shows the average sum rate per sub-carrier for the
different algorithms against SNR in dB. Our proposed algorithm
considers QoS by preserving a minimum user rate for each user
so the sum rate, at low SNR values, is slightly lower compared
to reference algorithms in [2, 3]. At high SNR values, our
proposed algorithm highly outperforms the algorithm with
proportional rate constraints in [3] and slightly exceeds the
algorithm in [2].

Fig. 2 Fairness index vs. number of users.

sum rate and fairness compared to three different algorithms in


the literature.
REFERENCES

C. Fairness performance
To further show the effectiveness of our proposed algorithm
in terms of fairness among users, Fig. 2 shows the Jains fairness
index (FI) [5], defined as =

(
=1 )

2
=1

[1] R. Prasad, OFDM for Wireless Communications Systems.


Artech House, 2004.
[2] V. D. Papoutsis and A. P. Stamouli, Chunk-based
resource allocation in multicast MISO-OFDMA with average
BER constraint. IEEE Communications Letters, vol. 17, no. 2,
pp. 317320, 2013.

against number of

users, K, for the different algorithms. It is clear from Fig. 2 that


our proposed algorithm highly outperforms the two reference
algorithms in [2, 3] with almost stable performance as number
of users increases. Further simulation results reveal that fairness
index of the reference algorithm in [2] continues to decrease
linearly as number of users deployed increases which hinders its
efficiency.
V.

[3] V. D. Papoutsis and S. A. Kotsopoulos, Chunk-based


resource allocation in distributed MISO-OFDMA systems with
fairness guarantee, Communications Letters, IEEE, vol. 15,
no. 4, pp. 377379, 2011.
[4] A. J. Goldsmith and S.-G. Chua, Variable-rate variablepower M-QAM for fading channels, IEEE Transactions on
Communications, vol. 45, no. 10, pp. 12181230, 1997.

CONCLUSION

We proposed a chunk-by-chunk allocation algorithm for the


downlink transmission of multi-user MISO-OFDMA systems.
The proposed algorithm tends to maximize sum rate under
average BER constraint. The proposed algorithm also considers
QoS by preserving a minimum user rate chunk allocation. Our
proposed algorithm provides efficient trade-off among system

[5] R. Jain, D.-M. Chiu, and W. R. Hawe, A quantitative


measure of fairness and discrimination for resource allocation
in shared computer system. Eastern Research Laboratory,
Digital Equipment Corporation, 1984.

43

.indd

43

2015/03/11

11:00:01

Limited Feedback Interference Alignment Design


for Heterogeneous Networks: Comparative Study
Mohamed Rihan, Maha Elsabrouty, Osamu Muta, Hiroshi Furukawa,
access point/BS, namely BS1 and BS3 . See system model in
[3] for details. The femtocell BS (FBS) serves one user per cell
and the macrocell BS (MBS) serves two users simultaneously.
(i)
Each served user receives d independent data streams, sk ;
i = 1, ....., d, along linearly independent precoding vectors Vk
from its corresponding BS. As a typical antenna configuration
for macrocell-femtocell scenarios [1], we assume that each
FBS and MBS are equipped with M = 2 and 2M = 4
transmit antennas, respectively, and all users have M = 2
receive antennas. Each user decodes its desired signal arriving
from its corresponding BS by multiplying the received signal
by the receive post-coding matrix (Wk ). The signal at user k
is given by: 

Abstract In this paper, we propose a limited feedbackbased interference alignment (IA) scheme suitable for two tier
macrocell-femtocell heterogeneous networks. Firstly, an analytical expression for the total system sum rate loss due to the
employment of limited feedback channel versions is derived.
Then, a comparative simulation study is done between two IA
schemes that are employed in our proposed limited feedback
system, namely, Hierarchical IA (HIA) scheme and Iterative
Reweighted Least Squares (IRLS) IA scheme. Simulation results
confirmed the severe effect of quantization of CSI on the IA
performance. Additionally, the obtained results show that the
IRLS based IA scheme is more robust to quantization errors
that the HIA scheme.

I. I NTRODUCTION

(i)

yk = Wk


Heterogeneous network (HetNet) is considered as a promising technology for cellular networks to extend the coverage
and capacity [1]. However, the existence of HetNets is accompanied with large intercell interference (ICI). Many IA
techniques proposed to recover from such ICI [1][2]. In [1],
the authors proposed a Hierarchical IA (HIA) technique. In
HIA, the transmit weights for the femtocell BSs (FBSs) are
calculated first, followed by calculating that of the macrocell
BS (MBS). All the transmit weights can be calculated in closed
form by separating the calculations of FBSs and MBS. In [2],
we proposed a downlink interference mitigation framework
that based on two algorithms, namely, the restricted waterfilling (RWF) algorithm and the IRLS based IA algorithm.
This framework showed an excellent performance in HetNet
scenarios compared with other IA techniques. The RWF
algorithm is responsible for maximizing the downlink sum rate
of the MBS on a restricted number of eigenmodes leaving the
other eigenmodes for the operation of the accompanied shared
spectrum femtocells [2]. These femtocells coordinate their
transmissions to be in such directions that are free from MBS
transmissions. However, neither the achievable performance of
the HIA technique nor that of the IRLS based IA technique is
clarified in limited feedback environment.
In this paper, we will make a comparative study for both
the HIA technique and the IRLS-IA technique to clarify
achievable performance in limited feedback environment. This
will also be accompanied with evaluating the upper bound of
sum-rate loss obtained with limited feedback systems in closed
form.
II. S YSTEM M ODEL

(i)

+ Wk


desired signal stream

dk 

pk
(j) (j)
k,f (k) Hk,f (k) Vk sk
d
k
j = 1
j = i



Intrauser interf erence


(i)

+ Wk


pk
(i) (i)
k,f (k) Hk,f (k) Vk sk
dk



dm 
4


pm

m = 1 l=1
m = k

dm

(1)

(l) (l)
k,f (m) Hk,f (m) Vm sm +
nk



Interuser interf erence

III. P ROPOSED L IMITED F EEDBACK IA F RAMEWORK


The proposed vector quantization based limited feedback IA
framework can be tracked from Algorithm (1). The quantization process used in this study is accomplished using random
vector quantization (RVQ) based on minimum chordal distance
(CD). This quantization process is based on Eq. (2)(k and j
are denoted in algorithm (1)).
ekj = min

ci C

2 



1 
hkj ci 

(2)

is the normalized channel vector to be quantized,


where h
and ci is one element from coodbook C. Through the chordal
distance (CD) minimization model [4], each channel unit
kj is expressed as the sum of two vectors, one in the
vector h
direction of the quantization, and the other isotropically distributed in the nullspace of the quantization. This is expressed
mathematically as [5]:

hkj =

1 ekj
ekj skj
hkj +

(3)

Based on similar mathematical manipulations as in


[3][4][5], the sum rate upper bound can be obtained as:

We consider a macrocell-femtocell heterogeneous network


with one macrocell served by a basestation (BS) BS2 and two
shared spectrum femtocells each served with one femtocell

(i)
Rk

4



tot =
RLtot = E Rtot R
E {RLk }

(i)
R
k

(4)

k=1

where
and
are the perfect CSIs and limited feedback
based CSIs sum rate for the ith stream of the k th user

44

.indd

44

2015/03/11

11:00:03

( k ) k,f (k)

4 dk
dk




log2 1 + 
Rtot =
2

d



 k  p
k=1 i=1
(i)
(j) 


( k ) k,f (k) W
Hk,f (k) V
+

dk
k
k 
j=1
j=i

(1) Each user k, will use its codebooks Ck,f (k) with Bk,f (k) bits, to
k,f (k) .
quantize its cross links CSIs, Hk,f (k) , to its quantized version H
(2) Each user k will send the vector indexes of all its cross channels
obtained from step (1), to its corresponding BS f (k).
(3) Each BS receives the channel indexes from all its served users, and
using the same codebooks Ck,f (k) construct the quantized version of the
k,f (k) .
channels H
(4) Each of the FBSs forwards the quantized channels of the FUs users
to the MBS through the backhual links. The MBS will add the quantized
cross channels CSIs of its MUs and forward all the quantized channels to
the IA design central unit.
(5) The IA design central unit will use the quantized CSIs forwarded by
MBS to evaluate the IA transceivers using either the IRLS algorithm in
i ).
i, V
[2] or the HIA algorithm in [1], (W
(6) The IA 
design central unit will forward all IA transceivers to the MBS,

i, V
i) .
(W
i ), hierarchically through its BS,
(7) Each user will obtain its precoder, (V

RL

UB

dk
4 


k=1 i=1

4


dk 

pk 

j=1
j=i

d
m  pm 


m=1 l=1
m=k

dm

Bk,f (k)

dk

Bk,f (m)

M2
M2 1
M2

M2 1

,2

,2

Bk,f (k)

Bk,f (m)

10

15

20

25

30

35

40

SNR (SignaltoNoise Ratio in dB)

Sum rate Loss (Bits/Sec/Hz)


Sum rate Loss (Bits/Sec/Hz)
Sum rate Loss (Bits/Sec/Hz)

40

Loss rate upper bound


Average loss rate with HIA
Average loss rate with IRLS

30
20

(a) With SNR = 35 dB


10
0

7
9
11
B (Number of feedback bits)

13

15

30

Loss rate upper bound


Average loss rate with HIA
Average loss rate with IRLS

20

(b) With SNR = 25 dB

10
0

7
9
11
B (Number of feedback bits)

13

15

15

Loss rate upper bound


Average loss rate with HIA
Average loss rate with IRLS

10

(c) With SNR = 15 dB

5
0

7
9
11
B (Number of feedback bits)

13

15

Effect of bit resolution value (B bits) on the total sum rate


loss at different SNR values.

levels (B) and at different signal to noise ratio (15 dB, 25 dB,
and 35 dB). This is because IRLS-IA scheme depends on an
optimization problem that aims to maximize the sum-rate.
V. C ONCLUSION
A limited feedback IA framework for heterogeneous networks has
been proposed. The proposed framework employed together with both
IRLS and HIA schemes. An expression for the sum-rate loss upper
bound for the proposed HetNet scenario is derived. A comparative
simulation study for the proposed limited feedback framework is accomplished based on both IRLS-IA and HIA schemes. The simulation
results showed the IRLS IA scheme is more robust to interference
miss alignment, occurred due to quantization process, than the HIA
scheme.

10

Fig. 2.


1 
n
B
Bk,f (k) 
1+
, 2 k,f (k)
2 1
n
M
n=1

+ log2 1 +

15

Sum rate performance with different bit resolution levels in


comparison with perfect CSI case.

(log2 (e)) 2

20

HIA with Perfect CSI


HIA enabling RVQ with B = 3 bits
HIA enabling RVQ with B = 7 bits
HIA enabling RVQ with B = 15 bits
IRLS enabling RVQ with B = 3 bits
IRLS enabling RVQ with B = 7 bits
IRLS enabling RVQ with B = 15 bits

Fig. 1.

respectively, and Rtot (total system sum-rate with perfect


CSI), according to Eq.(1), is expressed in Eq.(5). RLk is the
sum-rate loss for user k.
After long mathematical manipulation, we get the upper
bound on the downlink sum rate loss for the assumed HetNets
scenario as (details in derivation are omitted due to page
limitation.):

25

0
0

i ) from the MBS through the forward


that catches its receiver matrix (W
control channels.

(5)

30

Average sum rate (bits/sec/Hz)

Algorithm 1 Limited-Feedback Algorithm for HetNets


2


 (i)
(i) 

W
Hk,f (k) V

 k

k 


2

 

dm 

 4 

(i)
(l) 
pm


)

W
(
H
V
+
1

k,f (m)
k,f (m) m 
dm
k
 m=1 l=1

m=k

(6)

where (a, b) is Beta function for the constants a and b, and


M represents the number of antennas. Also Bk,f (k) is the
feedback bits between user k and its FBS indexed as f (k).

R EFERENCES

[1] Wonjae Shin, Wonjong Noh, Kyunghun Jang, Hyun-Ho Choi, Hierarchical Interference Alignment for Downlink Heterogeneous Networks,IEEE
Trans. on Wireless Comm., vol. 11 no. 12 pp. 4549 - 4559, Oct. 2012 .
[2] Mohamed Rihan, Maha Elsabrouty, Osamu Muta, and Hiroshi Furukawa,
Iterative Interference Alignment in Macrocell-Femtocell Networks: A
Cognitive Radio Approach,IEEE inter. Symposium on Wireless Comm.
Systems (ISWCS), Barcelona-Spain, August 2014.
[3] Mohamed Rihan, Maha Elsabrouty, Osamu Muta, Hiroshi Furukawa,
Interference Alignment with Limited Feedback for Macrocell-Femtocell
Two-Tier Heterogeneous Networks,Technical Report of IEICE RCS,
RCS 2014-178, Vol. 114, No. 254, October 2014.
[4] N. Jindal, MIMO Broadcast Channels With Finite-Rate Feedback,IEEE
Trans. on Inf. Theory, vol. 52, no. 11, pp. 5045-5060, Nov. 2006.
[5] R. Bhagavatula and R. W. Heath, Adaptive Bit Partitioning for Multicell
Interference Nulling with Delayed Limited Feedback,IEEE Trans. Signal
Proc., vol. 59, no. 8, pp. 3824-3836.

IV. S IMULATION RESULTS


In this part, we evaluate the performance of the HetNet
scenario described in Sect. II. In Fig.1, the sum-rate performance for limited feedback system based on either HIA (black
curves) or IRLS-IA (red curves) is evaluated at different bit
resolution levels (3,7, and 15). It is obvious that the IRLSIA is more robust to the interference misalignment caused by
quantization process. Also, it is clear that, as the number of
feedback bits increases, the more we approaches the case of
perfect CSI (blue color).
In Fig.2, it is obvious that the sum-rate loss due to limited
feedback in both algorithms obeys the upper bound expression
we derived. Additionally, it is clear that our IRLS-IA algorithm
causes lower sum-rate loss than HIA, at any bit resolution

45

.indd

45

2015/03/11

11:00:04

A Reduced Complexity K-best Sphere Decoding


Algorithm for MIMO Channels
Ibrahim Al-Nahhal1 , Masoud Alghoniemy2 , Osamu Muta3 , Adel B. Abd El-Rahman1 , Hiroshi Furukawa4
1

Egypt-Japan Univ. of Science and Technology, Egypt. ({ibrahim.al-nahhal},{adel.bedair}@ejust.edu.eg).


2

3
4

University of Alexandria, Egypt. (alghoniemy@alexu.edu.eg).

Center for Japan-Egypt Cooperation in Science and Tech., Kyushu Univ., Fukuoka, Japan. (muta@ait.kyushu-u.ac.jp).

Graduate School of Information Science and Electrical Eng., Kyushu Univ., Fukuoka, Japan. (furuhiro@ait.kyushu-u.ac.jp)

III. P ROPOSED A LGORITHM

AbstractA variant of the K-best (KB) MIMO decoding algorithm is proposed, namely, reduced complexity K-best (RCKB).
The reduced complexity K-best provides significant complexity
reduction up to 51.7% with performance reminiscent of the traditional KB, in well-conditioned channels. Complexity reduction
is the result of discarding irrelevant nodes in the tree that have
distance metric greater than a predetermined radius at each tree
level. Complexity analysis and simulation results are presented.

Fixing the number of nodes that survive at each tree level may
result in visiting unnecessary nodes. In order to see this, consider the
following eight distance metrics in the third tree level shown in Fig.
1, D = [ 0.2 8 9 9 9 9 10 10 ], the KB algorithm [4]
with K = 2 will choose the smallest two values which are [0.2
8]. It is clear that it is unlikely that the surviving path will mitigate
from the node with metric eight especially near the end of the tree.
Hence, the number of surviving nodes at each tree level should be
varied adaptively. In particular, the radius i should be modified as
we traverse the tree, we provide a heuristic for determining the pruned
radius, i at a specific tree level i. In particular

Index TermsMIMO systems, K-best, sphere decoder.

I. I NTRODUCTION
In multi-input multi-output (MIMO) communication systems, the
traditional K-best sphere decoder (KB) memorizes the best K-nodes
at each level of the search tree [1]. The chosen K-nodes include
irrelevant nodes that increase the decoding complexity without performance improvement; by discarding these irrelevant nodes, one can
decrease the complexity without compromising the performance.
In this paper, a variation of the KB decoder for MIMO systems is
proposed, namely, reduced complexity K-best (RCKB). The RCKB
provides lower complexity than the traditional KB algorithm without
sacrificing its performance in well-conditioned channels. The reduction in complexity comes from discarding irrelevant nodes in every
tree level according to a specific value which varies from tree level
to another.

i K 2 dmin
i
,
i =
10SN R/10

), K), i = 2M 1, , 2.
NiRCKB = min(card(RCKB
i

(4)

where card(RCKB
) is the cardinality of the set
i
RCKB
RCKB
= {j|dji < i , j {1, 2, , Ni+1
q}}
i

(1)

where x is the M 1 transmitted vector, y is the N 1 received


vector, and H is N M channel matrix whose elements, hnm ,
represent the Rayleigh flat-fading gains from transmitter m to receiver
n, is the lattice whose points represent all possible codewords at
the transmitter.
The sphere decoder (SD) reduces the computational complexity
by limiting the search space inside a sphere of radius centered at
, should
the received signal vector y. The estimated signal vector, x
satisfy the radius constraint y H
x < [2]. The SD transforms the
closest-point search problem into a tree-search problem by factorizing
the channel matrix, H = QR, where Q is a N N unitary matrix
and R is an upper triangular matrix of size N M . Thus, (1) can
be rewritten as
SD = arg min 
y Rx ,
x

(3)

where dmin
is the minimum distance metric at tree level i. It is clear
i
that in this case, the radius value of the sphere, i , is not fixed and
varies depending on the tree level, i, and the operating SNR for a
specific K value. Note that equation (3) has no proof, but it provides
good results.
The RCKB algorithm discards visiting unnecessary nodes in order
to reduce the complexity without affecting the performance. To
achieve this, only NiRCKB nodes survive at each tree level i, where
NiRCKB is given by

II. BACKGROUND
In additive white Gaussian noise environment (AWGN), the maximum likelihood (ML) decoder is the optimum decoder where the ML
M L that minimizes the 2-norm
solution finds the symbol estimate x
of
M L = arg min y Hx2 ,
x

i = 2M, 2M 1, , 2

(5)

i = 2M 1, 2M 2, , 2.
In essence, card(RCKB
) is the number of nodes at level i that
i
have distance metrics smaller than the pruned radius i .
Figure 1 illustrates a numerical example for the RCKB algorithm
for16-QAM signaling with 2 2 MIMO and K = 2 at SNR = 5dB;
the RCKB algorithm starts at the second higher tree level, i = 3.
From Eq. (3), 3 = 6.75dmin
= 1.35. Then, according to (4), the
3
number of survived nodes at this tree level N3RCKB = min(1, 2) =
1. Similarly, for the next tree level i = 2, 2 = 35.6. According to
(4), the number of survived nodes N2RCKB = min(4, 2) = 2 which
are nodes with distance metrics [8 9]. It is clear that, unlikely nodes
have been discarded without affecting the solution.

(2)

is the subset of the lattice that lies inside


where y
= QH y and
the sphere of radius centered at the received signal vector y. The
distance metrics d(x) = 
y Rx can be computed recursively using
partial Euclidean distances [2]. Note that this paper uses real form
representation mentioned in [3].

IV. C OMPLEXITY A NALYSIS


In what follows, we determine the complexity of the traditional
KB and RCKB algorithms. Complexity in this paper is defined as the
average number of visited nodes (VNs) in order to find the solution.

46

.indd

46

2015/03/11

11:00:06

2
Root

+1

+3

1.2

4 = 4.45 dmin
= 35.6

7.8

0.05

9.9

12.8

10
8.8

3
11

10

9.1

13

4
14

Pruned node

9
4
12

Complexity curves
4

10

8
8

9
8.9

13

8.9

.2

14

SNR = 5dB
8.9
3 = 6.75 dmin
= 1.35
9

10

14

-1

10

11

18

10

Leaf

12

10

11

12

17

KB solution = RCKB solution

18

10

Saved nodes in
RCKB over KB

Fig. 1. Tree representation of KB & RCKB for 16-QAM 2 2 MIMO and


K = 2 at SNR = 5dB.

To determine the complexity of the KB algorithm, tree levels are


divided into two groups. The first group contains tree levels where
number of available nodes in each tree level, NiKB K, whereas
the second group contains tree levels where number of available nodes
per level > K (see Fig.
1 for K = 6). Note that, each survived
node is expanded into q child nodes in the next tree level. Then,
number
(the same as available nodes in this group), N KB =
 Kof VNs

q pj=0
NjKB , for the first group is

PK 1


( q)j ,

j=0

( q)PK K,

(6)

where PK is number of tree levels in first group for specific K.


Given that ( q)PK K < ( q)PK +1 and knowing q and K, we
can determine PK [5].
PK =


ln(K)
,

ln( q)

(7)

where . is the floor operation.


For the second
group, each tree level has a fixed number of VNs,
NiKB+ = K q nodes. Then, total number of VNs in the second
group

N KB+ = (2M PK 1)K q.

Using PK from (7), the complexity of the KB,


of VNs
KB
CK
=

KB
CK
,

(8)

is total number

1 ( q)PK +1
+ (2M PK 1)K ,

1 q

(9)

B. Complexity of the RCKB


According to (4), the maximum number of VNs is K, then it is
clear that the complexity of the RCKB is upper bounded by the complexity of the KB. The lower bound is achieved when the minimum
number of VNs, NiRCKB = 1, isconsidered at all tree levels of
1( q)PK +1

+ (2M PK 1)
the second group. Hence, q
1 q
RCKB
KB
CK
CK
.
The percentage gain in complexity can be defined as
gain
CK
=

10

SNR (dB)

15

Fig. 2. Performance and complexity comparison of RCKB and KB for


16-QAM over 4 4 MIMO channels.
TABLE I
P ERCENTAGE COMPLEXITY GAIN FOR THE RCKB.

A. Complexity of the KB

N KB =

Average visited nodes

BER

1.2

0.15

-3
.15

340
320
300
K=4
RCKB4 280
260
K=6
RCKB6 240
220
200
180
160
140
120
100
80
60
40
20
0
20
K=2
RCKB2

Level (i)

Branch Label
(symbol)

First group
(if K = 6)

Node metric

Second group
(if K = 6)

Branch

Branch distance
metric

KB C RCKB
CK
K
KB
CK

100%

(10)

V. S IMULATION
The performance of the proposed decoder is compared to the KB
decoder. It is assumed that the transmitted power is independent of the
number of transmit antennas, M , and equals to the average symbols
energy in a Rayleigh fading well-conditioned channel. Figure 2
illustrates the performance and complexity of RCKB for 16-QAM

K
2
4
6
2
4
6
2
4
6

KB
CK

RCKB
CK

3 3 MIMO
44
84
116
4 4 MIMO
60
116
164
4 4 MIMO
120
232
344

max
Cgain

for 16-QAM
32
27.2%
48
42.9%
56
51.7%
for 16-QAM
44
26.7%
68
41.4%
84
48.8%
for 64-QAM
88
26.7%
136
41.4%
184
46.5%

4 4 MIMO system, compared to KB algorithm for different


K values. As shown in Fig. 2, the RCKB decoder has identical
performance to the traditional KB with complexity saving up to
48.8% in case of K = 6 at higher SNR. Table I summarizes the
complexity gains of RCKB over KB, for different constellation sizes
and MIMO systems, using Eqs. (9) and (10) as well as the simulation
results not mentioned in this paper.

VI. C ONCLUSIONS
We have proposed a modified K-best sphere decoding algorithm,
namely, reduced complexity K-best (RCKB). The RCKB achieves
complexity reduction compared to the KB without sacrificing its
performance. We have provided complexity analysis for proposed
and traditional algorithms. Simulation results have confirmed the
improvement of the proposed decoder.

R EFERENCES
[1] Z. Guo, P. Nilsson, Algorithm and Implementation of the Kbest Sphere Decoding for MIMO Detection, IEEE Journal On
Selected Areas In Communications, pp. 491-503, 2006.
[2] Y. Hsuan Wu, Y. Ting Liu, H. Chang, Y. Liao, H. Chang, EarlyPruned K-best Sphere Decoding Algorithm Based on Radius
Constraints, ICC, pp. 4496-4500, 2008.
[3] M. O. Damen, H. El Gamal, G. Caire, On maximum-likelihood
detection and the search for the closest lattice point, Information
Theory, IEEE Transactions, pp. 2389-2402, 2003.
[4] R. Shariat-Yazdi, T. Kwasniewski, Configurable K-best MIMO
Detector Architecture, ISCCSP, pp. 1565-1569, 2008.
[5] R. Graham, D. Knuth, O. Patashnik, Concrete Mathematics,
Addison-Wesley, 1989.

47

.indd

47

2015/03/11

11:00:08

.indd

48

2015/03/11

11:00:08

.indd

49

2015/03/11

11:00:08

MTS: Multicasting Tabu Search Mechanism with


near optimum multicast tree on OpenFlow
ALAA ALLAKANY1, Koji OKAMURA2
1

Graduate school of Information Science and Electrical Engineering, Kyushu University, Japan.
1
Faculty of Science, Kafrelsheik University, Egypt. Alaa_83moh@yahoo.com
2
Research Institute for Information Technology, Kyushu University, Japan.
controller has a global view of the current status of the network
and can interact with its network devices. All the multicast
management, such as multicast tree computing and group
management are handled by the this controller, and the
controller has complete knowledge of the topology and the
members of each group, thus it can create more efficient
multicast trees than the distributed approach [3].

Abstract Many multimedia applications including video


conferencing, content distribution and multi-party games etc.,
require multipoint communication, in order to reduce network
traffic rates. Multicast has been used as an efficient and scalable
technology for multimedia distribution. However, in the IP
multicast architecture, the routers need to be involved both in
forwarding and management, which cause some limitations, such
as high bandwidth consumption, long latency, redundant tree
calculation, control and management of group communication.
With the appearance of Software Defined Networking (SDN),
represented by OpenFlow technique, network control and
management become possible to remote network administrators.
The centralized controller is a core part in our proposed
multicasting approach, which is responsible for handling the
member request, constructing the multicast tree, and multicast
state maintenance, the routers just forward packets. In this paper
we propose a Multicasting Tabu Search approach (MTS) based
in tabu search algorithm for multicasting, where the construction
of near optimum multicast tree and rapid joining new members
to current tree is performed.

Various multicast mechanism and algorithm are proposed,


the author in [4] provides a mechanism to compute multicast
trees centrally by flooding group membership information to
all multicast routers. MOSPF has a scalability problem that all
routers have to compute a multicast tree per multicast group
when new multicast group appears or receivers join in or leave
from multicast groups. The author in [5] have suggested high
level primitives (API) based in Open-Flow to provide a more
friendly development of multicasting networks. These
primitives have a simplified implementation of the OpenFlow
multipoint protocol, but does not consider changes in multicast
groups. In [6] this paper proposes a multicast clean-slate
approach logically centralized based on SDN and anticipated
processing for all routes from each possible source. The author
in this paper aiming to reduce event delays form source to each
destination but dont consider minimizing the total edges in
construction multicast tree. Most multicast tree construction
algorithms namely heuristic algorithms assume centralized
computation so, applying these algorithms in IP multicasting
not efficient because of it is distributed system, but OpenFlow
enables us to use these algorithms because of centralized
control and programmability. The advantage of our proposed
approach is constructing near optimum multicast tree and rapid
processing time of events (join or leave any member in
multicast group members).

Keywords: Multicast Routing, SDN and Open Flow.

I.

INTRODUCTION

IP multicast is a distribution paradigm that sends IP packets


to multiple receivers in a single transmission, in a one-to-many
or many-to-many fashion. This allows reducing source server
load and increasing network capacity savings. IP multicast
supports a variety of applications such as IPTV streaming,
video conferencing, multi-location backups or online multiplayer gaming [1]. IP multicasting still faces some problems:
Firstly, Traditional multicast routing algorithms require routers
to participate in data forwarding and control management. In
addition, the multicast routers need to maintain each group
state, which arouse a lot of control overhead and add
substantial complexity to routers. Secondly, Routers construct
and update the multicast tree in a distributed manner; each
router has only local or partial information on the network
topology and group membership and there are high number of
communication messages that neighbouring routers have to
exchange in order to update their multicast trees, every time a
client joins or leaves a multicast group. So it is difficult to build
an efficient multicast tree due to the lack of global information.

II.

DESIGN AND IMPLEMENTATION OF OPENFLOW


CONTROLLER FOR MULTICASTING

A. MTS Architecture based on openflow


Figure 1 show the system architecture of proposed MTS.
There are two main component controller and openflow
switches (forwarder). Controllers manage the multicast group
state, construct the multicast tree and handle the host requests
sent from the forwarders, while forwarders are only need to
receive instructions from the controller and forward data. In the
next subsections we will discusses in details the function of the
modules in the openflow controller.

Recently, SDN is presented as a networking approach that


facilitates the decoupling control plane and data palan using
remote controller. OpenFlow protocol [2], defines the
communication between openflow switches and the controller
of the network. With the centralized network, OpenFlow

50

.indd

50

2015/03/11

11:00:09

C. Rapid updata Multicast Tree


It is inefficient to calculate multicast trees whenever
receivers join in or leave from multicast groups. For the
efficiency, this module caches pack up paths that covering
all of switches in the network. Figure 3 show and example
show how can this modules update rapidly the multicast tree
in case of join new member at switches number 8, in our
methods we caches 3 back up paths for each destination.
And using these back up paths we can find faster three
neighbors multicast tree for the current tree. Then the one
with minimum cost (second multicast tree with cost 15) will
be chosen for joining the new member.

Figure 1. An overview of the design of OpenFlow Controller

1- Network information modules:


This modules used for collecting topology information Also
obtain and store sender and receiver information including
locations of devices and provide this information for multicast
tree construction and update multicast tree modules.
2- Multicast tree construction modules:
When controller received new multicast group notification,
firstly Network information modules process these messages to
obtain sender and receiver information, then notify multicast
tree computation modules for construction the minimum
multicast tree. In this step we use Tabu search algorithm as
heuristic algorithm for constricting multicast tree as described
in section B.
3- Update multicast tree modules:
Whenever controller receivers join in or leave message
from multicast groups this modules is used for rapid updating
multicast tree as described in section C.

Figure 3. Example show how updating multicast tree modules work

III.

B. The proposed multicast algorithm


For constructing multicast tree we used heuristic tabu
search algorithm. It is heuristic algorithm that used to search
for optimum solution in neighbor solution. The advantage of
using TS is the capability of escaping from local optimum.
Figure 2 show TS algorithm steps for construction near
optimum multicast tree.

ESTIMATED RESULTS

The NOX controller will be used to for our implementation due


to its scalability. Also, the Mininet will used in order to build
the network topology. We have two Parameters for evaluation
and compression: The time used to join new host to current
multicast tree. And Construction near optimum multicast tree.
REFERENCES
[1]

[2]

[3]

[4]
[5]

[6]

Figure 2. TS flowchart and proposed TS for near optimum multicast tree

H. Holbrook, B. Cain, and B. Haberman. Using Internet Group


Management Protocol Version 3 (IGMPv3) and Multicast Listener
Discovery Protocol Version 2 (MLDv2) for Source-Specific Multicast,
RFC 4604, Aug. 2006.
N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J.
Rexford, S. Shenker, and J. Turner. Openflow: enabling innovation in
campus networks. ACM SIGCOMM Computer Communication
Review, 38(2):6974, 2008
C. Marcondes, T. Santos, A. Godoy, C. Viel, and C. Teixeira.
"CastFlow: Clean-slate Multicast Approach using In-advance Path
Processing in Programmable Networks", IEEE Symposium on
Computers and Communications (ISCC), Jul. 2012.
J. Moy, Multicast Extensions to OSPF, RFC 1584 (Historic), Internet
Engineering Task Force, Mar. 1994. [Online]. Available:
http://www.ietf.org/rfc/rfc1584.txt
YAP, K.-K.; HUANG, T.-Y.; DODSON, B.; LAM, M.S.; MCKEOWN,
N. 2010. Towards Software-Friendly Networks. In: ACM ASIAPACIFIC WORKSHOP ON SYSTEMS, 1, New York, 2010.
Proceedings New York, p. 49-54.
Cesar A. C. Teixeira, "CastFlow: Clean-slate multicast approach using
in-advance path processing in programmable networks", ISCC, 2012,
2013 IEEE Symposium on Computers and Communications (ISCC),
2013 IEEE Symposium on Computers and Communications (ISCC)
2012, pp. 000094-000101, doi:10.1109/ISCC.2012.6249274.

51

.indd

51

2015/03/11

11:00:11

Modeling the Impact of Clustering on the


Lifetime of Wireless Sensor Networks
1

w
o
i

a
f

Farhad Mehdipur1, Mirza Ferdous Rahman2, Kazuaki J. Murakami2

EJUST Center, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
2
Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan
Email: farhad@ejust.kyushuu.ac.jp, {ferdous, murakami}@soc.ait.kyushuu.ac.jp

c
n
n
q
t
p
s
c
T

p
n
a
f
t
w
c
T
v
p

I. INTRODUCTION
The amount of energy consumed by sensor nodes in a
wireless sensor network (WSN) may be too variant. Reducing
the energy level of a sensor node under a certain threshold
causes its death. Consequently, the death of sensor nodes may
lead to shortened lifetime of WSN. Dividing a network into
smaller partitions including a cluster head and a number of
sensor nodes in the cluster may alleviate the problem and
enhance the network lifetime. Unlike traditional case that every
sensor transmits data directly to the destination, in a clustered
network, the data is transmitted by the cluster heads via several
hops to the base station. This results in saved energy for the
network. However, to reduce the nodes death rate, it may be
necessary to recluster the network at certain conditions.

Fig.1. A WSN including clusters, where data is transmitted from sensors (black
nodes) via cluster heads (red nodes) to the base station

II. THE PROPOSED MODEL


In this section, firstly the energy consumption of WSN
without and with clustering taking into account various
parameters are formulated. The terminology we use in the
following equations is as follows:
Et: The total energy of WSN including the available energy in
network components i.e. sensor nodes.
Er: The amount of energy consumed during normal operation
of the network (data transmission, data processing, etc.)
Ec: The amount of energy consumed during clustering
operation such as communication and negotiation phases for
constructing clusters.
Pr: The power consumed for normal operation of the network.
Pc: The power consumed for clustering operation.
tlf: The total lifetime of WSN.
tc: The time spent for clustering.
nc: The total number of clustering during the network lifetime.
: The amount of improvement in power consumption of
network after each clustering operation.
In a WSN equipped with clustering, the energy
consumption in the network comprises of two major
components: energy consumed during normal operation (Er)
and clustering (Ec):

Clustering is performed autonomously by the sensor nodes,


and the cluster head and cluster members are decided based on
negotiations among them. Many research works have been
conducted on the clustering problem. According to the
literature [13], a WSN with clustering facility is more energy
efficient than its counterpart WSN lacking this facility. Various
clustering techniques have been proposed such as LEACH,
EDACH, and DHEED that can achieve noticeable
improvement in network lifetime [13]. But there is no any
recommendation given by the previous work on how frequent
clustering should be done in favor of the highest efficacy of
energy.
Clustering enhances the network lifetime by improving the
way that energy consumed throughout the network. However,
it consumes energy as well as time. Thus, a very frequent
clustering may negatively impact the network lifetime. In this
paper, we formulate this problem and draw equations so that
we can obtain an optimal number of clustering in favor of
maximum lifetime for the network.

E t - n c x E c= E r

w
o
T
c
w
p
t
c

v
f
e
i
o
s
F
i
i
p
n
c
t
e

(1)

In the case that clustering is performed once in the initial


phase of network operation, Eq. 2 expresses the energy model:
(2)
(Et - Pc x tc) x (1+) = Pr x tlf

52

.indd

52

2015/03/11

11:00:12

g
r

y
r
)

Total energy of WSN (Et)


The number of clusterings (nc)
Energy improvement factor ()
Clustering time (tc)
The ratio of power of regular network operation to
the power of clustering (Pr/ Pc)

and, the equation can be extended to the following general


form when clustering is run for nc times.
[Et - (nc x Pc x tc)] x (i= 0 i)= Pr x tlf
(4)
Accordingly, tlf can be represented as:
tlf =[Et - (nc x Pc x tc)](i= 0 )/Pr

1000 Joules
from 1 to 100
from 0.0 to 0.9
1 sec
from 1 to 1

10000

(5)

alpha= 0.0

9000

WSN lifetimetlf (sec)

Referring to Eq. 5, to increase the network lifetime,


clustering is desirable, but it consumes some amount of
network energy. Therefore, a tradeoff is needed between
network lifetime and the number of clusterings. The vital
question is that how frequent clustering is more beneficial for
the network? We have examined the effect of various
parameters on the network lifetime using the above model. Our
study shows that the lifetime is optimal for a certain number of
clustering, and further clustering degrades the network lifetime.
The result of the analysis is explained in the next section.

alpha= 0.1

8000

alpha= 0.2
alpha= 0.3

7000

alpha= 0.4

6000

alpha= 0.5
alpha= 0.6

5000

alpha= 0.7

4000

alpha= 0.8

3000

alpha= 0.9

2000
1000
0
0

10

20

30

40

50

nc

60

70

80

90

100

Fig.2. WSN lifetime vs. the number of clusterings for ranging from 0.0 to 0.9

III. EXPERIMENT RESULTS

2500

We have run our experiments on MATLAB based on the


parameters listed in Table 1. We assume that the total energy of
network is the summation of the amount of available energy in
all sensor nodes. Also, the energy efficiency is improved by the
factor of that varies between 0 and 0.9 in our experiments,
though very high values for may not be realistic. In addition,
we consider a variable range for the ratio of the power
consumption of regular network operation to that of clustering.
The ratio ranges from almost equal (Pr/Pc 1) to much higher
values where regular operation consumes considerably higher
power than clustering (Pr/Pc 1).

alpha= 0.1, Pr/Pc>1


alpha= 0.1, Pr/Pc=1

WSN lifetimet lf (sec)

N
s
e

Table1: Parameters and their values used in the experiments

where, 1+ represents the coefficient for the enhanced power


of the network due to clustering. Further, in the case, the WSN
is clustered two times during network operation:
(3)
(Et - 2 x Pc x tc) x (1++2) = Pr x tlf

2000

alpha= 0.3, Pr/Pc>1


alpha= 0.3, Pr/Pc=1

1500

alpha= 0.6, Pr/Pc>1


alpha= 0.6, Pr/Pc=1

1000

500

0
0

In principle, the network lifetime is expected to increase


when energy efficiency factor, namely rises. This fact can be
observed in Fig. 2, when increases from 0 to larger values.
The network lifetime increases if clustering is perfumed for
certain times but starts declining afterward. This is the behavior
we expected to observe in the network; i.e. when clustering is
performed too frequently, it results in lifetime degradation due
to overhead of clustering operation. The optimal number of
clustering is shown for = 0.9, 0.8 and 0.7 in Fig. 2.

10

20

30

40

50

nc

60

70

80

90

100

Fig.3. WSN lifetime vs. the number of clusterings for different values, while
Pr/Pc 1 and Pr/Pc 1

IV. CONCLUSION
Clustering WSN can significantly improve the network
energy efficiency and lifetime. However due to the energy
overhead of clustering, it is essential to determine an optimal
frequency for clustering depending on the network condition.
We consider regular time interval for clustering operation
though it may be necessary to recluster the network at certain
conditions to avoid high rate of sensor nodes death.

Fig. 3 shows the case when we choose three different


values for as the low, medium and high energy efficiency
factors. Also, we change the ratio of Pr/Pc from 1 (almost
equal power values) to much higher values (i.e. = 0.6) that
indicate a negligible power consumption for the clustering
operation. The two cases are depicted in Fig. 3 by dashed and
solid lines with the same line color, respectively. According to
Fig. 3, the higher is, the network lifetime is longer. More
importantly, the power consumption of configuration operation
impacts the lifetime of the network substantially. As long as the
power consumption of clustering operation is comparatively
negligible, the lifetime is significantly prolonged. But, when
clustering consumes almost equal power with regular operation,
the lifetime dramatically declines even for higher energy
efficiencies i.e. higher values of .

REFERENCES
[1] W. Heinzelman, A. Chandrakasan and H. Balakrishnan, Energy
Efficient Communication Protocol for Wireless Microsensor Networks,
Proceedings of the 33rd Hawaii International Conference on System
Sciences (HICSS '00), 2000.
[2] Kyung Tae Kim and Hee Yong Youn, EnergyDriven Adaptive
Clustering Hierarchy (EDACH) for Wireless Sensor Networks, EUC
Workshops, LNCS 3823, pp. 1098 1107, 2005.
[3] Manju Bala and Lalit Awasthi, Proficient DHEED Protocol for
Maximizing the Lifetime of WSN and Comparative Performance
Investigations with Various Deployment Strategies, International Journal
of Advance Science and Technology, Vol.45, August 2012.

53

.indd

53

2015/03/11

11:00:13

Implementing a High Throughput, Configurable and


Parameterizable Packet Filtering Firewall on FPGA

v
s
a
l
a
e
p

Mostafa Safaie1, Hamid Noori2, Farhad Mehdipour3


1

Electrical Engineering Department, Engineering Faculty, Ferdowsi University of Mashhad, Iran, msafaie@stu.um.ac.ir
2
Computer Engineering Department, Engineering Faculty, Ferdowsi University of Mashhad, Iran, hnoori@um.ac.ir
3
E-JUST center, Graduate School of Information Science and Electronics Engineering, Kyushu University, Japan

The proposed firewall is assumed to inspect data packets


through four different header fields: Source IP address,
Destination IP address, Source port and Destination port.
However, it is easily extendable for other cases as well.

i
r
a
[
f
o
u

B. Hardware Modules
The proposed firewall is composed of three main modules:
1) The IPv4 Controller: It is the heart of the design. It extracts
required fields of the Ethernet packets and forwards them to
Memory Controller. 2) Memory Controller: It is a small Finite
State Machine (FSM), which detects type of the data on the input
port, i.e. IP, Port or neither, and forwards data to the memory
modules. 3) The memory modules: IP TCAM and Port TCAM
modules are used to store firewall rule tables. They determine
whether extracted fields match the rules during a single clock
cycle. TCAM allows a third matching element of X or dont
care besides 1 and 0. This facility adds more flexibility to
the search. For example, by replacing Xs in few least
significant bits of an IP address, we can sensitize the packet
classification process to hundreds of IP addresses using a single
memory address.

Keywords- Network firewall, FPGA, TCAM.

INTRODUCTION

Security is one of the most significant aspects of data


communication. Modern network applications demand high
throughput operation on data traffic. Firewalls monitor data
traffic passing through to determine whether the packet coming
from an external network or going out from the internal network
is allowed or denied. This decision is made based on firewall
security policies, i.e. a user-pre-defined set of rules [1].
Packet classification is the main functionality of firewalls. It
is accomplished by comparing the firewall rules to the
information extracted from different fields of Ethernet packet
headers [2].

t
p
F
o
c
t
f
a

f
T
n

An important issue for a firewall is how to configure the


rules. There are different methods, each with advantages and
disadvantages in terms of area, speed, simplicity and security
issues. Following, we mention two techniques employed in our
system:

In this paper we propose a high throughput, configurable and


parameterizable packet filtering firewall on FPGA, in which
classification rules are stored in TCAMs. High throughput needs
lead us to use TCAMs despite their inefficiency in area and
power [3]. Our proposed design is modular and can be easily
extended for various protocols. The introduced hardware is
described using VHDL and implemented on Cyclone IV-E
Altera FPGAs. It has been fully evaluated using Marvell
88E1111 Ethernet PHY chip embedded on the Altera DE2-115
board.
II.

u
a
1
M

and ICMP). Our current implementation supports both TCP and


UDP protocols for the incoming packets.

AbstractSecurity is a growing concern brought by explosive


development of the computer networks. Firewalls are building
blocks forming a first line of defense in the network security
architecture. We have implemented a firewall on FieldProgrammable Gate Array (FPGA). The firewall takes advantage
of Ternary Content Addressable Memories (TCAMs) to store rule
tables. Two different methods to update the firewall rules in
TCAM modules are presented and verified. The proposed firewall
system achieves operating frequency of 185 MHz and is able to
filter up to 52100 Mbps Ethernet channels for the worst-case
packet size, whereas it only utilizes 1% of available hardware
resources of Altera Cyclone IV FPGA.

I.

1. Predefined Value: This is one of the methods examined


in our firewall and other publications, such as [5]. In this method
when the system powers up, an FSM initializes TCAMs.
Although this technique requires very low hardware resources,
it provides no reconfigurability facilities.
2. Configure-Through-Packets: This is a new method that
we proposed and evaluated in our system. In this method, we
configure the firewall rule tables by packets passing through the
firewall. The main idea is to build a specific Ethernet frame and
send it to the firewall by the administrator of the private network.
This frame is detected by the firewall, and some predefined
fields are extracted from the frame and considered as new
firewall rules. The major concern is to find a unique frame
structure to secure firewall configuration against any accidental
rule table alterations. We chose a unique combination of several
packet header fields. This combination of fields must not happen
in regular packets on the line. It is achieved by assigning unusual

ARCHITECTURE OF PROPOSED HARDWARE

A. Assumptions
There are many encapsulation protocols for the payload of
an Ethernet frame. These protocols are often distinguished using
a two-byte field in Ethernet frame header, called EtherType. In
the current implementation, we assumed that all frames are IPv4.
The Protocol field of IPv4 packet header declares the
encapsulation protocol of IPv4 packet payload (e.g., TCP, UDP,

54

.indd

54

2015/03/11

11:00:14

s
,
.

:
s
o
e
t
y
M
e
k
t
o
t
t
e

e
d
y
r

values, such as, version=5 and TTL=1. The firewall compares


specific fields versus predefined values, then extracts new rules
and load them to the TCAM modules. This method offers both
low resource usage and highly-configurable security policy,
although it might result in vulnerability if not properly
established. However, we think defining a distinctive data
packet will weaken the chance of unintended reconfiguration.
III.

EXPERIMENTAL RESULTS

A. Device Utilization
Size of rule tables plays an important role in resource
utilization. Larger TCAMs require more logic elements (LEs)
and memory blocks (Fig. 1). Increasing number of rules from
128 to 1024 causes the operating frequency to decrease from 185
MHz to 165 MHz.

Figure 2. Operating Frequencies for Different FPGA Families

network firewall adds only 1.5% to the processing time that is


1.3 to 2.6 times faster than system presented in [4]. We achieved
185 MHz operating frequency for the implemented system with
128 rules. We consider the worst-case frame length to be 84
bytes. With these assumptions, the proposed firewall can filter
up to 52100 Mbps Ethernet channels (185 MBps (84 B/24 B)
8 b / 100 Mbps).

We also synthesized our firewall system with each TCAM


including 256 entries on Stratix III chip. According to the
results, the proposed design uses 2174 LEs (less than 1% of
available resources) whereas the firewall system presented in
[6], containing 224 rules utilizes around 2400 LEs. Also, the
firewall presented in [4] utilizes 11% of hardware resources with
only 16 rules on the same FPGA chip. Our proposed hardware
uses fewer hardware resources while offering larger rule tables.

Some researches, like [6], use embedded memories to store


firewall rules instead of TCAMs, which needs design and
implementation of a search algorithm among rules. If we assume
an average of 112 clock cycles to search a memory containing
224 rules, the abovementioned work could be able to support up
to 3100 Mbps Ethernet channels, whereas the target FPGA is
Stratix III technology which is more advanced than Cyclone IVE. This huge gap between our hardware and the one presented
in [6] is due to employing TCAMs in our hardware.

B. Throughput
Due to the fixed number of required clock cycles to classify
the packet, higher operating frequency enables the firewall to
process more packets per time unit and offer higher throughput.
Fig. 2 depicts maximum operating frequencies related to the size
of the TCAMs in the mentioned three families of Altera FPGA
chips. Generally, operating frequency reduces for larger rule
tables, but in some cases synthesis results show a rise in
frequency that might be the consequence of optimization
algorithms applied by the synthesis tool.

ACKNOWLEDGMENT
We thank Mr. Mahmoud Fathi of Ferdowsi University of
Mashhad for his noteworthy assistance in configuration of
Ethernet PHY chips, and also we appreciate Laboratory of
Embedded Systems of Ferdowsi University of Mashhad for
providing Altera FPGA boards and tools.

For a single 100 Mbps Ethernet channel, a maximum-length


frame of 1538 bytes is equivalent to 123.04 s processing time.
The proposed firewall with operating frequency of 185 MHz,
needs 1.92 s to classify the frame. Therefore, the proposed

REFERENCES
[1]
[2]

d
d
.
,

t
e
e
d
.
d
w
e
l
l
n
l

[3]
[4]
[5]
[6]
Figure 1. Required Les for different FPGA families.

Ranum, M.J., Thinking about firewalls, Proceedings of the Second


World Conference on System Management and Security, 1994.
Jedhe, G.S., Ramamoorthy, A. and Varghese, K., A scalable high
throughput firewall in FPGA, 16th International Symposium on FieldProgrammable Custom Computing Machines, California, 2008.
Lunteren, J. and Engbersen, T., Fast and scalable packet classification,
IEEE Journal on Selected Areas in Communications, Vol.21, Issue 4, pp.
560-571, 2003.
Ajami, R. and Dinh, A., Design a hardware network firewall on FPGA,
24th Canadian Conference on Electrical and Computer Engineering
(CCECE), Canada, 2011.
McEwan, A.A. and Saul, J., A high speed reconfigurable firewall based
on parameterizable FPGA-based content addressable memories, The
Journal of Supercomputing, Vol.19, Issue 1, pp. 93-103, 2001.
Ezzati, S., Naji, H.R., Chegini, A. and Habibmehr, P., A new method of
hardware firewall implementation on SOC, International Conference for
Internet Technology and Secured Transactions (ICITST), London, 2010.

55

.indd

55

2015/03/11

11:00:14

An Alternative Digital Forensic Investigation Steps


for Cloud Investigation Processes
1

S
u
a
2
a
m
e
e
S
i
t
r
S
i
d
C
a
t
t
S
w
t
t
t
r
A
S
t
S
s
w
S
s
S
a
i
S
t
a
i
T
t
e
a
e
a

Vinesha Selvarajah1,3, Mueen Uddin2, Shinichi Matsumoto1,3, Junpei Kawamoto1,3, Kouichi Sakurai1,3
Faculty of Information Science and Electrical Engineering, Kyushu University, Kyushu, Japan vinesh@itslab.inf.kyushu-u.ac.jp,
kawamoto@inf.kyushu-u.ac.jp, sakurai@csce.kyushu-u.ac.jp,
2
Department of Computer Science, University Malaysia Pahang, Pahang, Malaysia, mueenmalik9516@gmail.com,
3
Institute of Systems, Information Technologies and Nanotechnologies (ISIT), Kyushu, Japan, smatsumoto@isit.or.jp,

Abstract Cloud computing marks the change era in the digital


world offering an unlimited service of storage both privately and
publicly. This however drew issues such as threats and attacks
towards cloud storage servers. The geographical distant location
of client and server impose great impact in the forensic field, to
be able to perform investigations on the server from the client
side. In this paper, a standard procedural implementation is
proposed to replace traditional digital forensic investigation
methodology to be adaptable in the cloud environment.

A.
Case Scenario
Case: User A is a cloud user of XYZ Cloud Service
company. Recently, she felt that her information on the cloud
were modified or deleted without her knowledge. She suspects
that her account is being compromised. She then sought help
from the police to help her to investigate on the matter and
serve justice. We designed the scenario based on the recent
security breach occurred between hackers and a famous public
cloud service provider "Dropbox", where hundreds username
and password including pictures, videos and other files were
prepared to be leaked out by the attackers which requested
exchange for bitcoins. [6]

Keywords- Cloud forensic, forensic methodology, crime


reconstruction, cloud forensic investigation.

I.

INTRODUCTION

From the above scenario, we proposed a new stakeholder to


ensure the proper flow of investigation steps involved in digital
forensic investigation process for cloud investigation:

Over the last one and a half decade, since cloud technology
was introduced and implemented, many looked at it as an
advantage of making use of the resource it provided especially
on availability of unlimited amount storage spaces. In the early
research conducted, [1] discovered that one of the major issues
related to cloud forensic is location transparency. Information
that is kept in a cloud server may be replicated in few different
locations. Further research in Liverpool [2] also discuss that the
main flaw concerning cloud is in the perspective of evidence
acquisition due to the remote locations of data centers. Adding
on to the research, [3] agrees with the earlier researchers on
multi-server locations and explained how data centers are
vulnerable to attacks or dominated by hackers without leaving
behind footprints. [3] suggests in his research that the answer to
this problem is to image records and files on the datacenters to
aid in forensic investigation processes. Reference [4] added on
by suggesting a private cloud server to be created and utilized
only when needed to aid forensic investigation, however the
evidence acquisition in a large data storage act as a barrier to
this suggestion. Recent research [5] stated that physically
acquiring object from indirect environment is uncertain
because customers and data centers are spread around the
world.
II.

International Digital Forensic Association (IDFA):


A proposed establishment of an international association to
keep track, give accreditation and authority to individuals or
investigation firms who intend to pursue digital investigation
field by giving them a license to do so. The association should
also additionally liaise with the legal authority of participating
countries to ensure that an internationally systemized legal
system is established for the purpose of smoothening the digital
investigation procedure in cloud computing.
B. Proposed model

PROPOSED METHOD

According to the review on the major areas related to


forensic investigations processes in cloud, there is a need for a
specific investigation method to overcome the gap between
current methodology practices on cloud forensic investigation.
In this section we illustrate our method using a case scenario.

r
s
i
t
a

Figure 1. Cloud Forensic Investigation Steps

56

.indd

56

2015/03/11

11:00:15

e
d
s
p
d
t
c
e
e
d

o
l

o
r
n
d
g
l
l

We proposed a model shown in figure 1 as a new step


replacing the issue of current cloud forensic investigation.

Authorization and Preparation of Legal


Document

Step 1: There are 2 possibility of situations that can happen. 1.


user A detects/suspects that some of her details has been
accessed, deleted or modified and becomes a victim of fraud.
2. User A is a victim of a compromised cloud user that her
account and details was use to initiate another crime. This is
more of a common issue whereby the user of cloud services
ends up as a victim or they hold information that maybe
essential for a crime investigation.
Step 2: The victim requests for a legal private investigator or
investigation firm to further proceed to the matter. Note that
the investigator or investigation firm has to be legally and
registered to the "International Digital Forensic Association".
Step 3: The private investigator gathers the necessary
information regarding the User A/victim and prepares a
document to be sent to the police department and Higher
Commissioner of the respective country for the application of
a search warrant and affidavit to obtain permission to access
the cloud server similar to the Authorization process in the
traditional forensic investigation steps.
Step 4: The investigator then proceeds with communicating
with the CSP to obtain access to the backup cloud server so
that to perform acquisition of data. The investigator provides
the CSP with the search warrant letter, affidavit together with
the registration ID detail of the individual investigator as
registered under the International Digital Forensic
Association for authorization and clarification purpose.
Step 5: The CSP performs a single step check with the IDFA
to authorize the investigator with his request.
Step 6: The IDFA receives the request, performs check and
status of the investigator, status of search warrant and case
with the reporting legal department of the respective country.
Step 7: The legal authority of respective country checks the
status of case approves the content and request by IDFA.
Step 8: IDFA gathers all information checks and statuses then
approves the status requested by the CSP to proceed with the
investigation.
Step 9: Once the requests are approved, the CSP allows a
temporary access for a restricted amount of time to allow
access to their backup/forensic server/IaaS layer so that the
investigator could access the information needed on the server.
The CSP generates access details to the investigator to log on
to the backup server then terminates the access upon the
expiry of the access period. The investigator proceeds with the
acquisition of necessary information and continues with the
examination and analysis phase leading to case reconstruction
and reporting of the investigation.

Initiate Request to CSP


IDFA Authentication
IDFA-Legal Authority Check
Validate CSP Request
Temporary Access Approval
Evidence Collection and Preservation
Evidence Identification
Examination and Analysis
Reporting

Figure 2. Cloud Forensic Investigation Steps

III. CONCLUSION
Cloud computing easily accommodates clients' need in
reducing the cost of maintaining servers and hardware. This
technology complicates the investigation process due to the
geographical location of the cloud servers elsewhere. Focusing
on the steps to acquire evidences will enhance the investigation
process to be more effective so that the integrity of evidences
can be protected and validated to be used in court for further
proceedings. Our method will result in a systematic flow of
forensic investigation steps in cloud forensics. The limitation of
the proposed method lies on system being only a conceptual
model and requires great effort bringing the method into
existence to ensure that the flow of forensic investigations is
systemized.
REFERENCES
[1]

[2]

[3]
[4]

C. Proposed Cloud Forensic Investigation Steps


A new cloud forensic investigation steps can be
reconstructed replacing the traditional forensic investigation
steps as shown in figure 2. The highlighted components
illustrates the components that need to be rooted to the
traditional forensic investigation methodology so that it is
applicable in cloud forensic investigation.

[5]

[6]

Wolthusen, S, D. (2009) Overcast: Forensic Discovery in Cloud


Environment. In IMF 09 Fifth International Conference on IT Security
Incident Management and IT Forensics Gjvok, Norway. 15-17
September 2009. pp.3-9
Reilly, D, Wren, C, & Berry, T. (2010). Cloud Computing: Forensic
Challenge for Law Enforcement. In 2010 International Conference for
Internet Technology and Secured Transactions (ICITST) Liverpool, UK
8-11 November 2010. pp. 1-7
Cheng, Y. (2011). Cybercrime forensic system in cloud computing. In
2011 International Conference of Image analysis and Signal Processing
(IASP) Shanghai, China. 21-23 October 2011. pp. 612-615
Ludwig.S, Parviz, P, N & Mohit, D. (2011) Cloud Computing and
Computer Forensics for Business Application. Journal of Technological
Research.
[online]
44(8)
p.23-32.
Available
from:
http://www.aabri.com/manuscripts/11935.pdf
Hong, G, Bo, J & Ting, S. (2012). Forensic Investigation in Cloud
Environment. In 2012 International Conference on Computer Science
and Information Processing (CSIP) Shanghai, China. 24-26 August
2012. pp.248-251
Buchanan, R. T. (2014). "Dropbox Password Leaked: Hundreds of
account hacked after third party security breach" [online] Available at:
http://www.independent.co.uk/life-style/gadgets-and-tech/nearly-sevenmillion-dropbox-passwords-hacked-pictures-and-videos-leaked-inlatest-thirdparty-security-breach-9792690.htm

57

.indd

57

2015/03/11

11:00:15

Authentication Protocols in PMIPv6 Using Nontamper Resistant Smart Card

e
m

t
I
m
S
(
p
A
p
T
c
s

Mojtaba Alizadeh 1, Mazdak Zamani 2, Sabariah Baharun 3, Kouichi Sakurai 4


Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan, malizadeh@ieee.org,
sakurai@csce.kyushu-u.ac.jp
2
Advanced Informatics School, Universiti Teknologi Malaysia, Malaysia, mazdak@utm.my
3
Malaysia-Japan International Institute of Technology, Universiti Teknologi Malaysia, Malaysia, drsabariah@utm.my

1,4

(PMIPv6) in 2008 [1]. In PMIPv6, the mobility services are


provided without mobile node (MN) involvement in signaling
communications. This protocol has been used as a part of
different wireless network such as WiMAX, 3GPP2, as well as
LAN networks because of having low mobility signaling over
wireless links [9].

Abstract Mobility management protocols support mobility for


roaming mobile nodes in order to provide seamless connectivity.
Proxy Mobile IPv6 is a network-based localized mobility
management protocol that is more suitable for resource
constrained devices among different mobility management
schemes. The IETF group standardized this protocol in 2008, and
specified in RFC5213 [1]. An authentication procedure, which
has a key role to protect the network against different security
threats, is not specified in this standard. In the last few years,
several authentication methods have been proposed, however,
using smart card under the non-tamper resistance assumption
has not been considered. In this paper, we discuss possibility of
using tamper-proof smart card in Proxy Mobile IPv6
authentication approaches.
Keywords-component; Proxy Mobile IPv6;
Authentication; Security; Impersonation Attack

I.

t
i
c
t
t

The main mobility entities are the Local Mobility Anchor


(LMA) and the Mobile Access Gateway (MAG) in PMIPv6
domain. The LMA supports MN connectivity and the MAG,
running typically on the access router, accomplishes mobility
management instead of the MN. Consequently, the MN in
PMIPv6 does not need modification of the protocol stack to
support PMIPv6. All mobility signaling is managed by the
MAG and LMA establishing a bi-directional tunnel to conduct
the traffic sent to/from the MN. In the MN view, the entire
domain of PMIPv6 seems as its home network [7]. The details
of the authentication mechanism are not specified in this
standard, therefore, many researchers proposed different
authentication mechanisms for PMIPv6 [10-15]. Though, the
proposed authentication methods reduced packet loss rate and
latency delay, the vulnerability against different attacks under
the non-tamper resistance assumption for mobile devices, is not
considered. In this paper, we discuss the vulnerability of the
current authentication schemes under this assumption.

PMIPv6;

INTRODUCTION

Nowadays, enormous growth can be seen in the wireless


and mobile devices as most people employ mobile devices to
access various services such as multimedia applications, video
conferencing, file sharing, and browsing the internet at
anytime, anywhere [2]. The growth is expanding even though
mobile devices are facing various problems to use wireless
services that include some problems such as low computing
power of mobile terminals, insufficient channel capacity and
complex security problems. The Mobile IPv6 (MIPv6) [3] adds
the capabilities of the roaming of mobile nodes in IPv6
network. This standard function permits mobile devices to
travel between networks, by keeping mobile nodes connected
to the network [4].

II.

m
a
a
v
o
p
b

m
c
s
t
p
m
c
w
a
c
m
s

TAMPER-PROOF SMART CARD

In this section, we discuss the possibility of using tamper


resistant smart card in PMIPv6. According to the literature [1622], the smart cards are almost not tamper-proof. Several
examples are provided to proof this statement as follow.
According to Khan et al., [17], and Rhee et al., [20], mobile
devices such as PDAs, smartphones, and laptops are not
tamper-proof. Furthermore, the vulnerability of the user
information inside mobile devices are discussed in several
studies [22]. Different approaches are proposed to break the
smart card security. For example, Kocher et al. [23] reported
that it is possible to extract the smart card secret key of the
used cryptographic algorithm by monitoring the power
consumption of a smart card. Their power analysis attack [24]
proves that all smart cards are vulnerable to this attack.

The Mobile IPv6 protocol suffers from several problems


such as packet loss, delay, and signaling cost. Therefore,
different host-based mobility management protocols like
hierarchical Mobile IPv6 (HMIPv6) [5], fast handover for
Mobile IPv6 (FMIPv6) [6] and network-based mobility
management protocols like Proxy Mobile IPv6 (PMIPv6) [1],
are proposed to improve the performance of MIPv6. In
comparison hierarchical Mobile IPv6 protocol with fast
handover for Mobile IPv6 (FMIPv6), the Proxy Mobile IPv6
(PMIPv6) achieves high less handover latency and signaling
cost [7].

I
M

The fault-based cryptanalysis is another kind of attack on


smart cards, which was reported by Bellcore press release [25].
The concept of this attack is that an attacker induces a certain
type of fault into the mobile device and then extracts the

The Network-based Localized Mobility Management


(NETLMM), a working group of Internet Engineering task
Force (IETF) [8] has standardized Proxy Mobile IPv6

58

.indd

58

2015/03/11

11:00:16

e
g
f
s
r

r
6
,
y
n
o
e
t
e
s
s
t
e
d
r
t
e

r
l

e
t
r
l
e
d
e
r
]

n
.
n
e

embedded secrets based on the incorrect responses from the


mobile device.

[4]
[5]

A. Tamper-proof Device (TPD)


In this part, we discuss the possibility of implementing the
tamper-proof architecture in mobile devices for PMIPv6. The
IEEE 1609.2 [26] discusses that the crucial secret information
must be protected by storing in a tamper-resistant Hardware
Security Module (HSM), which is called Tamper-proof Device
(TPD). The multiple layers of physical security degrees are
provided in TPD to provide a high degree of tamper-resistance.
A TPD can process cryptographic algorithms using its
processor, and has its own battery and clock for time stamp.
This kind of device is accessible just by authorized users and in
case of scanning or tracking can resize memory to zero for
security protection.

[6]
[7]

[8]
[9]

[10]

As far as our knowledge goes, using a TPD is the only way


to protect secrets inside the smart card according. However, it
is almost impossible to implement in PMIPv6 because of the
cost of the implementing this device. According to [27], a
typical TPD costs at least a few thousands of dollars, which is
too much to implement.

[11]
[12]
[13]

As a conclusion, it is not reasonable to implement a TPD in


mobile devices in PMIPv6 to improve security of
authentication mechanisms in this kind of network. Therefore,
all the authentication schemes in PMIPv6 are vulnerable to
various attacks like impersonation attack under the assumption
of using non-tamper-proof smart cards. Finally, it is critical to
propose a suitable authentication method that is not designed
based on tamper-proof assumption of smart card.
III.

[14]

[15]
[16]

CONCLUSION

The Proxy Mobile is a network-based localized mobility


management protocol that is more suitable for resource
constrained devices among different mobility management
schemes. Many authentication mechanisms are proposed for
this network since it was introduced by IETF group. In this
paper, we discussed that all the proposed authentication
methods are based on assumption of using tamper-proof smart
cards, which is almost impossible to implement in PMIPv6 as
we discussed. Therefore, we concludes that previous schemes
are prone to several attacks without using taper-proof smart
card, and it is crucial to propose a suitable authentication
method without considering tamper-proof characteristic of
smart cards.

[17]

[18]
[19]
[20]
[21]
[22]

ACKNOWLEDGMENT
This work was supported by Malaysia-Japan International
Institute of Technology (MJIIT) center at Universiti Teknologi
Malaysia.

[23]
[24]

REFERENCES
[1]
[2]
[3]

[25]

S. Gundavelli, K. Leung, V. Devarapalli, K. Chowdhury, B. Patil, and


K. Leung, "Proxy Mobile IPv6," ed: IETF RFC 5213, 2008, pp. 1-92.
I. Soto, C. J. Bernardos, M. Caldern, and T. Melia, "PMIPv6: A
network-based localized mobility management solution," The Internet
Protocol Journal, vol. 13, pp. 2-15, 2010.
D. Johnson, C. Perkins, and J. Arkko, "Mobility support in IPv6," IETF
RFC 3775 2004.

[26]

[27]

N. G. Mphil and N. R. M. Sc, "A Survey on Mobility Management


Protocols for Improving Handover Performance," vol. 10, pp. 53-57,
2014.
H. Soliman, L. Bellier, K. Elmalki, and C. Castelluccia, "Hierarchical
mobile IPv6 (HMIPv6) mobility management," IETF RFC 5380 2008.
R. Koodli, "Mobile IPv6 fast handovers," IETF RFC 5568, 2009.
K. Ki-Sik, L. Wonjun, H. Youn-Hee, S. Myung-Ki, and Y.
HeungRyeol, "Mobility management for all-IP mobile networks:
mobile IPv6 vs. proxy mobile IPv6," Wireless Communications, IEEE,
vol. 15, pp. 36-45, 2008.
H. Modares, A. Moravejosharieh, J. Lloret, and R. B. Salleh, "A Survey
on Proxy Mobile IPv6 Handover," Systems Journal, IEEE 2014.
Q. Jiang, J. Ma, G. Li, and A. Ye, "Security Enhancement on an
Authentication Method for Proxy Mobile IPv6," in Proceedings of the
2011 International Conference on Informatics, Cybernetics, and
Computer Engineering, Melbourne, Australia. vol. 110, L. Jiang, Ed.,
ed: Springer Berlin Heidelberg, 2012, pp. 345-352.
C. Ming-Chin, L. Jeng-Farn, and C. Meng-Chang, "SPAM: A Secure
Password Authentication Mechanism for Seamless Handover in Proxy
Mobile IPv6 Networks," Systems Journal, IEEE, vol. 7, pp. 102-113,
2013.
M.-C. Chuang and J.-F. Lee, "SF-PMIPv6: A secure fast handover
mechanism for Proxy Mobile IPv6 networks," Journal of Systems and
Software, vol. 86, pp. 437-448, 2013.
H. Zhou, H. Zhang, and Y. Qin, "An authentication method for proxy
mobile IPv6 and performance analysis," Security and Communication
Networks, vol. 2, pp. 445-454, 2009.
M. Youngsong, K. Miyoung, and K. Gye-Young, "Mutual
Authentication Scheme in Proxy Mobile IP," in International
Conference on Computational Sciences and Its Applications, ICCSA
'08. , 2008, pp. 65-72.
L. Joong-Hee, L. Jong-Hyouk, and C. Tai-Myoung, "Ticket-Based
Authentication Mechanism for Proxy Mobile IPv6 Environment," in
3rd International Conference on Systems and Networks
Communications, ICSNC '08. , 2008, pp. 304-309.
J.-H. Lee and J.-M. Bonnin, "HOTA: Handover optimized ticket-based
authentication in network-based mobility management," Information
Sciences, vol. 230, pp. 64-77, 2013.
C. G. Ma, D. Wang, and S. D. Zhao, "Security flaws in two improved
remote user authentication schemes using smart cards," International
Journal of Communication Systems, vol. 27, pp. 2215-2227, Oct 2014.
M. K. Khan and S. Kumari, "Cryptanalysis and Improvement of An
Efficient and Secure Dynamic ID-based Authentication Scheme for
Telecare Medical Information Systems," Security and Communication
Networks, vol. 7, pp. 399-408, 2014.
J. Xu, W.-T. Zhu, and D.-G. Feng, "An improved smart card based
password authentication scheme with provable security," Computer
Standards & Interfaces, vol. 31, pp. 723-728, 6// 2009.
Y.-y. Wang, J.-y. Liu, F.-x. Xiao, and J. Dan, "A more efficient and
secure dynamic ID-based remote user authentication scheme,"
Computer communications, vol. 32, pp. 583-585, 2009.
H. S. Rhee, J. O. Kwon, and D. H. Lee, "A remote user authentication
scheme without using smart cards," Computer Standards & Interfaces,
vol. 31, pp. 6-13, 2009.
C.-I. Fan, Y.-C. Chan, and Z.-K. Zhang, "Robust remote authentication
scheme with smart cards," Computers & Security, vol. 24, pp. 619-628,
2005.
T. S. Messerges, E. A. Dabbish, and R. H. Sloan, "Examining smartcard security under the threat of power analysis attacks," Computers,
IEEE Transactions on, vol. 51, pp. 541-552, 2002.
P. Kocher, J. Jaffe, and B. Jun, "Introduction to differential power
analysis and related attacks," URL www. cryptography.
com/resources/whitepapers/DPATechInfo. pdf, 1998.
P. Kocher, J. Jaffe, and B. Jun, "Differential power analysis," in
Advances in CryptologyCRYPTO99, 1999, pp. 388-397.
D. Boneh, R. DeMillo, and R. Lipton, "New threat model breaks crypto
codes," Press Release. Bellcore, p. 115, 1996.
"IEEE Standard for Wireless Access in Vehicular Environments
Security Services for Applications and Management Messages," in
IEEE Std 1609.2-2013 (Revision of IEEE Std 1609.2-2006), ed: IEEE
Std 2013, pp. 1-289.
M. Riley, K. Akkaya, and K. Fong, "A survey of authentication
schemes for vehicular ad hoc networks," Security and Communication
Networks, vol. 4, pp. 1137-1152, 2011.

59

.indd

59

2015/03/11

11:00:17

MANETs Performance Analysis Under DoS Attack


at Different Routing Protocols

a
T
a
c
t

Alaa Zain1 , Heba A. El-khobby 2, Hatem M. Abd Elkader3, Mustafa M. AbdelNaby4

Dep. of Electronics and Electrical communication Eng., Faculty of engineering, Tanta university, Tanta, Egypt,,
alaazain1986@gmail.com
2
Dep. of Electronics and Electrical communication Eng., Faculty of engineering, Tanta University, Tanta, Egypt,
h_khobby@yahoo.com
3
Dept. of Information Systems, Information and Technology Institute, Menoufia University, Menoufia, Egypt,
hatem6803@yahoo.com
4
Dep. of Electronics and Electrical communication Eng., Faculty of engineering, Tanta university, Tanta, Egypt,
mnaby45@gmail.com

Abstract-The main purpose of this paper is to study the


performance of different MANETs routing protocols under
Denial of Service (DoS) attacks. Two different categories of
MANET routing protocols are considered in this paper.
Optimized Link State Routing (OLSR) as a sample of proactive
routing protocol and Ad hoc On-Demand Distance Vector
(AODV) as a sample reactive routing protocols like were selected
in this paper to perform the proposed scenarios. The
performance of MANET under the attack is studied to find out
which protocol is more vulnerable to the DOS attack and how
much is the impact of the attack on both protocols. Selected
performance measures are considered for this comparative
analysis of DoS attacks throughput, delay and network load are
taken into account.

III.

IV.

Keywords: MANET , Routing Protocols, Attacks , DoS, AODV.

I.

SIMULATION SETUP

The simulation setup of two scenarios comprising of 16


mobile nodes moving at a constant speed of 10 meter per
seconds. Total of 8 scenarios have been developed. Our goal
was to determine the protocol which shows less vulnerability in
case of DoS attack. We choose AODV, DSR and OLSR
routing protocol which are reactive and proactive protocols;
respectively. In case AODV and OLSR, first scenario
malicious node buffer size is lowered to a level which increase
packet drop. Second scenario there is jamming attack net.

i
f
m
o
h

RESULTS AND STATISTICS

Two types of attacks scenarios and global discrete event


statistics (DES) are involved in OPNET 17.5 simulation [7].

INTRODUCTION

V.

A MANET is expose to various types of attacks, because


they are decentralized and distributed in nature, communication
takes place via multi-hop nodes low battery power supply, and
dependency on other nodes are such characteristics of sensor
networks [1-3]. Denial of Service attack (DoS) is produced by
the unintentional failure of nodes or malicious action. This
attack is a pervasive threat to most networks. The simplest
DoS attack tries to exhaust the resources available to the
victim node, by sending extra unnecessary packets and thus
prevents legitimate network users from accessing services or
resources to which they are entitled [4]. DoS attack is meant
not only for the adversarys attempt to subvert, disrupt, or
destroy a network, but also for any event that diminishes a
networks capability to provide a service [5].

RESULT OF THE FIRST SCENARIO

A. Throughput
From Figure 2, it is obvious that the throughput for OLSR is
high compared to that of AODV. This is because of the fewer
routing forwarding and routing traffic. Here, the malicious
node dropped the data rather than forwarding it to the
destination, thus affecting throughput. The same is observed in
the case with AODV, without attack; its throughput is higher
than in the case with attack; because of the packets discarded
by the malicious node.

w
j
c
c
r

II. CLASSIFICATION OF MANETS ROUTING PROTOCOLS


As shown in Figure 1, the three types of MANETs routing
protocol [6].

MANETs Routing Protocols


Reactive Protocols

Hybrid protocols

Figure 2. Throughput of (AODV, DSR and OLSR) with DOS attack.

Proactive protocols

Figure 1. Classification of MANETs Routing Protocols

60

.indd

60

2015/03/11

11:00:17

6
r
l
n
R
;
o
e

B.

The network load


The network load graph of OLSR and AODV with DoS
attack and without the presence of attack as shown in Figure 4.
The network load of OLSR is much high as compare to AODV
and DSR. In case of attack, OLSR has less network load as
compare to without attack. OLSR has high network load more
than AODV as shown in Figure 3.

The network load graph of OLSR, DSR and AODV with


DOS attack and without attack has been shown in the Figure 6.
The network load of OLSR is much high as compare to
AODV, since under attack node cannot send its packet i.e.
packet discarding leads to a reduction of network load. .

Figure 3. Network load of (AODV, DSR and OLSR) with and without attack

Figure 6. Network load of (AODV, DSR and OLSR) with and without attack.

VI.

C. NETWORK LOAD

XI. CONCLUSION

RESULT OF THE SECOND SCENARIO

In our study we compare the MANET performance under DOS


attack with two different scenarios with respect to the
performance parameters of end-to-end delay, throughput and
network load. We have analyses the intrusion on two protocols
OLSR and AODV have more severe effect when there is
higher number of malicious nodes. The percentage of
severances in delay under attack in case of OLSR is more than
in case of AODV. In case of network load however, there is
effect on AODV by the malicious node is less as compare to
OLSR. Based on our research and analysis of simulation result
we draw the conclusion that AODV is less vulnerable to denial
of service attack than OLSR.

A. Throughput
From Figure 4, it is obvious that the throughput for AODV
is high compared to that of AODV. This is because of the
fewer routing forwarding and routing traffic. Here the
malicious node jammed, thus effecting throughput. The same is
observed in the case with AODV, DSR, the throughput is
higher than in the case of AODV than DSR.

s
r
s
e
n
r
d

REFERENCES
[1]
[2]

Figure 4. Throughput of (AODV, DSR and OLSR) with and DOS attack.

B. Packed end-to-end delay


Packed end to end delay in case of jamming attack and
without attack depends on the protocol routing. In Fig. 5,
jamming attack in case of OLSR, DSR and AODV is high in
case when there is attack on the network nodes. Also
comparatively OLSR more effected than AODV because of its
route protocol nature.

[3]
[4]
[5]

[6]
[7]

A. D. Wood and J. Stankovic, Denial of service in sensor network,


IEEE Computer Magazine, vol. 35, no. 10, pp. 54-62, (2002) October
Y. Yu, K. Li, W. Zhou, and P. Li, Trust mechanisms in wireless sensor
networks: attack analysis and countermeasures, Journal of Network and
Computer Applications, vol. 35, no. 3, pp. 867880, 2012
I. Khalil, S. Bagchi, C. N. Rotaru, and N. B. Shroff, UnMask: utilizing
neighbor monitoring for attack mitigation in multihop wireless sensor
networks, Ad Hoc Networks, vol. 8, no. 2, pp. 148164, 2010
K. Biswas and Md. Liaqat Ali, Security threats in Mobile Ad-Hoc
Network, Master Thesis, Blekinge Institute of Technology Sweden,
22nd March 2007
Wu, B., Chen, J., Wu, J., Cardei, M. A Survey of Attacks and
Countermeasures in Mobile Ad Hoc Networks. [book auth.] X. Shen,
and D.-Z. Du (Eds.) Y. Xiao, Wireless/Mobile Network Security,
Spinger,2006.
S. A. Soomro, Denial of service attacks in wireless ad hoc networks,
Journal of Information / Communication Technology, vol. 4, pp. 1-10,
February 2011.
OPNET Technologies, www.opnet.com.

Figure 5. Packet End-to-end delay of AODV, DSR and OLSR with attack

61

.indd

61

2015/03/11

11:00:18

.indd

62

2015/03/11

11:00:18

.indd

63

2015/03/11

11:00:19

User Selection Algorithm with RF Beamforming for


Millimeter-wave MU-MIMO Systems
Keisuke HIROTA, Yuyuan CHANG, Gia Khanh TRAN, Kiyomichi ARAKI
Graduate School of Engineering, Tokyo Institute of Technology.
Ookayama 2121, Meguro-ku, Tokyo, 1528550 Japan
E-mail: {hirota,chang,khanhtg,araki}@mobile.ee.titech.ac.jp

AbstractRecently 60 GHz band WLAN system has become attractive. We examine a multiuser MIMO (multipleinput multiple-output) - OFDM (orthogonal frequency division
multiplexing) system that multiplexes 4 users spatially in that
system, and target system performance is set to achieve 6 Gbps
per user. In 60 GHz WLAN system, beamforming with RF
circuits is employed to compensate large path loss and reduce
interference between users. But it also changes the channel
information between each transmitter and receiver and the user
selection should also consider the variation of beam patterns made
by the RF circuits. We modied the user selection algorithms
with considering of the RF beam pattern selection and the
performances of the modied algorithms are estimated by the
simulation. We indicate adopted channel model in the simulation
and propose two types of user selection algorithm including beam
pattern selection. The rst one is capacity-based selection and
the second one is SUS (semi-orthogonal user selection)-based
algorithm. System performances of both algorithms can achieve
6 Gbps per user when number of total users is more than 8 and
low computation complexity.
KeywordsIEEE802.11ad, Channel
MIMO, OFDM, User selection

modeling,

Recived power [dB]

Exponential cluster power decay


Exponential ray power decay

Time of Arrival
Fig. 1. Millimeter-wave channel model with one LOS components and several
clusters.
TABLE I.
E XTRACTED PARAMETERS

Multiuser

Cluster pdtc,
Forward rays pdtc, f
Backward rays pdtc, b
Forward K-factor, kf
Backward K-factor, kb
Cluster arrival rate,
Forward ray arrival rate, f
Backward ray arrival rate, b
No. of forward rays, Nf
No. of backward rays, Nb

I. I NTRODUCTION
Recently due to the increasing data trafc, the unlicensed 60 GHz band has become attractive because of its
larger bandwidth and has been standardized in standards like
IEEE802.11ad [1]. According to the standards, there are four
channels from 57.24 GHz to 65.88 GHz. Each channel is
used by one user and all antennas adopted beamforming
with RF circuits. For further improve the channel efciencies
of the millimeter wave channels, we examine a multiuser
MIMO (multiple-input multiple-output) - OFDM (orthogonal
frequency division multiplexing) system that multiplexes 4
users spatially. In this paper we invest two modied user
selection algorithms with simulation that employed established
channel model, where the user selection algorithms also including beam pattern selection of each antenna element.

9.03 [ns]
4.50 [ns]
7.50 [ns]
7.42 [dB]
11.8 [dB]
0.15 [ns1 ]
0.47 [ns1 ]
0.39 [ns1 ]
1
4

B. Angular Prole
In 60 GHz wireless communication system, beamforming
in RF domain is adopted, so channel model must include angular prole. Characterization of angular proles of the clusters
and rays in each cluster is needed for this channel model.
From ray tracing, the angular prole of clusters becomes
uniform distribution in horizontal, and two kinds of uniform
distributions for rst cluster and other clusters, respectively (as
shown in Table II). Angular prole of the rays in each cluster
becomes Laplace distribution (as shown in Table III).

II. C HANNEL MODELING


A. CIR (Channel Impulse Response) Model
We consider channel model for small conference room
environment. Model is described as Fig 1. Average power
of clusters and rays decay exponentially; and occurrence
probability of clusters and rays follows Poisson distribution[2].
There are several parameters. Power decay time constant (pdtc)
and arrival rate for clusters, forward rays and backward rays.
From Ray-trace simulation and clustering[3], these parameters
can be extracted by MMSE (Minimum Mean Squared Error)
method. Finally we can get these parameters as table I.

TABLE II.

T HE RANGE OF THE UNIFORM


First cluster
Other cluster

TABLE III.

Horizontal AoD
32.7 [deg]

AoD [deg]
[140,175]
[95,120]

DISTRIBUTIONS

AoA[deg]
[140,175]
[60,85]

S CALE PARAMETER OF L APLACE DISTRIBUTION


Vertical AoD
8.24 [deg]

Horizontal AoA
39.7 [deg]

Vertical AoA
18.5 [deg]

64

.indd

64

2015/03/11

11:00:19

g
s
.
s
m
s
r

M ULTIUSER MIMO S YSTEM WITH RF


B EAMFORMING
In this section, we consider how to select RF beam pattern
and users in multiuser MIMO-OFDM system. The system is
including one access point (AP) with 4 antenna arrays, and
the beam pattern of each array can be controlled with phase
shifters (Here we call the antenna arrays as beamforming
units (BF units). There are several single antenna users in the
coverage of the AP. The numerical complexity of the patterns
and user selection method also should be considered, because
it will increase with the numbers of sub-carriers, and beam
patterns, as well as the total number of users.
A. Access Point Structure
60 GHz access point supports RF beamforming. The antenna conguration is set as that shown in Fig. 2. Each BF
unit can steer beam with 100 degree -3 dB beamwidth.

TABLE IV.

S IMULATION
SETTING

No. of FFT point


No. of subcarriers
GI length
Sampling frequency
Transmit power
Noise power
Channel model
Precoding

5mm

90

BF Unit 2

BF Unit 1
5

5mm

180

BF Unit 3

512
336
128
2.64 [GHz]
10 [dBm]
174 [dBm/Hz]
Using this work
ZF

TABLE V.

N UMERICAL
COMPLEXITY

E-search
C-based
SUS-based

5.2 1012
2.8 105
1.6 103

50

45

BF Unit 4

Channel capacity [bps/Hz ]

C. Multiuser MIMO-OFDM Simulation


For comparing the performances of the algorithms, the
multiuser MIMO-OFDM simulation is conducted, where the
channel model introduced in Chapter II is used with the
steering vector of the BF units that shown in Fig. 2 The
simulation setting is shown in Table IV. The performance
results is shown in Fig. 3 and the numerical complexity results
when K = 16 is shown in Table V. Simulation result shows
us that each selection algorithm becomes similar performance.
It is because the number of beam patterns are limited to four.
Accordingly, same beam pattern appeared when four users are
selected.

III.

270

Fig. 2. Antenna conguration at AP with four BF units and for each unit, it
has four beam patterns.

B. Selection Algorithm
We combine RF beamforming selection with two user
selection algorithms. The rst one is a capacity based
algorithm[4]. , that is shown in Algorithm 1, where K is
existing number of users, F is number of beam patterns.

40
35
30
Exhaustive search

25

Capacitybased
SUSbased

20

Target capacty

Algorithm 1 Capacity-based selection

15

T1 = {1, . . . , K}, i = 1, So = , B = {1, . . . , |F|}


for i = 1 to 4 do
for all k Ti do
if i = 1 then
All BF units form beams toward each user k
Selected beam pattern index bk B
else if i = 2 then
Two BF units form beams toward the selected user,
and the remained BF units form beams toward each user k
else if i = 3 then
Two BF units form beams toward the selected each users,
and the remained BF units form beams toward each user k
else
Three BF units form beams toward the selected users,
and the remained BF unit form beam toward each user k
end if
Calculate channel capacity Cki
end for
(i) = arg max Cki

Fig. 3.

Simulation result

10
12
14
Number of users

16

18

20

IV. C ONCLUSION
The performances of the proposed user selection algorithms
with RF beamforming decrease by 25% compared with the
exhaustive search. However, it is considerable the numerical
complexity of the proposed algorithms can be reduced remarkably.
ACKNOWLEDGMENT

This work was supported by Research and development of radio spectrum resources of the Ministry of Internal Affairs and Communications,
Japan.
R EFERENCES
[1]
[2]

kTi

i1
i
if C(i)
< C(i1)
then
Algorithm terminated
end if
So So {(i)}, Ti+1 = {k Ti , k = (i), bk = b(i) }
end for

[3]
[4]

Second one is based on SUS (Semi-orthogonal User Selection) algorithm[5]. For all combinations the differences
between directions of the beam patterns are set as 90 degree
in SUS based algorithm.

[5]

IEEE Std., IEEE802.11ad [Online]. Available:


http://standards.ieee.org/ndstds/standard/802.11ad-2012.html
IEEE 802.11ad. Channel Models for 60 GHz WLAN Systems. IEEE
802.11-09/0334r8, May 2010.
H. Sawada, Y. Shoji and K. Sato, A clustering method of arrival
waves suitable for analyzing propagation characteristics, presented at
the GSMMW2008, Nanjing, China, Apr. 2008.
Z. Shen, R. Chen, J. G. Andrews, R. W. Heath, Jr. and B. L. Evans,
Low Complexity User Selection Algorithms for Multiuser MIMO
Systems With Block Diagonalization, IEEE Trans. on Signal Process.,
vol. 54, no. 9, Sep. 2006.
T. Yoo and A. Goldsmith, On the Optimality of Multiantenna Broadcast selection Using Zero-Forcing Beamforming, IEEE J. on Select.
Areas Commun., pp. 528-541, vol. 24, no. 3, Mar. 2006.

65

.indd

65

2015/03/11

11:00:20

W
1
s
t
s
d
v
c
v

Wideband RF CMOS Variable Attenuator Using


Single Stage -Topology
Nusrat Jahan1, I.L. Abdalla2, Ramesh K. Pokharel1, Takana Kaho3
1

Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan.
2
Egypt-Japan University of Science and Technology, Egypt.
3
NTT Network Innovation Laboratories, NTT Corporation, Japan
E-mail: nusrat05cuet@gmail.com

Abstract This paper presents a digitally controlled wideband


variable attenuator design. It has a novel single stage topology
which has been designed and simulated by 0.18 m CMOS
technology model. The attenuation dynamic range (DR) is 18 dB
over a frequency range from DC to 4 GHz with an attenuation step
size by 2 dB. The worst case return loss is -8.2 dB across the
frequency band at maximum gain.

0
f
t

By comparison of and T-topology, -topology showed


the broader frequency response in our examination. In addition,
there is a trade-off between impedance matching and higher
attenuation in the T-topology attenuator. Thus we chose the topology. The schematic of the proposed variable attenuator
design is shown in Fig. 1. It uses a single stage -topology with
five consecutive stages. To improve the input/output impedance
matching of the attenuator, two shunt branch pair consists of
transistor (Mx/My) and resistors (3R/10R) are used. We
optimized the parameter by S-parameter simulation, the
resistance value R was set as 10 , and each FET switchs gate
width was set as 2 m. The designed attenuator provides from 6
dB to 24 dB attenuation with 2 dB step with a simple shift
register control bits Vc and Vc, where Vc is the complementary
of Vc.

Keywordsvariable attenuator, digitally controlled attenuation,


0.18 m CMOS, wideband attenuator.

INTRODUCTION
Recently, mobile communications systems form an
indispensable part of daily lives of millions of people. Next
generation mobile communication requires higher data rate
using many frequency bands and MIMO with many antennas.
Wideband RF circuits are viable solution in such multi-band
access points and user terminals. These devices require precise
RF gain control to make a beam form in phased array antennas
and to limit the incident power to receiver circuits. For precise
gain control, traditional variable gain amplifier (VGA) and
variable FET attenuator are good candidates [1, 2]. The latter
has advantages of low power consumption, bi-directionality,
and stability for unnecessary oscillation and thermal variation.
Mobile user terminals require low power, low cost, and small
size circuits, CMOS technology has become popular to realize
RF circuits on chip. This paper describes a wideband variable
attenuator design in 0.18 m CMOS technology.

ATTENUATOR DESIGN
There are some conventional attenuators using T, and bridged
T topologies with adjustment of the series and shunt resistance
[3]. In the -attenuator, minimum attenuation occurs when the
series resistance is small value and the shunt resistances are
large value by control the FET switches. In that case, the loss at
lowest frequencies comes only from the nonzero on-resistance
of the series switch. As this resistance gets smaller, the insertion
loss due to the minimum insertion of the attenuator gets smaller.
At higher frequencies, there is additional loss caused by the
parasitic capacitors to ground, therefore minimizing these
capacitors reduces the insertion loss. Similarly, the series
components of the T-attenuator at the minimum gain setting are
completely on and the shunt component is turned off.

l
F
w
t
c
a

Fig. 1: Proposed attenuator with five stages


Table 1: The gain of the attenuator
Control bias

Vc
Low
Low
Low
Low
Low
High
High
High
High
High

V1
High
Low
Low
Low
Low
High
Low
Low
Low
Low

V2
Low
High
Low
Low
Low
Low
High
Low
Low
Low

V3
Low
Low
High
Low
Low
Low
Low
High
Low
Low

V4
Low
Low
Low
High
Low
Low
Low
Low
High
Low

V5
Low
Low
Low
Low
High
Low
Low
Low
Low
High

Attenuation
(dB)
-6
-8
-10
-12
-14
-16
-18
-20
-22
-24

66

.indd

66

2015/03/11

11:00:22

d
,
r
r
h
e
f
e
e
e
6
t
y

When the transistors Mx are turned off i.e. Vc is 0 V and Vc is


1.8 V, the attenuation stages will achieve smaller attenuation
state from 6 dB to 14 dB with 2 dB step. Similarly, when
transistors My are turned off (Vc is 1.8 V and Vc is 0 V), the 5
stages show higher attenuation state from 16 dB to 24 dB with 2
dB step. The proposed attenuator uses six digital control
voltages. The Table 1 shows the combinations of the digital
control voltages to select attenuation states. The high and low
voltages are 1.8 V and 0 V, respectively.

Figure of Merit (FoM) =

() . ()
()

Our designed attenuator achieves wideband, simple control


technique with low power consumption and small step size
and high FoM.
Table 2: Attenuator Performance Comparison

Bandwidth

(GHz)

DC-5

0.4-0.8

This
work
DC-4

Step size
Max
attenuation
Return loss

(dB)

3/6

(dB)

-24

-48

-24

(dB)

> 14

> 12

>10

Noise Figure
(NF)

(dB)

Max 48
Min 5.8

Max 24.5
Min 6.1

Discrete
step
0.16 m
CMOS
20

Discrete
step
65 nm
CMOS
6.4/3.2

Discrete
step
0.18 m
CMOS
48

Parameters

SIMULATION RESULTS
The attenuator has been designed and simulated using TSMC
0.18 m CMOS technology model. Fig 2 shows the simulated
frequency response of the proposed attenuator when changing
the attenuation from 6 dB to 24 dB.
As the attenuator has been designed symmetrically the return
loss for both input and output are approximately the same.
Figure 3 shows the input/ output return loss versus frequency
with the minimum and maximum attenuation. Table 2 shows
the performance summary of the proposed digital attenuator
compared with the conventional work. We calculate the
attenuators figure of merit.

Control mode
Technology
FoM

[4]

[5]

CONCLUTIONS
A digitally controlled wideband variable attenuator has been
presented. The design circuit achieved a good performance over
the entire band with acceptable return loss. The worst return loss
is -8.2 dB at 4 GHz when the attenuation level becomes -14 dB.
ACKNOWLEDGMENT
This work was supported by Funding Program for WorldLeading Innovative R&D on Science and Technology and a
Grant-in-Aid for Scientific Research (B) (KAKENHI-B). This
work was also partly supported by VLSI Design and Education
Center (VDEC), Tokyo University in collaboration with
CADENCE Corporation and Agilent Corporation.
REFERENCES

Fig. 2: Insertion loss vs. frequency

[1]

[2]

[3]

[4]

[5]

Fig 3: Input/output return loss vs. frequency

M. S. Oude Alink, E. A. M. Klumperink, A. B. J. Kokkeler, M.


C. M. Soer, G. J. M. Smit, and B. Nauta, A CMOS-Compatible
Spectrum Analyzer for Cognitive Radio Exploiting
Crosscorrelation to Improve Linearity and Noise Performance,
IEEE Trans. Circuits Syst. I, vol. 59 no. 3, pp. 479492, Mar.
2012.
A. Maxim, R. K. Poorfard, R. A. Johnson, P. J. Crawley, J. T.
Kao, Z. Dong, M. Chennam, T. Nutt, and D. Trager, A Fully
Integrated 0.13-m CMOS Digital Low-IF DBS Satellite Tuner
Using a Ring Oscillator-Based Frequency Synthesizer, IEEE J.
Solid-State Circuits, vol. 42, no. 5, pp. 967982, May 2007.
Y. Huang, W. Woo, Y. Yoon, and C-H Lee, Highly Linear RF
CMOS Variable Attenuators With Adaptive Body Biasing,
IEEE J. Solid-State Circuits, vol. 46, no. 5, May 2011.
W. Cheng, M. S. O. Alink, A. J. Annema, G. J. M. Wienk, and B.
Nauta, A Wideband IM3 Cancellation Technique for CMOS and T-Attenuators, IEEE J. Solid-State Circuits, vol. 48, no. 2,
Feb. 2013.
A. Youssef, and J. Haslett, Digitally-Controlled RF Passive
Attenuator in 65 nm CMOS for Mobile TV Tuner ICs, in Proc.
of IEEE Int. Symp. on Circuits and Systems, pp.1999-2002, May
2010

67

.indd

67

2015/03/11

11:00:23

Compact Modeling of Phase-Locked Loop Frequency Synthesizer for Post-Layout


Simulation Time Reduction
Zhipeng Liu , Kazuaki Murakami , Ramesh Pokharel , Lechang Liu
Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan,
lzp@soc.ait.kyushu-u.ac.jp
AbstractCompact modeling of phase-locked loop frequency
synthesizer using system identification is proposed to reduce
post-layout simulation time. This model contains autoregressive
exogenous models for the charge pump and the loop filter with a
lookup table for nonlinearity compensation and a radial basis
function neural network for the voltage-controlled oscillator with
nonlinear frequency-voltage relationship, thereby reducing the
post-layout simulation time to 26 of the original circuits
simulation time.
KeywordsAutoregressive exogenous (ARX) model;
network; phase-locked loop; system identification

I.

B. Charge Pump and Loop Filter Model Design


In practice, all systems are nonlinear and the output is a
nonlinear function of the input variables. However, a linear
model is often sufficient to accurately describe the system
dynamics if the signal amplitude is small. The training set for
the model extraction should include all possible operating
modes of the circuits and therefore a pseudo-random bit
sequence (PRBS) is used as the charge-pump input. The PRBS
input is implemented with a linear feedback shift register
(LFSR) in HSPICE. The linear part of the charge pump and
loop filter model is obtained by describing the input-output
relationship as a difference equation

neural

INTRODUCTION

In complex analog or mixed-signal integrated circuits, such


as phase-locked loop (PLL) frequency synthesizer, post-layout
simulation is time consuming and computationally expensive
because the circuit simulators need to construct large
dynamical models by combining conservation laws with
constitutive relations for each device. This research aims to
reduce simulation time for post-layout and circuit optimization.
To reduce post-layout simulation time, an alternative approach
using system identification is proposed in this paper.
System identification refers to finding a dynamical model of
low complexity that delivers the best match for a collection of
dynamical input-output data [1]. The system identification
procedure: first collect training data, then select a model set,
then pick the "best" model in this set. If the model first
obtained cannot pass the model validation tests, it is necessary
go back and revise the various steps of the procedure.
II.

y (t ) a i y ( t i )
i 1

b u (t j ) e (t )
j 1

(1)

c
T
t
i
t
i

where the systems input and output at time t are denoted by


u(t) and y(t), respectively, and e(t) is the model residual.
Equation (1) is called as autoregressive exogenous (ARX)
model. From the definition of the ARX model, there is no
constant term in equation (1) and thus an additional bias is
required to capture the arbitrary difference between the input
and the output signals. The bias value is equal to the initial
VCO control voltage. And to prevent the PLL fails to lock, an
ideal integrator is proposed to insert to the model. The
parameters of this ARX model are estimated with the presence
of the integrator by least square method.
A linear model is sufficient to accurately describe the
dynamics of the charge pump and the loop filter with low
amplitude VCO control voltage. However, for a large
amplitude control voltage, the additional flexibility of a
nonlinear model is required. And an ideal charge pump should
provide a constant amplitude current but in practice, charge
pump current decreases with the increase of the VCO control
voltage. This is because the drain current of a transistor
decreases with the decrease of source-drain voltage. In order
to compensate the decreased current, Fig. 1 right above shows
the proposed charge pump and loop filter modeling approach
based on nonlinearity compensation.
Theoretically, the loop filter can be viewed as a
proportional-integral controller and thus a dual-path model
with a proportional path and an integral path are proposed and
the integral path is compensated. The nonlinear compensation
of the integral path is implemented with a 1-D lookup table.
The lookup table maps the inputs to an output value by
looking up or interpolating the table values with cubic-spline
interpolation methods.

PLL MATHEMATICAL MODEL DESIGN

As shown in Fig.1, the PLL frequency synthesizer is


divided to three partitions for model design. The first partition
contains the phase-frequency detector (PFD) and the
frequency divider. The second partition contains the charge
pump and the loop filter. The main purpose of this partition is
to establish the dynamics of the feedback loop. The third
partition only contains the voltage-controlled oscillator (VCO)
because the simulation time scale of the VCO is vastly
different from the loop filter.
A. PFD and Frequency Divider Model Design
This partition is essentially digital and thus a functional
model with a transport delay is sufficient. The frequency
divider is modeled with an ideal counter and the PFD is
modeled with ideal D flip-flops due to the nature of digital
circuits. An additional delay model is appended to construct
the transport delay of the digital circuits.

V
c
n
f
w

n
r
n
i
h
s
f
f
b
h
b
v
T
m

l
l
f
t

68

.indd

68

2015/03/11

11:00:25

a
r
m
r
g
t
S
r
d
t

y
.
)
o
s
t
l
n
e
e

e
w
e
a
d
e
l
r
r
s
h

a
l
d
n
.
y
e

Figure 2. Proposed VCO control voltage comparisons with the original circuits
and conventional PLL model
TABLE . MODEL INDENTIFICATION TIME AND SIMULATION
TIMES

Figure 1. Proposed mathematical model for PLL frequency synthesizer.

Table values are estimated from the data of the normalized


charge pump current dependency on the VCO control voltage.
The ARX model parameters are estimated with the presence of
the lookup table by nonlinear least square method. The
identification accuracy of nonlinear model is 97.2%. Finally,
this model passed the validation test using a different PRBS
input. The model accuracy is 97%.
C. Voltage Controlled Oscillator Model Design
The bottom part of Fig.1 shows the proposed model for the
VCO within nonlinear frequency-voltage relationship. It
contains a radial basis function neural network (RBFNN) for
nonlinear frequency-voltage mapping, a waveform generator
for frequency to phase conversion and a hard limiter for cosine
wave to square wave conversion.

original circuits and the conventional PLL model [4]. To


clearly show each ripple of the waveform, the waveform is
split to two parts at the middle of the simulation time. It can be
seen that each ripple width of the proposed model completely
coincides with the original circuit simulation results while the
conventional model fails to capture the cycle slip [5].
Table I summarizes the model identification time for the
PLL and the simulation time for Fig. 2. The proposed design
process requires additional 2802s for model identification and
therefore, the total time-consumption for post-layout
simulation is reduced to 26% of the original circuits with the
accuracy of 93%.

In the field of mathematical modeling, a radial basis function


network (RBFNN) is an artificial neural network that uses
radial basis functions as activation functions. The output of the
network is a linear combination of radial basis functions of the
inputs and neuron parameters. Radial basis function networks
have function approximation uses. Function approximation is
somewhat more complex because there is no obvious choice
for the centers. The training is typically done in two phases
first fixing the width and centers and then the weights. This can
be justified by considering the different nature of the non-linear
hidden neurons versus the linear output neuron. The x-
box in the radial basis layer accepts the input x (Vctrl) and the
vector , and produces a vector having 32 elements in Fig.1.
These elements improved the identification accuracy of VCO
model with 99.8%.
III.

REFERENCES
[1] L. Ljung, System Identification: Theory for the User, 2nd ed. Englewood
Cliffs, NJ, USA: Prentice Hall, 1999.
[2] B. Bond, Z. Mahmood, Y. Li, R. Sredojevic, A, Megretski,V, Stojanovic,
Y. Avniel, and L. Daniel, "Compact modeling of nonlinear analog circuits
using system identification via semidefinite programming and incremental
stability certification", IEEE Trans. Comput.-Aided Design Integr. Circuits
Syst., Vol. 29, No.8, pp. 1149-1162, Aug. 2010.
[3] L. Liu, T. Sakurai, and M. Takamiya, "A Charge-Domain Auto- and
Cross-Correlation Based Data Synchronization Scheme with Power-and AreaEfficient PLL for Impulse Radio UWB Receiver", IEEE J. Solid-State
Circuits, vol. 46, no. 6, pp. 1349-1359, June 2011.
[4] B. Razavi. Modeling and simulation, in Monolithic Phase-locked loops
and Clock Recovery Circuits- Theory and Design. New York, NY, USA:
IEEE Press, 1996.
[5] L. Liu, and R. Pokharel,"Post-Layout Simulation Time Reduction for
Phase-Locked Loop Frequency Synthesizer Using System Identification
Techniques, IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, Vol. 33, No. 11, pp. 1751-1755, Nov. 2014.

EXPERIMENTAL SETUP AND RESULTS

The proposed PLL frequency synthesizer model for postlayout simulation time reduction is shown in Fig. 1. The
layout data of the PLL frequency synthesizer from [3] is used
for parameter estimation of each building block. Fig.2 shows
the simulated VCO control voltage comparisons with the

69

.indd

69

2015/03/11

11:00:25

T
g
i
a
P
p
o
e
U
D

4.0 to 9.0 GHz 0.18 m CMOS Power Amplifier for UWB


Applications
1

H.Mosalam1, A. Allam1 ,H. Jia2, A. Abdelrahman1 and R. Pokharel2


Electronics and Communications Engineering Dept. ,Egypt-Japan University of Science and Technology, New Borg
Al-Arab, Alexandria, Egypt
2
E-JUST Center, Kyushu University, Nishi-ku 819-0395, Fukuoka, Japan
E-mail: hamed.mosalam@ejust.edu.eg

AbstractThe design of 4-9 GHz, two stages CMOS power


amplifier (PA) for Ultra-wideband (UWB) applications is
presented in this paper. The fabricated PA in TSMC 0.18 m
CMOS technology has a power gain S21 of 13.5 0.7 dB, an input
return loss S11 less than -5.0 dB and an output return loss S22 less
than -8 dB over the frequency range of interest. The two stages
PA achieves an average Power Added Efficiency (PAE) and
output 1-dB compression of 10 % and 2 dBm respectively all over
the band. In addition, the PA achieved excellent low measured
group delay variation of 190 60 ps with power consumption of
21 mW.

L1

R1
C1

INTRODUCTION

M3

L3

R3

C2

L2

Cout

E
U
o
G
J

Rl

Ls3

Lg1

RFIN

M1

Vgs3

Ls1

Fig. 1. Schematic of the proposed UWB-PA.

The Federal Communications Commission (FCC) released


the frequency range between 3.1 to 10.6 GHz for UWB
applications in 2002 [1]. UWB systems have very high data
rates and very low radiated power. For UWB systems, group
delay has a major effect on the system performance, as it is a
beneficial measure of time domain signal distortion [2]. In this
paper, two-stages UWB-PA, with flat gain and minimum
group delay variation, is designed and fabricated to operate
from 4 GHz to 9 GHz using TSMC 0.18 m CMOS
technology.
II.

M2

p
c

LD3

C3

Vgs1

Keywords: Ultra-wideband (UWB), Power Amplifier (PA),


Power Added Efficiency (PAE), Group Delay (GD).

I.

LD1

R2

Rs Cin

VDD2

VDD1

CIRCUIT DESCRIPTION

Fig. 1 shows the schematic of the proposed two stages


UWB-PA. The stagger-tuning concept is applied [2], and gain
flatness is achieved using two different center frequencies for
the two stages as seen in Fig. 2. The tuning frequencies of the
first and second stages are 8 GHz and 5 GHz, respectively. The
first stage consists of current reuse cascaded common source
configuration where source degeneration inductor Ls1 and gate
inductor Lg1 is used to achieve good input impedance matching
over the required bandwidth. The amplified signal from the
common source amplifier M1 is passed to the gate of transistor
M2 through a low impedance path formed by capacitor C1 and
inductor L1. Inductor L2 creates a high impedance path to block
the RF signal. The second stage is a common source amplifier
designed to maximize the power added efficiency and enhance
the gain at the beginning of the band as seen in Fig. 2. The
output of transistor M2 is passed to the second stage through
inter-stage matching circuit designed and optimized using
source-pull to achieve high flat gain, small group delay
variation and maximum power added efficiency.

Fig. 2.
UWB-PA.

Illustration of stagger tuning for the proposed


III.

MEASUREMENT RESULTS

The proposed UWB-PA has been designed and fabricated


in CMOS 0.18 m technology. Fig. 3 shows a micrograph of
the proposed 4.0 to 9.0 GHz UWB-PA.
The correspondence between the simulated and measured
S-parameter is displayed in Fig. 4 and 5. As can be seen in Fig.
4, the measured gain (|S21|) has high and flat power gain of 13.5
0.7 dB over the frequency band of interest. This gain flatness
is achieved by adjusting the tuning frequencies of the first and
second stage. The reverse isolation (|S12|), as presented in Fig. 4
is less than -45 dB over the required band. As displayed in Fig.
5, a measured input return loss (|S11|), of -5.0 ~ -13 dB, and a
measured output return loss (|S22|), of -8 ~ -12 dB are realized.

70

.indd

70

2015/03/11

11:00:28

This broadband input and output impedance matching enhance


gain flatness and improve group delay variation. As presented
in Fig.6, excellent small group delay variation of 60 ps is
attained across the entire band from 4 to 9 GHz. In the UWBPA design, the linearity is more important criteria than output
power level [1]. As shown in Fig.7, The average measured
output gain compression point (POut1dB) and power added
efficiency is 2 dBm and 10% at 4, 6 and 8 GHz. Finally, The
UWB-PA is unconditionally stable and consumes only 21 mW
DC power.
Table I shows a summary of the proposed UWB-PA
performance in comparison to other published UWB-PA.

Fig. 6. Comparison of simulated and measured group delay


variation.

ACKNOWLEDGMENT
The authors would like to thank the ministry of higher
Education (MoHE)-mission department, and Egypt-Japan
University of Science and Technology (E-JUST) for funding
our work, in addition, this work was partly supported by a
Grant-in-Aid
for
Scientific
Research
(B)
from
JSPS.KAKENHI (Grant no. 23360159).

Fig. 7. Measured P1- dB and power added efficiency.


TABLE I
Summary of the proposed UWB-PA performance in
comparison to other published UWB-PA.
Fig. 3. UWB-PA die photograph with an area of 0.77 mm2.

[2]

[3]

[4]

This work

Freq. (GHz)

3-10

3- 7

3- 10

4- 9

|S11 | (dB)

<-10

<-6

<-10

<-5.0

|S22 | (dB)

<-14

<-7

<-10

<-8

Gain (dB)

11 0.6

14 0.5

10 0.8

13.5 0.7

Gd (ps)

86

178

250

60

PAE (%)

NA

NA

NA

10

5.6

Area (mm )

0.77

0.88

1.76

0.77

Power (mW)

100

24

84

21

OP1dB
2

Fig. 4. Comparison of measured and simulated power gain


(|S21|) and reverse isolation (|S12|).

d
f

d
.
5
s
d
4
.
a
.

References

[1]
[2]

[3]

[4]

Fig. 5. Comparison of measured and simulated input return


loss (|S11|) and output return loss (|S22|).

REFERENCES
Federal Communication Commission, Revision of Part 15 of The
Commission's Rules Regarding Ultra-Wideband Transmission Systems,
First Report and Order, ET Docket 98-153, FCC 02-48, April 2002.
R. Sapawi, R. Pokharel, S. A.Z. Murad, A. Anand, N. Koirala, H.
Kanaya and K. Yoshida, Low Group Delay 3.110.6 GHz CMOS
Power Amplifier for UWB Applications, IEEE Microwave and
Wireless Components Letters, vol. 22, no.1, pp.41-43, Jan. 2012.
S. A. Z. Murad, R. K. Pokharel, A. I. A. Galal, R. Sapawi, H. Kanaya,
and K. Yoshida An Excellent Gain Flatness 3.07.0 GHz CMOS PA
for UWB Applications, IEEE Microwave And Wireless Components
Letters, vol. 20, no. 9, September 2010.
C. Lu, A.V. Pham and M. Shaw, A CMOS power amplifier for fullband UWB transmitters, 2012 IEEE Radio Frequency Integrated
Circuits (RFIC) Symposium, pp.400, 11-13 June 2006.

71

.indd

71

2015/03/11

11:00:30

Noh-Guide: Guidance System for


Novice Audience on Noh Stage

T
f
A

Hiroyuki Nakamura1, Masataka Maruta2, Yuta Sano2, Tsunenori Mine3


Graduate School of ISEE, Kyushu University 744 Motooka Nishi-ku, Fukuoka 8190395, Japan
Department of EECS, School of Engineering, Kyushu University 744 Motooka Nishi-ku, Fukuoka 8190395, Japan
3
Faculy of ISEE Kyushu University 744 Motooka Nishi-ku, Fukuoka 8190395, Japan
1

The Noh stage is composed of many actors by parts and


schools [2]. There are a main actor called Shite, his supporters
called Waki, singers called Jiutai, musicians called
Hayashi, and comical actors called Kyogen,. Stories of Noh
are based on classical literatures like the stories of the Heike
and the Genji in the Heian and the Kamakura periods. On the
other hand, Kyogen [3] is played only by two or three actors.
Its comical characteristics attract even novice audiences. In
addition, as it has a conversation in the casual language of the
Muromachi period without any chorus and playing instruments,
audiences can easily catch the conversation [4].

Abstract Noh, a traditional stage art in Japan, was selected as


an Intangible Cultural Heritage by UNESCO, 2008. However it is
not popular among Japanese because it uses the ancient language
in the Heian period. It is not easy for the audiences to understand
the language, in particular novice audiences of Noh. To overcome
the problem, we propose a real-time Noh-guidance system called
Noh-Guide. We demonstrated our trial system on a real stage of
Noh performance, in September 2014. The demonstration results
conclude our system helped every audience to understand the Noh
performance better.
Keywords-component; guidance system; traditional stage art

I.

INTRODUCTION

B. Aim of this system


Since we developed this Noh-Guide to help every audience can
understand Noh performance by using the three types of
information, which are provided as subtitles, on the stages. We
want to make sure if the subtitles will give many profits to users,
and both novice and experienced audiences can see the stage
with mobile devices without sacrificing their attention and mood.

Among a lot of Japanese traditional performance and music,


Kabuki and Noh [1] were selected as Intangible cultural
heritages by UNESCO, where Noh was established in the
Muromachi period (1338-1573). Noh has been regarded as the
oldest traditional stage art over the world. Although Noh was
popular among people in the Muromachi and the Edo period
(1603-1868), it was not accepted by majority. Today, few
people have been to a Noh Theater. At the same time, from the
point of view of Noh actors, there are many difficulties to foster
young actors because of Noh masters aging. For young
audiences and actors, Noh has a lot of difficulties and
complexities to understand. In particular, the literal part is so
difficult to understand for novice audiences. Under the above
reasons, we developed a mobile guidance system called NohGuide. This system helps every audience to understand a stage
performance of Noh. On the acting stage, the Noh-Guide
system enables us to get necessary guidance. Our system
provides three types of information: literary and colloquial
sentences with detailed explanatory texts. We call the three
types of information as Singing, Casual, and Commentary,
respectively. Such information is selectable on their own mobile
devices. In this paper, we present the overview of the system,
show and discuss some of highlighted results of the experiment
conducted on September 19, 2014. Finally we conclude the
paper and discuss our future work.
II.

C. System overview
The prototype of Noh-Guide is assembled by HTML5, Web
browser, and Web Server. There are actors, audiences, and a
system controller. The system controller plays a role of
operating the subtitles.

Singing

Highlighted

S
c

Display
area
Forward & back
Adjust

MOTIVATION

A. Difficulty and Complexity of Noh


One of difficulties to understand Noh performance is to listen
to its singing. The Noh singing is based on the literary style in
the Muromachi period. In general, as music and voices are
mixed, catching them individually is more difficult.

Change
mode to
explain

a
m
o
q
i
d

Figure 1. View of the system

72

.indd

72

2015/03/11

11:00:31

d
s
d
h
e
e
.
n
e
,

n
f
e
,
e
d.

b
a
f

B. Results
The answers to the questionnaires said that the difficulty of
Noh, the usefulness of our system, and the importance of
helpful subtitles as shown in Table 1.
Effectiveness of the Noh guide is shown through the answers
to this questionnaire. The program most appealed to the
examinees was Futari Hakama as we assumed because the
program was Kyogen and done by a living national treasure.
We also gathered free format comments from the examinees
to evaluate the effect of our system. According to the comments,
most examinees evaluated that our system helped them to
understand Singing and actors action. However, some
comments addressed some problems of the system as well.
When they focused on their mobile device, they were not able
to watch and concentrate the stage.

As mentioned earlier, our system provides three types of text


information: casual, singing and commentary.
Audiences can chose one of the three texts whenever they want.
The system can show the same text during the stage. The
function of Noh-Guide involves fast-forward and back turn.
Audience can adjust the scene and text as well.

a)

b) jumping

C. Discussion
Table 1 shows a part of results of the experiment. Examinee
found that the Casual subtitle is more helpful than the Singing
one. We consider this reason is that Singing is difficult to
understand for most audiences and the Casual language can
help them to watch Noh performance. We conclude that letting
the audiences understand the meaning of performance is an
important key to make them enjoy the performance.
Figure 2 shows actions of a main actor. The action
performance is linked with a song and background music,
where jumping action means avoiding waves over the sea. If
any novice audiences cant understand the meaning of singing,
they cant assume accurate information about the action either.
The most important result is that our system was able to help
all the examinee understand the meaning of Singing.

attacking

Stage of YASHIMA
Figure 2. Action of Shite
Table 1. Effects of Subtitle
Answer

Question

Which is more
helpful subtitle?
Casual or Singing.
Do you feel the
subtitle is suitable
to the performance?
Dose Noh-guide
help to understand
this stage?
Is Noh-guide easy to
control?

III.

Casual

Singing

Other

Yes

No

Other

10

IV.

CONCLUSION AND FUTURE WORK

In this paper, we discussed the Noh-Guide and the experiment


results conducted on the real stage. Unfortunately our system
sometime failed and stopped during the actual performance.
Because we constructed the system as a Web application, it was
difficult to treat wireless network of their mobile device at
theater. The system is similar to the earphone guide system of
Kabuki performance [5]. However the earphone guide is only
available to the Kabuki Theater in Tokyo. Since our system can
be used in every theaters, it can guide any Noh audiences at any
time and everywhere. We will try to improve the system and try
again to experiment in next version.

EXPERIMENT

A. Experiments
We demonstrated Noh-guide on the stage of Maibayashi of
Kiyotsune in the Kumamoto Prefectural Theater on
September 19th, 2014. The target stage was a part of the event
called Kumamot Noh Zanmai. Maibayshi is consisted of
Shite, Hayashi, and Jiutai.
The number of examinee was 10, 8 women and 2 men. The
average of their ages was 31. Every examinee brought their own
mobile device, iPhone or Android. They watched the stage with
own devices. We gathered their impressions from
questionnaires. At the same time, we tried to gather their system
interaction logs. However, our attempt failed because few
devices failed down during the performance.

ACKNOWLEDGMENT
We thank Ichiro Nakamura, who provide necessary data to
construct this system. We also would like to thank Kumamoto
Prefectural Theater for giving us a chance of this experiment.
REFERENCES
[1]
[2]
[3]
[4]
[5]

Arthur Waley. THE NO PLAYS OF JAPAN,ISBN 0-8048-1198-8


THE NOHGAKU PERFORMERS ASSOCIATION:
http://www.nohgaku.or.jp/
Kojima, Kenji, Masuzo Yanagida, and Ichiro Nakayama. "Variability of
vibrato-a comparative study between japanese traditional singing and bel
canto." Speech Prosody 2004, International Conference. 2004
Serper, Zvika. "Japanese Noh and Kygen Plays: Staging
Dichotomy."Comparative Drama (2005): 307-360..
KABUKI-ZA: http://www.kabuki-za.co.jp/

73

.indd

73

2015/03/11

11:00:32

Evaluating the Effectiveness of Hardware


Accelerators in Datacenters
Seyed Mortez Nabavinejad, Maziar Goudarzi
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Abstractthe proliferation of datacenters and their ever growing
applications has motivated the usage of hardware accelerators
such as GPUs and FPGAs due to their higher throughput
compared with conventional CPUs. We have extended the
Simware simulator to support hardware accelerators and then
conducted some experiments to evaluate whether or not the
hardware accelerators can reach their goal.

II.

C
r

A. Hardware Accelertors and Heat Recirculation


When one server is active and uses energy, obviously it
gives out some hot air. This produced hot air circulates in
datacenter and other servers bring it along with cold air that
comes from cooling unit. The hotter the recirculated air, the
more power the cooling system must use to neutralize its
effect. Fig. 1 illustrates the heat recirculation phenomena in a
typical datacenter.

Datacenter; hardware accelerator; energy consumption

I.

METHODOLOGY

INTRODUCTION

Nowadays, deployment of datacenters is on the increase.


Constructing a datacenter from base or extending a current one
and adding new servers is too costly, so some datacenter
owners have been using or have decided to use hardware
accelerators as a cheap and reasonable alternative. They install
the accelerators on existing servers and use them to
expeditiously increase the overall throughput [1]. There is no
doubt that accelerators, if using efficiently, can outperform
conventional CPUs in computation-intensive applications and
improve the throughput or decrease the runtime of applications.
But the question is whether or not they can decrease the
operational expenses such as energy cost and lead to more
financial prospers for datacenter owners. The reason is that
first: some accelerators such as GPUs consume the power
voraciously and second: this extra power consumption
exacerbates the thermal effect of servers on each other.
Consequently, these two side effects increase the power
consumption of cooling system. So it is important to choose the
accelerators carefully in a way that their advantages (reducing
runtime) conquer their disadvantages (increasing power
consumption) and yield reasonable financial revenue.

e
h
a
i
s
d
s
j

Adding one or more hardware accelerators to a server,


increases its power consumption and consequently, the air that
server gives out becomes hotter. So the hardware accelerator
not only uses extra power which increases the total power
consumption of datacenter, but also it worsen the thermal effect
and leads to more power consumption in cooling unit and these
effect most be considered when one decides to use hardware
accelerator.

s
c
a
h
2
h
c
d

B. SimWare Simulator
SimWare [2], is a holistic novel simulator that measures the
power consumption of datacenter by considering most effective
parameters such as servers, cooling system, fans and thermal
effect of servers on each other due to heat recirculation.

d
d
1
s
w
e
t

This simulator tries to simulate every aspect of a datacenter


in details. For example, it uses a heat recirculate matrix, which
is obtained by a precise simulator called BlueTool [3], to
calculate the impact that power consumption of one server has
on the inlet temperature of other servers. Considering this
effect is important since it increases the power consumption of
cooling unit.

For examining and evaluating the effectiveness of hardware


accelerators in a way that both pros and cons are included, we
have extended one of them most existing powerful simulator
that takes into account all the desired parameters. Our
extension upgrades the simulator so it can support hardware
accelerators and consider their effects on desired parameters.
Our experiments show that it is important to use accelerators
intelligently and carefully, otherwise the costs will exceed the
benefits and accelerators usage becomes unfruitful. The rest of
paper comprises an introduction to simulator and our
extensions on it with some experimental results that show the
effectiveness of accelerators under different scenarios.

i
w
w
m
s
t
e

We have expanded this simulator and added some new


features such as supporting hardware accelerators and taking
into account their effect on power consumption and thermal
effect which makes cooling system to consume more power.
With this extended version of SimWare, we are able to conduct
various experiments and evaluate the effectiveness of hardware
accelerators.
III.

EXPERIMENTAL SETUP AND RESULTS

For conducting experimental results, we have used


SHARCNET utilization trace which is a Standard Workflow
Format [5] trace from experimental datacenters and parallel
processing systems. The trace contains 8000 jobs and for each
job features such as job ID, amount of memory, number of

74

.indd

74

2015/03/11

11:00:33

t
n
t
e
s
a

,
t
r
r
t
e
e

e
e
l

r
h
o
s
s
f

w
g
l
.
t
e

d
w
l
h
f

Figure 1. Heat recirculation in datacenter [4]

CPUs, CPU utilization, runtime, wait time and some others are
recorded.
We assumed that one hardware accelerator is attached to
each server and every job can use them. The accelerators can
handle only one job at a time and when a job is using the
accelerator, the speedup would be 10X. However 10X speedup
is hypothetical and each different job might obtain different
speedup, but as we want to compare different accelerators with
different power consumption and the amount of speedup is
same for all the accelerators, this number (10X speedup) is
justifiable.

Figure 2. Effect of hardware accelerator on energy consumption

The left circle in Fig. 2 represents the energy consumption


of datacenter when there is no hardware accelerator and the
right one shows the energy consumption when the hardware
accelerator power consumption is around 70W. After 70W
point, we see that employment of hardware accelerator worsen
the energy consumption.
From these results, we can conclude that power hungry
hardware accelerators such as GPUs, which typically consume
more than 100W, must be used carefully otherwise they may
even worsen the energy consumption and consequently
increase the total cost of datacenter.

Our main goal here is to compare different accelerators


such as FPGAs and GPUs and perceive under what
circumstances (power consumption and speedup) a hardware
accelerator can be beneficial. For hardware accelerators, we
have considered different amount of power consumption, from
20W (LM605 Virtex 6) up to 375W (Tesla K80 GPU). We also
have measured the energy consumption of datacenter in the
case there is no hardware accelerator. The specifications of
datacenter during simulations are presented in Table 1.

There are some open streets for researchers who want to


explore the hardware/software co-design opportunities in
datacenters. One interesting challenge is to determine the most
efficient number of accelerators that can be used in a datacenter
in a way that both energy consumption and execution time can
be improved.

The results show that although the accelerators can


decrease the total execution time of jobs (from 170712 seconds
down to 166068 seconds, by this condition that they can give
10X speedup for any job that is running on them) but only
some of them can decrease the energy consumption compared
with situation that there is no accelerator. Fig. 2 represents
energy consumption of whole datacenter after executing the
trace.

Another direction that can be interesting is the placement of


hardware accelerators on servers by considering the heat
recirculation effect. Here the goal is choosing the servers that
attaching hardware accelerators to them have the least thermal
effect on other servers.

We can conclude that sole attention to increased throughput


is misleading and the power consumption of accelerator along
with heat recirculation effect of it must be considered. In Fig.2
we can see that if the power consumption of accelerator is
more than 70W, the energy consumption of datacenter will
surpass the case that there is no hardware accelerator even if
the accelerator can provide 10X speedup for all the jobs that it
executes.
TABLE I.

REFERENCES
[1]
[2]
[3]
[4]

DATACENTER SPECIFICATIONS DURING SIMULATIONS

No. of Chassis

No. of Servers
in each Chassis

50

10

No. of Cores
in each
Server
10

[5]

Power
Consumption
of Cores (total)
130W

http://www.eetimes.com/document.asp?doc_id=1324372
Y. Sungkap and H. H. S. Lee, "SimWare: A Holistic Warehouse-Scale
Computer Simulator," Computer, vol. 45, pp. 48-55.
http://impact.asu.edu/BlueTool
A. Pahlavan, M. Momtazpour, and M. Goudarzi, "Data center power
reduction by heuristic variation-aware server placement and chassis
consolidation," in Computer Architecture and Digital Systems (CADS),
2012 16th CSI International Symposium on, pp. 150-155.
www.cs.huji.ac.il/labs/parallel/workload/swf.html

75

.indd

75

2015/03/11

11:00:33

High Performance Two-Select Arbiter


Maher Abdelrasoul1, Mohammed Sayed1, Victor Goulart1,2
1
2

ECE department, Egypt-Japan university of Science and Technology (E-JUST), Egypt

Center for Japan-Egypt Cooperation in Science and Technology, Kyushu University, Japan
email: {maher.salem, mohammed.sayed}@ejust.edu.eg, victor.goulart@acm.org

AbstractArbitration circuit plays an important


role in defining the system performance and latency in
systems having shared resources. In this paper, we
focused on the problem of granting two requesters
simultaneously. We proposed a new solution to fix a
bug of three-Dimentional programmable two-select
circuit, which is the fastest arbitration circuit in the
literature. We implement all architectures concerning
the two-select arbitration on 65 nm CMOS technology.
Our circuit showed the smallest delay and, in average,
showed the second smallest area among all analogue
circuits.
I.

stages is divided into two vectors; the left bits, and the
right bits. Finally, the edge detector is used to detect the
zero-one transition. The detected one at the first vector
represents the first active requester, while the detected one
at the second vector represents the second active
requester. The main problem in this algorithm occurs
when the highest priority requester is active. The
sequence of additions results in a vector of ones. This
vector when passed to the edge detector stage results a
vector of zeros. This bug was not mentioned in the
original paper [3].
The bug in the 3DP2S architecture was fixed in [4].
We will name this circuit 3DP2S_OZU. Its idea is based
on ANDing the priority vector with input request vector
to indicate whether the highest priority requester is active
or not. If it is active, the first grant vector will be the
priority vector itself where the priority vector grants the
highest priority requester. On the other hand, if the
highest priority requester is not active, the grant vector
generated by the circuit will be activated. The control is
done through a multiplexer circuit. The fixed circuit is
shown in Fig. 2.

INTODUCTION

Arbiters are usually found in systems that have


shared resources with many requesters. Arbiters are used
to control access to those resources. Mostly it is preferred
to use strong fairness arbiters to avoid starvation at any
requester. The most widely known arbiter is Round Robin
Arbiter (RRA), which is simple in design, fast, and has
strong fairness [1]. There are many researches on arbiter
design aiming to have fast arbitration circuits. Most of
them consider that the shared resource cannot accept
access from more than one requester [2]. However, some
techniques accept more than one requester to access the
same resource at the same time with more control. In our
work, we focus on two-select arbiter, which controls the
granting of not more than two requesters among many
requesters. However, the architectures presented in this
paper, with little modification, can be used in multi-select
arbiter for systems having availability to access many
resources at the same time. Our architecture shows the
best performance over the other working architectures we
are aware of.
II.

RELATED WORK

Fig. 1: 8-point 3DP2S arbitration circuit.

In the state-of-art, only two researches targeted twoselect RRAs. The first one is 3-Dimentional
programmable 2-select (3DP2S) arbiter. 3DP2S was
proposed in [3]. The arbitration circuit of 8-point 3DP2S
is shown in Fig. 1. It consists of log2(8) stages of unit
blocks (UBs) and edge detector stage. The UB of 3DP2S
is a thermometer-coded adder saturated at 2 i.e. whenever
the sum of the two inputs is larger than 2 the output would
be 2. Further, the UB also takes two pointer bits (i.e.
inputs priorities) and outputs the OR result of them. The
pointer bits control the adder function. If the right input
priority is logic high, the UB will not add its inputs and
pass the right input as it is. Therefore, the result of
additions will propagate through stages to the paths of
requesters that have lower priority. The result of the three

Fig. 2: 3DP2S_Ozu modified arbitration circuit.

Another two-select arbiter, named RRA-2pick-Ozu,


was proposed in [4]. Its architecture is based on fixed
priority encoder (FPE) idea [5]. FPE picks the first two
active requesters in its input. It was shown that RRA-

76

.indd

76

2015/03/11

11:00:34

2pick-Ozu architecture has a significant reduction in area,


where the 3DP2S architecture has higher performance for
arbiter having a number of requesters lower than 32.
III.

PROPOSED SOLUTION

We have worked on the 3DP2S to solve its bug in a


more efficient way to improve its performance. Our idea
is also based on ANDing the priority vector with input
request vector to indicate whether the highest priority
requester is active or not. However, our circuit is
different. Our circuit ANDs the priority vector with the
request vector then, ORing the result with the grant vector
which is generated by the arbitration circuit. Our circuit is
shown in Fig. 3. Comparing our circuit, named as
3DP2S_E with the 3DP2S_Ozu, we expect our circuit to
improve both time and area. In terms of area, we have
replaced the OR tree and the multiplexer circuits by a bitwise OR circuit. Figs. 2 and 3 show the critical paths of
the two circuits as dashed lines. Our circuit just increased
the critical path of the 3DP2S circuit by one OR logic
gate. On the other hand, in 3DP2S_Ozu the critical path
increased by a multiplexer. In terms of area, both of our
circuit and 3DP2S_Ozu circuit have a bitwise ANDing
circuit. However, our circuit uses a bitwise ORing circuit
instead of the OR tree and the multiplexer circuits used in
3DP2S_Ozu. In next section, a complete evaluation study
and numerical results are presented.

Fig. 4: delay results

Fig. 5: area results

V.

In this paper, we presented a complete literature


review on the problem of granting two requesters at the
same cycle time. We focused on a bug in the circuit with
the smallest delay for arbiters that is used with low
number of requesters. Further, we reviewed the only
solution that was presented in the literature. We proposed
a new solution that solves the bug and implemented the
design on ASIC CMOS technology. Our proposed design
shows the best results in terms of delay for arbiter sizes
starting from 4 to 32. Besides, it has smaller area than
other solutions.

Fig. 3: 3DP2S arbitration circuit with our modification.

IV.

CONCLUSIONS

EVALUATION AND RESULTS

We evaluated the performance of the proposed


3DP2S_E architecture versus both 3DP2S_Ozu and
RRA_2pick_Ozu architectures for arbiter sizes starting
from 4 to 32 in terms of delay and area. We described the
three architectures in VHDL. We verified the correctness
of each description by simulating it using ISim tool of
Xilinx 14.5. Synthesis has been done using Cadence
Encounter RTL compiler RC11.10 with TSMC 65 nm
technology. The delay and area results are shown in Figs.
4 and 5 respectively. In terms of delay, RRA_2pick_Ozu
circuit shows a high delay for all sizes of the arbiter
except for 32 point arbiter. Therefore, if we keep the
competition between 3DP2S_E and 3DP2S_Ozu, our
design (3DP2S_E) shows the smallest delay with average
reduction 5% compared to the 3DP2S_Ozu circuit.
Regarding the area, although RRA_2pick_Ozu circuit still
has the advantage of the smallest area, our circuit as
expected shows an average reduction of 4% in area than
the 3DP2S_Ozu.

REFERENCES
[1] W. J. Dally and B. Towles, Route Packets, Not Wires: OnChip Inteconnection Networks, In Proc. of 38th Design
Automation Conference (DAC), pp.684-689, 2001
[2] M. Abdelrasoul, M. Ragab, V. Goulart, "Impact of Round
Robin Arbiters on router's performance for NoCs on
FPGAs," IEEE International Conference on Circuits and
Systems (ICCAS), pp.59-64, 2013
[3] J. S. Ahn, D. K. Jeong, and S. Kim, Fast three-dimensional
programmable two-selector, Electronics Letters, vol. 40, no.
18, 2004
[4] H.F. Ugurdag, F. Temizkan, O. Baskirt, and B. Yuce, Fast
two-pick n2n round-robin arbiter circuit, Electronics Letters,
vol. 48, vo. 13, 2012
[5] P. Gupta and N. McKeown, Designing and implementing a
fast crossbar, Micro IEEE, vol.19, Issue 1, pp.20-28, 1999

77

.indd

77

2015/03/11

11:00:34

An Efficient Data Dependence Profiling Method for


Parallelisation on Embedded Systems
Mostafa M. Abbas1 and Ahmed El-Mahdy1,2
Computer Science and Engineering Department, E-JUST, Alexandria, Egypt
2
On-Leave from Computer and Systems Engineering Dept., Alexandria University, Alexandria, Egypt
{mostafa.abbas, ahmed.elmahdy}@ejust.edu.eg
1

profiling methodology; finally, Section IV concludes and


discusses future work.

Abstract By incorporating multi-core and GPU technologies


into embedded systems, their performance is rapidly improving.
Unfortunately, the execution of applications on these systems
suffers from tight constraints on computation and energy
resources. Therefore, it is important for embedded applications
to be able to execute efficiently. We aim to develop and integrate
a low overhead profiling subsystem onto the LLVM compilation
system. The profiling subsystem will make use of the device idle
time and the large storage space generally available in embedded
devices to perform incremental dependence profiling, which gives
suitable information for dynamic parallelism analysis, potentially
allowing for exploiting the parallel resources on such emerging
systems.

II. RELATED WORK


Recent work on dynamic dependence analysis includes the
SD3 system which mainly targets loop parallelism [1]. An
interesting aspect of SD3 is its focus on optimising the
proling phase, performing linear interpolation on memory
accesses to reduce memory overhead, and parallelising the
proler itself to speed-up the whole process.
A recent development based on SD3 is called Multislicing [2]. It uses compiler-provided information on memory
aliasing to cluster memory accesses into slices that can be
analyzed independently. This partitioning is then used to
parallelise the analysis, by running several distinctly
instrumented versions of the program, one for each slice.

Keywords- LLVM compiler; profiling; HPC.

I.
INTRODUCTION
With the incorporation of multi-core and embedded GPUs,
the performance capabilities of embedded systems are rapidly
improving. However, exploiting parallelism is even harder than
the traditional systems owing to much constraint on serial
performance, memory, and power. Moreover, embedded
applications generally require meeting real-time constraints,
which further increase application development complexity.
Therefore, developers must utilise tools to aid in determining
application and system performance bottlenecks, and
characteristics. Such information can help in both manual and
automatic complier optimisations.

Kremlin [3] uses the notion of self-parallelism, which in


turn is based on evaluating an amount of work (apparently
instruction counts) and critical path analysis. Kremlin also uses
a shadow memory, which seems to keep several versions
(availability times) of memory accesses.
Kims thesis [4] also covers privatization, pipelining, and
other enabling transformations.
The POSH system [5] is a representative example of a
mixed system, combining static analysis to select potential
parallel tasks and utilise dynamic data to rene this selection.
This is the closest to our approach.

Software profiling plays an important role in optimising


performance. According to the 80-20 principle, generally
software spends 80% of the time in executing 20% of the
codes; thus it is important to identify the most time-consuming
kernels so that developers/compilers can optimise them to
enhance software performance.

The ALCHEMIST system [6] uses the notion of distance,


which gives a measure of the effectiveness of parallelising that
construct, as well as identifying the transformations necessary
to facilitate such parallelisation. ALCHEMIST builds an
execution index tree at run-time. This tree is used to
differentiate among multiple instances of the same static
construct, and leads to improved accuracy in the computed
profile, useful to better identify constructs that are amenable to
parallelisation.

With the incorporation of parallel resources into the


embedded systems, the task of profiling becomes more
complicated. The profile needs to identify the dependence
relations among program statements, so as to allow for
parallelisation of the software. Running such profiling on the
embedded system has the advantage of decreasing the
development costs (e.g. no need to detailed system simulator
with associated high developed and execution costs), as well as
allowing for possible on-system dynamic compilation.

The Parwiz tool [7] used to capture all data dependencies


happening at run time, and to build from that a dependence
graph covering all sequentially constraints between program
structures, at various levels of resolution.

The reset of this extended abstract is organized as follows:


Section II discusses related work; Section III introduces our

78

.indd

78

2015/03/11

11:00:35

The Prospector [8] is a dynamic binary-instrumentation


approach, which dynamically detects frequently executed loops
and the data dependences that they carry.

Source
Code
CFG Analysis
Instrumentation

It is worth noting that none of the above systems has


targeted the embedded system platform.

Data Dependence
Profiler

Binary
Code

III. METHODOLOGY
We aim to develop and integrate a low overhead profiling
subsystem onto the well-known LLVM compilation system.
The system allows for just-in-time (JIT) compilation, allowing
us to implement dynamic code analysis. To decrease the
perceived profiling overhead, the profiling subsystem will
make use of the device idle time and the large storage space
generally available in embedded devices to perform
incremental profiling, that gives suitable information for
parallelism analysis.

Programmers/
Compilers
Figure 1. A proposed diagrame of our system.

Our future work will include implementing the full system


using the LLVM compilation framework. That would follow
by a detailed performance and cost study running typical high
performance applications on the embedded platform. The final
goal is to develop a parallelising compilation system for the
embedded system and to integrate with the proposed profiling
methodology.

Our proposed subsystem aims to detect frequently executed


loops and the data dependences that they carry, and exploit it in
the parallelisation process.

ACKNOWLEDGMENT

A typical parallelisation process on a loop-intensive


program consists of the following four steps:

This research has been supported by the Ministry of


Higher Education (MoHE) of Egypt through a MSc
fellowship.

(1) Finding candidate loops for parallelisation,


(2) Analyzing data dependences in the loops,

REFERENCES

(3) Parallelising the loops, and

[1]

M. Kim, H. Kim, and C.-K. Luk, SD3: A Scalable Approach to


Dynamic Data-Dependence Profiling, in 2010 43rd Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO), 2010, pp.
535546.
[2] H. Yu and Z. Li, Multi-slicing: A Compiler-supported Parallel
Approach to Data Dependence Profiling, in Proceedings of the 2012
International Symposium on Software Testing and Analysis, New York,
NY, USA, 2012, pp. 2333.
[3] S. Garcia, D. Jeon, C. M. Louie, and M. B. Taylor, Kremlin:
Rethinking and Rebooting Gprof for the Multicore Age, in
Proceedings of the 32Nd ACM SIGPLAN Conference on Programming
Language Design and Implementation, New York, NY, USA, 2011, pp.
458469.
[4] M. Kim, Dynamic program analysis algorithms to assist
parallelization, Ph.D. dissertation, Georgia Institute of Technology,
2012.
[5] W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas,
POSH: A TLS Compiler That Exploits Program Structure, in
Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming, New York, NY, USA, 2006, pp.
158167.
[6] X. Zhang, A. Navabi, and S. Jagannathan, Alchemist: A Transparent
Dependence Distance Profiling Infrastructure, in International
Symposium on Code Generation and Optimization, 2009. CGO 2009,
2009, pp. 4758.
[7] A. Ketterlin and P. Clauss, Profiling Data-Dependence to Assist
Parallelization: Framework, Scope, and Optimization, in 2012 45th
Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO), 2012, pp. 437448.
[8] M. J. Kim, Chi-Keung Luk, and Hyesoon Kim, Prospector:
Discovering parallelism via dy- namic data-dependence profiling,
Georgia Institute of Technology, Technical Report TR-2009-001, 2009.
[9] L. Fagui, L. Shengwen, X. Ran, and L. Chunwei, A low-overhead
method of embedded software profiling, in ISECS International
Colloquium on Computing, Communication, Control, and Management,
2009. CCCM 2009, 2009, vol. 4, pp. 436439.
[10] M. (Maury) Bach, M. Charney, R. Cohn, E. Demikhovsky, T. Devor, K.
Hazelwood, A. Jaleel, C.-K. Luk, G. Lyons, H. Patil, and A. Tal,
Analyzing Parallel Programs with Pin, Computer, vol. 43, no. 3, pp.
3441, Mar. 2010.

(4) Verifying and optimising the parallelised loops.


Profiling targets the rst and second steps; the result of
which can guide parallelisation of the code by programmers
and compilers.
Our proposed subsystem (Fig.1) consists of two phases of
analysis: static and dynamic analysis. The static phase is used
to select potential parallel tasks (loops) and analyze datadependence. This phase used to reduce the run time analysis
overhead by reducing the profiling space into tasks and specific
statements within tasks. The dynamic phase is used to rene
the selected tasks and to complement the weaknesses of the
static data-dependence (conservative) analysis since all
memory addresses are resolved in runtime.
The dynamic analysis will make use of adaptive sampling.
Sampling technology [9][10] collects program running context
without modifying source code during compilation. As
embedded OS can generate many asynchronous interrupts, we
can make use of this characteristic and sample programs
running context on certain frequency to get a library of
samples, so developers can analyze these samples with other
program data to profile embedded software. The frequency of
sampling will be guided by the static analysis phase.
IV. CONCLUSIONS AND FUTURE WORK
This extended paper proposes a new profiling methodology
for guiding parallelisation of applications on embedded
systems. The methodology combines static and dynamic
profiling utilised in the literature for general systems, while
relying on more extensive static analysis to filter out runtime
dynamic analysis, adaptive sampling, and on exploiting the
idle-time characteristics of the embedded systems.

79

.indd

79

2015/03/11

11:00:36

An Improved Performance K-best Sphere Decoding


Algorithm for MIMO Channels
Ibrahim Al-Nahhal1 , Masoud Alghoniemy2 , Osamu Muta3 , Adel B. Abd El-Rahman1 , Hiroshi Furukawa4
1

Egypt-Japan Univ. of Science and Technology, Egypt. ({ibrahim.al-nahhal},{adel.bedair}@ejust.edu.eg).


2

3
4

University of Alexandria, Egypt. (alghoniemy@alexu.edu.eg).

Center for Japan-Egypt Cooperation in Science and Tech., Kyushu Univ., Fukuoka, Japan. (muta@ait.kyushu-u.ac.jp).

Graduate School of Information Science and Electrical Eng., Kyushu Univ., Fukuoka, Japan. (furuhiro@ait.kyushu-u.ac.jp)

AbstractA variant of the K-best (KB) MIMO decoding algorithm


is proposed, namely, improved performance K-best (IPKB). The
IPKB achieves noticeable performance improvement up to 1-dB while
its complexity remains similar to the traditional KB. Performance
improvement is the result of considering more nodes in the upper
tree levels while discarding similar number of nodes near the end
of the search tree. Complexity analysis and simulation results are
presented.
Index TermsMIMO systems, K-best, sphere decoder.

NiIP KB

iU
i=M +1
iL

d
o
g
>
t
i

w
(
P

(3)

(4)

w
M
n
n

KB
KB
where card(IP
) is the cardinality of the set IP
which is the
i
i
number of nodes at level i that have distance metrics smaller than the
pruned radius i . In particular, we provide a heuristic for determining the
pruned radius, i at a specific tree level i. In particular

In additive white Gaussian noise environment (AWGN), the maximum


likelihood (ML) decoder is the optimum decoder where the ML solution
M L that minimizes the 2-norm of
finds the symbol estimate x
(1)

i K 2 dmin
i
,
i =
10SN R/10

where x is the M 1 transmitted vector, y is the N 1 received vector,


and H is N M channel matrix whose elements, hnm , represent the
Rayleigh flat-fading gains from transmitter m to receiver n, is the
lattice whose points represent all possible codewords at the transmitter.
The sphere decoder (SD) reduces the computational complexity by
limiting the search space inside a sphere of radius centered at the
, should satisfy
received signal vector y. The estimated signal vector, x
the radius constraint y H
x < [3]. The SD transforms the closestpoint search problem into a tree-search problem by factorizing the channel
matrix, H = QR, where Q is a N N unitary matrix and R is an upper
triangular matrix of size N M . Thus, (1) can be rewritten as

IP KB

), K)
max(card(i
= K

2K max(card(IP KB ), K)
i+M

KB
= {j|dji < i , j {1, 2, , 2K 1}, i U }
IP
i

II. BACKGROUND

SD = arg min 
x
y Rx ,

The IPKB achieves performance improvement over the traditional KB


algorithm without sacrificing its complexity. Performance improvement is
the result of visiting more nodes in the upper tree levels, while discarding
the same number of discarded nodes in the lower tree levels in order to
preserve the complexity. Lower tree levels are defined as L = {i|2 i <
M +1} while upper tree levels are defined as U = {i|M +1 < i 2M }.
It should be noted that the middle level M + 1 represents a neutral level
where its nodes are kept unmodified (see Fig. 1). In particular, discarded
nodes from a lower level i L are reused in the corresponding upper
level i + M U . The number of surviving nodes in each tree level,
NiIP KB , is given by

In multi-input multi-output (MIMO) communication systems, the traditional K-best sphere decoder (KB) memorizes the best K-nodes at each
level of the search tree [1].
In this paper, a variation of the KB decoder for MIMO systems
is proposed, namely, improved performance K-best (IPKB). The IPKB
provides performance improvement without complexity increase. Unlike
[2] that presents an algorithm for a reduction in complexity without
performance degradation, this paper provides performance improvement
without complexity increase. The chosen K-nodes include irrelevant
nodes that increase the decoding complexity without performance improvement; by discarding these irrelevant nodes, one can decrease the
complexity without compromising the performance as mentioned in [2].
We can invest the discarded irrelevant nodes in some tree levels by
increasing the visited nodes (VNs) in other tree levels with the same
number of discarded nodes, in order to improve the performance at the
same complexity value.

a
n

III. T HE I MPROVED P ERFORMANCE K- BEST A LGORITHM (IPKB)

I. I NTRODUCTION

M L = arg min y Hx2 ,


x

F
S

metrics d(x) = 
y Rx can be computed recursively using partial
Euclidean distances [3].
Alternatively, the KB algorithm traverses the tree in a breadth-first
search strategy at each level. Where, a decision is made on a level-by-level
basis by keeping only the best K-nodes that corresponds to the smallest
K-distance metrics. Note that this paper uses real form representation
mentioned in [4].

i = 2M, 2M 1, , 2

(5)

is the minimum distance metric at tree level i. It is clear that


where dmin
i
in this case, the radius value of the sphere, i , is not fixed and varies
depending on the tree level, i, and the operating SNR for a specific K
value. Note that the maximum number of excess nodes in the upper levels
Ui cant exceed K 1; similarly, the maximum number of discarded
nodes in the lower levels Li cant exceed K 1.
Figure 1 illustrates a numerical example for the IPKB for 16-QAM
signaling with 2 2 MIMO and K = 2 at SNR = 5dB. According
KB
to (5), (3), and (4), 4 = 9dmin = 1.35, card(IP
) = 3; hence
4
N4IP KB = max(3, 2) = 3, then the number of excess nodes compared
to the traditional KB is only one node at this level. In order to preserve
the same complexity, one node is discarded from the first tree level, i = 2
and reused in the fourth level, i = 4, as shown in Fig. 1. Note that the
number of VNs in Fig. 1 is the same as the number of VNs if we take
two nodes from each tree level (i.e., K = 2).

(2)

is the subset of the lattice that lies inside the


where y
= QH y and
sphere of radius centered at the received signal vector y. The distance

U
V

K
H

80

.indd

80

2015/03/11

11:00:37

t
l
t
n

B
s
g
o
<
.
l
d
r
,

e
e
e

t
s

s
d

M
g
e
d
e
2
e
e

10
12.8
13

1.2

9.8 8.8
11

9
8.8

14
+1

10

8.9

9
9

10

14

8
9

10

9.1

14

7.8

2.8

+3

9
14

15

Neutral
level

IPKB solution

10

Complexity curves
4

10
BER

.2
7.8

8.9

9.9

-1

Increase nodes at upper level and discard the


same number of nodes from the lower level

.15

8.9 0.05
9

1.2

0.15

-3

Level (i)

Upper level

4 = 9 dmin
= 1.35

Lower level

SNR = 5dB

10

10

10

Fig. 1. Tree representation of IPKB for 16-QAM 2 2 MIMO and K = 2 at


SNR = 5dB.

10

SNR (dB)

15

K = 2 340
IPKB2 320
K = 4 300
IPKB4 280
K = 6 260
IPKB6 240
220
200
180
160
140
120
100
80
60
40
20
0
20

Average visited nodes

Fig. 2. Performance and complexity comparison of IPKB and KB for 16-QAM


over 4 4 MIMO channels.

IV. C OMPLEXITY A NALYSIS


V. S IMULATION R ESULTS AND D ISCUSSIONS

In what follows, we determine the complexity of the traditional KB


and IPKB algorithms. Complexity in this paper is defined as the average
number of visited nodes (VNs) in order to find the solution.

The performance of the proposed decoder is compared to the KB


decoder. It is assumed that the transmitted power is independent of the
number of transmit antennas, M , and equals to the average symbols
energy in a Rayleigh fading channel.
Performance improvement of the IPKB over the traditional KB is
illustrated in Fig. 2 for 16-QAM and 4 4 MIMO system; where the
improvement is in the order of 1-dB in case of K = 4 at the same
KB complexity shown in the same figure. The performance improvement
came from discarding irrelevant nodes from lower tree levels, and then
invest the same number of discarded nodes to increase the visited nodes
in upper tree levels. Thus, the total number of VNs still the same with
performance enhancement.

A. Complexity of the KB
To determine the complexity of the KB algorithm, tree levels are
divided into two groups. The first group contains tree levels where number
of available nodes in each tree level, NiKB K, whereas the second
group contains tree levels where number of available
nodes per level
> K. Note that, each survived node is expanded into q child nodes in
the next tree level. Then,
number
(the same as available nodes
 K of VNs
in this group), N KB = q pj=0
NjKB , for the first group is
N KB =

PK 1

 j
( q) ,
j=0

( q)PK K,

VI. C ONCLUSIONS
We have proposed a modified K-best sphere decoding algorithm,
namely, improved performance K-best (IPKB). The IPKB achieves performance improvement over the KB without increasing its complexity.
We have provided complexity analysis for proposed algorithm. Simulation
results have confirmed the improvements of the proposed decoders.

(6)

where
PK is numberof tree levels in first group for specific K. Given that
( q)PK K < ( q)PK +1 and knowing q and K, we can determine
PK [6].


ln(K)
PK =
,
(7)

ln( q)

R EFERENCES
[1] Z. Guo, P. Nilsson, Algorithm and Implementation of the K-best
Sphere Decoding for MIMO Detection, IEEE Journal On Selected
Areas In Communications, pp. 491-503, 2006.
[2] I. Al-Nahhal, M. Alghoniemy, O. Muta, A. B. Abd El-Rahman,
H. Furukawa, A Reduced Complexity K-best Sphere Decoding
Algorithm for MIMO Channels, JEC-ECC, 2015.
[3] Y. Hsuan Wu, Y. Ting Liu, H. Chang, Y. Liao, H. Chang, EarlyPruned K-best Sphere Decoding Algorithm Based on Radius Constraints, ICC, pp. 4496-4500, 2008.
[4] M. O. Damen, H. El Gamal, G. Caire, On maximum-likelihood
detection and the search for the closest lattice point, Information
Theory, IEEE Transactions, pp. 2389-2402, 2003.
[5] R. Shariat-Yazdi, T. Kwasniewski, Configurable K-best MIMO
Detector Architecture, ISCCSP, pp. 1565-1569, 2008.
[6] R. Graham, D. Knuth, O. Patashnik, Concrete Mathematics,
Addison-Wesley, 1989.

where . is the floor operation. For example, consider 16-QAM for 22
MIMO shown in Fig. 1. In case of K = 6, from Eqs. (6) and (7), the
number of tree levels for the first group PK = 1.29 = 1 and the total
nodes of the first group N KB = 4.
For the second
group, each tree level has a fixed number of VNs,

NiKB+ = K q nodes. Then, total number of VNs in the second group

N KB+ = (2M PK 1)K q.

(8)

KB
Using PK from (7), the complexity of the KB, CK
, is total number of
VNs



1 ( q)PK +1
KB
(9)
+ (2M PK 1)K ,
CK = q

1 q

B. Complexity of the IPKB


By construction, the complexity of IPKB equals the complexity of
KB algorithm because both algorithms visit the same number of nodes.
Hence,
IP KB
KB
CK
= CK
(10)

81

.indd

81

2015/03/11

11:00:39

.indd

82

2015/03/11

11:00:39

.indd

83

2015/03/11

11:00:39

Challenges in Machine Learning Driven Compilation


Arnaldo J. Cruz1 , Antoine Trouve1,2 , Hiroki Fukuyama1 , Hadrien Clarke2 , Kazuaki J. Murakami1,2 ,
Masaki Arai3 , Tadashi Nakahira3 , and Eiji Yamanaka4
1

Kyushu University
Institute of Systems, Information Technologies and Nanotechnologies
3
Fujitsu Laboratories Limited
4
Fujitsu Limited

This paper outlines the different ML driven compilation approaches that have been proposed in related works
and the challenges that need to be resolved to make it
practical.

Abstract Modern day compilers rely on heuristics to choose an optimization strategy to apply to
an input program with the objective of improving its
performance. An optimization strategy is defined as a
sequence of code optimization techniques and their respective parameters. Nevertheless, this approach has
turned out to be suboptimal, and researchers have
proposed using machine learning (ML) driven compilation as a way of improving the optimization scenario
selection. This paper first outlines the different ML
driven compilation approaches that have been proposed in related works. Then it identifies from the
state of the art the five major challenges to be resolved
in order to make ML driven compilation practical.

II. What is ML driven compilation?


Machine learning (ML) is a field in computer science
concerned with the development and application of algorithms that can automatically find structure within data
and predict future outcomes. In ML driven compilation,
a traditional compilation flow is expanded with a ML subsystem, which we call a predictor, that aids it in choosing
an appropriate optimization strategy for a given input
program. In this work we define an optimization strategy
as a sequence of optimizations and their parameters, and
the optimization space as the set of optimization strategies that can be chosen. The predictor utilizes a numerical
characterization of the input program to drive the compiler as to which optimization strategy to apply, in order
to produce a speedup greater than the compiler could
achieve by itself.

I. Introduction
The compilers job is to translate human-readable code
into machine code that makes efficient use of hardware resources. To apply sequences of optimizations, compilers
rely on pre-defined optimization levels and static performance models of the system to estimate whether a given
optimization would be beneficial. An optimization level
is a fixed optimization scenario that is applied to any input program mostly ignoring its specificities. Still, modern compilers fall short in selecting appropriate optimizations that yield the highest speedups. As an illustrative
example, we tested Intels ICC compiler ability at optimizing 750 tensor contraction programs with a relatively
small optimization space of 9505 strategies. We observed
that for 59% of our test programs there was at least one
optimization strategy that would provide more than 5%
speedup than the one chosen by ICC, and often times
resulting in more than twice the speedup.
More recent compiler research has shown promising results in the application of machine learning (ML) techniques to improve compiler performance. In this alternate approach, the hardware and compiler are treated as a
black box and a tunable model is trained with benchmark
programs to learn to recognize which types of programs
benefit from which optimizations.

III. Techniques for ML driven compilation


Iterative optimization methods compile and execute a
program for a set of optimization strategies and keep the
one with highest performance. A nave application of iterative optimization would find the best strategy by executing each point in the optimization space. In practice, optimization strategies are arbitrarily long and increase exponentially with the number of optimization techniques. As
a consequence, the compilation time would also increase
exponentially, making this method impractical. Although
there are methods that use derivative-free optimizations
to reduce the search space [1] [2], still iterative optimization requires for the programmer to prepare test inputs
to allow the compiler to execute the program. This is too
disruptive compared to traditional compilation flows, and
not always possible.

84

.indd

84

2015/03/11

11:00:40

Regression is the most commonly used ML technique


in related literature [3]. It is similar to iterative compilation, but rather than executing the program, it predicts
the performance of each point in the optimization space
from a set of software characteristics extracted from the
program to be optimized. As a consequence, it is more
compatible with a standard compilation flow. However,
we have found this method to have low accuracy and is
impractical because the prediction time grows at the same
rate than the optimization space, that is, exponentially.
Classification is the second most common technique for
ML driven compilation and it is able to output an optimization scenario directly from a set of software characteristics. This is contrast to regression which tries to
solve the more complex problem of performance prediction. Optimization parameters can be predicted independently [5], or in order when when optimizations are interdependent [4]. Similar to regression, classification does
not disrupt the standard compilation flow as is the case
for the iterative method. The limitation of classification is
that it is challenging to find an appropriate optimization
strategy encoding to be output by the predictor.

case a scenario encoding must also be defined. The scenario input can be avoided by training one predictor for
each scenario. However, the number of optimization scenarios may be too large for this technique to be practical.
Classification is an alternative approach but is harder to
model and is prone to a strongly unbalanced training set,
which ML techniques do not handle well.
Challenge 4: Generality How varied are the types
of programs for which the predictor can accurately find
beneficial optimizations? This entails first determining
which application domain would benefit the most from
ML driven compilation. The next step is identifying a
source from which to mine programs to train and test
the predictor. There are different sources that can be
mined to generate the software characteristics including
benchmarks, auto-generated code, and crowd-sourcing [2].
Once we have a program source to mine, we need some
method to ensure enough program varieties are covered
[6].
Challenge 5: Reproducibility In the ML driven
compilation research field there is a general lack of trust
in published results because the experimental data is usually not readily accessible for independent investigation.
Moreover, much research effort is lost because the tools
employed are often developed ad-hoc and also not made
publicly available. This is further complicated because in
ML driven compilation there is no standard methodology
or metrics for evaluating and comparing different prediction approaches. Therefore there is a need for experimental environment sharing services and predictor evaluation
methodologies to enable collaboration that can lead to
resolving the challenges presented in this work.

IV. Challenges in ML driven compilation


Challenge 1: Compilation Time How to decouple the growth in prediction time from the optimization
spaces exponential growth? Iterative techniques require
program execution are thus disruptive to the compilation
flow. Regression is a plausible solution, but the exponential growth in prediction time is a limitation of this
method.
Challenge 2: Software characteristics Regression
and classification techniques rely on software characteristics in order to associate a program to be optimized to
a beneficial strategy. How to best encode the input program in order to predict a beneficial optimization strategy with high accuracy? Programs can be characterized
whether dynamically during execution, semi-dynamically
during assembly generation, or statically from high-level
or intermediate representations. Higher levels are more
practical to profile at the expense of being more distant
from the final dynamic behavior. Designing the software
characteristics goes in hand with determining if we are
trying to predict performance as in regression or if we are
comparing programs as in classification.

References
[1] F. Agakov et al., Using machine learning to focus iterative optimization, International Symposium on Code Generation and
Optimization (CGO), pp. 295-305, March 2006
[2] G. Fursin and O. Temam, Collective optimization: a practical
collaborative approach, ACM Transaction on architecture and
code optimization (TACO), vol. 7, no. 4, 2010
[3] E. Park et al., Using graph-based program characterization for
predictive modeling. International Symposium on Code Generation and Optimization (CGO), pp. 196-206, 2012
[4] S. Kulkarni and J. Cavazos, Mitigating the compiler optimization phase-ordering problem using machine learning, OOPSLA,
2012

Challenge 3: Problem modeling How to define the


predictors input and output in order to maximize prediction performance? An optimization strategy can be
fixed, and each optimization parameter predicted independently. Another alternative is to predict a scenario
one optimization at time. If using regression, the predictors input may be the software characteristics and the
scenario for which to predict the performance. In this

[5] A. Trouv
e et al., Using Machine Learning in order to Improve
Automatic SIMD Instruction Generation In The eighth international workshop on automatic performance tuning (iWAPT),
2013
[6] K. Hoste and L. Eeckhout, Microarchitecture-independant
workload characterization, MICRO, vol. 27-3 pp. 63-72, 2007

85

.indd

85

2015/03/11

11:00:41

Pattern-Driven Branchless Code Generation


Reem Elkhouly 1 , Ahmed El-Mahdy 2 , Amr Elmasry 3
CSE Department, E-JUST, Alex, Egypt
CSE Department, E-JUST, on leave from Alexandria University Alex, Egypt
Computer and Systems Engineering Department, Alexandria University, Alex, Egypt
1

1
2

reem.elkhouly@ejust.edu.eg
ahmed.elmahdy@ejust.edu.eg
3
elmasry@mpi-inf.mpg.de
Abstract

Control dependence elimination optimisation, namely if-conversion, is essential when generating efficient parallel code from
a serial code. While many If-conversion optimisation heuristics have been proposed in the literature, little investigated the
effectiveness of detected pattern guided transformations. In our research we tackle this problem by exploring the optimisation space
for a set of representative kernels, focusing on frequent branches. We have implemented our technique as an LLVM prototype
tested on Intel x86 platform. Compared to some well-known optimization techniques, our idea focuses on extracting a pattern that
can identify the profitable conversions. We hereby highlight the performance improvement opportunities that may be investigated
via new learning techniques.

I. I NTRODUCTION
Making computers run faster is a major goal in the fields of computer architecture and compilers. A key performance
driver is providing parallel execution at various granularities. With the multicore shift, larger parallelism granularities are
generally sought. Also, single-core performance is still important, as it is a major scaling limiting factor. Accelerating singlecore performance mainly relies on instruction-level parallelism, where various independent instructions are executed on the
same processing cycle. The degree of parallelism is inherently limited by the data-flow characteristics of the running program.
However, control-flow significantly hinders exploiting the true dependence manifested by the data-flow, significantly reducing
the achievable degree of parallelism [1], [2]. When branches are highly mispredicted, converting control-dependence into
data-dependence via predicated execution is a solution. In this model, instructions are generally guarded by predicates,
thereby eliminating control-flow [3]. This approach thus relies on if-conversion optimisations to convert conditional branches
into predicated instructions, allowing further potential parallelisation subject to the inherent data-flow dependences. However,
predication comes at the extra cost of executing nullified instructions. This can potentially degrade performance for large
if-then bodies. Moreover, branches interact in terms of allowing for different execution schedules, for which finding the
optimal schedule is generally a hard combinatorial search problem. In our work we revisit the problem of deciding which
branches to convert.
II. M ETHODOLOGY
Source code
program.c

Executable
program.o

LLVM
bitcode
program.bc

Optimized
assembly code
program.s

Time profiling

Hotspot
functions
program.bc

Pre-conversion
optimization
(mem2reg, ..)

BitmaskControlledIfConversion

LLVM bitcode
in SSA form
program.bc

Fig. 1. Code flow through preparation, analysis and optimization phases

In particular, we consider representative, frequently executed kernels from selective SPEC-CPU2006 benchmarks [4]. We
exhaustively try all possible combinations of if-converting conditional branch instructions, and report the obtained performance
on an x86 processor. Moreover, we measure the effectiveness of some commonly used heuristics by comparing the performance
of these heuristics with the optimal strategy while changing the corresponding metrics that the heuristics deploy. We will use
these observations to extract a pattern for profitable conversions using techniques as the Monte-Carlo Search Tree. Figure 1
describes the implemented prototype where the source code is initially converted into the bitcode format. At this moment, the
code needs to be prepared by some pre-if-conversion optimisations that allow the if-conversion to perform well [5].

86

.indd

86

2015/03/11

11:00:42

4.5
0.14

400

selected ifcvt
clang -O3

0.135

395

0.125
0.12
0.115

4.35

Runtime (s)

385
Runtime (s)

Runtime (s)

4.4

390

0.13

380
375

4.3
4.25
4.2

370

4.15

365

0.11

selected ifcvt
clang -O3

4.45

selected ifcvt
clang -O3

4.1

360

0
10
10
11
11

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

if-conversion selective mask

if-conversion selective mask

(a) bzip2

0
00
10
00
11

00

00

00

00

01

01

01

01

10

10

10

10

11

11

11

11

00

00

00

00

01

01

01

01

10

10

10

10

11

11

11

11

if-conversion selective mask

0
10
01
01
10

350

0
00
01
10
01

0.1

4.05
0
10
00
11
00

355

0
00
00
00
00

0.105

(b) mcf

(c) astar

Fig. 2. Runtimes of all selective ifcvt along with the clang -O3 output runtime

0.8

0.6

0.4

1.14

1.14

1.12

1.12

1.1

Runtime to optimal ratio

Clang -O3
Best ifcvt
Worst ifcvt
Runtime to optimal ratio

Percentage of unoptimized runtime

1.08
1.06
1.04
1.02

1.1
1.08
1.06
1.04
1.02

0.2

0
bzip2

mcf
Benchmarks

0.98

0.98
15

astar

20

25
30
35
40
45
Average no. of instructions per basic block

50

0.5

1.5

(a)

Fig. 3. Best, worst, and clang -O3 output runtimes as


a ratio of the unoptimized output runtime

2
2.5
3
Average if depth

3.5

4.5

(b)
Fig. 4. Heuristics tested on bzip2

III. E XPERIMENTAL R ESULTS AND A NALYSIS


Subfigure 2(a) shows the runtime of each version of the bzip2 program against the mask that represents the combination
of un/converted if s. Interestingly, most of the configurations are significantly better than the -O3 optimisation. Subfigures 2(b)
shows the results for the mcf case; the optimal is within 2% of the -O3 result. Finally, Subfigure 2(c) shows the results for astar
case. The -O3 is almost the same as the optimal. For a comparative view, Figure 3 shows the optimal, worst and -O3 optimised
programs runtimes relative to the unoptimised one for each benchmark. Figure 4(a) shows the distribution of average block
size (for the if-then block), and percentage achieved to optimal execution time for the bzip2 case. The average block size is
computed as the average size of all blocks in the set of branches converted. While there is a trend of having lower performance
when increasing the block sizes (as it would be expected and utilised by the heuristics), there is an improvement in performance
for block sizes greater than 37; and more interestingly, the optimal occurs in the mid-range. Figure 4(b) shows the results
for the position metric for the bzip2 case. We computed the position of each branch with respect to its frequent execution
path from the function entry point; for a set of converted branches, we compute the average depth of which. It is generally
believed that deeper branches are more suitable for conversion. The Figure shows that, while indeed the best happens at depth
4 (deepest), a comparable result happens at shallow depth (1.5). Moreover, the results do not show a trend of improvement
towards deeper depths.Thus, this application largely shows a counter example to the typical use of heuristics. We believe that
interactions among branches is responsible for such discrepancy with the heuristics.
IV. C ONCLUSION AND F UTURE W ORK
Through our work, we found promising opportunities for performance gains due to correct conversions. In the future, we
will work on deducing the patterns that enable achieving this correctness. We will use Monte-Carlo Search trees and support
vector machines. Also, we will target GPUs.
R EFERENCES
[1] J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, Conversion of control dependence to data dependence, in Proceedings of the 10th ACM
SIGACT-SIGPLAN Symposium on Principles of Programming Languages, ser. POPL 83. New York, NY, USA: ACM, 1983, pp. 177189.
[2] M. S. Lam and R. P. Wilson, Limits of control flow on parallelism, SIGARCH Comput. Archit. News, vol. 20, no. 2, pp. 4657, Apr. 1992.
[3] S. A. Mahlke, R. E. Hank, J. E. McCormick, D. I. August, and W.-M. Hwu, A comparison of full and partial predicated execution support for ilp
processors, in Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on. IEEE, 1995, pp. 138149.
[4] Standard Performance Evaluation Corporation. SPEC Benchmarks, http://www.spec.org/. Accessed Oct. 10, 2014. [Online]. Available: http://www.spec.org/
[5] The LLVM Compiler Infrastructure. http://www.llvm.org/. Accessed Oct. 10, 2014. [Online]. Available: http://www.llvm.org/

87

.indd

87

2015/03/11

11:00:43

Design Methodology Deduction of 3D NoC Implemented with


Multiplexed TSVs
1

Mostafa Said1 , Farhad Mehdipour2 , Kazuaki Murakami2 , Mohamed El-Sayed1

Department of Electronics and Communications, Egypt-Japan University of Science and Technology (E-JUST), Alexandria, Egypt
2
Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
Emails: mostafa.saied@ejust.edu.eg, farhad@ejust.kyushu-u.ac.jp, murakami@ait.kyushu-u.ac.jp, m.ragab@ejust.edu.eg

AbstractThe emerging 3D integration technology significantly overcomes 2D integration process limitations. The use of very short (very
shorter than average wire length) Through-Silicon Vias (TSVs) introduces
a significant reduction in routing area, power consumption, and delay.
Though, 3D technology suffers from extremely low yield. It is shown
in literature that reducing TSV count has a considerable effect in
improving yield. The TSV multiplexing technique called TSVBOX was
introduced in [1] to reduce the TSV count without affecting the direct
benefits of TSVs. The TSVBOX introduces some delay to the signals
to be multiplexed. In this paper, we deduce a design methodology for
TSVBOX-based 3D Network-on-Chip (NoC) to overcome the TSVBOX
delay degradation impact on system validity.

Fig. 1: (a) TSVBOX, (b) SEL signal, (c) 332 3D NoC.

I. I NTRODUCTION

Input inverter
driver Rdr_Conv
V

The extra fabrication steps in 3D technology may result in some


faulty TSVs [2], which decreases total process yield significantly. The
probability of having faulty TSVs increases as the total TSV count
increases. Fig. 1a displays the TSV multiplexing technique introduced
in [1] that reduces the number of TSVs by half, by multiplexing
each two 3D signals into one signal and passing it through one
TSV instead of two. Due to this significant reduction in TSVs,
the yield analysis done in [1] has revealed very high improvement
over conventional 3D-ICs. The SEL signal is used to control a
multiplexer/demultiplexer assembly (Fig. 1b), which introduces some
delay to one of the multiplexed signals besides the parasitics of the
TSVBOX itself. Such delay may affect the functional validity of
the system to be implemented. However, this problem has not been
studied yet. In this paper we select 3D NoC as our target system
architecture for applying TSVBOX. Thereafter, we deduce a design
methodology to check the validity TSVBOX technique.

TSV
RW
CW

RTSV/2

RTSV/2 RW
CTSV

CW

Output 1x-inverter
driver
CL

Fig. 2: Conventional 3D NoC 3D path circuit model.


VDD 
RdrT SV BOX CP N +
|VthP |
(RdrT SV BOX + RP N ) (4CP N + 2CW + CT SV )+

(RdrT SV BOX + 2RP N ) (CP N + CL )
TdT SV BOX = ln

where

CP N = CdbP + CdbN , RP N =

II. T HE TARGET 3D N O C ARCHITECTURE

(2)

RonP RonN
RonP + RonN

While the delay of the SEL (SEL) signal can be approximated by:

Fig. 1c shows our target 332 3D NoC architecture. For the


conventional 3D NoC, the whole 3D data bus width is N +2 (N is the
packet size), where the extra two bits are required for the handshaking
protocol signals; for request (REQ) and acknowledgement (ACK). For
the TSVBOX case, the data bits of the packet are multiplexed and
hence N2 +4 TSVs are required, where anther two extra TSVs are
required for SEL and SEL.

VDD
TdSEL = ln

VDD max(VthN , |VthP |)




RdrSEL (2CW + CT SV + 2N Cg )

(3)

IV. M INIMUM CLOCK DURATION AND DESIGN FLOWS


As shown in Fig. 4, we can choose SEL to be the clock signal
itself. SEL has faster charging rate (Fig. 4b) to void concurrent
ON state of the TSVBOX switches between T1 and T2 (Fig. 4a).
The period t=[0.5TCLK ,TCLK ], can be divided into two smaller
periods: TdSEL , and the remaining time till the clock edge Trem .

During Trem , V2 signal is required to reach an acceptable level
0 or 1, therefore we must select Trem TdT SV BOX . Based on
these observations the minimum clock signal for the TSVBOX can
be expressed as follows:
(4)
TCLKmin = 2(TdSEL + TdT SV BOX )

III. 3D INTERCONNECT CIRCUIT MODELING


SystemC-A is used for our 3D NoC implementation. Processor
cores, routers, and intra-layer interconnects, are modeled as a high
level architectures, while for inter-layer interconnects; TSVs and
TSVBOX, low level circuit implementation is used to accurately
determine different delays.
A. Conventional and TSVBOX 3D signal path modeling
The conventional 3D-IC 3D signal path is shown in Fig. 2. The
conventional 3D interconnect delay (TdConv ) can be approximated
using Elmore-delay as follows:

VDD 
RdrConv (2CW + CT SV + CL ) (1)
TdConv = ln
|VthP |

Fig. 5 introduces the design flows for 3D interconnect in case of


conventional and TSVBOX-based 3D NoCs. As shown, the first step
in both design flows is calculating technology dependent constants
and parameters. All other steps are concerned with the drivers
designs. The driver resistance can be considered the average value of
RonP and RonN ;
RonP
RonP
+ RKonN
+ RonN
KP
N
(5)
= 1.5
Rdr =
2
2KN

Fig. 3 shows the TSVBOX circuit model, where the equivalent


RC parasitic circuit of the transistors in MUX, DeMUX, and the
SEL path are involved. The TSVBOX delay can be approximated
according to the following equation:

88

.indd

88

2015/03/11

11:00:44

Input inverter
driver
Rdr-SEL
SEL

To other TSVBOXes
TSV
in the data bus
RTSV/2 RTSV/2

RW
CW

Input inverter
driver Rdr_TSVBOX
V1

CP

2Cg

CP

CP

RonP
SEL

CN
RW

RonP SEL

CN

CTSV

CN

CW
CP

CP

RonP SEL
RonN

CN

CN

2Cg

Conventional

Select Td-TSVBOX | Td-TSVBOX < 0.5TCLK

Data driver
design steps

Select Rdr-SEL=Rdr-max

No

CL

Is
Rdr-SEL<=Rdr-max
?

2Cg

To other TSVBOXes
in the data bus

Select Td-Conv | Td-Conv <= TCLK

Data driver
design steps

Calculate Rdr-TSVBOX

Calculate Rdr-Conv

Select Rdr-Conv=Rdr-max

Yes

No

Is
Rdr-Conv<=Rdr-max
?
Yes

Calculate KN-TSVBOX

CW

CTSV

Calculate
TSVBOX model
parasitics

CL

CP Output 1x-inverter
driver
V2
CN

TSVBOX

TSV
RTSV/2 RTSV/2 RW

RW
CW

SEL

RTSV/2 RW

RonN
CN

Input inverter
driver
Rdr-SEL
SEL

RTSV/2

CW
CP

Output 1x-inverter
CP
driver
V1

RonP
RonN

TSV

TSV parasitics values


(CTSV, RTSV)

2Cg

RonN
CN

Input inverter
driver Rdr_TSVBOX
V2

CW

CTSV

Technology to be used
(180 nm, 130 nm, etc)

To other TSVBOXes
in the data bus

RW

Calculate KN-Conv

To other TSVBOXes
in the data bus

Finish

Select Td-SEL |
Td-SEL <= 0.5TCLK-Td-TSVBOX

SEL driver
design steps

Fig. 3: TSVBOX-based 3D NoC 3D path circuit model.


Calculate Rdr-SEL

SEL (CLK),
SEL

SEL

Td-SEL

(a)

SEL

Select Rdr-SEL=Rdr-max

No

max(VthN, VthP)

Yes

0.5TCLK
SEL, SEL

Is
Rdr-SEL<=Rdr-max
?

SEL

TCLK

T2
T1

Td-SEL

Calculate KN-SEL

(b)

SEL

Select Td-Disch-SEL |
Td-Disch-SEL <=Td-SEL

SEL driver
design steps

max(VthN, VthP)

Calculate Rdr-SEL

Trem

Fig. 4: Minimum clock duration for TSVBOX-based 3D NoC.

Select Rdr-SEL=Rdr-max

No

Is
Rdr-SEL<=Rdr-max
?
Yes

where (KP 1, KN 1) are the pMOS and nMOS transistors sizes,


respectively. For all drivers, we choose KP =1.5KN to achieve equal
currents during charging and discharging. For conventional 3D NoC,
first TdConv is to be selected such that; TdConv TCLK . Then
RdrConv is calculated using Eq. 1. However, since the driver has
minimum size, therefore if the calculated value of the RdrConv was
larger than the maximum driver resistance, RdrConv must be set to
the maximum driver resistance Rdrmax . Finally, KN Conv can be
determined from Eq. 5. Similarly, the same concepts can be followed
for TSVBOX case.
V. S IMULATION RESULTS , CONCLUSIONS , AND FUTURE WORK
Two clock periods are selected for TCLK3D ; 2.5 and 9.3 nsec, for
conventional and TSVBOX-based 3D NoCs, respectively. Based on
TCLK3D ; all conventional and TSVBOX delays and their associated
driver sizes are selected and determined according to the design flows
of Fig. 5 and depicted in Tables I and II. Where we calculated
different parasitic values depending on [3]-[6]. As shown, the errors
between theoretical and simulation delays do not exceed 14%, which
indicates the acceptable accuracy of Elmore-delay models. After that
the TSVBOX-based 3D NoC run with 3600 packets and all packets
have received successfully to their destination which depicts the
successfulness of our design methodology. In the future work, the
TSVBOX-based 3D NoC would be heavily studied under different
simulation scenarios and traffic patterns to show the possible pros
and cons of our methodology.
TABLE I: Theoretical and simulation 3D signals delays for conventional and TSVBOX paths.
3D signal
delay
TdConv
TdT SV BOX
TdSEL
TdSEL

Unit

Theoretical

simulation

nsec
nsec
nsec
nsec

2.499
4.1455
0.5
0.1383

2.45
4.15
0.5
0.12

|error|
0.04%
0.1%
0.0%
13.23%

Calculate KN-SEL
Finish

Fig. 5: Design flows for 3D interconnect of TSVBOX-based (left)


and conventional 3D NoCs (right).
TABLE II: Driver sizes and their equivalent ON resistances.
Design parameter
(KN Conv , KP Conv )
(KN T SV BOX , KP T SV BOX )
(KN SEL , KP SEL )
(KN SEL , KP SEL )
RdrConv
RdrT SV BOX
RdrSEL
RdrSEL

Unit
k
k
k
k

Values
(1, 1.5)
(1, 1.5)
(2.7526, 4.1289)
(9.9531, 14.93)
15.51
15.51
5.6344
1.5582

R EFERENCES
[1] M. Said, F. Mehdipour, and M. El-Sayed, Improving Performance
and Fabrication Metrics of Three-Dimensional ICs by Multiplexing Through-Silicon Vias, DSD13, pp. 581-586, 2013.
[2] I. Loi, S. Mitra, T. Lee, S. Fujita, L. Benini, A low-overhead Fault
Tolerance Scheme for TSV-based 3D Network on Chip Links,
ICCAD08, pp. 598-602, 2008.
[3] N. Weste, D. Harris, CMOS VLSI Design, A Circuits and Systems
Perspective, Addison Wesley, 2011.
[4] A. Papanikolaou, D. Soudris, and R. Radojcic, Three Dimensional
System Integration, Springer, 2011.
[5] J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits, A Design Perspective, Prentice Hall, 2003.
[6] http://www.itrs.net/reports.html

89

.indd

89

2015/03/11

11:00:46

General-Purpose Word-Parallel Pattern Recognition Processor


for the k Nearest-Neighbor Algorithm with High-Speed, Low-Power
S. Yamasaki, F. An, and H. J. Mattausch

Research Institute for Nano device and Bio Systems, Hiroshima University, Higashi-Hiroshima, 739-8527, Japan
Phone: +81-82-424-6265 E-mail:yamasaki-shogo@hiroshima-u.ac.jp
ison of data becomes possible. On the other hand, a disadvantage of the clock-counting search is that it takes a long
time if the distance of the most similar data is large. The
worst-case clock number for the SFCC (Straight Forward
Clock Counting) method increases exponentially with the
number N of feature-vector component bits. The previously
reported CCR (Clock Counting Reduction) method [3]
achieves reduction to a linear increase by starting the search
from the highest-value bit and including lower value bits in
the search sequentially after a match for the higher-value
bits has been found.
Match signals are received from the reference-sample
rows in the sequence of their distances to the input sample.
By summing the match-signal number the K-th nearest
neighbor discovery is recognized and the search is terminated by a stop signal. After this, the recognized-class identification by a majority vote is carried-out with an identification circuit that can be freely set to the number K of interest. One evaluated concept identifies the class of each of
the K nearest neighbors by the storage location in the associative memory. However, with this concept the reference
data has to be sorted and the data number for each class becomes inflexible. Figure 2 shows the preferred architecture
for the k-NN clustering circuit, which enables complete
flexibility with respect to the number of classes, the number
of reference data for each class and the storage locations of
the reference data in the associative memory. This flexibility
is achieved by a look-up table stored in an SRAM which can
be changed for each application and specifies the class information of the reference data in each row of the associative memory. The Match Signal Storage Registers on the
left are connected to Match Signal Detecting Circuits
which sequentially identify the row numbers of the K nearest neighbors. For nearest-neighbor row i, an acti signal is
generated and used to read the correct class information
from a look-up table. This class information then controls a
de-multiplexer to increase the status of the corresponding
class counter by 1. Class identification and counting process
are finished when the nextR signal is asserted at the output
of the last Match Signal Detecting Circuit. Afterwards a
comparator at the output of the class counters identifies the
recognized class with the highest counting status of the corresponding class counter as the recognition result. The associative memory rows, where the stored data is not among
the K nearest neighbors, are skipped in this clustering process. Therefore, the clustering process finishes with the
recognition result after K clock cycles.
The required number of feature-vector dimensions is different for each application as for example, face recognition,
character recognition, fingerprint authentication. Architecture
that can handle a different number of dimensions in the same
hardware is required for a SoC with standardization potential.
A bit slice of the developed DEC (Dimension Extension Circuit) for enabling dimension extension is shown in Fig. 3.
With this DEC circuit additional component distances are
added sequentially at the time of their input and stored in the
lower-right DFF. In this way a flexible feature-vector dimensionality in the range 8~2042 dimensions is achieved.

Abstract- A learning and pattern recognition processor for the


k nearest neighbor (k-NN) algorithm uses a nearest Euclidean-distance search associative memory and a classification
circuit for the k-Nearest Neighbors. A SoC design example of
the complete k-NN system is implemented in 180nm CMOS
with 32 reference vectors and 2~8 classes. Small silicon area of
3.51 mm2, low power consumption of 5.02mW and a maximum
operating frequency of 42.9MHz (at Vdd=1.8V) are achieved.
Compared to implementation in software, the chip operates
with a factor of 1000 better energy efficiency.

1. Introduction
In recent years, a variety of application developments
which require pattern matching, such as face/object recognition in images, or voice recognition have become hot topics
[1, 2]. Many users desire these applications could be implemented in advanced mobile devices, such as smart
phones.
Conventionally, pattern matching and other process of intensive computation are off-loaded from the mobile device
to the Cloud, i.e. to data centers with highly large parallel
servers. Due to the required data transmission between terminal and Cloud, this method involves relatively large time
delays. Furthermore, a Cloud connection isnt available
everywhere and sometimes it may be disrupted. Also, total
power consumption is very high due to data transmission
and external server processing.
Important classifiers for recognition are based on the K
nearest-neighbor (k-NN) algorithm. For flexible integrated
k-NN hardware, which combines high searching speed for
the most similar data in a reference-data base with low
power consumption, high search reliability and accurate
recognition, wide applications in mobile devices can be expected. Furthermore, such a VLSI chip has the potential of
becoming an application specific standard product (ASSP)
for intelligent systems with recognition capability. Unfortunately, there is no previous example of efficient VLSI integration of k-NN algorithm without algorithm simplification
and reduced matching accuracy. This paper reports a flexible low-power recognition SoC (System on Chip) based on
k-NN algorithm.
2. k-NN Realization with searching by clock counting
and, expansion method for feature-vector dimensions
k-NN is a method of statistical classification, based on
multiple reference samples closest in the feature space,
which is often used in pattern recognition. The number of
reference data may be very large and k-NN algorithm consistency is fairly reliable. However, computation amount is
large because all data must be compared, which is one of the
reasons that no efficient VLSI realization of k-NN exists.
Here, we adopted the distance-search method by clock
counting which allows fully-parallel low-power Euclidean-distance search [3, 4]. Fig. 1 shows the block diagram of
k-NN algorithm integration with associative memory operating in the clock domain. After calculating the distances
between each component of the feature vectors for reference
and input samples, these component distances are expressed
as a number of clock cycles. During the distance search by
clock counting, match signals are received from the reference-data rows in the sequence of their distances to the input sample. In this manner, fully-parallel error-free compar-

Fi

F
n

90

.indd

90

2015/03/11

11:00:47

3. Evaluation of fabricated prototype SoC


We implemented a full custom design for k-NN in 180nm
CMOS technology, and integrated it on the learning and
recognition SoC. A chip photograph of the SoC, identifying
the portions for performing k-NN, are shown in Fig. 4.
Specifications of the chip are shown in Table 1. It has 32
rows and the dimension number of learned data can be extended from 8 up to 2042 dimensions. The number of bits in
each dimension is 8 bits. Total area including the associative
memory is 3.51mm2. The additional part that runs the k-NN
has only 0.445mm2. Thus, the part for recognition could be
extended efficiently to the k-NN from of associative
memory which can do only 1-NN search. The critical path
that determines the maximum frequency of 42.9MHz at
1.8V power supply is the match-signal propagation in the
distance evaluator (DE). Power consumption of the part for
distance calculation is 5.02mW. The estimated performance
data, when used in conjunction with associative memory,
are summarized in Table 1. We have further compared with
an implementation in the software (assumed to run in
I7-4770k Intel core). The software is represented by a C
language calculation of the k nearest-distance neighbors,
compiled into assembly code with parallel comparison.
From the assembly code instruction number, speed performance and energy consumption of the software implementation are estimated. This means that the software performance is a best case estimate not taking into account losses
due to branch- and cache misses. Nevertheless, a significant
advantage for the SoC was observed in the power-delay
product (Fig. 5). Error-free and high-speed processing is
possible compared with a previous sequential processing
method [5]. From the above evaluation it can be concluded
that this architecture is effective for implementation in the
mobile devices. Further, as a fully digital circuit, the developed architecture is scalable to advanced process technology.
This means within the same area and at the same cost, it is
possible to handle more training data, have higher processing frequency and less energy consumption, i.e. scaling
to the technology of the I7-4770k Intel core processor will

Figure 1: VLSI architecture of k-NN recognition system.

improve the SoC-performance by orders of magnitude.


4. Conclusion
We propose a learning and recognition SoC with an implementation of the k-nearest neighbor (k-NN) method for
which no efficient hardware implementation was available
previously. Maximum clock frequency is 42.9MHz (at
1.8V), when all 8-bit vector components are used. Power
consumption is as low as 5.02mW (at 42.9MHz, 1.8V).
The SoC has the advantage of processing the data for all
reference vectors with high reliability in parallel. High
speed processing and a high flexibility for the reference-vector number of each class and the storage location of
each reference vector is achieved with the help of class
counters and a look-up table. Additionally, extension of the
feature-vector dimensionality is possible, and allows the
versatility of executing different applications on this SoC
hardware. When compared to software implementations
with sequential processing, high-speed, low power consumption and small chip area are realized. Thus, the requirements of application in mobile devices with limited
battery capacity are fulfilled. Because the proposed architecture is realized with a completely digital circuit, it can be
easily implemented with advanced and scaled fabrication
processes to achieve further large improvements with respect to higher search speed and reduced power consumption.
References
[1] D. R. Tveter, The Pattern Recognition Basis of Artificial Intelligence,
IEEE Press Piscataway,NJ,1997.
[2] A. Ahmadi et al., Expert Sys. Appl., Vol. 38, No. 4, pp.3499-3513,
2011.
[3] T. Akazawa et al., Word-Parallel Coprocessor Architecture for Digital
Nearest Euclidean Distance Search JJAP2012, pp.267-270, 2013
[4] S. Sasaki et al., Digital Associative Memory for Word-Parrallel Mahattan-Distance-Based Vector Quantization ESSCIRC2012, pp.185-188,
2012
[5] M.A. Abedin et al., Realization of K-Nearest-Matches Search Capability in Fully-Parallel Associative Memories, IEICE Trans. on Fundamentals, vol. E90-A,No. 6,pp. 1240-1243 , 2007

Figure 3: Bit slice of Dimension Extension Circuit


(DEC).
Figure 2: Clustering circuit with large flexibility
application.
for implementation of different applica

Table 1: Specifications of the recognition SoC.

Figure 4: Die photo of the Recognition SoC implementing k-NN in


180nm CMOS.

Figure 5: Comparison of this work and an implementation of the algorithm by software.

91

.indd

91

2015/03/11

11:00:48

Adaptive Search Window Algorithm for Integer


Motion Estimation Unit in HEVC Encoder

p
r

s
(
f
m
a
p
e
c
c
a
c
p

Ahmed Medhat1, Ahmed Shalaby1, Mohammed S. Sayed1, Maha Elsabrouty1, Farhad Mehdipour2
1

ECE Department, Egypt-Japan University of Science and Technology (E-JUST), Alexandria, Egypt
{ahmed.abdelsalam, ahmed.shalaby, mohammed.sayed, maha.elsabrouty}@ejust.edu.eg
2
Center for Japan-Egypt Cooperation in Science and Technology, Kyushu University, Fukuoka, Japan
{farhad@ejust.kyushu-u.ac.jp}

AbstractThis paper presents an Adaptive Search Window


Algorithm (ASWA) for integer pixel motion estimation in High
Efficiency Video Coding (HEVC) in order to achieve the best
performance in terms of time savings, bit rate resources and
PSNR quality. Experimental results show that the proposed
algorithm results in significant reduction up to 41 % reduction in
encoding time. This improvement is accompanied with a
negligible PSNR loss of 0.002 dB and an increase by 0.82 % in
terms of the bit rate.

search is used. The second group adopts adaptive search


strategy with variable size search window [6]. They minimize
the necessary computations of full search by reducing the
number of possible candidates to be checked, but most of
these algorithms increase bit rate with a notable percentage. In
the third group of algorithms, fast BMAs are used with
different strategies, sizes and shapes to enhance the search
speed such as Diamond [5], Hexagon and Three-Steps search
algorithms.

Keywords HEVC, motion estimation, inter prediction, fast


integer pixel motion estimation, adaptive search window.

I.

II.

t
w
w
a
p
T
e
a
s
t
s
a

PROPOSED ADAPTIVE SEARCH WINDOW ALGORITHM

In video sequences, the motion of PB can be classified into


three different categories, the first category is stationary
motion where there is no motion at all. In this case, the best
matching candidate at the search window is the center point of
the search window. About 40% 60% of frame motion falls
under the category of stationary motion. Quasi-stationary
motion, which is the second category, represents 30% 40%
of the frame motion where the best matching candidate is
around the center point by 2 pixels [2]. The third category is
rapid motion and it represents displacements in the frame
position by more than 2 pixels. From these statistics, it can
be shown that more than 70% of PBs have motion vectors
around the center point of their search windows within a small
range of pixels. Therefore, it is rational to perform BMA with
in a small search window in order to decrease the
computational complexity of the ME unit.

INTRODUCTION

The new High Efficiency Video Coding (HEVC) standard


was introduced targeting to double the compression efficiency
with respect to the previous standard H.264. It can achieve
50% bit rate saving compared to H.264 for the same video
quality [1]. Motion Estimation (ME) entails the major
computation complexity load in video encoder. In most
implementations, it takes up to 90% of encoding time [1].
Despite that HEVC provides a simple inter prediction process,
the overhead of the ME block employed is larger compared to
H.264. Typically, as in H.264, for every Prediction Block (PB)
in HEVC, the Block Matching Algorithm (BMA) finds the
best matching block within a certain search window. Full
search algorithm is the superior technique to perform motion
compensation in terms of Compression Ratio (CR), Bit Rate
(BR) and Peak Signal to Noise Ratio (PSNR). On the other
hand, full search is the most computational consuming
algorithm especially for High Definition (HD) videos. Many
fast algorithms were proposed to reduce the computational
complexity consumed by the full search. Nevertheless, most of
these algorithms provide a significant degradation in terms of
BR and PSNR.

f
I
s
s
t
e
s
t
I
p
s
w
a
F
c

A simple way to speed-up the full search in ME unit is to


adapt the search window where the BMA is applied. The
optimal search window range is the range that would give
results with less computation complexity without affecting the
PSNR, BR or the compression performance. Therefore, the
proposed Adaptive Search Window Algorithm (ASWA)
chooses the best search window range for each PB depending
on motion vector and bit cost value of the previous PB in the
same frame. The proposed adaptive search window algorithm
has two possible fixed ranges; the original search window
range and the reduced search window range which is the
original search window divided by four. Therefore, the search
range for any given PB block is chosen among two candidates,
i.e., 64 and 16 pixels. More specifically, search window
range size is chosen on the basis of the Motion Vector (MV)
obtained for the previous PB in the same frame. If the MV of
the previous PB in the same frame is near from the center

Fast search algorithms for ME can be classified into three


main groups, 1) early termination added to the full search
using certain stop criteria, 2) using adaptive search window
for the full search and 3) using fast BMA such as diamond
search, hexagon search and star search. Algorithms, in group
one, i.e. early termination algorithms, add an early termination
condition to the full search in an attempt not to check all
possible candidates to reduce computation needs and encoder
time. However, the drawback is that, this early termination
range is fixed and some blocks may not reach it even if full

r
M
a
f
p
c
A
s

92

.indd

92

2015/03/11

11:00:48

h
e

f
n

h
h

o
y
t
f
s
y
%
s
s

n
s
l
h
e

o
e

e
m
w
e
h
,
w
)
f
r

point of its search window, then, the small search window


range is chosen for the next PB.
Statistics for various video sequences show that there is a
strong correlation between Sum of Absolute Differences
(SAD) of the collocated prediction blocks in the consecutive
frames [3]. In HEVC, bit cost is the criterion to decide the best
matched block. Therefore, ASWA proposes early termination
algorithm that depends on the cost function of the previous
processed prediction block in the same frame. The proposed
early termination algorithm checks if the cost function of the
current candidate is less than T, it means that the current
candidate is accurate enough to stop search. T is calculated
according to ( T = * previous_cost ) as represents the
correlation parameter between the cost of the previous
prediction block and the current one.
Generally, ASWA is based on normal BMA search from
the edges of the search window, however, it determines
whether to use a large normal search window or a small search
window based on some parameters. In addition, ASWA uses
an early termination criterion to achieve much superior
performance with fewer number of search points on average.
The ASWA flow chart is shown in Fig. 1. According to
experimental analysis in all our simulations for our proposed
algorithm, is set to two as a fixed value in all our
simulations, and the adaptive small search window size is set
to 16 in all our simulations. These values of and adaptive
search window size achieve the slightest possible PSNR loss
and BR increase.
IV.

Fig. 1. Adaptive Search Window Algorithm (ASWA) Flow Chart


TABLE I.

Sequence (Resolution)

EXPERIMENTAL RESULTS AND DISCUSSION

A software analysis -using x265 open-source code project


for HEVC [4]- was carried on Windows 7 OS platform with
Intel Xeon X5690 at 3.46 GHz CPU and 96 GB RAM. Table I
shows that our proposed ASWA achieves around 41.19% time
saving with respect to full search. In addition, ASWA saves
the same PSNR quality of the full search and needs just 0.82%
extra bit rate. Table I shows that ASWA exploits its idea about
stationary and slow motion videos and saves large amount of
time in video sequences where there is no rapid motion. Table
II compares our proposed FCSA with previous fast searches
proposed for HEVC in [5]-[6]. ASWA achieves better time
saving than [5] with better PSNR and BR. While compared
with [6], ASWA does not achieve the same time saving, but,
achieves better PSNR and BR with notable values.
Furthermore, ASWA can achieve better time saving by
changing and adaptive search window size parameters.
V.

RESULTS OF ASWA VERSUS FULL SEARCH ALGORITHM


Time (%)

BR (%)

PSNR (db)
-0.003
0
-0.011
0
0
0
-0.002

Duck (1280x720)
Ice (1280x720)
Blue_Sky (1280x720)
Shields (1280x720)
Stockholm (1280x720)
Park_Joy (1280x720)

-55.04
-46.39
-32.67
-39.24
-28.73
-45.06

0.93
0.67
0.88
0.87
0.52
1.03

Average

-41.19

0.82

TABLE II.
Search Method
Belgith et al. [5]
Anand Paul [6]
Proposed ASWA

RESULTS OF ASWA VERSUS PREVIOUS FAST SEARCH


ALGORITHMS
Time (%)
-40.68
-77.43
-41.19

BR (%)

PSNR (db)

1.10
3.44
0.82

-0.120
-0.183
-0.002

REFERENCES
[1]
[2]

CONCLUSION

[3]

The proposed Adaptive Search Window Algorithm


reduces the search window size of a PB if the previous PB
MV is near from the center point of its search window. In
addition, it uses stop criteria to terminate the search once it
finds an acceptable bit cost. ASWA showed better
performances in terms of time saving, BR and PSNR
compared to full search and other fast search algorithms.
ASWA speeds up ME unit by 41.19% with respect to full
search with a trivial degradation of PSNR and BR.

[4]
[5]
[6]

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, Overview of the


high efficiency video coding (HEVC) standard, IEEE Trans. Circuits
Syst. Video Technol., vol. 22, no. 12, pp. 1649-1668, December 2012.
Huang, Hui-Yu, and Shih-Hsu Chang, Block Motion Estimation Based
on Search
Pattern
and
Predictor, IEEE Symposium on
Computational
Intelligence For Multimedia Signal and Vision
Processing, pp. 47-51, April 2011.
L. Bo, L. Wei, and T. Ming, A fast block matching algorithm using
smooth motion vector field adaptive search technique, J. Comput. Sci
& Technol., vol. 18, no. 1, pp. 14-21, January 2003.
X265 team of ISO/IES Moving Picture Experts Group (MPEG), and
ITU-T Video Coding Experts Group (VCEG) through JCT-VC reference
software, http://x265.org/index.html
F. Belghith, H. Kibeya, H. Loukil, M. Ayed, and N. Masmoudi, A new
fast motion estimation algorithm using fast mode decision of highefficiency video coding standard, J. R. -Time Img. Process., Febr 2014.
A. Paul, Adaptive search window for high efficiency video coding,
Springer J. Sign. Process. Syst, Septemper 2013.

93

.indd

93

2015/03/11

11:00:49

High Performance 2D-DCT Architecture


for HEVC Encoder

b
S
w
p
S
s
d
b
m
o
a
i
t

Maher Abdelrasoul1, Mohammed S. Sayed1, Maha Elsabrouty1, Victor Goulart1,2


1

ECE department, Egypt-Japan university of Science and Technology (E-JUST), Egypt

Center for Japan-Egypt Cooperation in Science and Technology, Kyushu University, Japan

email: {maher.salem, mohammed.sayed, maha.elsabrouty}@ejust.edu.eg, victor.goulart@acm.org

Abstract Large integer DCT, with sizes 16x16 and


32x32, is one of the key new features of the new standard
HEVC. In this paper, we propose a new optimized
architecture for integer DCT in HEVC encoder. The
proposed architecture is a fully pipelined architecture with
optimized bitwidth adders. We map all architectures on 65
nm CMOS technology. Our proposed architecture, for 16DCT and 32-DCT, compared to the state-of-art
architecture decreases delay by 22.5% and 51.8% while
increases area by 19% and 7.5% respectively. Our
optimized circuits are able to process 8K UHD at 30 fps
rate in real-time.
I.

multipliers with shifters and adders. In addition, they exploited


the similarities in the coefficients of the transformation matrix.
However, to the best of our knowledge, no work yet targets to
optimize the DCT implementation itself through pipelining the
critical paths and optimizing the adders bitwidths. Such
optimization is necessary since the computational complexity
of the DCT module increase dramatically in the new HEVC
standard. Therefore, we used the recently published DCT
architecture in [7] as a basis and optimized it by pipelining the
critical paths and optimizing the adders bitwidths. We aim at
higher clock frequencies to increase the number of blocks that
can be processed per second. This will allow higher video
resolutions to be processed timely.

INTRODUCTION AND RELATED WORK

As shown in Fig. 1, we adopted the architecture proposed in


[7]. This base architecture is divided into three processing
stages. The first stage is an Input Adder Unit (IAU), which
adds and subtracts the input values. This IAU is useful in
adding inputs, which have a common multiplier. The first
output of the IAU unit is used to calculate the even N/2-point
DCT. The second output is used to calculate the remaining odd
N/2-point DCT. The second stage in the design is the Shift Add
Unit (SAU). The SAU is a MCM circuit. The N-point DCT has
N/2 SAU blocks, each SAU multiply its inputs by N/2 integers.
The third stage of the design is the Output Add Unit (OAU).
The OAU is used to execute the final additions on the outputs
of different SAUs to produce the final odd coefficients.

One of the most important blocks in any video


encoder/decoder is the transformation block. In the last
standard on video compression HEVC, integer Discrete Cosine
Transform (DCT) is used in to reduce the computational
complexity and to eliminate the error produced by floating
point approximations involved in the traditional DCT [1]. In
HEVC, new big transform block sizes (16x16 and 32x32) and
new integer DCT transformation matrixes are used. This makes
it important to work on building new integer DCT architectures
to achieve high performance in terms of throughput and used
resources. Integer DCT is an integer approximation of the DCT
[2]. It is used to simplify the calculations and to improve
transmission correctness. The DCT operation can be
represented as a set of multiple constant multiplications
(MCMs). Generally, the 2D-DCT is implemented through a
two-separated 1D-DCT implementation [7].
In the state-of-art, there is a very large and wide research on
DCT. Recently, several different transform cores for integer
DCT have been introduced. However, most of them are
targeting H.264/AVC [8, 9]. There are a few publications
detailing the hardware implementations of the DCT transforms
to fulfill the requirements of the new standard HEVC. In [5]
three flexible architectures are proposed to perform 1D-DCT
operation for any DCT size. In [6], an architecture based on the
similarities of the integer DCT values that are multiplied by the
input values was proposed. In [7], DCT architecture was
proposed based on MCM. Their architecture is modular such
that it enables reusability for the N/2-point DCT module in
implementing the N-point DCT.
II.

b
T
w
o
p
O

D
w
p
a
E
n

c
F
r
i
s
o
O
b
a
d
f
i
a
1
2
r

Fig. 1. General Architecture of N-point integer DCT

Our first contribution is applying pipelining to the circuit.


Any N-point DCT consists of 4 main blocks. Therefore, we
have divided the N-point DCT into the number of stages equal
to the N/2-point DCT stages plus one stage for the IAU. The
smallest size DCT, 4-point DCT, is divided into three stages
for the IAU, SAU, and OAU because the calculation of 2-point
DCT does not need IAU block. This pipelined circuit will be
our base architecture.

OUR PROPOSED OPTIMIZED ARCHITECTURE

Most of the previous work focused on implementing the


integer DCT algorithm in an efficient way by replacing the

94

.indd

94

2015/03/11

11:00:50

d
.
o
e
h
y
C
T
e
t
t
o

n
g
h
n
t
t
d
d
s
.
.
s

.
e
l
e
s
t
e

As mentioned above, the integer DCT implementation,


basically, depends on MCM, which took place in the SAUs.
Several algorithms can be used to generate circuit topologies
with shifters and adders as an MCM process [3]. After
pipelining the DCT architecture, the critical path becomes the
SAU circuits. Using the Hcub algorithm [4], an optimized
shift-add structure is generated with the smallest possible
depth. In addition, we worked on optimizing theadders
bitwidth. Each adder in the SAU circuits is designed with the
minimum number of bits that can represent its output. This
optimization has two benefits: first, it reduces the delay of each
adder and consequently, the delay of the whole circuit; second,
it decreases the area of each adder and hence, the total area of
the circuit is reduced.

Fig. 3. Delay for different designs of N-point integer DCT

After optimizing the SAU circuits, the critical path


becomes the OAUs. In N-point DCT, OAU adds N/2 inputs.
Therefore, log2N-1 cycles can be used to pipeline the OAU,
which is implemented as a tree structure in such way that only
one adder is allocated in each pipeline stage. Fig. 2 shows the
pipelining stages of N-point DCT. It is clear that pipelining the
OAU has no cost in increasing the overall pipeline stages.
Fig. 4. Area of different designs of N-point integer DCT

IV.

In this paper, we present a new optimized architecture for


2D-DCT using pipelining and adder bitwidths optimizations.
The implementation results, of the proposed architecture, on
CMOS 65nm technology show good improvement delay,
compared to the state-of-art architecture, especially for large
size block DCTs that are added to the new standard. For 16DCT and 32-DCT, the proposed architecture decreases the
delay by 22.5% and 51.8% respectively with the cost of
increasing area by 19% and 7.5% respectively. The proposed
architecture can process 8K UHD video in real time.

Fig. 2. Pipeline stages of 32-point integer DCT with pipelined OAU

III.

CONCLUSIONS

EVALUATION AND RESULTS

In this section, the performance of the base integer 2DDCT architecture is evaluated versus the same architecture
with optimized SAU and with both optimized SAU and
pipelined OAU in terms of delay and area. The architectures
are described in Verilog RTL and synthesized using Cadence
Encounter RTL compiler RC11.10, and an ASIC CMOS 65
nm technology.

REFERENCES
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, Overview ofthe
High Efficiency Video Coding (HEVC) Standard, IEEE Trans. on Circuits
and Systems for Video Technology,vol.22, No.12, pp.1649-1668, 2012.
[2] N. Ahmed, T. Natarajan and K. R. Rao, "Discrete cosine transform,"IEEE
Trans. Comput., vol. C-23, pp. 90-93, 1974.
[3] http://spiral.ece.cmu.edu/mcm/gen.html (last visit 15-1-2015)
[4] Y. Voronenko, and M. Pschel, Multiplierless Multiple Constant
Multiplication, ACM Transactions on Algorithms (TALG), vol. 3, Issue 2,
May 2007.
[5] S. Y. Park and P. K. Meher, Flexible Integer DCT Architectures for
HEVC, IEEE International Symposium on Circuits and Systems (ISCAS),
2013.
[6] Wenjun Zhao; Onoye, T. ; Tian Song, High-performance multiplierless
transform architecture for HEVC, IEEE International Symposium on
Circuits and Systems (ISCAS), 2013.
[7] P. K. Meher, S. Y. Park, B. K. Mohanty, K. S. Lim, and C. Yeo, Efficient
Integer DCT Architectures for HEVC, IEEE Transactions on Circuits and
Systems for Video Technology, 2013.
[8] W. H. Chen, C. H. Smith, and S. C. Fralick, A fast computational
algorithm for the discrete cosine transform, IEEE Trans. Commun.,vol.
COM-25, no. 9, pp. 10041009, 1977.
[9] C.-P. Fan, Fast 2-dimensional 4x4 forward integer transform
implementation for H.264/AVC, IEEE Trans. Circuits Syst. II, vol. 53, no.
3, pp. 174177, 2006.

The performance gained by the optimizations on the DCT


circuit is presented in terms of delay and area in Fig. 3 and
Fig. 4, respectively. The SAU optimization shows a significant
reduction in area. This reduction becomes bigger when
increasing the DCT size as it increases the number and the
sizes of SAUs. For OAU optimization, as expected, the delay
of the circuits decreased, mainly for larger DCTs where the
OAU logic levels are larger. Pipelining OAU blocks show the
benefit of optimizing the SAU, which became the critical path
after the pipelining process. The proposed optimizations
decreases delay by approximately 27.9%, 22.5%, and 51.8%
for 8-point, 16-point, and 32-point DCTs respectively. Further,
it shows a reduction in area resources by approximately 5.1%,
and 5.5% for 4-point and 8-point DCTs and an increasing by
19% and 7.5% for 16-point and 32-point DCT. Our proposed
2D-DCT architecture is able to process videos with 8K UHD
resolution, 7680*4320 pixel / frame, at 30 fps rate.

95

.indd

95

2015/03/11

11:00:50

.indd

96

2015/03/11

11:00:50

.indd

97

2015/03/11

11:00:51

Frequency-Scan Capacitive Touch Panel


For Bioelectrical Sensor

m
t
f
i
e
r
e

Reiji Hattori1, 2, Ryota Yoneda1, and Yuhei Morimoto1


Kyushu University: Interdisciplinary Graduate School of Engineering Sciences, Fukuoka, Japan
Kyushu University: Art, Science and Technology Center for Cooperation Research, Fukuoka, Japan
1

t
p
f
b
m
(
h
3
t
a

Abstract A novel touch sensing system which acts as a


bioelectrical sensor simultaneously was presented. The frequency
scanning was a powerful method for detecting the touch action
and for obtaining bioelectrical information. This system can
create a new touch sensor applications such as multi-user system
and bioelectrical sensing.
Keywords- User differentiation; Multiuser touch panel;
Bioelectrical sensor; Soft biometrics; Frequency sweep; Resonant
frequency detection

I.

INTRODUCTION

(b)

Multi-touch function in recently-used touch sensor panel


made great success in smartphones or tablet PCs. It can
eliminate all buttons from the devices resulting in a simple and
smart design. Therefore, adding a new function to touch panel
has a possibility to bring about innovation and create new
market. Pen writing, hover touch and 3-D touch are examples
of their new functions and intensively researched and
developed by many companies and laboratories.
In this work, we present another new function which can
simultaneously measure the skin resistance of touched finger
and touching position. By measuring the skin resistance, we
expect a new touch panel application such as user
differentiation in multi-user usage [1]. If we improve the
sensitivity or precision, the touch panel can be used as a
bioelectrical sensor such as a sweat rate sensor, a heart rate
monitor, a skin-age checker, and so on.
II.

(c)
Figure 1. Principle of skin resistance measuring. (a) Electrode configuration,
(b) Touch Panel sectional view and equivalent circuit, (c) Measured
impedance spectrum.

e
a
c
(
W
f
i
r
T
A
c
c
P

Figure 1 explains how to measure the skin resistance of


touched finger. We can use the commonly-used mutual
capacitive touch panel with electrodes of diamond-shape
configuration as shown in Fig. 1(a). The equivalent circuit for
indicating the measuring principle is shown in Fig. 1(b) with
the sectional view defined by the cutting line A-A in Fig. 1(a).
The finger skin resistance, Rf, is connected electrically with TX
and RX touch panel electrodes through the floating
capacitances, Cf, generated between TX/RX and finger. When
the adequate inductance, L, is put in series to this circuit, the
total impedance, Z, is described as the following equation;

METHODOLOGY

Z = j 2 fL +

2
+ Rf ,
j 2 fC f

(1)

where f is the frequency of the wave generator. According to


this equation, the frequency response of the total impedance
magnitude, |Z|, is shown in Fig. 1(c) where Rf is given by the

(a)

w
t
t
C
a

98

.indd

98

2015/03/11

11:00:52

f
l
e
r
h
.
X
g
n
e

minimum value at the resonant frequency, fc, which suggests


that Rf is obtained by measuring the minimum value on the
frequency response of |Z|. Notice that the obtained value of Rf
is free from effect of Cf in principle, which means that we can
eliminate the effect of changing the touching conditions
resulting in a robustness of the system. In addition, Cf is
estimated from fc according to the following equation;
Cf =

L ( 2 f c )

(2)

which suggests that the touching area can be estimated by Cf.


The idea of this detecting circuit was invented based on the
technique called Swept Frequency Capacitive Sensing (SFCS)
proposed by M. Sato [2], but the circuit was improved on the
following two points; (1) The detecting system is modified to
be applied to the mutual capacitive panel by employing the
multiplexer. (2) The detecting signal range is enlarged by TIA
(Trans-Impedance Amplifier). The detecting system using the
high-Q oscillation by installed inductor was also proposed as a
3D gesture-sensing system by Y. Hu [3]. This system is similar
to ours in the event but our system is the first proposal to be
applied to mutual capacitive panel.

Figure 3. Measured spectrum of panel impedance obtained with the


proposed circuit.

Figure 4 is the relationship between measured C and R


calculated from the nadir position on impedance spectra. When
the finger is not approaching, that is, without touch, C
corresponds to Cm and R corresponds to Rl. As the finger is
approaching to the touch panel, C increases because Cf
increases and R also increases but saturated to the certain value.
This saturation means that the R value corresponds to the sum
of Rf and Rl, and the value does not change even if Cf value,
that is, the touching area changes. This fact indicates the
robustness of this measuring system.

Figure 2. Block diagram of the skin-resistance-measuring touch panel


system

Figure 2 shows the block diagram for whole of our system


employed in this work. The touch panel has 88 arrays of TX
and RX electrodes with a diamond configuration. In the driving
circuit, the applied AC voltage is generated by AD5932
(Analog Devices: Programmable Frequency Scanning
Waveform Generator) and scanned from 1 MHz to 3 MHz in
frequency. The 8:1 multiplexer for scanning on TX is
introduced before the resonant inductor, L, because of
reduction of the effect of inject capacitance of the multiplexer.
The signal processing circuit scans RX electrodes and converts
AC signal current to DC logarithmic voltage. In
communication system, CPU (dsPIC) controls the AD5932,
converts analog voltage to digital signal and send the data to
PC through Bluetooth.

o
e
e

touching, the nadir shifts to lower frequency indicating Rf and


Cf. Therefore, by detecting the nadir value and frequency, we
can recognize Rf and Cf. In order to fasten the detection speed,
the scanning frequency region should be strained as narrow as
possible. In our system, we can shorten the frequency scanning
period less than 8 msec. One frame period, that is, the period to
measure for all array points was about 512 msec.

III.

Figure 4. Relationship between measured C and R calculated from the nadir


position on impedance spectra.

REFERENCES
[1]
[2]

RESULTS

Figure 3 shows the measured impedance spectrum obtained


with the proposed circuit. Without touching, the spectrum has
the nadir whose value corresponds to the line resistance, Rl, at
the frequency provided by the mutual capacitance, Cm. Rl, and
Cm mainly consist of the resistance of the touch panel electrode
and the capacitance between TX and RX, respectively. With

[3]

R. Yoneda, K. Kyoung, R. Hattori, User differentiation system with


Projected Capacitive Touch Panel, IMID 2014 Digest of Technical
Papers, p.252 (2014).
M. Sato, I. Poupyrev, and C. Harrison, Touche: enhancing touch
interaction on humans, screens, liquids, and everyday objects, Proc. of
CHI, pp. 483492 (2012).
Y. Hu, L. Huang, W. Rieutort-Louis, J. Sanz Robinson, S. Wagner, J. C.
Sturm, and N. Verma, "3D Gesture Sensing System for Interactive
Displays Based on Extended-range Capacitive Sensing," Solid-State
Circuits Conference Digest of Technical Papers (ISSCC), pp. 212213
(2014)..

99

.indd

99

2015/03/11

11:00:53

Strong resonant coupling for short-range wireless


power transfer using defected ground structures

s
T
d
r
f
o
i
i
u
a
i

Sherif Hekal1, Adel B. Abdel-Rahman1, Ahmed Allam1, Ramesh K. Pokharel2, H. Kanaya2, and H. Jia2
1

ECE department, Egypt-Japan University of Science and Technology, Alex, Egypt, sherif.hekal@ejust.edu.eg
2
Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
introduced to obtain miniaturized structures that can be
embedded in portable and biomedical devices.

AbstractIn this paper, we present a new idea for highly efficient


short-range wireless power transfer based on defected ground
structures via strongly coupled magnetic resonance. An
equivalent circuit model for using H-slot defected ground
structures as coupled resonators is introduced. Measurement
results for the new proposed structure show a power transfer
efficiency of 82% at 9 mm distance between driver and load
resonators. Experimental measurements have shown good
agreement with theoretical and simulation results.

In this paper, we present a modification that can be applied


to the short-range wireless power transfer system, shown in
Fig. 1. An enhancement in the power transfer efficiency is
implemented by using strong resonant coupling through high
Q-factor intermediate resonator. Electromagnetic and circuit
simulators are used to design and simulate the proposed
structures transmission, reflection, and coupling performance.
Measurement results have confirmed the validity of the results
achieved in simulations.

Keywords-wirless power transfer; defected gorund structures;


EM resonant coupling; strong resonant coupling

I.
INTRODUCTION
The technology of wireless power transfer (WPT) has
attracted a great interest for its wide potential applications such
as RFIDs, biomedical implants, wireless buried sensors, and
portable electronic devices. The implementation methods of
WPT can be classified into two main techniques: near-field and
far-field. The short-range and mid-range WPT can be achieved
based on near field coupling, which can be divided into three
types: inductive coupling, EM resonant coupling, and strong
resonant coupling. Inductive coupling is the most popular
technique for high power transfer, and is usually applied at the
lower frequency region [1]. At higher frequencies, the resonant
type becomes a good choice. Resonant circuits focus the power
at a certain frequency, so that power transfer efficiency can be
improved [2]. Strong coupling uses intermediate resonators
with high Q-factors to increase the total efficiency of the
proposed WPT system [3]. Far field technology is usually
applied for long distance transfer based on radiated waves like
radio waves, Microwave links, or laser beams.

II. TWO COUPLED H-SLOT RESONATORS


In this section, we discuss the new idea of using DGSs in
WPT applications. Fig. 1(a) and (b) show the implementation
of short-range WPT system using two H-slot coupled
resonators set back-to-back, and their equivalent circuit models
respectively. The simulated and measured power transfer
efficiencies are shown in Fig. 1 (c).

(a)

Most of the near field WPT methods depend upon lumped


elements such as inductors and capacitors. For low frequency
applications, these elements are bulky and lossy. Quasi-lumped
elements, based on defected ground structures (DGS), can act
as resonant circuits [4]. These structures have been proposed
for RF/Microwave applications to implement band-pass and
band-stop filters with low profiles. These compact structures
have small dimensions and low cost; which make them suitable
for high frequency and small size applications such as portable
electronic devices and biomedical implants. As shown in Fig.
1, we have verified that DGSs, acting as resonators, can be
coupled at definite distances to transfer power. the new
proposed WPT system using H-slot coupled resonators can
transfer power with efficiency of 85% at distance 3.5 mm and
resonant frequency 1.43 GHz. Design at GHz range is

(b)

F
r

(c)
Figure 1. New proposed WPT system using two H-slot coupled resonators
(a) EM simulator implementation. (b) Equivalent circuit (c) Simulated and
measured power transfer efficiency.

III. STRONG RESONANT COUPLING


The power transfer efficiency for two and three coupled
resonators can be given as in (1), (2) respectively, where Qd,
Ql, and Qt are the Q-factors for driver, load, and transmitter
(intermediate) resonators [5].

E
p

100

.indd

100

2015/03/11

11:00:55

d
n
s
h
t
d
.
s

k 2 Qd Ql
1 k 2 Qd Ql

k12Qd Qt
k 22Qt Ql
dt tl .
2
1 k1 Qd Qt 1 k 22 Qt Ql

thickness h1 and h2 are used to separate between resonators.


The coupling performance is examined by changing the
separation distances h1 and h2. EM simulation results have
shown that at distances h1 = 3 mm and h2 = 5 mm, power
transfer can be achieved with an efficiency of 85% at resonant
frequency 1.4 GHz. The equivalent circuit parameters (R, L,
and C) for each resonator are extracted from their EM
simulation results as in [4] and the series capacitance by
Cs=tan(S)/0Z0. By substitution, we get CS1 = CS2 = 1.15 pF,
L1 = L3 = 3.2 nH, L2 = 4.75 nH, C1 = C3 = 3 pF, C2 = 2.7 pF, R1
= R3 = 3.2 K, and R2 = 2.5 K. Fig. 3 shows good agreement
between the measured and simulated S-parameters for the new
proposed WPT system using strong resonant coupling.

(1)

(2)

Fig. 2(a) shows the proposed structure for short-range WPT


system strongly coupled using single intermediate resonator.
The distance between driver and transmitter is h1, and the
distance between transmitter and load is h2. The driver and load
resonators are H-slot DGS resonators where the top layer is a
feed line of width = 1.9 mm and length = 10 mm ended by stub
of length S = 9.5 mm. The stub acts as series capacitance CS for
impedance matching. The bottom layer of the three resonators
is a ground plane defected by an H-slot as shown in Fig. 2(b),
using the dimensions shown in Table I. A capacitor of 1.4 pF is
added with the slot to increase the equivalent capacitance
introducing resonators with high Q-factor and compact size.
TABLE I.
Resonator
type
Driver/Load
Transmitter

IV.

The measurement setup for the fabricated WPT system


using a network analyzer (Agilent N5227A) is shown in Fig
4(a). It can be inferred from Fig. 4(b) that an efficiency of 82%
can be achieved at distance 9 mm with strong resonant
coupling compared to an efficiency of 5% with traditional
systems.

DESIGN PARAMETERS OF STRONGLY C OUPLED WPT


L=W (mm)

20
20

Resonator parameters

Lslot (mm)

6
10

Wslot (mm)

1
1.9

LH (mm)

14
14

n
n
d
s
r

d
,
r

EXPERIMENTAL SETUP AND RESULTS

(a)

(a)

(b)

(b)
Figure 4. Measured S-parameters for the proposed WPT system with/without
strong resonant coupling

(c)
Figure 2. Strong resonant coupling (a) Schematics of driver, TX, and load
resonators. (b) Bottom layer of H-slot resonator. (c) Equivalent circuit.

REFERENCES
[1]
[2]

[3]
[4]
Figure 3. S-parameters of the proposed structure using strong resonant coupling

The proposed structure, shown in Fig. 2 (a), is simulated by


EM simulator (CST microwave studio) using the design
parameters in Table I. Foam layers of permittivity r = 1.2 and

[5]

C.-J. Chen, T.-H. Chu, C.-L. Lin, and Z.-C. Jou, A Study of Loosely
Coupled Coils for Wireless Power Transfer, IEEE Trans. Circuits Syst.
II Express Briefs, vol. 57, no. 7, pp. 536540, Jul. 2010.
B. L. Cannon, J. F. Hoburg, D. D. Stancil, and S. C. Goldstein,
Magnetic resonant coupling as a potential means for wireless power
transfer to multiple small receivers, IEEE Trans. On Power Electron.,
vol. 24, no. 7, pp. 18191825, Jul. 2009.
A. Kurs, A. Karalis, R. Moffatt, J. D. Joannopoulos, P. Fisher, and M.
Soljacic, Wireless energy transfer via strongly coupled magnetic
resonances, Science, vol. 317, pp. 8385, 2007.
D.-J. Woo, T.-K. Lee, J.-W. Lee, C.-S. Pyo, and W. Choi, Novel U-slot
and V-slot DGSs for bandstop filter with improved Q factor, IEEE
Trans. Microw. Theory Tech., vol. 54, no. 6, pp. 28402847, Jun. 2006.
A. K. RamRakhyani and G. Lazzi, On the design of efficient multicoil
telemetry system for biomedical implants, IEEE Trans. Biomed.
Circuits Syst., vol. 7, no. 1, pp. 1123, Feb. 2013.

101

.indd

101

2015/03/11

11:00:56

Mach-Zehnder Interferometric Phase Stabilization


for Optoelectronic Carrier Generation

t
t

K. Sakuma1, S. Takeuchi1, Y. Fujimura1, J. Haruki1, K. Kato1, S. Hisatake2, T. Nagatsuma2


1

Graduate School of Information Science and Electrical Engineering, Kyushu University


Fukuoka, Japan
2
Graduate School of Engineering Science, Osaka University
Toyonaka, Japan
In this method, since the generated 100 GHz carrier is also
dithered, the induced phase noise might degrade the receiver
sensitivity in the case of future phase-shift keying or quadrature
amplitude modulations.

Abstract We present a phase stabilization system of


optoelectronic carrier generation for the future THz wireless
transmission. The system has a Mach-Zehnder interferometric
configuration and we transmitted 1-Gbit/s data at a 12.5-GHz
carrier to confirm the effectiveness of the concept. We found that
the noise floor which had been observed at the BER
characteristics in our previous work disappeared with an
optimized control.

Previously, we devised a phase stabilization system without


dithering in which two optical paths were configured as a
Mach-Zehnder interferometer (MZI), and stabilized the phase
fluctuation between two optical carriers by controlling their
interfered intensity extracted with an optical band-pass filter
[3]. In this phase stabilization system, we achieved a data
transmission of 11 Gbit/s based on on-off keying modulation at
100 GHz. However, the BER characteristics has noise floor
below 109 .

Keywords-coherent wireless transmission; sub-THz wave; optical


frequency comb; phase stabilization

I.

e
d
e
a
o
T
t

INTORODUCTION

Recently, there has been considerable interest in the THz


wave for short-distance wireless communication of high datarate [1]. Fig. 1 shows the conventional approach for the THz
carrier generation. In this system, two different optical carriers
from an optical frequency comb (OFC) are extracted by an
optical filter such as an arrayed-waveguide grating (AWG),
and they are coupled with an optical coupler (OC) and
converted to the THz carrier as the beat signal at a
photomixier. Here, the key technology to generate a low phase
noise THz carrier has been reduction of path length fluctuation
in optical fibers between the AWG and the OC which is due to
temperature fluctuation and acoustic noise. (Note that a special
filter could extract two optical carriers into one optical path, in
the case of which the path deference cannot occur, but the
future high-capacity phase-shift keying or quadrature
amplitude modulations would still need separated paths
because of modulations on each path.) For reducing a phase
noise of the THz carrier, we should detect the difference of
optical path lengths and control them to be constant.

b
l
v
o
t
v
t
s
t
n
s
p
p

In this paper, in order to precisely control the phase


fluctuation with a interfered lightwave, we eliminate optical
noises by using the optical filter with much narrower pass band
than that in the previous experiment. As a result, we obtain the
BER characteristics without the noise floor.
II.

PHASE STABILIZATION SYSTEM

Fig. 2 shows the conceptual diagram of the coherent


carrier generation system with our phase stabilization scheme.
The single-mode lightwave (the lasing frequency 0 ) output
from a narrow linewidth laser is phase-modulated at the
frequency 6.25 GHz with an electro-optic modulator (EOM) to
generate the OFC. The two optical carriers of different
frequencies ( 0 6.25 GHz, 0 + 6.25 GHz) are extracted
from the OFC by an optical filter so that we got two
lightwaves with the optical frequency difference of 12.5 GHz.
This filter is designed also to divide the lightwave of 0 into
these two optical paths. The two optical paths configure a MZI
for the lightwave of 0 and thus, the intensity of 0 is varied
by light interference. The coupled lightwaves after the OC
consist of 0 6.25 GHz, 0 + 6.25 GHz, and 0 . The
lightwave of 0 is extracted with the Fiber Bragg Grating
(FBG), which works as the band-pass filter, and the optical
circulator. Here, the passband of the FBG is sufficiently
narrow (< 1 GHz) compared with that of conventional filter
(about 35 GHz). The two optical lightwaves are converted to a
12.5-GHz carrier by a photomixer and the lightwave of 0 is
interfered and detected with a photodetector (PD). The
detected interfered intensity is fed back to be constant using

Figure 1. Conventional configuration for THz carrier generation.

One of the practical approaches for detecting the difference


of two optical path lengths is to dither optical path lengths and
deduce relative optical phase fluctuations [2]. It has
successfully demonstrated an error-free data transmission of up
to 12.5 Gbit/s based on on-off keying modulation at 100 GHz.

t
o
w
e
(

102

.indd

102

2015/03/11

11:00:57

o
r
e

t
a
e
r
r
a
t
r

e
l
d
e

t
.
t
e
o
t
d

.
o
I
d
C
e
g
l
y
r
a
s

the phase shifter (PS) through PID controller, so as to stabilize


the phase fluctuation of the 12.5-GHz carrier.

Figure 5. Phase noise spectrum.

Figure 2. Mach-Zehnder Interferometric phase stabilization system.

III.

Fig. 6 shows the BER characteristics of the received data at


1 Gbit/s with the MZI phase stabilization system as a function
of a photocurrent at the photomixer. For comparison, it also
shows the BER characteristics of the reference system in which
two optical carriers are extracted into the same optical path.
Here, the photocurrent can be regarded to be proportional to the
square root of the THz power. We achieved a power penalty of
only 14 % of the photocurrent at the BER of 1010 .

EXPERIMENTAL SETUP AND RESULTS

Fig. 3 shows the experimental setup to testify the


effectiveness of our phase stabilization system at a coherent
data transmission. At the transmitter, the two lightwaves
extracted through the FBG are modulated with on-off keying at
a bit rate of 1 Gbit/s by another EOM. The data stream consists
of a pseudo-random bit sequence (PRBS) of length 210 1.
Then, the photomixer generated the 12.5-GHz beat signal with
the 1 Gbit/s on-off keying.
Fig. 4 shows the optical interfered intensity of 0 detected
by the PD. The PD output voltage without control was varying
largely due to the phase fluctuation. We estimated that the
voltage variation of 10.8 V corresponds to the phase fluctuation
of or less. Then, the PD voltage was fed back to the PS so as
to control the voltage to be constant. We kept the voltage
variation of less than 350 mV which is estimated to correspond
to phase stabilization with an accuracy of 0.03 or less. Fig. 5
shows the relative phase noise intensity obtained by Fourier
transform of the waveforms in Fig. 4. Compared with the phase
noise intensity without control, the controlled one decreased
significantly in the region below 400 Hz. These mean that the
phase fluctuation of the two lightwave was stabilized by the
proposed system.

Figure 6. BER characteristics at 1 Gbit/s.

IV.

CONCLUSION

We have presented a MZI phase stabilization system of


optoelectronic carrier generation. We controlled the phase
fluctuation with a low-optical-noise interfered lightwave
extracted with an optical filter with narrow pass band and
transmitted 1-Gbit/s data based on on-off keying modulation
at a 12.5 GHz carrier. Consequently, we eliminated the noise
floor at the BER characteristics. Future work will apply this
method to the 300-GHz carrier generation and high-capacity
phase-shift keying or quadrature amplitude modulations.

At the receiver, the modulated 12.5-GHz wave is


transmitted through a coaxial cable to a mixer. The local
oscillator signal (LO) is generated at the frequency doubler
which doubles the frequency 6.25 GHz. The data signal
extracted as the IF signal is amplified by a low noise amplifier
(LNA) and detected by the error detector.

ACKNOWLEDGMENT
The authors thank NTT Laboratories for their experimental
support. A part of this work was supported by the Strategic
Information and Communications R&D Promotion Programme
(SCOPE) 2014, from the Ministry of Internal Affairs and
Communications, Japan, and CREST/JST.
REFERENCES
[1]

Figure 3. Data transmission setup of the sub-THz carrier.

[2]

[3]

g
Figure 4. Interfered light intensity detected with the photodetector.

T. Nagatsuma, S. Horiguchi, Y. Minamikata, Y. Yoshimizu, S. Hisatake,


S. Kuwano, N. Yoshimoto, J. Terada, and H. Takahashi, Terahertz
wireless communications based on photonics techologies, OPTICS
EXPRESS, Vol. 21, No. 20, pp. 23736-23747, 2013.
Y. Yoshimizu, S. Hisatake, S. Kuwano, J. Terada, N. Yoshimoto, and T.
Nagatsuma Wireless transmission using coherent terahertz wave with
phase stabilization, IEICE Electronics Express, Vol.10, No.18, pp. 1-8,
2013.
S. Takeuchi, K. Kato, S. Hisatake, and T. Nagatsuma, Coherent subTHz carrier frequency transmission with novel Pseudo-Mach-Zehnder
interferometric phase stabilization, in Proc. 2014 International Topical
Meeting on Microwave Photonics Conf. (MWP 2014), TuEF-1, pp 404406.

103

.indd

103

2015/03/11

11:00:57

Fine wavelength stabilization of the DFB laser


by photo-mixing with reference wavelengths
Atsushi Saeki, Kazutoshi Kato
Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
Abstract The lasers for the future WDM system such as the
flexible grid need to have highly accurate wavelength tuning
which corresponds to the optical frequency stability within 0.1
GHz. Instead of the use of the conventional wavelength locker
which detects the optical frequency with optical filters, we
propose the microwave photonics approach in which the beat
signals are generated by photo-mixing with reference lights and
filtered by the microwave filter. In this method the frequency
accuracy depends on the accuracy of the microwave filter.
Experimental results show that the optical frequency is stabilized
at the grid with the accuracy of 0.1 GHz by using the
microwave filter.

systems, by generating the beat signals with the reference


lights and deducing the optical frequency deviation with the
microwave filter. We successfully demonstrate the wavelength
stabilization of the DFB laser within 0.1GHz.

Keywords WDM, wavelength stabilization, photo-mixing, beat


signal, DFB laser, feedback control.

I. INTRODUCTION
The
wavelength
division
multiplexing
(WDM)
transmission technology has promoted the expansion of the
large-capacity optical communication. In addition to the large
capacity, the WDM technology has another advantage that it
transmits differently formatted signals with different
bandwidths. Figure 1(a) indicates a situation that three lights
with different wavelengths are transmitted through an optical
fiber at a WDM system. At the conventional WDM system, in
which the grid spacing is 50 GHz or 100 GHz, one channel is
allocated even for small-capacity data traffic. This feature is
illustrated with the red band (the actually used band) and the
blue band (the allocated band) in the figure. Thus, for future
high spectral-efficiency transmission, the conventional WDM
channel would be inefficient. To increase the spectral
efficiency of the WDM transmission, the WDM channels on
the flexible grid has been defined by the ITU [1].
At the flexible grid, the channels are allocated on the basis
of 6.25-GHz spacing. Each channel can occupy the bandwidth
of 12.5 GHz n (n is a natural number) with its center at the
grid. Figure 1(b) shows an example that the channels with
37.5-GHz, 12.5-GHz, and 25-GHz bands are allocated on the
grids. We can see the advantage of the flexible grid that each
channel occupies the minimum bandwidth for the signal being
transported. Since the center optical frequency should be
aligned on the 6.25-GHz-spacing grid, the flexible grid system
needs the optical source which has its optical-frequency
stability within 0.1 GHz.
In this paper, we propose a novel technique to accurately
monitor the optical frequency of the distributed feedback
(DFB) laser, which is the common light source for the WDM

Fig. 1. Bandwidths of the data at (a) the conventional WDM


grid and (b) the flexible grid
II. HIGH-RESOLUTION WAVELENGTH MONITORING TECHNIQUE
At the conventional WDM system, the optical frequency of
the laser has been measured by optical filters [2]. Accuracy of
the measured optical frequency depends on the property of the
optical filters, whose resolution is typically around 0.2 GHz.
The flexible grid needs resolution of much less than 0.1 GHz.
To get enough resolution for the flexible grid we proposed
to detect optical beat signals of the optical frequencies
between the laser and the reference optical frequencies [3].
Figure 2 shows the principle of the method. First, we
prepare two reference lights with 6.25-GHz spacing whose
center frequency is just on the target optical frequency. Next,
we generate the beat signals between these three optical
frequencies by using a photodiode. Here, when the optical
frequency of the laser deviates from the target optical
frequency by f, the beat signals contain the frequencies of
3.125 GHz + f and 3.125 GHz - f. Then, we filter the beat
signals through the band pass filter (BPF) with 3.125-GHz
center frequency, from which we observe larger power when
the beat signals are closer to 3.125 GHz. Namely, we observe
larger power when the optical frequency of the laser is closer
to the target frequency.

104

.indd

104

2015/03/11

11:00:59

2.9 GHz. On the other hand, Fig. 4(b) shows the spectrum of
the beat signal when f 0. It has two peaks whose
frequencies are 2.9 GHz + f and 2.9 GHz f. The RF power
measured by the RF detector was decreasing when |f| was
increasing, from which we confirmed that we could stabilize
the optical frequency of the laser by controlling the RF power
at the maximum. Then, we controlled the LD temperature so
that the RF power is maximum. Figures 5 shows the Max
Hold trace for thirty minutes with the feedback control. The
width of the Max Hold trace indicates the wavelength
deviation. The FWHM of the trace is less than 0.1 GHz. This
shows that the optical frequency can be stabilized within 0.1
GHz by our proposed method.

III. EXPERIMENTAL RESULTS


Figure 3 shows block diagram of the wavelength
stabilization system based on the proposed method. For
confirming the effectiveness of our proposed method, our
experiment was carried out at 5.8-GHz-spacing grid with an
in-house 5.8-GHz-center BPF instead of 6.25-GHz-spacing
one. First, we generated 5.8-GHz-spacing reference lights.
Then, the reference lights were multiplexed with that from a
DFB laser and these lights were photo-mixed by the
photodiode.

Fig. 5.
Fig.2. The principle of the proposed method

The Max Hold trace for thirty minutes


IV. CONCLUSION

We proposed a novel technique to stabilize the optical


frequency of the DFB laser with a high accuracy. In this
method the beat signals with the reference lights were
generated by photo-mixing and filtered by the microwave
filter. We demonstrated that the optical frequency was
stabilized with an accuracy of less than 0.1 GHz. These
results show that the proposed method is suitable for the lasers
at the flexible grid WDM communication.

ACKNOWLEDGEMENT

Fig.3. The block diagram of the wavelength stabilization


system

The authors thank NTT Laboratories for their experimental


support. This work was supported by JSPS KAKENHI Grant
Number 26420308 and the Strategic Information and
Communications R&D Promotion Programme (SCOPE) 2014,
from the Ministry of Internal Affairs and Communications,
Japan.
REFERENCES

Fig.4.

[1] ITU-T Recommendation G. 694. 1 (02/2101), Spectral grids for


WDM applications: DWDM frequency grid.
[2] H. Ishii, K. Kasaya, H. Oohashi, Y. Shibata, H. Yasaka, and K.
Okamoto, Widely wavelength-tunable DFB laser array
integrated with funnel combiner, IEEE J. Sel. Top. Quantum
Electron., vol. 13, no. 5, pp. 1089-1094, 2007.
[3] A. Saeki, H. Onji, S. Takeuchi and K. Kato, Wavelength

The spectrum of the beat signals (a) when f = 0


and (b) when f0

stabilization of flexible grid lasers by monitoring


frequency difference from reference wavelengths, 34th

Figure 4(a) shows the spectrum of the beat signals observed


by a spectrum analyzer when f = 0. It has a single peak at

LSJ Conference, January 2014. (in Japanese)

105

.indd

105

2015/03/11

11:00:59

Nonlinear Capacity of Few-Mode Fibers using the


Gaussian Noise Model
Abdulaziz E. El-Fiqi1 , Abdallah Ismail1 , Ziad A. El-Sahn2 , Hossam M. H. Shalaby1,2 , and Rameash K. Pokharel3
abdulaziz.elfiqi@ejust.edu.eg, abdallah.Ali@ejust.edu.eg, ziad.elsahn@ieee.org, shalaby@ieee.org, pokharel@ed.kyushu-u.ac.jp
1

Egypt-Japan University of Science and Technology (E-JUST), Alexandria 21934, Egypt


Photonics Group, Electrical Engineering Department, Alexandria University, Alexandria 21544, Egypt
3
E-JUST Center, Kyushu University, Fukuoka, Japan

AbstractA closed-form expression for the nonlinear capacity


of a few-mode fiber (FMF) is formulated analytically by extending
the Gaussian noise (GN) model. The effects of different nonlinearity penalties on the system capacity are then evaluated via
simulations.

I. I NTRODUCTION
Space-division multiplexing (SDM) is a promised degree
of freedom to increase the transmission capacity, which is
rapidly approaching its fundamental limit in single mode fibers
[1]. Few-mode fibers (FMFs) are remarkable channels for
SDM techniques. However, the nonlinear interaction between
different propagation modes in FMFs is a major source of
performance limitation which must be addressed for its mitigation. Few analytical efforts have been developed to model
the nonlinear propagation in multi-mode fibers [2], [3]. In this
paper, we extend the GN-model developed for single mode
fibers [4] to address the different nonlinearities impact in
FMFs. In [3], a general integral formula for the cross-modal
nonlinear interaction has been proposed for multimode fibers.
However, in this work, a simple closed-form expression (with
less computational complexity) for the nonlinear capacity of
FMFs is derived for the case of weak linear coupling regime
among the different spatial modes. In addition, the effect of
different nonlinearity penalties for various constellation orders
are investigated.
II. P ROPOSED GN-M ODEL FOR F EW-M ODE F IBERS
The signal propagation of mode p in a FMF has been
already described in [5]. It is divided into a linear part
(dispersion
and a nonlinear part,
given by
 + attenuation)
 2 4 
 2 
8

p is




Np = j 9 fpppp Ap + 3 h=p fpphh Ah . Here A
the field envelope of mode p, is the fiber nonlinearity coefficient, fpppp is the intra-modal nonlinear coefficient tensor
of mode p, and fpphh is the inter-modal nonlinear coefficient
tensor between p and h spatial modes. The calculated values
of these tensors have been reported in [1].
The GN-model for single mode fibers assumes that the nonlinearity source can be modeled as an additive Gaussian noise
which is statistically independent from both the amplifier noise
and the transmitted signal [4]. Also, it assumes the transmitted
signal as a wavelength-division multiplexed (WDM) comb
signal with Nch channels. These assumptions can be extended
for FMFs based on the fact that the interaction between
any two orthogonal polarization modes is equivalent to that

between two spatial modes [6]. Therefore, the performance of


a FMF link per mode can be determined by the optical signalto-noise ratio as OSN Rp = Ptxp /(PASE + PN Lp ), where
Ptxp is the launch power per mode, PASE is amplifiedspontaneous-emission (ASE) noise power, and PN Lp is the
nonlinear interference power per mode.
From Shannons relation of capacity for the unconstrained
additive-white Gaussian noise channel of single-polarization
single mode fiber, we can formulate an extended relation for
the dual-polarization few-mode fiber capacity per mode as


Rs
2Rs
Cp =
bits/symbol/mode,
log2 1 +
OSN Rp
Bch
Bn
(1)
where Bn is the noise bandwidth of 12.48 GHz (equivalent
to the reference 0.1 nm resolution for OSNR calculation),
Rs is the signaling rate, and Bch is the WDM channel
bandwidth. After a rigorous mathematical analysis, we derive
the nonlinear interference power formula through integrating
its power spectral density (PSD) over the WDM bandwidth
Bw = Nch Bch . Furthermore, this PSD is obtained by statistically averaging the square absolute-value of the nonlinear
optical field. Next, by assuming a rectangular shaped WDM
channel spectrum with bandwidth Bch with has the same
value as signalling rate Rs at Nyquist case, a closed-form
expression for the nonlinear interference power per mode can
be obtained at the center channel frequency [7]. Finally, the
overall capacity for the few-mode fiber is formulated by the
summation of all mode capacities. The final expression of the
overall capacity is shown in (2) at the top of next page, where
M is the number of spatial modes, Lef fp = (1 ep L )/p
is the span effective length of a fiber with length Ls , and a
mode fiber loss coefficient p , Ptx is the total lunch power,
2p is the mode group-velocity dispersion (GVD), Ns is the
number of fiber spans, F is the amplifier noise factor, h is
Planks constant, is the center channel frequency, and G is
the amplifier gain.
III. M ODEL R ESULTS
In this section, we apply the proposed model for a system
with the following FMF parameters [1] (p 0.22 dB/km,
2p 21.2 ps2 /km, and p 1.3 W1 km1 ) for three
modes (LP01 , LP11a,b ). For the WDM system, the specifications are assumed as: Rs = 32 GBaud (that is, a net sampling
rate of 25 GBaud + 20% for forward error correction (FEC)
and network protocols overheads [4]) and Nch = 5. The used

106

.indd

106

2015/03/11

11:01:01


1
2Rs 

log2 1 +
Bch
B n Ns

(G 1)F h +

4 2
3 M3

amplifier is an erbium-doped fiber amplifier (EDFA) with a


noise figure of 6 dB and a gain that compensates the fiber
span loss: G = ep Ls .
Linear Shannon limit
8
LP11a or 11b intramodal limit 7
6

45

LP11a or 11b intermodal limit 2,000

LP

Capacity (b/symbol)

11a or 11b

LP

01

Copropagated with LP

01

intramodal limit

LP01 intermodal limit

LP01 Copropagated with LP11a &11b

5.5

4
5
5000

3,000

5500

6000

5,000

L (km)

7,000

9,000

Fig. 1: Capacity versus fiber maximum reach at different nonlinear penalties for a FMF of Ls = 100 km.
Linear Shannon limit
Capacity (b/symbol)

36

Nolinear shannon limit


64QAM

(a)
4QAM
Linear Shannon limit
Nolinear shannon limit

Intermodal
penalty

64QAM

28

16QAM

20
12

(b)
4QAM
Linear Shannon limit

36

Total
penalty

Nolinear shannon limit


64QAM

28

16QAM

20
12

(c)
4QAM
16

12

4
P (dBm)

Ptxp

2
 2
log( 2 Bw
| 2p |Lef fp ) 3
4 2
P
f
+
f
L
ef fp
tx.
3
pphh
9 pppp
2 Bw
| 2p |
h=p

(2)

degenerated modes (LP11a , LP11b ) in both the inter-modal


and the intra-modal limits. Also, the inter-modal penalty is
more significant than those for intra-modal ones in all spatial
modes. This penalty variation is related to the different spatial
interactions and the fiber effective areas of different modes.
The capacity for different constellation QAM levels (4, 16,
64) is compared to both the nonlinear and linear Shannon
limits in Fig. 2. The impact of the nonlinearities does not
appear at low constellation levels (4-QAM). However, at
moderate levels (16-QAM), the different nonlinearity penalties
become significant and limit the FMF capacity from reaching
its maximum value (which is 24 b/symbol for a 3-mode dualpolarized signal). In addition, both the inter- and intra-modal
impacts are approximately equal as shown in Figs. 2-b and 2c. At high constellation levels, the impact of the nonlinearities
becomes more significant for different penalties. Furthermore,
the inter-modal impact becomes greater than the intra-modal
one by 1.5% at nonlinear Shannon limit. These nonlinearity
penalties are clear in the nonlinear Shannon capacity curves
for different nonlinearity limits. The optimal launched power
(top points on curves in Fig. 2) only depends on the penalty
limit ( nonlinear tensors values) not on the constellation order.
IV. C ONCLUSIONS
The GN-model has been extended for FMFs in order to
estimate the effects of different nonlinearity penalties. A
closed-form formula for the nonlinear FMF capacity has been
obtained. Using this formula, it has been verified that the
performance degradation due to the inter-modal penalty is
greater than those for the intra-modal ones. In addition, the
nonlinear impact on the fundamental mode is greater than that
for the degenerated modes.
R EFERENCES

16QAM

20

36
Capacity (b/symbol)

Intramodal
penalty

28

12

Capacity (b/symbol)

C=

tx,p

Fig. 2: Capacity versus channel launch power per mode at


different nonlinear penalties for a FMF of Ls = 100 km and
Ns = 5.

Fig. 1 shows the degradation of the FMF capacity with the


fiber maximum reach for different nonlinearity penalty limits
at the optimal launch power. It is shown that the nonlinear
penalty effect is greater in the fundamental LP01 than the

[1] I. Kaminow, T. Li, and A. E. Willner, Eds., Optical Fiber Telecommunications Volume VIB, Sixth Edition: Systems and Networks, 6th ed.
Amsterdam ; Boston: Academic Press, May 2013.
[2] F. Ferreira, S. Jansen, P. Monteiro, and H. Silva, Nonlinear semianalytical model for simulation of few-mode fiber transmission, Photonics Technology Letters, IEEE, vol. 24, no. 4, pp. 240242, 2012.
[3] G. Rademacher, S. Warm, and K. Petermann, Analytical description of
cross-modal nonlinear interaction in mode multiplexed multimode fibers,
Photonics Technology Letters, IEEE, vol. 24, no. 21, pp. 19291932,
2012.
[4] A. Carena, V. Curri, G. Bosco, P. Poggiolini, and F. Forghieri, Modeling
of the impact of nonlinear propagation effects in uncompensated optical
coherent transmission links, Journal of Lightwave Technology, vol. 30,
no. 10, pp. 15241539, 2012.
[5] S. Mumtaz, R. Essiambre, and G. P. Agrawal, Nonlinear propagation
in multimode and multicore fibers: Generalization of the manakov equations, Lightwave Technology, Journal of, vol. 31, no. 3, pp. 398406,
2013.
[6] A. Mecozzi, C. Antonelli, and M. Shtaif, Nonlinearities in space-division
multiplexed transmission, in Optical Fiber Communication Conference
and Exposition and the National Fiber Optic Engineers Conference
(OFC/NFOEC), 2013, Mar. 2013, pp. 13.
[7] A. E. El-Fiqi, A. Ismail, Z. A. El-Sahn, H. M. H. Shalaby, and R. K.
Pokharel, Evaluation of nonlinear interference in few-mode fiber using
the gaussian noise model, submitted in CLEO-2015 (unpublished).

107

.indd

107

2015/03/11

11:01:02

.indd

108

2015/03/11

11:01:02

.indd

109

2015/03/11

11:01:03

Low-Complexity Perceptual-Based Compressed


Video Sensing
Sawsan A. Elsayed, Maha M. Elsabrouty

Electronics and Communication Engineering, Egypt-Japan University of Science and Technology (EJUST), New Borg ElArab City, Alexandria, Egypt, {sawsan.abdelsalam, maha.elsabrouty} @ejust.edu.eg
efficiency of embedding perceptual-based weighting strategy
into the CS framework for intra-frames. It mainly utilizes the
structural sparsity of 2D-DCT transform and focuses the
measurements and recovery on 2D-DCT low frequency
coefficients. In this paper, we propose to proceed for further
improvements by exploiting inter-frame correlation among
successive frames. Residual-based recovery is utilized here
as it proves good performance by just recovering the residual
between the required frame and its predicted frame [8, 9].
Utilizing residual based recovery with our perceptual-based
recovered frames is anticipated to further improve the
performance over other works in literature.

Abstract This paper proposes compressed video sensing


system that efficiently exploits spatial and temporal
redundancies in video signal. Our system utilizes perceptual
features of human eyes to focus the measurements and
recovery on the most pronounced visual coefficients in video
frames. Low complexity residual-based recovery is proposed to
exploit inter-frame correlation. Simulation results demonstrate
the efficiency of our proposed system both in quality and
complexity.
Keywords- compressed sensing; video coding; perceptual
weighting.

I.

INTRODUCTION

II.

Compressed sensing (CS) is an emerging technology that


enables acquisition and sampling of signals that adopt
sparsity in some basis with much lower sampling rate than
the traditional Nyquist rate [1]. CS has found remarkable
potential in many diverse applications [2] especially for
resource limited acquisition devices as sensor nodes in
wireless sensors networks (WSN). Video processing is one
of the applications that can maximally benefit from CS
theory since video signals are characterized with an inherent
vast amount of correlation both in spatial and temporal
directions.

Video sequence is grouped into number of group of


pictures (GOPs). Each GOP consists of one key frame and
number of non-key frames. The sampling rate for key frames
is chosen to be higher than that of non-key frames. The high
quality perceptual-based reconstructed key frames are
utilized to obtain prediction frame that can be used as side
information (SI) for the non-key frame recovery. For the
non-key frame reconstruction, instead we recover the frame
itself, we intend to recover the residual between the required
frame and its predicted SI. Due to the similarity among
successive video frames, the residual frames prove to be
sparser than the full frames themselves. This feature leads to
the anticipated improved recovery quality. The procedure is
explained as follows.

The standard CS framework is to acquire small number


of random samples from the signal that
has sparse representation in some orthonormal
basis such that, , , and .
is a sensing matrix. Different types of efficient
and low complexity sensing matrices have been proposed
[3]. Solving sparsity promoting problem has been
proved to guarantee accurate recovery [1, 4]:

s. t.

METHODOLOGY

The measurement vector for both key and non-key fames


are obtained using our previously proposed perceptual CS
system [7] as:

(1)

(2)

s. t.

(3)

is the perceptual-based sensing matrix,


where
, is weighting matrix defined in
[7], and is visually weighted transform coefficients.
The key frames are reconstructed by solving the following
weighted problem which focuses the recovery on
visually pronounced coefficients [7] :

where is the optimal recovered sparse representation of the


signal and . Then recovered signal can be obtained
as .

When prior information about signal sparsity structure is


available, weighted proves to improve the recovery
performance of CS [5] by focusing the recovery on the
pronounced coefficients. Moreover, focusing not only the
recovery but also the measurements on these pronounced
coefficients, the recovered signal can be more accurate [6].
Our previously proposed recent work [7] demonstrates the

where
| | and is small tolerance error.
Then, non-key frames predictions (a.k.a side information

110

.indd

110

2015/03/11

11:01:04

III.

Average PSNR rate per frame (dB)

40
35
30
25

Standard L1-min
intra-PercCS
Resid-PercDCVS

20

40
50
60
70
Average measurement rate per frame (%)

40
35
30
25

(a)

(4)

Standard L1-min
intra-PercDCVS
Resid-PercDCVS

20
15
30

80

PSNR curve of foremanCIF sequence

45

40
50
60
70
Average measurement rate per frame (%)

80

(b)

Figure 1: PSNR for (a) News and (b) Foreman sequences

Average recovery time per frame (s)

45

15
30

where is the optimal reconstructed residual signal


transform. The optimal residual reconstructed frame can
be obtained as . Then the reconstructed non-key
frame can be obtained as:

50

50

(5)

Reconstruction time of NewsCIF sequence


Stabdard L1-min
intra-PercCS
Resid-PercDCVS

1.8
1.6

Average recovery time per frame (s)



s. t.

PSNR curve of NewsCIF sequence

55

Average PSNR rate per frame (dB)

(SI)) are to be generated form simple motion compensation


interpolation of previously reconstructed key frames. Let
is SI generated for non-key frame. We propose to acquire
similar measurement vector as non-key frame from its SI
. The residual frame
frame as
is to be reconstructed from the residual measurement vector
using perceptual weighted recovery as in (3)
replacing full signals with residual signal as below:

1.4
1.2
1
0.8
30

EXPERIMENTAL SETUP AND RESULTS

40
50
60
70
Average measurement rate per frame (%)

80

Standard L1-min
intra-PercCS
Resid-PercDCVS

1.6
1.4
1.2
1
0.8
30

(a)

In our simulation, for accurate SI we choose GOP size =


2. The measurement rate for key frames is selected to
be higher than that of non-key frames . For example,
for the average ., the rates for key and non-key
frames are selected as = 0.7 and ., where

. We exploit 2D-DCT transform as sparsity basis

and scrambled block Hadamard ensemble (SBHE) as the


sensing matrix [10]. Our proposed system is applied on
different publically available video sequences [11] with
different types of motions (100 frame each, CIF resolution
288x352, Y component). The video sequences considered
here are Foreman as high motion video, and News as
low motion video. The proposed system is evaluated on HP
Labtop core i5, 4GB RAM, and the software utilized is
Matlab2013a. System evaluation is demonstrated using
PSNR and reconstruction time.

Reconstruction time of foremanCIF sequence

1.8

40
50
60
70
Average measurement rate per frame (%)

80

(b)

Figure 2: Reconstruction time for (a) News and (b) Foreman sequences

ACKNOWLEDGMENT
This work has been supported by the Egyptian Mission of
Higher Education (MoHE) and Egypt-Japan University of
Science and Technology (E-JUST).
REFERENCES
[1]

D. L. Donoho, Compressed sensing, IEEE Trans. Inf. theory, vol.


52, no. 4, pp. 12891306, Apr. 2006.
[2] S. Qaisar, R. M. Bilal, W. Iqbal, M. Naureen, and S. Lee,
Compressive sensing: from theory to applications , a survey,
Commun. networks, vol. 15, no. 5, pp. 114, 2013.
[3] T. T. Do, L. Gan, N. H. Nguyen, and T. D. Tran, Fast and efficient
compressive sensing using structurally random matrices, IEEE
Trans. signal Process., vol. 60, no. 1, pp. 139154, 2012.
[4] E. Candes, J. Romberg, and T. Tao, Stable signal recovery from
incomplete and inaccurate measurements, Commun. pure Appl.
Math., vol. 59, no. 8, pp. 12071223, 2006.
[5] M. P. Friedlander, H. Mansour, R. Saab, and O. Yilmaz, Recovering
Compressively Sampled Signals Using Partial Support Information,
IEEE Trans. Inf. Theory, vol. 58, no. 2, pp. 11221134, 2011.
[6] H. Mansour and O. Yilmaz, Adaptive compressed sensing for video
acquisition, in IEEE international conference on acoustics, speech
and signal processing (ICASSP), 2012, pp. 3465 3468.
[7] S. A. Elsayed and M. M. Elsabrouty, Perceptually weighted
compressed sensing for video acquisition, in 5th International
Conference on Pervasive and Embedded Computing and
Communication Systems (PECCS), 2015, p. In press.
[8] S. Mun and J. E. Fowler, Residual reconstruction for block-based
compressed sensing of video, in Data Compression Conference
(DCC), 2011, no. March, pp. 183192.
[9] A. Wang, M. Zhao, S. Deng, and X. Zhang, An efficient residualbased distributed compressive video sensing, Comput. Inf. Syst., vol.
16, pp. 57325737, 2011.
[10] L. Gan, T. T. Do, and T. D. Tran, Fast Compressive Imaging Using
Scrambled Bock Haramard Ensemble, Proc. EUSIPCO, 2008.
[11] Arizona State University, YUV video sequences. [Online].
Available: http://trace.eas.asu.edu/yuv/. [Accessed: 26-Nov-2014].

Fig. 1 shows PSNR curves with different average


measurement rates for different systems. It can be seen that,
for both video sequences, applying the previously proposed
intra-perceptual weighting strategy gives remarkable
improvement over the standard CS system. Moreover,
exploiting inter-frame correlation by utilizing residual-based
recovery gives an additional remarkable improvement over
intra-perceptual based system. Since the accuracy of SI is
higher for low motion videos than for high motion videos,
the improvement achieved by inter-system is larger in case
of low motion videos as can be seen in Fig. 1(a) for News
sequence. In addition to its efficient quality, our proposed
inter-frame perceptual compressed video sensing system
gives lower reconstruction time than our previous intraperceptual based system. Fig. 2 shows the reconstruction
time of different systems. It can be seen that, the standard CS
system consumes the least reconstruction time among others,
and intra-perceptual system consumes the most
reconstruction time. However, the proposed inter-perceptual
based system achieves comparable reconstruction time with
standard CS system.

111

.indd

111

2015/03/11

11:01:05

Reducing SAO Encoding Complexity by Eliminating


Infrequent Types
Sayed Elgendy, Ahmed Shalaby, Mohammed S. Sayed

ECE Department, Egypt-Japan University of Science and Technology (E-JUST), Alexandria, Egypt
{elsayed.elgendy, ahmed.shalaby, mohammed.sayed}@ejust.edu.eg
Abstract HEVC has adopted Sample Adaptive Offset (SAO) as
a new in-loop filter block. SAO can significantly improve coding
efficiency, however, it requires intensive operations to get best
SAO parameters for each CTB. Real time and low power video
encoders still requires more efficient SAO encoding algorithm. In
this paper, statistics of SAO modes are explored, also analysis of
frequently used types is carried out. Moreover, the effect of
eliminating those rarely used modes on video quality is studied in
terms of PSNR. Based on the modes statistical analysis, we
propose an algorithm that reduces the SAO encoding time by
40.6% with only 0.05% YUV PSNR reduction on average.

Fig. 1. 4 SAO EO classes and 32 BO bands


TABLE I.

Keywords High efficiency video coding (HEVC), H.265, in


loop filter, sample adaptive offset (SAO), SAO encoder.

I. INTRODUCTION
The rapidly increased demand for high resolution videos
have pushed to develop effective compression techniques.
High Efficiency Video Coding (HEVC) was jointly developed
by MPEG and VCEG as a video compression standard, which
aims to reduce the bit rate by 50% in comparison with
H.264/MPEG-4 AVC assuming the same quality [1]. HEVC
adopted the in-loop filter in its main profile to reduce artifacts
generated by quantization and block-based processes such as
blocking artifacts, color biases and ringing artifacts. It is used
on both sides; the encoder and the decoder. The in-loop filter
mainly consists of three main blocks, Deblocking Filter
(DBF), Sample Adaptive Offset (SAO) and Adaptive Loop
Filter (ALF). SAO filter aims to improve visual quality by
preventing ringing artifacts near object edges. It reduces
samples mean distortion by adaptively adding an offset value
to each sample [2]. Despite the SAO encoding time is lower
than other coding modules, real time videos encoding still
requires efficient algorithms for SAO encoding.

Category
1
2

Offset Sign
Positive
Positive

Negative

4
0

Negative
Non

EO CATEGORIES

Selection logic
C < 2 neighbors
C < 1 neighbor and
C = the other neighbor
C > 1 neighbor and
C = the other neighbor
C > 2 neighbors
C > 1 neighbor and
C < the other neighbor

Condition
C < A && C < B
(C < A && C=B) ||
(C =A && C < B)
(C >A && C=B) ||
(C =A && C > B)
C > A && C > B
None of the above

In BO, sample classification depends only on the current pixel


level. Pixel levels range can be mutually exclusive categorized
into 32 bands as shown in Fig. 1 (b). Only offsets for four
consecutive bands and the starting band position are signaled
to the decoder. Moreover, HEVC provide two merge mode
UP/LEFT modes in which the current Code Tree Block (CTB)
inherits SAO parameter from UP or LEFT CTB.
The encoding procedure works for each CTB to estimate
the best SAO type and its related offsets. This procedure is
accomplished in three phases: statistics collection, parameter
determination and sample reconstruction. In statistics
collection phase, the encoder collects CTB-based statistics for
48 classifications divided into 16 EO categories and 32 BO
bands based on: 1) pixel wise sum of differences between the
output of the Deblocking filter and the original sample, 2)
number of samples which belong to each classification. In
parameter determination phase, the encoder obtains offset,
distortion and cost for each one of the 48 classifications.
Optimum offset is chosen based on an iterative procedure to
optimize the rate distortion. The optimum SAO type and
offsets are selected based on modes comparing of all 48
classifications, merge left, and merge up, and OFF. Then the
mode with the minimum cost is selected for current CTB. In
reconstruction phase, the final output samples are produced
by applying the selected SAO type and offsets.

II. SAO OVERVIEW AND RELATED WORK


In HEVC, SAO is applied after DBF and has two modes of
operations, Edge Offset (EO) mode and Band Offset (BO)
mode. In EO, sample classification depends on the directional
constellation of the samples. Additionally, sample
categorization depends on the value of the current sample with
respect to its neighboring pixel values. Sample patterns can be
classified into four classes, EO 0, EO 90, EO 135 and EO 45
as shown in Fig. 1 (a). Each class can be mutually exclusive
categorized into five categories based on the value of the
current sample with respect to its neighboring pixel level as
shown in Table I C refer to the current sample and A, B
refer to the neighboring samples [2].

In [3], exploration of various SAO encoding policies and


some techniques for complexity reduction was carried out. In
[4] another technique is used to reduce rate distortion cost

112

.indd

112

2015/03/11

11:01:06

TABLE II.

calculation for all classes by exploiting intra prediction mode


information.
III. SAO STATISTICS AND CANDIDATES REDUCTION
The statistics collection and parameter determination phases
consume most of processing time where they dominate around
90% of SAO encoding time [3]. The processing time of these
two phases mainly depends on the total number of candidate
SAO types. In this work, we aim to study SAO types in depth
and provide statistics of frequently used types for different
video classes. Moreover, study the effect of eliminating these
rarely used candidates on video quality in terms of PSNR and
measure computational complexity reduction in terms of SAO
encoding time.

V. CONCLUSION AND FUTURE WORK


Exploration of SAO types has been conducted for various
classes of test sequences with different configurations. The
experimental results show that there are SAO candidates,
which consume so much time in statistics collection and
parameter determination phases and rarely used in the
reconstruction phase. Elimination of these candidates can
reduce computational complexity with a negligible reduction
in PSNR. In future work, our aim is to implement an adaptive
SAO algorithm that eliminates infrequently used SAO
candidate types.
ACKNOWLEDGEMENT
We would like to thank Egypt-Japan University of Science
and Technology (E-JUST) for the continuous support and the
Egyptian Ministry of Higher Education for funding this work.

Encoding configuration
HM 16.2 main configuration
All intra (AI),
Random access (RA),
Low delay with P picture (LP)
Low delay with B picture (LB)
6 sequences of 6 classes
Windows 7
Intel Xeon X5690 at 3.46 GHz CPU and 96 GB
RAM.

Anchor
Prediction structure
(GOP structure)
Target sequences
Operating system
Machine
specification

IV. EXPERIMENTAL RESULTS AND DISCUSSION


In our experiment, we used the HEVC reference software
test model version 16.2 (HM16.2) [5], with common test
conditions and test sequences [6]. Experiments are carried out
with the specifications shown in Table II for Random Access
(RA) configuration as a case study. Fig. 2 shows a pie chart
that demonstrates the ratio of total EO to BO SAO types of
sample output of test sequence PeopleOnStreet with
resolution 2560x1600. It can be seen that 77% of total SAO
types is concentrated in EO types. A detailed histogram
analysis of this sequence is shown in Fig. 4. In addition, we
measure the ratio for other test sequences listed in [5]. Fig. 3
shows that the average ratio for EO for all classes is existed
around 72% of the total SAO types.
Based on statistics exploration, we suggest reducing the
complexity and saving time by elimination of infrequent SAO
types like BO. Hence, another experiment is carried out to
show the effect of BO elimination. Table III shows the total
SAO absolute and relative execution time for class A videos
with RA configuration and average PSNR for each Y, U and
V sequence component and the total YUV average PSNR. The
results show that the total execution time is reduced by 40.6%
for an only YUV PSNR reduction of 0.05% on average.
Hence, we expect that computational complexity will be
reduced as well with trivial effect on video quality.

EXPERIMENTAL CONDITIONS

Fig. 2. Class A (RA)

Fig. 3. All Classes

Fig. 4. SAO types histogram


TABLE III.

SAO
Execution
Time
Y_PSNR
U_PSNR
V_PSNR
YUV_PSNR

EXECUTION TIME AND PSNR

SAO ON

SAO BO
OFF

Reduction

Relative
percent

230

136.7

93.3

40.6%

34.2011
41.3052
41.9512
35.4638

34.1833
41.2858
41.9161
35.4463

- 0.0178
- 0.0194
- 0.0351
- 0.0175

- .05%
- .05%
- .08%
- .05%

REFERENCES
[1]
[2]
[3]
[4]

[5]
[6]

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, Overview of the


high efficiency video coding (HEVC) standard, IEEE Trans. Circuits
Syst. Video Technol., vol. 22, no. 12, pp. 1649-1668, December 2012.
Chih-Ming Fu, Elena Alshina, Alexander Alshin, Sample adaptive
offset in the HEVC Standard, IEEE Trans. Circuits Syst. Video
Technol., vol. 22, no. 12, Dec. 2012.
Choi, Yongseok, and Jaehwan Joo. "Exploration of Practical HEVC/H.
265 Sample Adaptive Offset Encoding Policies." Signal Processing
Letters, IEEE 22.4 (2015): 465-468.APA
Jaehwan Joo, Yongseok Choi, and Kyohyuk Lee, Fast Sample
Adaptive Offset Encoding Algorithm for HEVC based on Intra
Prediction Mode IEEE Third International Conference on Consumer
Electronics - Berlin (ICCE-Berlin) , 2013.
https://hevc.hhi.fraunhofer.de/trac/hevc/browser/tags/HM-16.2
F. Bossen, Common HM test conditions and software reference
configurations, Document of Joint Collaborative Team on Video
Coding JCTVC-L1100, Jan. 2013.

113

.indd

113

2015/03/11

11:01:06

The Performance of One-bit Perceptual Compressed


Sensing for Audio Signal Compression
Hossam M. Kasem and Maha Elsabrouty

Osamu Muta and Hiroshi Furukawa

School of Electrical and Communications Engineering


Egypt-Japan University of Science and Technology.
New Borg El-Arab, Alexandria,
Egypt.
Email: hossam.kasem@ejust.edu.eg

Center for Japan-Egypt Cooperation in Science and Technology


Kyushu University
744 Motooka, Nishi-ku, Fukuoka-shi
Fukuoka-ken, Japan
Email: muta@ait.kyushu-u.ac.jp

AbstractCompressed Sensing (CS) has many applications in


the field of multimedia processing for reducing the number of
the measurements required to acquire signals that are sparse or
compressible sparse in some basis. CS shows good perceptual
quality of the restored signal even when the signal is not
completely sparse and even also at high compression ratio. In this
paper, we propose to use 1-bit CS to represent the measurement
by only 1-bit (i.e. the sign of each measurement) for audio
compression using a perceptual model. Using 1-bit CS reduces
the transmission overhead due to the huge reduction in the
transmitted bits. This leads to the reduction in required data
rate, i.e., the number of transmitted bits per second and saving
the channel capacity, with a reasonable performance. Simulations
results show reasonable performance for 1-bit compressed sensing
compared to classical CS.

I.

Fig. 1.

The block diagram for the proposed system.

The rest of the paper is organized as follows. Section


2 presents the quantized perceptual system model for audio
compression along with the quantization procedures used. The
simulation results and discussion are presented in Section 3.
Finally, the conclusion of this paper is presented in section 4.

I NTRODUCTION

Sparsity in audio signal can be even more pronounced


when considering the perceptually relevant frequencies. The
human auditory system masking phenomena can render some
frequency components inaudible and thus irrelevant to the
compression. This fact increases the sparsity of the signal.
This sparsity can be manipulated through applying perceptual
masking on the audio signal. In [1], different perceptual
masking setups are proposed to take the perceptual properties
of the audio signal into account. From [1], we can conclude
that using the perceptual masking at the encoder side will
improve the quality of the recovered audio signal and save
the time required to recover the signal at the decoder side.

II.

P ERCEPTUAL S YSTEM M ODEL .

In [1], we aimed to enhance the signal sparsity using the


psychoacoustic model. The main motivation is to consider the
audio samples that are important to the human ear. Towards
this end, several setups are proposed in [1]. In the first setup,
we apply the auditory masking to form a new sparsifying
matrix at the encoder side. This can be translated into creating
new sparsifying bases that favor frequency components that are
more influential to the human hearing experience. Using these
sparsifying bases result in an improvement in the perceptual
quality of the recovered signal.

Practical implementation of digital communication and signal processing system require representing the signal in finite
precision form. In fact, studying the quantization effects and
field point representation is one of the key bridges that link the
theoretical signal processing algorithms to its implementation
platforms. In the particular extreme case of 1-bit quantized CS,
only the sign of the measurements are kept. More specifically,
in analog-to-digital (A/D) conversion the acquisition of 1-bit
measurements of analog signal only requires a comparator
with reference voltage zero, which can be implemented using
inexpensive and fast hardware that is robust to amplification
of the signal and other errors.

A. Perceptual Weighting at the Encoder Side.


The quantized perceptual system model at the transmitter
side is shown in Fig.1. The audio signal is processed as
frames, each of a duaration 30 ms. The frames are first passed
through a psychoacoustic model represented in a weighting
matrix form, namely H. The chosen model is the one used in
the ISO/IEC 11171-3, i.e. MPEG-1 layer 1 standard. Similar
perceptual processing setup is used in [1]. The data is then processed through the sensing matrix. The new part in this work
is passing the measurements resulting from the perceptual CS
setup through a quantizer before applying it to the front end of
the encoder. The quantized CS signal yCS can be represented
as:

The objective of this paper is to clarify the effect of onebit CS in audio signal compression for digital transmission
systems and evaluate the achievable performance in terms of
Mean Opinion Score (MOS)

yCS = Q(Hf ) + 1 = Q(Sf ) + 1

(1)

114

.indd

114

2015/03/11

11:01:07

n
o
e
.
.

e
e
s
,
g
g
e
e
l

r
s
d
g
n
r
k
S
f
d

The resulting samples are divided into frames and each frames
contain 1321 samples and the duration of each frame is 30 ms.
One faithful measure of the perceptual quality of audio signal
is the Perceptual Evaluation of Speech Quality (PESQ). The
output of this test called Mean Opinion Score (MOS). Signal
with higher MOS, means that less distorted and more similar to
the original signal. The simulations are running on Panasonic
workstation with an Intel processor core i7 at 2.9.

where Q(.) is the quantizer, H is the perceptual weighting


matrix, S = H is new perceptual sparsifying matrix,
is N N matrix whose columns are orthonormal basis
functions, and f is sparse coefficients vector and 1 follows
the distributions N (0, 12 ).
At the decoder side, four different reconstruction methods
are used. First, stander l1 recover algorithm can be used. This
method can be described by (2).

f = argmin f 1
(2)
s.t yCS = Q(Hf ) + 1

TABLE I.
Recovery
Algorithms

The second method is original IRl1 with steps similar


to the algorithm in [2]. Third, IRl1 with constant perceptual
weighting matrix. The steps of this method is similar to IRl1
[2], but the only difference in step 2; the optimization problem
changes to Eq.(3) :


(3)
fl = argmin Wl f 1


1


s.t
H (yCS Q(Hf )) 

BachHymn
Piano
Folk Music
Bach Partita

Using CVX Tool


l1 Stand.
No Perc
1.2
1.6
1.59
1.7
0.9
1
0.94
1.1

IRl1
No Perc
1.1
1.21
0.73
0.77
0.92
0.14
0.33
0.62

l1 Stand.
With Perc
2.06
3.39
1.6
2.75
1.21
2.32
1.65
2.84

IRl1
With Perc
1.35
1,25
0.78
0.88
0.93
0.3
0.7
0.84

FPC [2]
IRl1
With Perc and
Const. H
1.38
2.36
0.84
2.2
0.97
1.52
1.41
2.36

No
Perc
1.45
1.24
1.07
1.11
0.56
0.8
0.84
0.9

With
Perc
1.91
2.31
1.58
1.6
0.8
0.92
1.41
1.74

1-Bit
FPC [4]
No
With
Perc
Perc
0.14

1.64

1.23

1.85

0.01

0.59

0.8

1.14

Table I shows different Mean Opinion Scores (MOS)


values for our proposed system setup at Compression Ratio
CR = M
N = 60% ( means taking only 40% samples from
the original signal). Table I Confirms that using perceptual
masking at encoder side will improve the quality of the recovered signal than non perceptual signal, this clear from MOS
values of the perceptual audio signal which are higher than non
perceptual audio signal by (1-2) degrees. Also, the results in
Table I confirm that using l1 Standard recovery algorithm with
perceptual audio signal produces better performance than other
recovery algorithm. The results in Table I show reasonable
MOS degrees for perceptual 1-bit CS compared to number of
bits used to represent the measurements. Classical CS uses
2 and 4 bits to represent the measurements which are twice
and quadruple number of bits used in 1-bit CS, nevertheless
the difference in MOS between classical CS and 1-bit CS
around 0.5 degree in 2-bits quantization and 1-1.5 degree in
4-bit Case. From these results, 1-bit CS produces a reasonable
performance regarding to the number of bits used to represent
the measurements.

where = Hf 2 with 0 1 . W is the weighting


matrix and it is equal to the inverse of the absolute amplitude
of the previous recovered vector. The constant H can be
chosen as the inverse of the absolute threshold of hearing.
The fourth method used is Fixed Point Continuation (FPC) as
the algorithm in [3].
l

B. Perceptual 1-bit compressed sensing Model.


The Perceptual 1-bit CS Model is similar to the above
in Fig.1. The main difference in this system setup is the
quantization process. The quantization process in perceptual
1-bit CS is done by using single compartor which compares
the measurments values with reference voltage (equal to 0) to
extract the sign information as follow
y1bit = sgn(Af ),
(4)
where the function sgn(.) denotes the sign of the variable,
element-wise, and zero values are assigned to be +1. At the
decoder side, due to fact that the received measurements do
not contain any information about the amplitude of the signal.
It should be noted that the reconstruction algorithms (i.e. l1
standard, IRl1 and IRl1 with constant H) can not be used as
a recovery algorithms because all of these algorithms need
information about the amplitude of the signal. Also, note
that the FPC algorithm can not be used because it is also
need information about the amplitude of the signal. Modified
FPC algorithm, is explained in more details in [4], is used
to recover the signal from perceptual 1-bit measurements.
Modified FPC algorithm differs form algorithm 1. First, using
one side penalty instead of using constraint on the amplitude of
the signal. Second, the re-normalization step at each iteration.
III.

2-Bits
4-Bits
2-Bits
4-Bits
2-Bits
4-Bits
2-Bits
4-Bits

MOS VALUES OF DIFFERENT RECONSTRUCTION


ALGORITHMS .

IV.

C ONCLUSION

This paper have presented system setup for perceptual


audio compression using perceptual 1-bit CS. In this paper
we compared the performance of 1-bit CS with classical CS
which uses 2 and 4 bits to quantize the measurements. The
simulation results assert that it is possible to represent and
recover the original signal from quantized CS. Experiments
have confirmed that the 1-bit CS produces reasonable MOS
degrees compared to the number of bits used to represent the
measurements.
R EFERENCES
[1]

SIMULATION AND DISCUSSION


[2]

In this section, we perform numerical simulation to study


the effect of using 1-bit CS on the quality of the recovered
audio signal versus using multi-bits (i.e., 2 and 4 bits) to
quantize the measurements. Our proposed perceptual system
is tested using test audio signals with 10 second music piece,
sampled at rate 44100 samples/s. We tested our model on
four different audio signals. The signals are from the EBU
SQAM discs commonly used for assessment of audio coder.

[3]
[4]

H.M. Kasem and M. Elsabrouty,Perceptual compressed sensing and


perceptual sparse fast fourier transform for audio signal compression,,
in; fifteenth International Workshop on Signal Processing Advances in
Wireless Communications (SPAWC),2014, pp.444-448.
M.G. Hristensen and B.L. Sturm, A perceptually reweighted mixednorm method for sparse approximation of audio signals,Forty Fifth
Asilomar Conf. on Signals, Systems and Computers (ASILOMAR), 2011,
pp.575-579.
E.T. Hale, W. Yin and Y. Zhang, A fixed-point continuation method
for L1-regularized minimization with applications to compressed sensing,Rice University, CAAM Tech. Report TR07-07, 2007.
P.T. Boufounos, R.G. Baraniuk, 1-Bit compressive sensing, 42nd Annu.
Conf. (CISS), 2008, vol.16, no.21, 16-21.

115

.indd

115

2015/03/11

11:01:08

Optimized Quantization and Scaling of Layered


LDPC Scaled Min-sum Decoder
Ahmed A. Emran , Maha Elsabrouty , Osamu Muta , Hiroshi Furukawa

Egypt-Japan

University of Science and Technology (E-JUST), New Borg Al-Arab, Alexandria, Egypt
Email: ahmed.emran, maha.elsabrouty @ejust.edu.eg
Center for Japan-Egypt Cooperation in Science and Technology, Kyushu University, Fukuoka-shi, Fukuoka, Japan
Email: muta@ait.kyushu-u.ac.jp
Graduate School of Information Science and Electrical Engineering, Kyushu University
Email: furuhiro@ait.kyushu-u.ac.jp
AbstractDeveloping a fast converging LDPC code is desirable in many state of the art systems that require high throughput
and low error rate. In addition, quantization of LDPC codes is
one of the important aspects in practical LDPC implementation.
In this paper, we combine both targets and we propose a highly
efficient scaled min-sum layered implementation. The work in
the paper jointly optimizes the scaling factor of the LDPC along
with the quantization step to provide optimized performance. The
simulation results also show the performance improvement of
using optimal scaling with floating point (without quantization).
In addition, the over all performance enhancement of using
both optimal scaling and quantization parameters compared with
previous literature results.

I.

I NTRODUCTION

LDPC codes have seen increased incorporation in state of


the art systems. the second generation Digital Video Broadcasting (DVB-S2) [1] has adopted LDPC codes as the inner
coding scheme. It is widely recognized that LPDC codes
have improved performance [2]. Recently, layered decoding
of LDPC codes [3] has been utilized to reduce the encoding
time and complexity of the legacy flooding decoding.
Quantization is a major step in practical LDPC decoders
implementation and can have a severe effect on the LDPC
decoders performance [4][6]. In [4], Non-uniform quantization is proposed to lower the error floor of the LDPC
decoder. However, this comes at the expense of complexity.
To avoid this complexity, we adopt uniform quantization in
this paper and we target lower error floor through the use
of Discrete Density Evolution (DDE) optimization. In [5],
DDE is used with the quantization of product-sum belief
propagation algorithm. Authors of In [6] used the distribution
of the input Log-Likelihood Ratio (LLR) to design the uniform
quantization step of scaled min-sum LDPC decoder. However,
most quantization design techniques are only interested in
BPSK, although the distribution of QAM de-mapper output
is not the same as BPSK de-modulator output. To deal with
this problem, optimization of the quantization step for each
used constellation size is necessary. In [7], we proposed a fast
converging algorithm named, Generalized Simplified Variable
Scaled (GSVS) min-sum decoding.
The original contribution in this paper is developing a
GSVS-min-sum LDPC decoding for layered implementation
with finite precision. In addition, we propose using the DDE to
jointly optimize the quantization step of uniform quantizer q ,

initial scaling factor 0 and the scaling factor updating step S.


Finally, we propose to use the optimal quantization and scaling
parameters of each constellation size and code rate.
The rest of the paper is organized as follows: Section II
presents the necessary background. Section III presents the
proposed optimization strategy. The simulation environment
and results are displayed and discussed in Section IV. Finally,
the paper is concluded in section V.
II.

BACKGROUND ON LDPC DECODING AND


QUANTIZATION

An (n, k) LDPC code is a binary code characterized by a


sparse parity check matrix H Fmn
where m = n k. It
2
can be represented by a Tanner graph, which contains variable
nodes with indexes j {1, , n} and check nodes with indexes i {1, , m} . LDPC codes are efficiently decoded by
scaled min-sum iterative decoding. In each iteration, messages
are passed between variable nodes and check nodes. Messages
passed from check nodes are scaled by correcting factor 1,
is constant for all iterations in traditional scaled min-sum
decoding. In [7], authors proposed to change as the iterations
progress using a simple updating algorithm. For an iteration
itti the corresponding is named itti and is calculated by
(1):
itti = 1 (1 0 ) 2(itti /S1)

(1)

where itti takes values {1, 2, 3, }, 0 is the initial scaling


factor, S is the updating step which controls the increasing
rate of itti , and itti /S is the first integer greater than or
equal to itti /S.
Representing decoding messages in finite precision is
essential for practical implementation. To avoid the nonuniform quantization complexity, we adopt in this paper uniform quantization. For specific number of bits per message,
quantization step design can severely degrade the LDPC
decoding performance; large quantization step increases the
error probability especially in the waterfall region. On the
other hand, small quantization step decreases the representation
region, and increases error floor [4]. This trade-off, between
waterfall performance and error floor, increases the importance
of uniform quantization step design to obtain the best waterfall
performance with the lowest error floor.

116

.indd

116

2015/03/11

11:01:09

III.

P ROPOSED SCALING AND QUANTIZATION STEP

10

In this paper, we extend the usage of GSVS-min-sum


LDPC decoding to layered implementation with finite precision, in addition to, jointly optimize GSVS-min-sum parameters (0 , S) with the uniform quantization step q . Consequentially for each constellation size, an optimal quantization step
is used for the LDPC decoder input (channel LLR Uch ) to
achieve the maximum decoding performance.
We propose to use DDE with Nelder-Mead optimization
method to jointly optimize the quantization step, initial scaling
factor of GSVS-min-sum and the updating step of GSVS-minsum (q , 0 , S). The used DDE (after our modifications to be
valid for any constellation size and for layered implementation)
is used to evaluate the performance of a set of parameters, by
calculating (Eb /N0 )min which is the minimum Eb /N0 can
achieve a pre-specified BER using this set of parameters.
Nelder-Mead optimization method is used to solve our 3dimension optimization problem of finding the set of parameters (q , 0 , S)opt. that minimizes (Eb /N0 )min (the same as
what happen with GSVS-min-sum parameters optimization in
[7]), so we get the optimal set of (q , 0 , S) for any LDPC
code with any used constellation size. Jointly optimization of
these three variables is used because it achieves better results
than optimizing them separately.

10

IV.

WER (in solid line) and BER (in dashed line)

(b) 256-QAM

10

-1

-1

10

-2

10

-2

10

-3

10

-3

10

-4

10

-4

10
-5

10

-5

10

-6

10

-6

10

-7

10

of [6] and =0.75

-8

10

-9

10

-7

10

Our and GSVS


Floating point and =0.75

-8

10

Floating point and GSVS


-10

10

1.6

-9

1.7

1.8

1.9

Eb / N0 in dB

2.1

10

11

11.1

11.2

11.3

11.4

Eb / N0 in dB

11.5

11.6

Fig. 1. BER and WER of long code with rates 2/3 modulated by (a) BPSK
and (b) 256QAM.

The gap between our results floating point performance is about 0.1 dB for BPSK and about 0.2 dB
for 256-QAM.
V.

S IMULATION ENVIRONMENT AND RESULTS

CONCLUSION

The work in this paper targeted the dual objective of


developing a fast converging LDPC decoding algorithm and
designing an optimized quantization scheme to reduce the
loss between floating point implementation and its practical
fixed point version. Simulation results show the superior
performance of our optimized parameters specially with large
constellation size, also show the need of using different quantization step and scaling factor parameters for each constellation
size and code rate.

To verify the performance of our proposed optimization


technique, we simulate the layer decoding performance of
DVB-S2 LDPC long code (n = 64800) with rates 2/3, using
the maximum parallelism size P = 360, and 5 bits to represent
the belief propagation messages, with at most 25 iteration. To
illustrate the advantage of our technique, we simulate both
BPSK and 256-QAM. In Fig. 1:

(a) BPSK

DESIGN

Dashed lines represent the BER and the solid lines


represent the WER.
The lines with marks (square or diamond) represent
the performance of our GSVS min-sum decoding
algorithm with different quantization schemes. On the
other hand, lines without marks show the performance
of scaled min-sum decoding algorithm with constant
scaling factor for all iterations itti = 0.75 (used as
the scaling factor in most literatures).
Floating point simulation results are shown to illustrate the scaling factor effect (without quantization).
For both BPSK and 256-QAM, it is clear that our
GSVS min-sum (represented with diamond marked
blue line) has better BER and WER performance
than constant scaling factor (blue unmarked line). The
reason of this advantage is that itti = 0.75 is far from
the optimal scaling factor for each iteration.
The overall advantage of the proposed jointly scaling and quantization optimization is shown between
our scaling and quantization parameters result (black
square marked line) and scaling and quantization
parameters of [6] (red unmarked line). For both BPSK
and 256-QAM, the proposed quantization design with
the proposed layered GSVS-min-sum has superior performance than literature results of fixed point results.

R EFERENCES
[1]

[2]
[3]
[4]
[5]
[6]
[7]

E. ETSI, 302 307: Digital Video Broadcasting (DVB), Second generation framing structure, channel coding and modulation systems for
Broadcasting, Interactive Services, News Gathering and other broadband
satellite applications, vol. 1, p. 2, 2006.
D. J. MacKay, Good error-correcting codes based on very sparse
matrices, Information Theory, IEEE Transactions on, vol. 45, no. 2,
pp. 399431, 1999.
M. M. Mansour and N. R. Shanbhag, High-throughput LDPC decoders,
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,
vol. 11, no. 6, pp. 976996, 2003.
X. Zhang and P. H. Siegel, Quantized min-sum decoders with low error
floor for LDPC codes, in Information Theory Proceedings (ISIT), 2012
IEEE International Symposium on. IEEE, 2012, pp. 28712875.
R. Zarubica, R. Hinton, S. G. Wilson, and E. K. Hall, Efficient
quantization schemes for LDPC decoders, in Military Communications
Conference, 2008. MILCOM 2008. IEEE. IEEE, 2008, pp. 15.
C. Marchand, L. Conde-Canencia, and E. Boutillon, Architecture and
finite precision optimization for layered LDPC decoders, Journal of
Signal Processing Systems, vol. 65, no. 2, pp. 185197, 2011.
A. A. Emran and M. Elsabrouty, Generalized simplified variablescaled min sum LDPC decoder for irregular LDPC codes, in Personal
Indoor and Mobile Radio Communications (PIMRC), 2014 IEEE 25rd
International Symposium on. IEEE, 2014, pp. 892896.

117

.indd

117

2015/03/11

11:01:11

.indd

118

2015/03/11

11:01:11

.indd

119

2015/03/11

11:01:11

Optimal Reference Block Partitions for Multipath


Fading Channels with Doppler Shift

w
e
t

Dongshin Yang, Tatsuro Higuchi and Yutaka Jitsumatsu


Dept. of Informatics, Kyushu University,
Motooka 744, Nishi-ku, Fukuoka, 819-0395, Japan.
Email: {yang, higuchi, jitumatu}@me.inf.kyushu-u.ac.jp

AbstractA timing synchronization method proposed by


Shmidl and Cox uses a reference block consisting of two identical
parts, while another method proposed by Shi and Serpedin uses a
reference block consisting of four parts with the third part being
multiplied by 1. Accuracy of estimated timing osets of the latter
method is higher than the former. In this paper, optimal number
of partitions is investigated for single- and multi-path channels.
We conclude that optimal number of partitions largely depends
on the maximum multi-path delay.

Keywords-OFDMA; Timing synchronization; Multi-path


fading

M
e
5
T
0
i
t
i
i

is equal to the length of a reference block. The discrete-time


received signal of m-th time instance is expressed by
L
T 1
rm = e j2m/N
h sm + wm ,
(1)
=0

where sm is a transmitted signal, is a timing oset between


received signal and reference signal, and wm is a proper
complex Gaussian noise with zero mean and variance 2 . We
assume that the lter coecient h is a complex-valued and
Rayleigh fading channel. Note that , and h are assumed to
be xed for one DFT block.
B. Shi-Serpedin methods

I. Introduction
Time and frequency synchronization is an essential issue for
accurate data communications in OFDMA systems with large
Doppler shift [1]. A transmitted signal in an OFDMA system
consists of a reference block and a data block. Schmidl-Cox
(S&C) method [2] in which a reference block consists of two
identical parts is used as a coarse synchronization method. A
reference block of Shi and Serpedin (S&S) method [3] consists
of four identical parts. A separate paper [4] shows that an
optimal number of partitions for a multi-path fading channel
with a Doppler shift is approximately given by M = N/2 for
60 N 240, where M is the number of partitions and N is the
length of the reference block. In this paper, we investigate that
how many parts should we divide the reference block into for
maximizing the synchronization performance in a multi-path
environment with Doppler shift. Eects of M are investigated
for N = 480, 960.
II. Timing Synchronization
A. Channel model
A multi-path fading channel with a Doppler shift can be modelled as follows: Consider a discrete-time and time-invariant
system with impulse response h ( = 0, 1, . . . LT 1), where
LT denotes the maximum multi-path delay. Let = N fD T s
{0, 1, . . . , N 1} be a normalized frequency oset, where fD is
a Doppler frquency, T s is a sampling interval and N is the size
of Discrete Fourier Transform (DFT) for an OFDMA which
This work is supported by Japan Society for the Promotion of Science (JSPS)
KAKENHI Grant Number 25820162

There is a drawback in S&C method that peak value exhibits


a large plateau that reduces the accuracy of the estimated
delay. On the other hands, Shi-Serpedin used a reference block
composed of four identical parts with the third part being
multiplied by 1 (See Fig. 1). S&S method exhibits smaller
plateau than S&C method. This result shows that synchronization performance of M = 4 is better than that of M = 2. Then,
it is natural to ask: can we get better performance by increasing
the number of repetitive parts more than four?

Fig. 1: Transmitted signals for Shi-Serpedin method


We denote sign pattern by (d0 , d1 , . . . , d M1 ), while the repetitive part B consists of X0 , X1 , . . . , XL1 , where L = N/M. Then,
the transmitted signal is expressed by
sn+iL = di Xn
(2)
for 0 n L 1, 0 i M 1. The timing metric is given
by [5]:
M1

)
=

(
|k ()|,
(3)

F
1

k=1

where is a controlled parameter for estimating and {k } are


auto-correlation functions, dened by
L1
Mk1

=
k ()
di di+k
rn++iL
rn++(i+k)L
,
(4)

i=0

n=0

120

.indd

120

2015/03/11

11:01:12

where rn denotes the complex conjugate of rn . Note that the


eect of the normalized frequency oset is eliminated by
in (3).
taking absolute value of k ()

is
The parameter that attains maximum value of ()
selected as an estimate of , i.e.,

= arg max ().


(5)

TABLE I: Types of the shapes of the variance of estimated


timing errors versus M

III. Simulation Results

but there exists M = M0 such that the variance of takes


almost the same value for M M0 . Since large M requires
long computational time, M = M0 is recommended for the
case (a) and (c). The M or M0 for N = 480 and 960 and
for LT = 1, 2, . . . , 10 are reported in Table II. We summarize
that the recommended M value depends on LT rather than N.
The recommended M values are 1) M = 12 if LT is small
(LT = 1 . . . 3), 2) M = 8 if LT is middle (LT = 4 . . . 7), and 3)
M = 4, 5 or 6 if LT is large (LT = 8 . . . 10).

Lt=1
Lt=2
Lt=3
Lt=4
Lt=5

5.5

V[estimated value of](sec )

5
4.5

2
a
a

3
a
a

4
a
b

5
a
a

6
b
a

7
c
b

8
c
b

9
b
b

10
b
c

TABLE II: Recommended values for M

Lt=6
Lt=7
Lt=8
Lt=9
Lt=10

N
480
960

LT
1
12
12

2
12
12

3
15
12

4
8
12

5
8
8

6
6
8

7
8
8

8
5
4

9
4
6

10
5
5

3.5
3

IV. Conclusion

2.5
2

A reference block was investigated by an optimization of


the performance of a timing synchronization method [5] which
is a generalization of [2] and [3]. A separate paper [4] shows
that an optimal number of partitions for a multi-path fading
channel
with a Doppler environment is approximately given by
M = N/2 for 60 N 240, where N isthe length of the
reference block. On the other hand, M < N/2 for 480 and
960. We conclude that a recommended M value largely depends
on the maximum delay parameter LT and is almost independent
of N = 480 or 960.

1.5
1
0.5
0

10

15

20

25

30
M

35

40

45

50

55

60

(a) The length of the reference block, N=480


7

Lt=1
Lt=2
Lt=3
Lt=4
Lt=5

6.5
6
5.5

V[estimated value of](sec )

1
a
a

480
960

We perform numerical simulations. Fig.2 shows results of


Monte Carlo simulation for the variances of timing estimation
errors1 with 20dB(SNR). For a low SNR case such as 0dB and
5dB, there are the large variance of timing estimation errors.
is independent of , so that we put =
The value of ()
0 in the simulation. These curves versus M are categorized
into three cases: (a) monotone decreasing, (b) rst decreasing,
taking minimum value at M = M , and then increasing, (c) it
is dicult to judge (a) or (b). The result of the categorization
is shown in Table I.

LT

Lt=6
Lt=7
Lt=8
Lt=9
Lt=10

5
4.5
4

References

3.5
3
2.5
2
1.5
1
0.5
0

10

15

20

25

30
M

35

40

45

50

55

60

(b) The length of the reference block, N=960

Fig. 2: The variance of timing estimation error for LT =1 to


10, where N = 480, 960, SNR is 20dB and the delay is = 30.
For the case (b), the variance of is minimized by M = M .
On the other hand, for the case (a), it is minimized by M = N

[1] M. Morelli, I. Scott, C.-C. J. Kuo, and M.-O. Pun, Synchronization Techniques for Orthogonal Frequency Division Multiple Access (OFDMA): A
Tutorial Review, Proc. of the IEEE, Vol.95, pp.1394-1427, 2007.
[2] T. M. Schmidl and D. C. Cox, Robust Frequency and Timing Synchronization for OFDM, IEEE Trans. Commun., Vol.45, No.12, pp.1613-1621,
Dec. 1997.
[3] K. Shi and E. Serpedin, Coarse Frame and Carrier Synchronization of
OFDM Systems: A New Metric and Comparison, IEEE Trans Wireless
Commun., Vol.3, No.4, pp.1271-1284, July 2004.
[4] T. Higuchi and Y. Jitsumatsu, Design Criteria of Preamble Sequence
for Multipath Fading Channels with Doppler Shift, 17th Int. Symp. on
Wireless Personal Multimedia Commun.(WPMC2014), 2014.
[5] M. Ruan, M. C. Mark, and Z. Shi, Training Symbol Based Coarse Timing
Synchronization in OFDM Systems, IEEE Tran. Wireless Commun., Vol.8,
No.5, pp.2558-2569, 2009.

1 Here, the variance is given by 1 T ( ( ))2 , where is a bias of


i
i
i
i=1 i
T
i.e., 1 T (i i ). Compensation of the bias is another issue to be solved.
,
i=1
T

121

.indd

121

2015/03/11

11:01:13

A Hybrid DQPSK-MPPM Technique for High


Sensitivity Optical Communication Systems
Ahmed E. Morra1 and Hossam M. H. Shalaby1
1

Egypt-Japan University of Science and Technology (E-JUST), Alexandria 21934, Egypt


ahmed.morra@ejust.edu.eg, shalaby@ieee.org

AbstractA new class of advanced optical modulation formats,


which has a better performance than traditional ones and
is suitable for high sensitivity transmission, is proposed. This
technique is based on combinations of both MPPM and DQPSK
modulation techniques.

I. I NTRODUCTION

Fig. 1: Block diagram of the hybrid DQPSK-MPPM transmitter.

Fig. 2: An example of the transmitted signal of a hybrid DQPSKMPPM scheme with M = 4 and n = 2.

Our proposed hybrid DQPSK-MPPM transmitter is shown


in Fig. 1. The transmitter sends data symbols within time
frames. Each time frame has a duration T and is composed
of M disjoint slots. Optical pulses (each of pulsewidth =
T /M ) are signaled within n slots of each time frame. A block

II. H YBRID DQPSK-MPPM S YSTEM M ODEL

( )

+ )

One of the most important issues to many optical communications systems is the receiver sensitivity. Indeed, when
increasing the receiver sensitivity, less number of signal photons per bit can be transmitted to achieve a given bit-error
rate (BER) [1]. One of the preeminent modulation schemes for
increasing the receiver sensitivities in optical communications
systems is direct-detection differential quadrature phase shift
keying (DD-DQPSK) [2]. DQPSK is one of the most popular
receivers for multilevel phase-modulated optical communications systems and is more bandwidth efficient than differential
binary phase shift keying (DBPSK) but with the price of
increased complexity. DD-DQPSK can be demodulated using
an optical delay demodulator so that it avoids the need of an
optical local oscillator [2]. Of course using DD-DQPSK significantly simplifies the receiver implementation. In this paper
we propose a hybrid differential quadrature phase shift keyingmultipulse pulse-position modulation (DQPSK-MPPM) technique assuming optical amplifier-noise limited systems in an
attempt to increase further the receiver sensitivity of optical
communications systems. The key idea here is to use DQPSK
on top of an energy efficient modulation scheme, e.g., MPPM,
in order to gain the advantages of both schemes. It turned out
that the proposed system would enhance the performance of
traditional DBPSK, DQPSK, and MPPM techniques.

( )

+ )

{ }

Fig. 3: Receiver of the hybrid DQPSK-MPPM technique adopting


DQPSK optical delay detection.


 
+ 2n bits are transmitted each time frame as
of log2 M
n

 
follows. The first log2 M
bits are encoded using MPPM
n
scheme. These bits would identify the positions of the n pulses
within the frame. Each MPPM optical pulse is then DQPSK
modulated using an additional two bits. That is, compared
with traditional DQPSK, instead of transmitting a consecutive
stream of DQPSK pulses (each with a relatively low power),
we transmit less number of high-power DQPSK pulses. The
positions of these pulses within the frames are identified using
more data bits. An example of the transmitted signal of a
hybrid DQPSK-MPPM scheme with M = 4 and n = 2 is
shown in Fig. 2.
At the receiver side, the received signal is first split into
two branches using a 3-dB coupler, Fig. 3. The lower branch
is composed of a traditional direct-detection MPPM receiver in
order to identify the positions of the received n pulses within
the frame. In the upper branch, the DQPSK data is directly
detected.
In the upper branch, the DD-DQPSK demodulation needs
the received optical signal to be split through two asymmetric
interferometers with phase difference of /2 [2]. As shown in
the figure, the received optical signal is further divided into two
parts, one part is variably delayed depending on the positions
of the previous and current signal slots being compared. If
the previous and current signal slots being compared exist
in the same frame, the delay is (m2 m1 ) , where m1
{0, 1, . . . , M 2} and m2 {m1 + 1, m1 + 2, . . . , M 1}

122

.indd

122

2015/03/11

11:01:14

+ (1 SERM P P M ) SERM P P M (1 2 BERDQP SK )




+ (1 SERM P P M ) 2n BERDQP SK .

(1)

where SERM P P M is the symbol-error rates (SER) of MPPM


data, BERDQP SK is the BER of DQPSK data bits on top
 
of the current MPPM frame, and N = log2 M
. The
n
SERM P P M is given by [4] with slight modifications. Also,
BERDQP SK can be found in [2].

Biterror rate (BER)

10

IV. C ONCLUSION
A hybrid DQPSK-MPPM modulation technique has been
proposed for high sensitivity optical communications systems.

10

10

10

10

10

Hybrid DQPSKMPPM (M=8 n=2)


Hybrid DQPSKMPPM (M=14 n=3)
Hybrid DQPSKMPPM (M=20 n=4)
Polarized Traditional DBPSK
Polarized Traditional DQPSK

12

10
8
6
4
2
0
Average received optical power (P ) in dBm
av

Fig. 4: Average bit-error rate versus average received optical power


for proposed DQPSK-MPPM systems, traditional DBPSK, and traditional DQPSK systems with n2 = 1.6 105 A2 .
2

10

III. N UMERICAL R ESULTS


In Figs. 4 and 5 we plot the BER versus average received
optical power for both hybrid DQPSK-MPPM and traditional
systems. All systems under comparisons are assumed to have
same transmission data rate (Rb ) and bandwidth except for traditional DQPSK system (comparison with traditional DQPSK
system cannot be made under same Rb and bandwidth simultaneously), we assume that traditional DQPSK system has same
Rb but half the bandwidth of other systems under comparison.
It can be seen from both figures that the performance of the
hybrid system improves as M increases. Indeed the energy
efficiency of the system improves by increasing M . Also it can
be seen that the proposed DQPSK-MPPM system performs
better than corresponding traditional DBPSK, DQPSK and
MPPM systems. Specifically from Fig. 4, for the proposed
DQPSK-MPPM system with M = 20 and n = 4, there is an
improvement of about 2.4 dB at BER = 109 when compared
to the polarized DBPSK system. And there is an improvement
of about 1.5 dB at BER = 109 for hybrid DQPSK-MPPM
system (of M = 36 and n = 3) when compared with
traditional MPPM system (of M = 36 and n = 5). The reason
behind this improvement is because hybrid system has higher
peak power per slot as compared to corresponding traditional
systems under the aforementioned constraints.

Rb = 1/

10

10

Biterror rate (BER)

are the positions of the previous and current signal slots,


respectively. On the other hand, if the previous and current
signal slots being compared exist in different frames, the delay
is (M m1 + m2 ) , where m1 , m2 {0, 1, . . . , M 1}. The
output of DQPSK receiver depends on the phase difference
between any two neighboring pulses and is used by the
decision circuit to determine the DQPSK bits. It should be
noticed that the delay by two time frames in the upper branch
is to guarantee the availability of information about both m1
and m2 from the lower branch. As seen from Fig. 3, the BER
of the hybrid system depends on both current and previous
frames. We obtain an upper bound of the BER of the proposed
hybrid DQPSK-MPPM technique by considering the worst
case scenario. That is, we assume that all the n positions are
incorrectly decoded whenever an MPPM frame is incorrectly
detected. This upper bound can be written as [3]:



1
N 2N
BERHybrid
SERM P P M
+
n
N + 2n
2(2N 1)

Traditional MPPM (M=26 n=4)


Traditional MPPM (M=36 n=5)
Hybrid DQPSKMPPM (M=22 n=2)
Hybrid DQPSKMPPM (M=36 n=3)

10

10

10

10

10

Rb = 1/2

12

10

10
9
8
7
6
5
Average received optical power (Pav) in dBm

Fig. 5: Average bit-error rate versus average received optical power


for both hybrid DQPSK-MPPM and traditional MPPM systems with
n2 = 1.6 105 A2 .

A simple detection mechanism, based on direct-detection


DQPSK receivers, has been proposed and studied. Our results
reveal that the proposed technique is more power efficient
than traditional ones and has an improved BER and receiver
sensitivity.
R EFERENCES
[1] X. Liu, S. Chandrasekhar, T. H. Wood, R. W. Tkach, P. J. Winzer, E. C.
Burrows, and A. R. Chraplyvy, M-ary pulse-position modulation and
frequency-shift keying with additional polarization/phase modulation for
high-sensitivity optical transmission, Optics Express, vol. 19, no. 26, pp.
B868B881, Dec. 2011.
[2] K.-P. Ho, Phase-Modulated Optical Communication Systems. Springer,
July 2005.
[3] A. E. Morra, H. M. H. Shalaby, and Z. Kawasaki, A hybrid DPSKMPPM technique for high sensitivity optical transmission, in Proc. IEEE
Photonics Conference (IPC 2014), San Diego, CA, Oct. 2014, pp. 615
616.
[4] A. E. Morra, H. S. Khallaf, H. M. H. Shalaby, and Z. Kawasaki,
Performance analysis of both shot- and thermal-noise limited multipulse PPM receivers in gamma-gamma atmospheric channels, J. Lightw.
Technol., vol. 31, no. 19, pp. 31423150, Oct. 2013.

123

.indd

123

2015/03/11

11:01:16

Performance Evaluation of E-SDM-OFDM Systems using


Adaptive Peak Cancellation under Restriction of Out-Band Radiation Power
Tomoya KAGEYAMA , Osamu MUTA , Haris GACANIN and Hiroshi FURUKAWA

Department of Electrical Engineering and Computer Science, School of Engineering, Kyushu University
Center for Japan-Egypt Cooperation in Science and Technology, Kyushu University
Alcatel-Lucent Bell N.V Belgium
Graduate School of Information Science and Electrical Engineering, Kyushu University
, , 744 Motooka, Nishi-ku, Fukuoka-shi, Fukuoka-ken, 819-0395 Japan
AbstractIn multicarrier modulation system such as orthogonal
frequency division multiplexing (OFDM), the reduction of high
peak-to-average power ratio (PAPR) is a challenging problem.
Recently, an adaptive peak cancellation was proposed to reduce
the out-of-band leakage power as well as an in-band distortion
power (EVM), while keeping the transmitted PAPR below the predetermined and permissible value. In this paper, we evaluate and
discuss the performance of MIMO-OFDM systems using eigenbeam SDM in terms of bit error rate (BER), complementary
cummulative distribution function (CCDF) and the systems computational complexity. Our computer simulation results conrm
the effectiveness of the adaptive peak cancellation under the
restriction of out-of-band power radiation.

keywordsOFDM, Peak-to-average power ratio (PAPR),


PAPR reduction, E-SDM-MIMO, ACLR, EVM
I. I NTRODUCTION
One of major drawbacks in orthogonal frequency division
multiplexing (OFDM) systems is that the transmit signal exhibits high peak-to-average power ratio (PAPR) which causes
nonlinear distortion at power amplier. As a solution to this,
several techniques have been proposed such as amplitude peak
limiter such as clipping and ltering (C&F)[1] and phase control
such as partial transmit sequence. By the way, in wireless
communication systems, it is desirable to employ a simple
PAPR reduction technique which does not require any additional
signal processing on the receiver side. The former such as C&F
is an attractive approach. However, C&F operation causes undesired nonlinear distortion, i.e., in-band distortion and out-ofband radiations, which are measured by error vector magnitude
(EVM) and adjacent leakage power ratio (ACLR). In wireless
communication systems, the values of EVM and ACLR are
strictly restricted below a pre-dened system requirements.
To deal with this problem, we have proposed an adaptive
peak amplitude cancellation method for PAPR reduction of
OFDM signals [2]. In this method, the maximum amplitude
exceeding a given threshold level is iteratively suppressed. This
is done by adding a peak cancellation (PC) signal which is a
scaled OFDM symbol, whose subcarriers are coherently added
at a given time instance. On the other hand, both ACLR and
EVM are restricted below pre-dened levels. The objective of
this paper is to evaluate and discuss the performance of MIMOOFDM systems using eigenbeam SDM in terms of bit error
rate (BER), complementary cumulative distribution function
(CCDF) and the computational complexity.
II. S YSTEM M ODEL
A. E-SDM-OFDM system
Figure 1 shows an E-SDM-OFDM system considered
in this paper, where M , N and K denote the number of
transmit antennas, receive antennas, and data streams, respectively. Wtl and Wrl denote weighting matrices of transmit and receive spatial lters on the l-th subcarrier, respectively, where l = 1, , L denotes the number of subcarriers. In this gure, the transmit data vector of the l-th
subcarrier xl = [xl1 , ..., xlk , ..., xlK ]T is multiplexed by the
l
l
l
l
weighting matrix Wtl = [wt1
, ..., wtk
, ...wtK
], where wtk
=
l
l
l
[wtk1
, ..., wtkm
, ..., wtkM
]T is k-th weighting vector of Wtl and
superscript T denotes transposed matrix. denotes M N matrix

Fig. 1. E-SDM-OFDM system block diagram.

(a) Time-domain waveform.

(b) Frequency spectrum.

Fig. 2. PC signal waveform and its spectrum.

of l-th subcrrier frequency channel matrix where each element


corresponds to impulse response of the path between the m-th
transmit antenna and the n-th receive antenna.
On the receiver side, after FFT, the received signal is
demultiplexed by multiplying the weighting matrix Wrl =
l
l
l
[wr1
, ..., wrk
, ...wrK
]T , where the n-th weighting vector of the
l
l
l
l
l-th subcarrier is denoted as wrn
= [wrn1
, ..., wrnk
, ..., wrnK
]T .
l
l
l
l T
Hence, the demultiplexed signal vector y = [y1 , ...yk , ...yK ]
of the l-th subcarrier is given as
y l = Wrl H l Wtl xl + Wrl n = diag(l )xl + Wrl n,
(1)
where H denotes transposed conjugate and n
=
[Nq , ...nn , ..., nN ]T is additive white Gaussian noise (AWGN)
vector at each receive antenna. diag(l ) = [l1 , ...lk , ...lK ]
is diagonal matrix whose diagonal element is eigen value of
channel matrix of H l .
B. ACLR and EVM
ACLR is dened as adjacent channel leakage power normalized by the average power of transmit signal. In this paper,
we dene the required value of ACLR as -50dB. EVM is
dene as normalized mean square error between the ideal
signal constellation points and the actually transmitted ones.
In-band distortion caused by PAPR reduction can be evaluated
by measuring EVM.
III. PAPR REDUCTION SCHEME
In our PAPR reduction method, the maximum signal amplitude exceeding a given threshold level is suppressed by adding
a PC signal. The PC signal is a scaled OFDM symbol whose
subcarriers are added up to be inphase at a given time instance
of the symbol duration; the PC signal is generated by scaling
the following basic function g(t);
N
1
h(t)ejk t ,
(2)
g(t) =
N
k=1

where N is the number of subcarriers, k is the angular


frequency in k-th subcarrier signal, and h(t) is the impulse

124

.indd

124

2015/03/11

11:01:17

t
h

s
=
e
.

=
)
]
f

,
s
l
.
d

g
e
e
g

r
e

response of the transmit band-pass lter, and we assume that


square wave expressed by h(t) = 1 when 0 t T , where T is
OFDM symbol duration. We x(t) denotes complex multicarrier
signal. The maximum peak amplitude |x(t0 )| is detected at time
t = t0 . If |x(t0 )| exceeds a given threshold value Ath , peak
amplitude is reduced to the Ath by adding the PC signal to the
transmit signal. The added PC signal is expressed as
pr (t t0 ) = Ap ej g (t t0 ),
(3)

where g (t) = w(t)g(t), and w(t) is windowing function to


truncate the time-domain waveform of g(t). Ap = |x(t0 )|Ath
and is The phase of the maximum peak amplitude |x(t0 )|.
In E-SDM-OFDM system, PC signal is individually added to
transmit signal at each antenna. The transmit signal in m-th
antenna after adding the i-th PC signal is given as
(i1)
(t) + p(i)
(4)
x(i)
m (t) = xm
r (t t0 ),

Fig. 3. CCDF of normalized instantaneous power of the transmit signal.

(i)

where xm (t) is the transmit signal after adding the i-th PC


(i)
(i) (i)
signal pr (t t0 )t0 denotes time instance when the i-th
maximum peak amplitude is detected. Then, we estimate the
the distortion power of transmit signals by using the estimation
equation of increment quantity of ACLR and EVM. The above
operations are continued until the maximum amplitude value
|x(t0 )| in each frame is below the threshold Ath or the number
of iterations Nit comes to a given iteration number. If either the
estimated ACLR value or EVM value exceeds the acceptable
value in i-th iteration, the PAPR reduction procedure stops
before adding the i-th PC signal.
IV. P ERFORMANCE E VALUATION
Performance of the proposed system is evaluated by computer
simulations. System block diagram is the same as in Fig.1.We
consider 4 2 E-SDM-MIMO-OFDM system with 64 subcarriers. The number of streams is 2. The number of FFT points
is 512. Each subcarrier is modulated with QPSK. The transmit
signal is affected by independent attenuated 12-path Rayleigh
fading and AWGN. In this paper, the required EVM is set to
-20dB for QPSK. For comparison purpose, we also evaluate a
modied C&F method.
Figure 3 shows CCDFs of instantaneous power of OFDM
signal normalized by average power, where ACLR and EVM
meet required values dened in the system. Let P (z) denote
density function of the signal amplitude, and then CCDF is
given as the probability that random variable Z exceeds a certain

value z, i.e.,
P (t)dt
(5)
CCDF (z) = P rob(Z z) =
z

Fig. 4. Bit error rate performance of OFDM system with PAPR reduction.

Fig. 5. Computational complexity on PAPR reduction processing.

where x denotes the averaging of x. N and Nit denote


window size of window function and the number of the PC
signal additions. Ntap and Nth are the number of taps in time
domain lter and signals which exceed threshold respectively. In
proposed method, N complex multiplications per PC addition
are needed to generate the PC signal.
Figure 5 shows the number of the required multiplications
per one OFDM symbol for PAPR reduction as a function of the
achievable PAPR, where the normalized instantaneous power at
CCDF= 104 is dened as the achievable PAPR. From this
gure, we can see that the proposed PAPR reduction scheme
achieves lower computational complexity than that of C&F with
time domain LPF.
V. CONCLUSION
In this paper, we have evaluated the performance of MIMOOFDM systems using E-SDM and adaptive peak cancellation.
Simulation results show that the proposed method is effective
in reducing peak transmit amplitude in E-SDM-OFDM systems
under requirements of ACLR and EVM.

For comparison, CCDFs of C&F method with time domain lter


is shown. From Fig.3, it can be seen that instantaneous power of
transmit signal is reduced by around 3.0dB at CCDF=104 by
using the proposed method. It can be also conrmed that the
proposed method achieves better PAPR performance than the
C&F method; normalized instantaneous power in case using the
proposed method is reduced by around 1.3dB at CCDF= 104 .
Figure 4 shows BER performance of E-SDM-OFDM systems using the proposed PAPR reduction method under the
requirements of ACLR and EVM, where ACLR and EVM
requirement are set to =-50dB and -20dB for QPSK. Ideal case
means the ideal BER performance in case without PAPR reduction. For comparison, BER performance using C&F method
with time domain lter is shown. From this gure, we can
ACKNOWLEDGMENT
be conrmed that the proposed method shows better BER
This work was partially supported by Support Center for Adperformance than that of the C&F method.
vanced Telecommunications Technology Research, Foundation.
To evaluate computational complexity, we dene the number
of multiplications for PAPR reduction per one OFDM symbol
R EFERENCES
is dened as the required complexity, where the number of [1] Armstrong J., Peak-to-average power reduction for OFDM by repeated
additions is not taken into consideration for simplicity of
clipping and frequency domain ltering, Electron. Letters, vol.38. no.5,
pp.246-247, Feb. 2002. the Requirements of ACLR and EVM for MIMOdiscussion. Computational complexity for the proposed method
OFDM Systems, IEEE PIMRC2012, Sept. 2012.
and C&F method with time domain ltering is expressed as
[2] O.Muta, et al, A peak limiter based PAPR reduction technique under
{
restrictions of out-of-band radiation and in-band distortion for OFDM
Proposed method
N Nit
Signals, Technical report of RCS, 2014-12.
Co =
Ntap Nth C&F(with Time domain LPF),
125

.indd

125

2015/03/11

11:01:18

Performance of Single Carrier Transmission Systems Using


Nonlinearity Mitigated A/D Converter

t
w
s
p
m
R
a
s

Syota FUKUSHIGE , Osamu MUTA Daisuke KANEMOTO and Hiroshi FURUKAWA


Graduate School of Information Science and Electrical Engineering, Kyushu University
Center for Japan-Egypt Cooperation in Science and Technology, Kyushu University
Electrical and Electronic Enginieering, University of Yamanashi
, 744 Motooka, Nishi-ku, Fukuoka-shi, Fukuoka-ken, 819-0395 Japan
4-3-11, Takeda, Kofu, Yamanashi, 400-8511 Japan
Transmitter

Abstract To develop small size base and relay nodes, it is


desirable to improve power efciency at power amplier and reduce
required hardware complexities at analog receiver circuits; analogto-digital (A/D) converter (ADC) and related analog hardware
designs are important factors to simplify the transceiver circuits
at each nodes. To mitigate the nonlinearity of a low resolution
ADC for reducing the required analog hardware complexity, we
have proposed two nonlinearity mitigation techniques are adopted,
i.e., the dither ADC and the hysteresis ADC. In this paper, we
investigate the effect of the linearity enhanced A/D conversion
techniques on the achievable performance in single carrier offset
quadrature amplitude modulation (OQAM) systems, where the
dither ADC or the hysteresis ADC is adopted. Simulation results
prove that both the dither ADC and the hysteresis ADC are effective
in improving BER performance of OQAM systems using a low
resolution ADC.

Receiver

Fig. 1.

D/A

BPF

Power
Amplifier

i
c
i
i
u
l
b
i
o
c

Digital Process
Down
Convert
(IF BB)

ADC

MLSE

LPF

System block diagrams of the transmitter and the receiver.


Received Data

#1
000

00

I. I NTRODUCTION
A radio relay transmission is a promising technique to construct broadband wireless communication networks, where the
communications trafc from/to the base nodes connected to
the relay node is handed to/from wire-line system. In such a
network, it is desirable to improve power efciency at power
amplier and reduce required hardware complexities at analog
receiver circuits; analog-to-digital (A/D) converter (ADC) and
related analog hardware designs are important factors to simplify
the transceiver circuits at each node.
Single carrier transmission is a promising candidate for power
efcient wireless communications, since it achieves a lower
peak-to-average power ratio (PAPR) which improves power
efciency at the transmit power amplier. From a viewpoint
of a low PAPR characteristics, single carrier offset quadrature
amplitude modulation (OQAM) is attractive. On the other hand,
to reduce analog hardware complexity at the receiver circuits,
it is required to minimize the required resolution of ADC
while mitigating the inuence of quantization errors, i.e., there
is a trade-off relationship between hardware complexity and
nonlinearity at ADC. To mitigate the inuence of nonlinearity
at a low resolution ADC, we have proposed two nonlinearity
mitigated A/D conversion techniques, i.e., the dither ADC and
the hysteresis ADC [1].
In this paper, we investigate the effect of the nonlinearity mitigated A/D conversion techniques on the achievable performance
in single carrier OQAM systems, where the dither ADC and the
hysteresis ADC [1] are adopted on the receiver side. In addition,
we extend the proposed ADCs to multi-bit quantization case and
evaluate its performance. Using these techniques, it is expected
to mitigate the performance degradation caused by nonlinearity
of a low resolution ADC.

#2

01

Estimated
channel

ADC
model

Demodulation

Square
Error
Calculation

Estimated
channel

ADC
model

Demodulation

Square
Error
Calculation

For each candidate sequence

#2L
111

Estimated
channel

11

Fig. 2.

ADC
model

Demodulation

p
b
s
h
t
s
c
a
t
m
i
c
r
e
w
f
e
A
o
a
w
t
t

Select the minimum

L Symbol

000

A. System Model
Figure 1 shows block diagrams of the transmitter and the
receiver considered in this paper. We consider an offset quadrature amplitude modulation (OQAM) as a power efcient single
carrier transmission scheme, where the transmit signal is bandlimited by a pulse shaping lter whose frequency transfer
function is a square root of raised cosine roll-off function with

LPF

Up
Convert
(BB RF)

Analog Process

Down
Convert
(RF IF)

keywordsSingle carrier, OQAM, A/D converter, Dither, Hysteresis.

II. P ROPOSED S YSTEM

OQAM

t
a

Analog Process

Digital Process

Information
Sources

B
i

Square
Error
Calculation

Block diagram of MLSE.

a) Dither-ADC
Input

Output

b) Hysteresis-ADC

Input

-1
-

comparator

Sample and Hold

Z-1

MUX(Digital)

Block diagrams of the proposed one-bit ADC.

Fig. 4.

Output

Switching circuit

Switching circuit

Fig. 3.

-1
-

0.3

a
i
u
i
e

a
f
w

0.9

Examples of input and output signals at ADC.

roll-off factor. On the receiver side, the received radio signal


is affected by multipath Rayleigh fading and additive white
Gaussian noise (AWGN). After frequency down-converted to
intermediate frequency (IF) and passing through analog bandpass lter (BPF) to eliminate out-of-band noise, the signal is
digitized by an IF sampling ADC, where the digitized signal
is frequency down-converted to the baseband signal in digital
domain. After that, nonlinear equalization based on a maximum
likelihood sequence estimation (MLSE) equalization is adopted

a
v
2
c
c
i
d
t
t

126

.indd

126

2015/03/11

11:01:19

TABLE I
S IMULATION PARAMETERS .
Modulation
OQPSK, O16QAM
Demodulation
Coherent Detection
Channel model
6-path Rayleigh fading
ADC sampling frequency
f=32fs
Number of quantization bits
Q=1,2,3

to detect the data. Figure 2 shows block diagram of the MLSE,


where square errors are calculated for all possible candidate
sequences that replicate the received signals corresponding to all
possible data sequences. The data sequence that results in the
minimum mean square error is selected as the output data, (See
Ref. [4] for details). In this study, for simplicity of discussion, we
assume that channel state information is known on the receiver
side.

100

Ideal Quantization
1bit_
0

2bit_

2bit_

0.3

B. Enhanced A/D Conversion Techniques for Nonlinearity Mit1bit_


0.3
2bit_
0.9
igation
1bit_
0.9
3bit_
0
This subsection introduces two enhanced A/D conversion
10
techniques for mitigating its nonlinearity, i.e., the dither ADC
and the hysteresis ADC [1].
1) Dither ADC: Using dither based techniques has been
investigated to mitigate nonlinearity of ADCs in delta-sigma A/D
10
conversion for audio applications [2], where the dither signal
is added to the input signal of ADC in order to mitigate the
inuence of quantization noise. In this paper, we propose to
use the dither technique in order to mitigate the inuence of a
Dotted lines : Dither ADC
Solid lines : Hysteresis ADC
low resolution ADC in wireless receivers. Figure 3(a) shows the
10
block diagram of the dither ADC, where pseudo noise sequence
0
5
10
15
20
is generated as analog signal and added to the input signal
Eb/N0 [dB]
of ADC. The magnitude of dither signal can be optimized by
(a) OQPSK
choosing a suitable value of .
2) Hysteresis ADC: Comparator hysteresis is known as a
10
phenomenon that where current ADC output is partially fed
2bit_
0
Ideal Quantization
1bit_
0
2bit_
0.3
back to the input side and causes the DC offset of the input
1bit_
0.3
2bit_
0.9
signal at next sampling time [3]. In the hysteresis ADC, the
1bit_
0.9
3bit_
0
hysteresis effect is explicitly utilized to mitigate the inuence of
the nonlinearity caused by a low resolution ADC. Figure 3(b)
10
shows block diagram of the proposed hysteresis ADC which
consists of a hysteresis comparator as one-bit resolution ADC
and an analog sample-and-hold (S/H) circuits as chopper. Note
that the hysteresis ADC can be straightforwardly extended to
multi-bit resolution case. Key idea for linearity enhancement
10
in the hysteresis ADC is to invert the hysteresis sign of the
comparator at ever two samples. This inverted hysteresis effect
randamizes quantization error in a low resolution ADC and
Dotted lines : Dither ADC
Solid lines : Hysteresis ADC
enables the receiver to obtain similar effect to the dither ADC
10
without adding a specic dither signal. The value of feedback
0
5
10
15
20
25
factor in the comparator can be adjusted to a desired value by
Eb/N0 [dB]
external bias voltage. Figure 4 shows example of the hysteresis
(b) O16QAM
ADC digitized output for sinusoidal analog input signal in case
Fig. 5. BER performance of single carrier OQAM system.
of = 0, 0.3, and 0.9, where possible output states is changed
according to the amount of feedback factor . Note that the ADC
without hysteresis effect is equivalent to case using = 0. From
these results, we can conrm that possible output states increased
performance than that using the dither ADC by using a proper
to three states in case of 0 < < 1.
value of feedback factor .
BER

01

BER

0.11

0.01
2

0.001
3

III. P ERFORMANCE E VALUATION


Performance of the OQAM system using the proposed ADCs
are evaluated by computer simulations, System block diagram
is the same as Fig. 1, where MLSE and the proposed ADCs are
used on the receiver side. Simulation parameters are summarized
in Table I. Channel model is assumed to be quasi-static and an
equal level 6-path Rayleigh fading with RMS delay spread of
/T = 0.81. The receiver adopts a six-order butterworth lter
as band-pass lter (BPF) followed by ADC. The IF sampling
frequency (fsample ) at the ADC is given as fsample = 16fs ,
where fs is the symbol frequency dened as fs = 1/T .
Figure 5 shows bit error rate (BER) performance of OQPSK
and O16QAM systems using the proposed ADCs for various
values of , where the number of resolutions in ADC is Q = 1,
2, and 3. In this gure, dotted and solid lines correspond to
cases with dither ADC and hysteresis ADC, respectively. For
comparison, BER performance in case using the ideal ADC
is also shown. From these gures, it can be seen that both
dither and hysteresis techniques achieve better BER performance
than the conventional ADC (i.e., = 0). We can also conrm
that OQAM system using the hysteresis ADC show better

IV. C ONCLUSION
This paper investigated the achievable performance of single
carrier OQAM systems using the enhanced ADCs, where two
dither based ideas are utilized to mitigate the nonlinearity of
ADC. Simulation results prove the effectiveness of the enhanced
ADC in single carrier OQAM systems.
R EFERENCES
[1] D. Kanemoto, O. Muta, et al, Linearity enhancement technique for one bit
A/D converter in wireless communication devices, ISCE2014, June 2014.
[2] N.Yamasaki,The application of large amplitude dither to the quantization
of wide range audio signals,J. Acoust. Soc. Jpn.(E),1983.
[3] R. Jacob Baker, CMOS Circuit Design, Layout, and Simulation, Third
Edition, Wiley-IEEE Press, 2010.
[4] E. M. Mohamed, O. Muta, and H. Furukawa, Adaptive Channel Estimation for MIMO-Constant Envelope Modulation, IEICE Trans. Commun.,
Vol.E95-B, No.07, pp.2393-2404, July 2012.

127

.indd

127

2015/03/11

11:01:20

128

.indd

128

2015/03/11

11:01:24

129

.indd

129

2015/03/11

11:01:25

130

.indd

130

2015/03/11

11:01:29

131

.indd

131

2015/03/11

11:01:31

The Third International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC 2015)

S-ar putea să vă placă și