Documente Academic
Documente Profesional
Documente Cultură
Assistants
Oktay Bahcecia,
a
oktayb@kth.se
Abstract
Intelligent Personal Assistants (IPA) are implemented and used in Operating
Systems, Internet of Things (IOT), and a variety of other systems. Many im-
plementations of IPAs exists today and companies such as Apple, Google and
Microsoft all have their implementations as a major feature in their operating
systems and devices. With the use of Natural Language Processing (NLP),
Machine Learning (ML), Artificial Intelligence (AI), and prediction models
from these fields in Computer Science (CS), as well as theory and techniques
from Human-Computer Interaction (HCI), IPAs are becoming more intelli-
gent and relevant. This paper aims to analyse and compare the current major
implementations of IPAs in order to determine which implementation is the
most developed at this moment in time and is contributing to the sustainable
future of AI.
Keywords: Intelligent Personal Assistant; Statistical Learning; Natural
Language Processing; Machine Learning; Artificial Intelligence;
Human-Computer Interaction.
1 Introduction 3
1.1 Scope and Objectives . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Rational agent . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Intelligent agent . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Statistical Learning . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Natural Language Processing . . . . . . . . . . . . . . 5
2.2.3 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . 5
2.2.4 Supervised Machine Learning . . . . . . . . . . . . . . 6
2.2.5 Unsupervised Machine Learning . . . . . . . . . . . . . 6
2.2.6 Artificial Neural Networks . . . . . . . . . . . . . . . . 6
3 Methods 7
3.1 Literature Study . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Discussion 11
5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Conclusion 11
6.1 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2
1. Introduction
Intelligent Personal Assistants (IPA) and their current software imple-
mentations exists in a range of applications, usually integrated in the Op-
erating System (OS) from different developers and organizations, such as in
personal computers, mobile computers and in Internet of Things (IOT). The
IPAs are implemented in different programming languages and behave differ-
ently but all fall under the same branches of computer science and concern
themselves with similar problems [1].
Natural Language Processing (NLP) and Human-Computer Interaction
(HCI) have always been essential parts for these type of systems, and more
recently as the systems develop further, Machine Learning (ML) and Arti-
ficial Intelligence (AI) have become essential for the functionality of IPAs
[1].
Apple, Google and Microsoft are organizations that have their own im-
plementations of these kind of systems and are continuously integrating their
implementations across all of their devices and services [2, 3, 4].
3
2. Background
This chapter, namely the background chapter introduces the concepts,
theories and models that are relevant to Intelligent Personal Assistants. The
first part covers a few definitions of essential parts of AI, followed by the
fundamentals of some statistical learning methods and finally an introduction
and discussion of Human-Computer Interaction and its relevance to Natural
Language Procession in the context of IPAs.
2.1.1. Agents
An agent is something, such as an entity that acts, from the Latin verb
agere, which means to do. Computer programs always acts and does some-
thing, but an agent is excepted to do more. An agent is expected to be able
act autonomously, analyze and adapt to its environment and change, persist
over a longer time period and finally create, understand and pursue goals [1].
4
2.2. Statistical Learning
This chapter will include definitions of certain fields that are relevant for
the creation of Intelligent Personal Assistants.Artificial Intelligence, Machine
Learning and Sentiment Analysis are introduced together with its contexts
and relevance to IPAs.
5
degree of agreement among raters. According to recent studies, the human
agreement rate in sentiment analysis is around 79-80%. [9, 10, 11]. No IPA in
enterprise form is considering any of the mentioned aspects and will therefore
not be discussed further.
6
3. Methods
This chapter describes used research methods and data collection ap-
proaches. Furthermore, the methods used for Sentiment and Data Analysis
are described.
7
following native applications coming with the operating systems; reminders,
weather, stocks, messaging, email, calendar, contacts, notes, music, clocks,
web browser, Wolfram Alpha, and Apple Maps [14].
8
• Support for Docker containers
• Native isolation between tasks with Linux Containers
• Multi-resource scheduling (memory, CPU, disk, and ports)
• Java, Python and C++ APIs for developing new parallel applications
• Web UI for viewing cluster state [17].
It can therefore be concluded that most of Siri’s processing power, if not
all of it, is heavily dependent on this backend, and is by implication heavily
dependent on an Internet connection.
4.2. SDK
Siri uses the concept of domains in order to classify what the utterance
belongs to and what the user intent is. With the release of iOS 10, Apple
created the Siri SDK, namely SiriKit for developers, giving them the ability
to implement Siri usage in their own apps. SiriKit support is divided into
domains, each of which defines one or more tasks that can be performed. In
order to support SiriKit, apps must support one of the following domains
• VoIP calling, Messaging, Payments, Photo, Workouts, Ride booking,
• CarPlay (automotive vendors only) and Restaurant reservations (re-
quires additional support from Apple) [18].
4.3. Google Assistant
Google Assistant, previously named as Google Now and various of other
names, is Google’s personal intelligent assistant, and was released July 9,
2012. Google Assistant uses a natural language user interface to answer ques-
tions, make recommendations, and perform actions by delegating requests to
web services, which makes the service heavily dependent on Internet ac-
cess. Google recently created their own application for iOS with restricted
features, mostly because of the OS and SDK restrictions iOS has for third
party developers [2].
Little is known of Google Assistant’s system, architecture and technolo-
gies used, what language its implemented in and what dependencies it has.
Currently, Google Inc. has their IPA under closed development and there is
no SDK or any tools existent for accessing its features. Therefore, Google’s
contribution to this area is irrelevant for this paper and will not be discussed
further.
9
4.4. Microsoft Cortana
Cortana is Microsoft’s intelligent personal assistant that is able to access
and change reminders, recognize natural voice, and answer questions using
information from Bing. Cortana is able to act as a personal assistant for
stock applications with its operating system, but is dependent on Internet
access for fetching information from Bing [4].
Currently, Cortana is supported for seven different languages; English,
French, German, Italian, Spanish, Chinese, and Japanese.
Cortana is the only personal assistant under discussion that is cross-
platform and has an open SDK, and is implemented on the following plat-
forms; Windows Phone 8.1, Windows 10, Windows 10 Mobile, Microsoft
Band, Microsoft Band 2, Android, Xbox One, iOS and Cyanogen OS [4].
Cortana was the only personal assistant out of the IPAs that are under
discussion that presented an SDK for third party developers before the release
of iOS 10 and SiriKit.
4.5. SDK
Microsoft has opened their SDK for Cortana for third party developers
and gives much freedom for developers to integrate Cortana into their own
applications, and even supports new actions. This section will focus on the
features of the Cortana SDK.
4.5.1. Speech
Microsoft provides the following speech platforms and services for your
apps.
Windows speech Windows speech is a set of UWP APIs that enable
both speech recognition and speech synthesis across multiple languages on
all Windows-10 based devices, including IoT hardware, phones, tablets, and
PCs. Cortana on Windows uses these speech APIs.
Speech recognition Recognize real-time audio from the built-in micro-
phone, from a source other than the microphone such as a Bluetooth headset,
or from a file.
Speech synthesis Convert text into audio.
10
context. Actions are however restricted to Windows 10 Desktop and Mobile,
and Android.
Developers can define their own actions from scratch, or select from the
two predefined actions such as ordering food and sending messages. Examples
of own actions include, but not limited to Get nutrition info or Turn on the
lights [19]. This makes the Cortana SDK incredibly flexible and capable of
handling a wide spectrum of intents and actions, which the other IPAs from
Apple and Google does not.
However, developers need to register their actions, which can be done
without any cost, and will most likely be reviewed by a developer working
for the Cortana Team at Microsoft.
5. Discussion
This chapter will present an analysis of the results, discussion about the
limitations, methodical constraints together with a conclusion and future
work of this thesis. The implementational and computational limitations
are discussed with focus on restrictions on time, data quantity and machine
learning implementations. Finally, the conclusion of the found results are
discussed with advice of future research in the areas.
5.1. Limitations
The current and popular IPAs from Apple, Microsoft and Google are al-
most all under secret and classified development, and therefore, the architec-
ture and technoligies used by these companies are unknown. The exceptions
are Apple, which recently released their Software Development Kit (SDK)
with iOS 10 for their IPA Siri and Microsoft, which released its SDK for
their IPA Cortana.
6. Conclusion
The purpose of this paper was to research the capabilities of some of
the current IPAs that exist in the market today. It can be concluded that
the most flexible assistant today is Cortana, which is available for all of
the popular mobile operating systems, including iOS, Android and Windows
Phone 10 Mobile. Measuring intelligence and technologies is not possible at
this time, mainly due to the fact that the IPAs under discussion are under
closed development. However, the most flexible agent is Microsofts Cortana,
11
which has the richest SDK and allows third party developers to customize
and create their own actions for their agent. On the ethics side of things,
Microsoft is still the winner. Microsoft has established an ethical, well-formed
agent that is open for developers worldwide and does not blindly implement
new features for their agent without confirming the new actions. This is
widely talked about in Superintelligence, where a stable, common ground
needs to be established for the AI in question before the public can start
tweaking the source code for the intelligence. [20]. With the most complex
agent, hopefully, Microsoft will be open to working with Artificial Intelligence
groups such as OpenAI which aims to promote and develop friendly AI in
such a way as to benefit, rather than harm, humanity as a whole [21].
References
[1] S. Russel, P. Norvig, Artificial Intelligence: A Modern Approach, 3rd
Edition, Prentice Hall, 2009.
12
[4] Microsoft, Cortana - meet your personal assistant - microsoft - usa,
https://www.microsoft.com/en-us/mobile/experiences/cortana/,
(Accessed on 03/11/2016).
13
[13] D. Das, M. Shorif Uddin, Data mining and Neural network Techniques
in Stock Market Prediction: A Methodological Review, International
Journal of Artificial Intelligence & Applications 4 (9) (2011) 117–127.
URL http://www.airccse.org/journal/ijaia/papers/4113ijaia09.pdf
14