Sunteți pe pagina 1din 15

An Exhaustive Meta-analytical Study of the History, Evolution and

Development of ‘Saraiki NLP’


JATINDERKUMAR R. SAINI1

Narmada College of Computer Application,


Zadeshwar, Bharuch, Gujarat – 392 011, India
1
saini_expert@yahoo.com

Abstract. The paper is first formal attempt worldwide to present a 200 years based chronicle of Natural Language
Processing (NLP) research activities in resource-scarce Saraiki language, spoken mostly in Pakistan and India. It
presents a single major bibliography till-date comprising of nearly 100 research instances of ‘NLP of Saraiki
Language’ or ‘Saraiki NLP’ (the terms used first time in research community). The listed Saraiki NLP research
instances are classified in one of six proposed categories for Corpus Development, Evolution, Machine Translation
System, Phonetics, Script and Social-Science. State-of-art of Saraiki NLP sub-domains along with
recommendations for potential sub-domains is presented. As a best attempt exhaustive annotated and classified
bibliography of Saraiki NLP, the paper is intended to be a stepping-stone one-stop resource for future Saraiki NLP
research activities.

Keywords: language, Multani, Natural Language Processing (NLP), Saraiki, Seraiki, Siraiki

(Received October 28th, 2018 / Accepted December 5th, 2018)

1 Introduction current state of art of Saraiki NLP. This paper intends


to be a one point stop for various NLP research
Saraiki language is spoken by almost 20 million people activities till date for the Saraiki language. The period
in Pakistan as the first language while almost 78000 of almost 200 years, for which the journey of research
people in India too speak Saraiki language [68]. The in Saraiki language has been analyzed, ranges from
Saraiki language is mostly spoken in various parts of 1818 to 2018. For this purpose, the related research
Pakistan as well as North India. The people who have works were searched from various possible platforms
migrated from North India to other parts of India are like major journal publishers, various local journals as
also found to speak Saraiki language. This language is well as various indexing and abstracting service
similar to Punjabi and Sindhi and has many dialects like providers. Such platforms included, in alphabetical
Multani (being the most commonly found dialect), order, EBSCO, IEEE, ProQuest, ScienceDirect,
Riasati, Jafri, Hindki and Thali. The Saraiki language Springer and SSRN, to name a few. For the searched
belongs to Indo-Aryan sub-family of Indo-Iranian records, mere inclusion of Saraiki word in a research
language family. In the year 2007, based on the paper was not treated as a sufficient criterion for
percentage-fraction of world population of native inclusion in the current research work. Direct or in-
speakers, Saraiki language was ranked 60th out of 100 direct contribution to NLP field was treated as a
languages spoken world-wide [92]. requisite for consideration of a research work for the
current work. In this process, many general papers and
The current research work has been done with articles were also found for languages of Pakistan and
intention of creating a bibliography of Saraiki Natural North India. Each of the references listed in the current
Language Processing (NLP) research tasks as well as work have been compiled based on multiple sources and
finding, listing and directing possible future research detailed here enough elaborately to act as a first step for
avenues for Saraiki NLP. The paper brings home the further research in the field. Also, all the non-English

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

titles have also been provided with best-attempt near- definitely they provide a first step in that direction.
equivalent English titles of the papers. Further, the Most of the works of the Saraiki language also do not
source language with each paper too has been specified. bear the characteristics of true NLP research because
they were proposed before the formal definitions of
Given hardly any available Saraiki NLP research NLP were finalized and put to implementation widely.
works, the current research work has not differentiated
among the various dialects of the Saraiki language. The 2 Saraiki Language Corpus Development
word ‘Saraiki’ has been found to be spelled in multiple
ways including ‘Saraiki’, ‘Seraiki’ and ‘Siraiki’. The The first available resource for the Saraiki language is
current work spells it as ‘Saraiki’ uniformly and without by Adward [2] and possibly it is the oldest of its kind.
making any difference of the language characteristics This is available in the form of a glossary of Multani
based on the way it is spelled. The current work also language. Monnier [46], who actually is the printer of
does not distinguish between ‘Podhohar’ and the book and not perhaps the author of the book on
‘Pothohari’ dialects of Punjabi. The current work, comparison of Sindhi and Punjabi languages with
further, also does not distinguish between differently Multani, presents a detailed discussion on Saraiki
spelled ‘Pashto’ and ‘Pushto’ language. Further, proverbs and grammatical usage. The book intends to
‘Saraiki’ may refer to a language, a culture, a provide an almost exhaustive coverage of the usage of
community, a dialect, etc. all of name ‘Saraiki’ but the Saraiki language in day to day routine activities
current work refers to language only, unless explicitly specifically carried out by Saraiki people. While going
stated. The current work also does not elaborate on through the book, though it is clearly evident that most
Saraiki tribes or Saraiki nationalism, which seeks to of the activities mentioned in the book have become a
establish a separate province in Pakistan. Finally, the thing of past when compared with current times, the
current work also does not distinguish between reader encounters an indeed candid treatise of the
‘Multani’ and ‘Saraiki’ languages. Saraiki language used in olden times. Brien [13, 14] too
has presented a glossary of the Multani language along
Rest of the paper is presented in fourteen sections. with its revised version. Brien and Mughal [15] have
There are six sections listing the bibliographic presented a Saraiki Urdu dictionary. Shackle [69] has
references to various Saraiki NLP research works with presented an encyclopaedia of the Saraiki language.
each section having focus on Saraiki Language Corpus
Development, Saraiki Language Evolution, Saraiki The translation of holy book Kuran is provided by
Language Machine Translation System (MTS), Saraiki Abbasi [1] in the form of handwritten manuscript of
Language Phonetics, Saraiki Language Script and Saraiki language. A similar Saraiki translation is also
Saraiki Language Social Science, respectively. Each of provided by Molana [45]. The translation of the holy
these six sections is followed by a respective sub- bible in Saraiki is also available [82]. A corpus of
section containing remarks on the preceding section as Saraiki poetry is provided by Jampuri [32]. Skemp [83]
well as discussions on existing research work done as have collected and presented a translation of various
well as possible future research in that area. The second Multani stories.
last section of the paper body elaborates on the findings,
in addition to already listed remarks in the foregoing six Parveen and Anjum [55] have proposed the
sub-sections. Concluding remarks are presented in the development of a bank of words translated from English
last section of the paper body. language into six other languages namely Punjabi,
Saraiki, Balochi, Urdu, Sindhi and Pashto. They have
It is noteworthy to mention that the line demarcating aimed to develop it as a language resource through
the classification of a research work in one category which learners can better understand language. They
instead of other has been found to be thin. This is so propose to implement their developed vocabulary word-
because there are research works which may contribute bank for school going children for their better education
to multiple categories of NLP tasks listed here. For and betterment of language too. The Russian and
instance, the work of Rasoolpuri [62] details about the English language works of Samarnov [87] and Shackle
script as well as the phonetics of the Saraiki language [88] respectively have been translated into Saraiki and
and has therefore been classified in both of these in a way this too has provided the research community
categories. It is also notable to mention that a paper with parallel corpus for the Saraiki language.
classified in one category with a single word title may
also contribute to many sub-fields of that category too. Rauf et. al. [63] have presented a speech corpus
All the Saraiki language work instances presented here developed by them for the purpose of Automatic Speech
under various categories may not necessarily represent Recognition system for Urdu. They have recorded
true research works from perspective of novelty, but district names in six accents namely Punjabi, Urdu,

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

Saraiki, Pashto, Sindhi and Balochi. They have also [55] have actually proposed and created a parallel
taken care of various challenging factors for recording, corpus with L1 language being English and L2 to Ln
including the disturbances that creep into while languages being Punjabi, Saraiki, Balochi, Urdu, Sindhi
recording. and Pashto, corresponding to value of n equal to 6.
Similarly, Rauf et. al. [63] have also presented a parallel
Amritsari [84] also has created a Saraiki language corpus with L1 language being Urdu and L2 to Ln
corpus consisting of the proverbs, sayings and verses of languages being Punjabi, Saraiki, Pashto, Sindhi and
the Saraiki language. Akhter [3] in her thesis has Balochi, corresponding to value of n equal to 5. Though
presented an exhaustive study of idioms and proverbs of both these research instances are similar, there are two
the Saraiki language. The specific noteworthy mention major differences. Firstly, the value of n is 6 in former
about her thesis is the clear differentiation presented by case whereas 5 in later case and secondly, the former
her among the idioms and proverbs, unlike many other relates to parallel corpus for text while the later relates
authors who fail to bring out this differentiation and to parallel corpus for speech. The presentation of
present an ambiguous or mingled picture of the similar various translations by authors for religious holy books
treatment to idioms as well as proverbs. like Quran and Bible as well as those for Saraiki stories
are all instances of creation of parallel corpus from the
2.1 Remarks on Saraiki Language Corpus perspective of Saraiki NLP.
Development
3 Saraiki Language Evolution
The researcher in current work has classified all
attempts of linguists and researchers dealing with This category presents the Saraiki language works
presentation of encyclopaedia, dictionary, book which directly indicate the authors’ intention of
translations, various compilations and collections presenting the evolutionary stages of the Saraiki
(specifically for poetry, proverbs and idioms) as well as language. The word ‘directly’ has been used here
the presentation of language glossary in the category of emphatically because most of the works themselves
‘Saraiki Language Corpus Development’. This is done have used the word ‘irteqa’, meaning ‘evolution’ for the
because from the perspective of NLP, specifically the Saraiki language. For instance, the works of
statistical school of thought, corpus is the first step of Bahawalpuri [9], Nasir [50] and Zami [79] fulfil this
NLP and any focused collection of language data is a criterion. In a broader sense, the other similar works
corpus of that language. The corpora listed here, hence, present the historical growth and development of the
provide a scope of further statistical NLP research in the language as well as insights into Saraiki linguistics. The
Saraiki language. A single corpus, a combination of works of Bakhsh [11], Gilani [18], Hussain [26],
corpora or all corpora listed here together could be used Kalanchvi [38], Mughal [49], Pervaiz [56], Qais [57],
for further research. Specifically machine learning Rasoolpuri [59], Sajjad [67] and Wagha [78] as well as
based research work as well as a formal attempt for Haq [21], Kalanchvi [36] and Rasoolpuri [61]
corpus building for the Saraiki language could be respectively fulfil these criteria.
undertaken on these lines. Also, there is a definite
scope for development of more parallel corpora with Gilani [18] has provided very interesting discussion
corpus consisting of paragraphs and documents instead on origin and evolution of word India as well as the
of just words. This task in turn could be aid the evolution of the Saraiki language itself. He has strongly
development of Saraiki MTS too. advocated the influence of Persian and other rulers as
well as their languages over the evolution of the Saraiki
The researcher in current work believes that the language. Wagha [78] has discussed about the growth
works of various litterateurs as well as researchers listed and development of the Saraiki language with specific
in above section are actually pseudo NLP contributions context of Pakistan. Haq [20] has presented a
because the intentions of authors were actually not comparison of Multani language with Urdu and
inclined to contribute towards NLP. The development depicted their relationships. Haq [23] has presented and
of corpus for a language is evidently a full-fledged discussed a comparative view of Saraiki language and
research project in its own right. Hence, the current other languages spoken in the areas where Saraiki is
research work terms the various works listed above as spoken. Hussain and Farooq [24] have lamented the
‘pseudo NLP works’. However, it is also found that the issues on Punjabi and Saraiki language where the people
research works of Parveen and Anjum [55] as well as speaking one of these languages have termed the other
Rauf et. al. [63] are both the true NLP research works language as the dialect of their language. Their paper
and only exact NLP contributions as far as ‘Saraiki argues that both the languages have regional identity
Language Corpus Development’ is concerned. This is and have significant differences in context of historical
so because from NLP perspective, Parveen and Anjum as well as linguistic perspectives. They have also

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

focused on the separate identity and existence of Saraiki 3.1 Remarks on Saraiki Language
language presenting reasons like its inclusion in the Evolution
'Paishacha' and ‘Dardic’ group of languages as well as
similarity and impact of other languages like Pashto, Apart from depicting the Saraiki language’s roots from
Punjabi, Pothohari, Sindhi and Persian on vocabulary of the cultural, geographical and linguistic angles, the
Saraiki language. litterateurs and researchers have attempted various
works to trace its origin and have presented elaborate
Hussain and Khan [25] have proved the close discussions on the history, growth, development,
linkage and relationship of Pushto and Saraiki decline and issues of the Saraiki language. There is
languages. They have also discussed the morphological clear evidence from all of these instances, as listed
and syntactic similarity of both of these languages. above, that at times the works have focused on Saraiki
Hussain [26] has proved that Saraiki language has links community and the Saraiki language has been studied
with ‘Pashachi’ and ‘Dardik Alsana’. He has further on the sidelines and at times, the Saraiki language has
added that there is a lot of Islamic influence on the been studied with specific concentration and the cultural
Saraiki language and that the later has hence largely dimension has been on the side-lines. Irrespective of
benefited from the Arabic and Persian languages which either case, the Saraiki language’s evolutionary study
have contributed many words and proverbs too to the has been treated as a major element to qualify a work in
Saraiki language. Singh [90] has presented a detailed the category of ‘Saraiki Language Evolution’ by the
discussion on the comparison of Majhi and Multani researcher in the current work.
variants of the Punjabi language. He has also claimed
that the Multani language has a large percentage of The works presented in this category have depicted
Perso-Arabic words as well as it has many diminutives. the evolution and derivation of Saraiki language as well
as also presented the linkage, relationship and influence
An ethnologue of the Multani language has been of other languages on the Saraiki language. To
presented by Lewis et. al. [42]. Khan [89] has provided summarize the contributions of various authors, the
a detailed discussion focusing on the genealogy of the current research work claims that the Saraiki language
word ‘Saraiki’ as well as the proliferation of the Saraiki has been linked with various languages, namely, Arabic,
language. Raza [64] has provided an etymological Doabi, Majhi, Malwai, Multani, Pashto, Persian,
evidence based on the linguistic and phonological Pothohari, Powadhi, Punjabi, Sindhi and Urdu.
discussion about the derivation of the word Saraiki. He
advocates that neither Siraikī nor Seraikī are the valid 4 Saraiki Language MTS
forms of the usage of the word. He has also proved the
origin of the word Saraiki to the word Sauvīra and Bhaiya [12] has presented the description of the rules of
claimed that the word Saraiki is not derived from the usage of the Saraiki language. The grammatical rules
word Siro. for the usage of the Saraiki language have also been
presented by Haq [22]. Hussaini [29, 30] have
Wikipedia [68] at its page on Saraiki dialect has published the knowledge books for the usage of Saraiki
mentioned that Saraiki language shares a large amount language. Shackle [72] has presented a reference
of morphology and vocabulary with Punjabi language. grammar of the Saraiki language. Khawaja [85] has
It also mentions that Saraiki language also shares many deployed an online blog for the Saraiki language.
grammatical features with Sindhi language. It adds Though his work is not a formal research work, but the
further that even though there are similarities of Saraiki presented concepts about the verbs, tenses and
language with Punjabi and Sindhi languages the grammatical usage of the Saraiki language could act as
phonology and accent usage of Saraiki language is quite a key contribution to the development of Saraiki Part-
distinct. of-Speech (POS) tagger and in turn, Saraiki MTS.

Singh and Kaur [74] have reviewed the dialects of Raza [65] has presented the notion of development
Punjabi language. During their review, they have found of transfer-based Machine Translation System (MTS)
that the main dialects of Punjabi language are Majhi, for language pair consisting of languages Saraiki and
Malwai, Doabi, Powadhi, Multani and Podhohar. They Urdu. He has also lamented the lack of availability of
have further advocated that Saraiki is same as Multani resources required for implementation of otherwise
and that it is the dialect of Punjabi language which has sophisticated techniques of artificial intelligence for
originated from Pakistan. designing MTS. The paper also discusses about the
problems and opportunities for MTS for such languages.
In another paper, Raza [66] has presented an interesting
comparison among Persian, Urdu and Saraiki languages.

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

The basis used for comparison is adposition usage in the Gardezi and Hassan [17] have discussed about the
respective language. On the sidelines of the paper the Saraiki language and presented an introductory note on
author has also provided a detailed description of basic the poetics of the Saraiki language. Jumani et. al. [34]
adpositions and compound adpositions. have presented a detailed discussion on the influence of
Saraiki language on the pronunciation of English words
4.1 Remarks on Saraiki Language MTS during the usage of English language by speakers whose
native language is Saraiki.
As such, from the perspective of NLP, only one true
MTS application has been proposed by Raza [65]. Even Latif [41] has worked extensively on the phonemic
he too has lamented the lack of resources for the inventory for the Saraiki language and also proposed the
development of a full functional MTS for the Saraiki analysis of voiced implosives of the language based on
language. It is notable that another work of Raza [66] acoustics. He laments the lack of availability of corpora
dealing with adpositions is actually a work on Part-of- for research purpose for the Saraiki language and
Speech (POS) tagging. Through the current work, the experimented on Multani dialect of the Siraiki language
researcher has classified, compiled and presented for his research purpose. In a similar work, Moran et.
various works dealing with the Saraiki language and its al. [47] have presented a sound inventory for Seraiki
grammar at a single location. It is noteworthy that for language. Rasoolpuri [62] also has presented the
the development of an accurate MTS, it is very description about the sounds of Saraiki language words.
important to have clear comprehension of the rules of Mughal [48] has provided a detailed description on the
the language used in MTS either as a source or peculiar sounds used in the Saraiki language. Malik et.
destination language. Hence, through the current work, al. [44] have proposed the use of Finite State Machines
the researcher believes that a lot of scope is evident for with an aim of finding solution to the problem of
the development of a MTS which makes use of the scriptural translation. They claim that their model, based
Saraiki language. The, claimed to be exhaustive, listing on finite state transducers and Universal Intermediate
of works presented here and providing rules for the Transcription, is reported on Saraiki, Punjabi, Hindi and
usage of the Saraiki language is definitely going to act Urdu languages but can also be used with extension for
as an extremely useful resource for the development of other languages. They have emphasized the
Saraiki MTS, specifically the rule-based one, in future. significance of subjective evaluation of such models
before implementation and rolling them out for usage in
5 Saraiki Language Phonetics the real world scenarios.

Awan et. al. [8] have provided a detailed discussion of Nasir [51] has provided a detailed description on
Saraiki language. They have discussed about the nasality of the Saraiki language. He has explained
varieties of Saraiki language as well as lamented the about nasality level in language segments, phonotactics
fact that most of work done for Saraiki language was of the language, the nature of word-medial nasals as
from non-native speakers of the language. The key well as the analysis of the relation between contextual
contributions of their paper include a definite and and independent nasalization. Nasir [52] has presented
thorough comparison of vowels and consonants of a phonetic discussion of Saraiki language in context of
Saraiki language and English language. The variety of comparison with English language. An in-depth study
Saraiki used for comparison purpose by them is and analysis of nasalization in Saraiki language with
Multani. A detailed study of injective consonants usage context of the phonetic properties of the language has
in Saraiki language has been presented by Bahl [10]. been provided by Syed [75]. The research work of Syed
[76] aimed to notice the Saraiki language's sensitivity
Farooq [16] has provided a phonetic based towards the stress required for heaviest syllable in a
comparison of accent of Urdu language spoken in word. Similarly left-headedness posing another
Pakistan. The six different languages used for scenario for stress requirement too was studied by him.
comparison by the author include Sindhi, Punjabi, He concluded that many seemingly contradictory
Saraiki, Urdu, Balochi and Pashto. In a similar work, modifications take place, specifically for the Saraiki
Afsheen [93] et al. classified accents of Urdu language language words which are borrowed from Arabic
among Punjabi, Urdu, Pashto, Saraiki and Sindhi language.
languages using Mel Frequency Cepstral Coefficient
(MFCCs) and feature formants. Farooq and Hussain 5.1 Remarks on Saraiki Language
[94] have presented an analytic study of phonetic Phonetics
influence of six languages namely Urdu, Punjabi,
Sindhi, Balochi, Pashto and Saraiki on the There are a good number of instances that could be truly
pronunciations of vowels of true Urdu language. called 'research works' and are available for Phonetic

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

work on the Saraiki language. The current work has Gurmukhi equivalents [6] and Saraiki online
classified all such instances under the common class of transliteration [7].
'Saraiki Language Phonetics' and includes research
works which deal with Saraiki language's comparison of 6.1 Remarks on Saraiki Language Script
vowels and consonants, study of injective consonants,
comparison of accent with Urdu, scriptural translation, The researcher through the current work believes that
phonotactics, poetics, varieties, nasality and phonetics the area of Saraiki language script-based NLP research
of loan-words from Arabic language. The influence of is the only area where the extremes from completely
Saraiki language on the speakers of English has been formal NLP research to completely non-formal research
discussed for speakers whose native language is Saraiki. exist. The Unicode encoding of Saraiki script and the
The nasality comparison of English and Saraiki usage of NN and MATLAB for Saraiki script OCR are
language too has been presented. true script-based NLP researches for the Saraiki
language. Given this presentation and listing of single
The research works also include lamentation on digit research instances of the script-based NLP
many Saraiki language works being done by non-native research in Saraiki language, there is a lot of scope for
speakers of the Saraiki language as well as lack of Saraiki script OCR development and improvement.
formal corpus for Speech and Phonetic research work in
the Saraiki language. But at the same time, it has been 7 Saraiki Language Social Science
found through the current work that a good number of
speech corpora have been developed by researchers for Alam and Gill [4] have presented a study of a small
the Saraiki language. There is a definite need, now, to group of subjects to advocate that there is a definite
standardize these corpora and create uniform common influence of mother tongue on the learners of other
corpus for the Speech and Phonetics related NLP tasks languages. They have also highlighted through the case
in the Saraiki language. study of students of higher education of a university that
students try to grasp the concepts in any language but
Saraiki language is very typical as far as its mentally they process it in their mother language only.
characteristic dimensions like pronunciations, nasality The languages used for their research work include
and phonetics are concerned. In fact it differentiates Pashto, Saraiki and English. Gilani [19] has discussed
from many other dialects based on these features. about the different varieties of translations of The Holy
Hence, the current work advocates that there is a lot of Quran. The language focused by the author is Saraiki
research scope in Saraiki language from the said and has been studied on different basis like lexis, poetic
dimensions. and semi-idiomatic.

6 Saraiki Language Script Hussain et. al. [27] have highlighted the importance
of oral traditions and specifically focused on the way by
Pandey [53] submitted a formal preliminary proposal to which traditional values are transferred from one
encode the Multani script. He [54] also submitted a generation to the other through the use of folktales.
formally finalized proposal to encode the Multani script. They have deployed the case of three different folktales
He has also presented a comparison of Multani, for case study of Saraiki farmers. They have also
Khudawadi, Khojki, Gurmukhi, and Devanagari letters proposed recommendations for the educational policy
with the Multani script of the Saraiki language. for the Saraiki speaking region of Pakistan. Hussain et.
Rasoolpuri [62] has presented a description on the al. [28] have studied the folktales of Saraiki language
written letters of the Saraiki language. Jan and Saleem and tried to find out the gender usage in such folktales.
[33] have deployed Neural Network (NN) to recognize They have found that such folktales are largely
written characters of Saraiki script. They have used dominated by male gender. Khan and Shami [80] have
MATLAB with segmentation of character set of the elaborated on the benefits that Europeans had owing to
language and achieved 85% accuracy in their Optical the knowledge they acquired from the Saraiki language.
Character Recognizer (OCR). Perveen [81] has discussed about the communication
about HIV using the Pakistani languages including
Anonymous researchers, programmers and website Saraiki.
developers have contributed informally to the
development of Saraiki language script-based research. Khokhlova [40] has lamented the death of a
This includes presentation of Saraiki font and keyboards language which is spoken by majority of people in the
for download for Windows and Android operating country. The language lamented by the author is
systems [5], Saraiki alphabets with its corresponding Punjabi language wherein Saraiki language has been
included as a dialect of the Punjabi language. The

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

author has used the term 'intellectual death' of the Kalanchvi [37], Khan [39], Rasoolpuri [60] and Shoukat
language for Punjabi language's decreasing usage in the [73] are a few examples of the same. It is notable that
country. Mahar et. al. [43] have depicted the crane bird this does not indicate these research works are in-
as the symbol of affection and love through the analysis significant in totality but it indicates that they have not
of poetry written in Saraiki language. They have also been treated with relevance for the current work.
cited examples from the Saraiki language's Sufi and The research works in current category mainly
contemporary poets namely Baba Fariduddin Ganj include cases of influence of mother tongue on learners'
Shakar, Ashoo Lal and others. They have also mind, attrition of Saraiki language, transfer of oral
discussed the change in symbolism of the bird with traditions through Saraiki language, Saraiki poetry
time. Shackle [86] has presented a discussion on the symbolism and language movement in Pakistan with
styles and themes in the mystical poetry of the Saraiki specific focus on Saraiki language.
language of the Sind region of Pakistan. Shackle [91]
has presented a thesis based upon the study of the 8 Additional Remarks and Findings
Saraiki poetry of the writings during the period of 1750
to 1900 by the seven poets. He has further narrowed the This section presents findings in addition to the remarks
scope of research by considering the seven poets only already listed above corresponding to respective
from the Upper Sind and South-West Punjab regions of sections on Saraiki NLP. It has been found that Saraiki
the Indian sub-continent. literature, and hence its language unlike many other
languages of the world, is very rich in agricultural
Shackle [70] has presented the discussion on Saraiki terms, proverbs, folktales and poetry (‘shaayari’ as well
language in the context of the language movement in as ‘dohrey’ forms). The author believes that it is the
Pakistan with specific focus on Saraiki language. language of common man and mostly used in the
Shackle [71], way back in the late 70s, lamented the remote interior parts of the villages in its purest form.
research focus required to be given to languages like Though influenced by many other languages, like any
Saraiki. He has specifically focused on the unique style other language in the world, has still remained a lot
of writing poetries in the Saraiki language. In the aloof from possible alien-language attacks. The context
context of planning of a language for a nation and the of this statement becomes more evident from the
problems that a language can pose, Rahman [58] has following scenario. There are natural languages like
discussed about the Saraiki language movement in Hindi, Punjabi, etc. for which the speakers use words of
Pakistan. other languages (mostly English) freely during
conversation. The author believes that this is the
Syed et. al. [77] have conducted an experiment to process of slow-death of the language. This is so
show the attrition of Saraiki language by the other because each substitution of another language word in a
languages spoken by people in the areas inhabited by conversation going in a language attempts to kill the
Saraiki-speaking population. It is noteworthy that the original word, which perhaps for the next generation
other language focused by researchers was Hindi and its completely ceases to exist. In this context, Saraiki has
influence as a dominating language has been studied in still the continuing legacy which is dwindling with time
detail by the researchers in the context of reduction in though. Obviously, languages do not remain constant,
use of Saraiki language by even native speakers of new words keep on adding and old words keep on being
Saraiki language against the usage of Hindi by them. removed from its usage. But there is difference in both
these cases. The first case corresponds to language
7.1 Remarks on Saraiki Language Social wearing away due to substitution of other words and
Science decreasing vocabulary of the speaker for the native
language. In second case, the old words get removed
The researcher in current work has categorized those because of time. For instance, a word like ‘dhonkni’
research works in category of ‘Saraiki Language Social meaning ‘Blowpipe or Aerator, used in kitchens’ will
Science’ which deal with Saraiki language but their not be heard of in times to come as people have stopped
NLP focus is negligible and they have typical social using it owing to advent of new technologies like gas-
science relevance. It is noteworthy that the said works stoves and electric-stoves. In either case, it is strongly
do have NLP focus and relevance but it is either quite felt that unless there is a dictionary of a language, the
less or in-direct. Hence a research work which merely word will be completely wiped out from world. It is
makes use of the word Saraiki language is not assumed that the dictionary contained the word and new
considered for classification in any category in the editions of dictionary too retained that word.
current work. In fact, numerous such works exist but
have not been counted for in the current work. The It is notable here that though resources like
research works of Imtiaz [31], Kalanchavi [35], magazines, news-papers, TV channels and YouTube

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

videos exist for Saraiki language, there is no formal otherwise main-stream of the field. To the best of
attempt by any researcher till date to standardize them author’s knowledge, the current work is the first formal
uniformly and create a specifically focused corpus. research work worldwide which actually has used the
This is also the reason of their non-inclusion in ‘Saraiki coined terms ‘Saraiki Language NLP’ and ‘Saraiki
Language Corpus Development’ category. A lot of NLP’.
work on language movement in Pakistan has been found
but not included for the current research work as it does It has been found from the analysis of various
not contribute to core Saraiki NLP. research works presented in the current paper that
comparison of Punjabi and Sindhi languages with
Maximum work on Saraiki language has been done Saraiki language contributes to more than 50% of the
by Muslim authors whereas Christian authors rank total research works based on comparison of some
second. Hindu and Sikh authors, both rank third as far language with the Saraiki language. Table 1 presents
as the community-wise number of research instances are the statistics on such comparisons of various languages
concerned. From geographical perspective, maximum by the researchers with the Saraiki language. Table 1,
number for research works in Saraiki language hailed being sorted on the number of research works in
from Pakistan followed by United Kingdom (UK), descending order, also indicates the proximity of
United States of America (USA), India and France, in languages like Punjabi, Sindhi, Pashto and Urdu with
the listed ascending order of number of research works. the Saraiki language. It also paves further future
These findings have a logical justification because the research path for exploring the comparison and linkage
Saraiki language is spoken mostly in Pakistan which is a of other languages which are less-researched and less-
Muslim-majority nation. After Pakistan, maximum studied but otherwise known to be closely related with
number of Saraiki language speakers are present in Saraiki language. Such languages include Persian,
India. Also, the Saraiki language is similar to Punjabi Arabic, Balochi, Majhi, Pothohari and Dogri. It is
language and hence the Sikh research workers for the notable to mention that the statistics on comparison of
Saraiki language have been found too. It is notable that various languages with the Saraiki language presented
irrespective of community viz. Muslim, Hindu, etc., the in Table 1 deal with the language comparisons which
native language of a person could be Punjabi. The focus on nasality, phonetics, accent, vocabulary and
Saraiki language research instances by Christian authors grammar of the language. Only one research instance
and those from UK could be understood from the fact by Pandey [53] has been found which formally presents
that both India and Pakistan were under the British rule detailed comparison of script of Saraiki language with
pre-1947. The country of origin and community for that of scripts of Multani, Khudawadi, Khojki,
authors of three research works could not be established Gurmukhi, and Devanagari letters.
as the works were presented online in the form of
websites and not through the formal way of publication Table 1: Number of Research Works
of research findings through thesis, books or research based on Comparison of Saraiki Language
papers. Sr. Compared Number of
No. Language Research Works
Out of the 94 NLP research works found 1 Punjabi 7
corresponding to Saraiki language, as many as 32
(nearly 35%) were found to be scripted in Urdu, 2 Sindhi 7
Shahmukhi Punjabi, Gurmukhi Punjabi or Saraiki 3 Pashto 5
language itself, one was found to be scripted in French 4 Urdu 5
whereas all other were found to be scripted in Roman
English script. It is notable here that the author of 5 Persian 3
current research work himself is a native speaker of 6 Arabic 2
Punjabi and Saraiki. A few words from Urdu were 7 Balochi 2
understood with the example of Translator facility of
Google Inc. Examples of such words include ‘irteqa’ 8 English 2
meaning ‘evolution’, ‘lisaaniyaat’ meaning ‘linguistics’ 9 Majhi 1
and ‘saqaafat’ meaning ‘culture’. The author of current 10 Pothohari 1
research work strongly believes that the Saraiki
language NLP research work being done by non-Saraiki Total 35
speakers as well as non-availability of many important
Saraiki language works in international languages like Out of a total of 94 Saraiki NLP research papers
English are two main reasons responsible for non- listed in the current work, Figure 1 provides a time
inclusion of Saraiki language NLP research in the period based distribution of number of such research

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

Figure 1: Distribution of Number of Saraiki NLP Research Works Over Time Period

Figure 2: Distribution of Category-wise Number of Saraiki NLP Research Works Over Selected Time Period

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

works for a period of nearly 200 years. It is remarkable done for Figure 1 as its main purpose was to find,
that 1960s triggered the active Saraiki NLP research understand and analyze the distribution of Saraiki NLP
activities, the trend that has more or less continued till research activities over a period of time and hence to
the first decade of 21st century. The last decade from analyze a clear time-based pattern of trend. The
2010 till the date of final compilation of this paper in graphical representation in Figure 2, similar to Figure 1,
November 2018, has seen a sharp increase in the also shows 14 research works for which information
number of such activities. As the number of non-zero about year of publication is not available. Also, in order
research works for each year has been included in the to make the figure more readable, the skipped period
graphical representation itself, a separate table has not has been marked with the tilde sign (i.e. ~) in the figure.
been provided for the same redundantly. The graphical From Figure 2, it is clear that during the initial periods
representation also shows 14 research works for which of early nineteenth century, the Saraiki language
information about year of publication is not available. activities began with what we can today term as ‘Saraiki
Language Corpus Development’. This was followed by
Table 2: Period-wise Category-wise Number of developments in the field of tracing origins of the
Saraiki Language NLP Research Works language called ‘Saraiki Language Evolution’. The
remaining period was then dominated by activities
Year C E M O P S Total
comprising ‘Saraiki Language Social Science’ with
1815-1819 1 1 sporadically scattered instances of ‘Saraiki Language
1880-1884 2 2 MTS’ as well as ‘Saraiki Language Phonetics’ and
‘Saraiki Language Script’. It is also clear from Figure 2
1900-1904 1 1 that the latest period has seen parallel research instances
1915-1919 1 1 in various categories dealing with Saraiki NLP.
1925-1929 1 1
It is also evident from Figure 2 that for the entire
1935-1939 1 1 period under analysis, the maximum number of research
1960-1964 1 1 instances have been found for the ‘Saraiki Language
Evolution’ category followed respectively by ‘Saraiki
1965-1969 1 2 3 Language Social Science’, ‘Saraiki Language Corpus
1970-1974 2 2 1 5 Development’, ‘Saraiki Language Phonetics’, ‘Saraiki
Language MTS’ and ‘Saraiki Language Script’. It is
1975-1979 3 1 2 6
also notable that maximum number of core NLP
1980-1984 1 1 1 1 4 activities have been found for the category of ‘Saraiki
1985-1989 2 2 4 Language Phonetics’ which when compared with
activities in the category of ‘Saraiki Language Script’,
1990-1994 2 2 has been found to be almost three times of the later. For
1995-1999 1 1 3 1 6 further clarity about the statistical values represented
graphically in Figure 2, the data is also presented in
2000-2004 1 2 2 1 6 tabular form in Table 2. The common legend of
2005-2009 1 1 1 3 abbreviations used for Figure 2 and Table 2 includes
‘Saraiki Language Corpus Development’ represented by
2010-2014 1 3 1 5 8 2 20
‘C’, ‘Saraiki Language Evolution’ represented by ‘E’,
2015-2019 2 6 1 1 2 1 13 ‘Saraiki Language MTS’ represented by ‘M’, ‘Saraiki
Unknown 3 2 1 4 1 3 14 Language Social Science’ represented by ‘O’, ‘Saraiki
Language Phonetics’ represented by ‘P’ and ‘Saraiki
Total 17 26 8 20 16 7 94 Language Script’ represented by ‘S’, respectively.
This leads us to another important question of 9 Conclusion
analyzing the type of activity that was undertaken
during the period of nearly 200 years. Figure 2 presents To the best of author’s knowledge and study of the
a graphical representation of the distribution of type of field, this is the first formal attempt to present a
Saraiki activity over the period of time. Figure 2 should classified and annotated bibliography of the NLP tasks
be viewed in color for better comprehension. Even of the Saraiki language of Pakistan and India. This is
though the data represented in Figure 2 coincides with also the first research activity comprising a period of
the data represented in Figure1, for simplicity, the years 200 years and presenting a bibliography of more than 90
having zero values corresponding to number of research works. Through the current work, the usage of
activities have been removed in Figure 2. This was not terms ‘Saraiki Language NLP’ as well as ‘Saraiki NLP’

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

has also been made for the first time in research a resource-poor language slowly proceeding towards
community. The resource-scarceness of Saraiki decline. It is also concluded that the last decade has
language is evident from the fact that the popular seen a remarkable increase in the Saraiki NLP activities,
translator facility by Google. provides translation to and though.
from 103 languages.
The current work laments the lack of sufficient
These also include close relatives of Saraiki research activities for Saraiki NLP but it does not
language like Pashto, Sindhi, Punjabi and Urdu, though lament the lack of NLP activities of any other language.
Saraiki itself is not included. Even though not listed in It does not advocate the inadequacy or pre-eminence of
any research paper so far, the current work advocates any one language over any other language. It is just a
the possible comparison of dialects of Saraiki language research work undertaken with an academic perspective
too. These dialects can include Multani, Riasti, Jafri, to communicate the first results in the area of Saraiki
Hindki, Thali and Bahawalpuri. There is also a language NLP activities. One of the objectives of the
possibility, which the research world has not seen yet, paper is to definitely promote and encourage the NLP
of comparison of Saraiki language with other closely research activities in Saraiki language. It is believed
related languages like Persian, Arabic, Balochi, Majhi, that the current work will have an excellent ripple effect
Pothohari and Dogri. in the domain of NLP research activities of languages of
Pakistan, Afghanistan as well as North India. In the
A collection of nearly 100 NLP based works of current work, best attempt has been made for presenting
Saraiki language was done for a period of nearly 200 as well as including exhaustive coverage of the NLP
years. The collected data was subjected to thorough activities of the Saraiki language. The current work is
analysis and it was found that though not much NLP intended to be a stepping stone and one-stop resource
research work has been done for the Saraiki language, for NLP research in the Saraiki language. This paper,
the research instances found in the process could be being authored in English language unlike many
divided into six different categories corresponding to previous papers in domain of Saraiki NLP, is also an
the Saraiki language’s Corpus Development, Evolution, attempt to bring the Saraiki language into main stream
MTS, Phonetics, Script and Social Science. It is research and put it on the international platform for
concluded based on the analysis of the current work that further exploration and research.
good number of attempts have been made to trace the
origins of Saraiki language, study influence of other References
languages on Saraiki language, study comparison of [1] Abbasi A.W., Quran Kareem Ma tarjama
Saraiki language with other languages and analyze the Manzoom (Translated from Urdu: The Holy
social impact of Saraiki language usage. Urdu, English, Quran Translation), Saraiki hand written
Punjabi, Sindhi, Russian and French languages have manuscript
been found to have parallel corpus with the Saraiki [2] Adward, A Glossary of Multani Language,1881
language. Hence, on the side lines of these works, [3] Akhter N., Saraiki Zarb-ul-amsaal aur Saraiki
parallel corpus development, non-parallel corpus Vasaib: Aik Tehqeeqi-o-tanqeedi Jaiza
development as well as the study of rules of usage of (Translated from Urdu: Saraiki Proverbs and
Saraiki language too have been studied and presented. Saraiki Culture: A Research and Critical
These in turn have proved to be useful in current times Survey), published by Multan’s Saraiki Adabi
for typical Saraiki NLP activities like study of Board, MPhil thesis accepted by Bahauddin
phonetics, study of nasality, MTS, POS as well as OCR. Zakariya University, Multan
The current work strongly recommends and predicts the [4] Alam M.F., Gill A., Pragmatic Transfer: An
core Saraiki NLP activities dealing with development of Interlanguage Study Of Pashto And Siraiki
Saraiki language resources and approaches for English Language Learners, Gomal University
WordNet, Named Entity Recognition (NER), Journal of Research, Special Issue III, ISSN:
Coreference Resolution, Morphological Segmentation 1019-8180, pages 143-151
and Word Sense Disambiguation (WSD), to name a [5] Anonymous, Download Saraiki font and
few. keyboard for Windows and Android, Available
online: http://saraiki.jimdo.com
Through the current research work, it is concluded [6] Anonymous, Saraiki Alphabet with Gurmukhi
that the area of Saraiki language phonetics has drawn equivalents, Available online:
maximum attention of researchers so far while MTS and http://www.user.unihannover.de/nhtcapri/wester
OCR are the most upcoming areas. It is further npanjabialphabet.html
concluded that in spite of various formal and informal
attempts to research the Saraiki language, it still remains

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

[7] Anonymous, Saraiki online transliteration, into Saraiki Linguistics), Saraiki Adabi Board,
Available online: Multan, 1985
http://www.sanlp.org/saraikiMT/saraikiMT.aspx [22] Haq M.A., Saraiki Zaban dey Qaidey Qanoon
[8] Awan M.S., Baseer A., Sheeraz M., Outlining (Translated from Urdu: The Grammatical Rules
Saraiki Phonetics: A Comparative Study of of Saraiki Language), Saraiki Adabi Board,
Saraiki and English Sound System, Language in Multan, 1984
India, ISSN: 1930-2940, vol. 12, issue 7, July [23] Haq M.A., Saraiki aur us ki Hamsaya Ilaqai
2012; pages 120-136 Zubanein (Translated from Urdu: The Saraiki
[9] Bahawalpuri Z., Saraiki Zuban ka Irteqa Language and Its Neighboring Regional
(Translated from Urdu: The Evolution of the Languages), Saraiki Adbi Board, Multan, 1977
Saraiki Language), Markaz-e-Saraiki Zuban-o- [24] Hussain S.S., Farooq M., The Story of Language
Adab, Bahawalpur, 1970 Conflicts in Pakistan: A Case Study of Saraiki
[10] Bahl P., Étude de phonétique historique et and Punjabi Languages, Pakistan Journal of
expérimentale des consonnes injectives du Social Sciences, vol. 35, issue 1, 2015, pages
Multani (Translated from French: A study of 287-295
historical and experimental phonetics of injective [25] Hussain S.S., Khan B.M., Integrated Linguistic
consonants in Multani), MA thesis, A. Links of Pushto and Saraiki Languages, Pakistan
Maisonneuve, Paris, 1936 Journal of Social Sciences, vol. 36, issue 2, 2016,
[11] Bakhsh H.F.E., Saraiki zabaan kee vajah e pages 653-663
tasmiyyah (Translated from Urdu: The Name [26] Hussain S.S., The Growth of Saraiki Language,
Story of the Saraiki Language), Daily Bashaarat, Pakistan Journal of Social Sciences, vol. 36,
Muzaffargarh, December 1960 issue 1, 2016, pages 387-396
[12] Bhaiya, Ahmad B., Siraiki Qavaid tey zubandani [27] Hussain Z., Asif S.I., Rehman R.A., Hanif M.,
(Translated from Urdu: The Rules and Transferring Values to the Saraiki Farmers
Linguistics of the Saraiki Language), Siraiki through Folk Tales, Interdisciplinary Journal Of
Adbi Majlis, Bahawalpur, Pakistan, 1998 Contemporary Research In Business, Institute of
[13] Brien O., A Glossary of the Multani Language, Interdisciplinary Business Research, vol. 4, issue
Punjab Government Press, 1988 11, March 2013, pages 670-679
[14] Brien O., A Glossary of the Multani Languages, [28] Hussain Z., Haneef M., Asif M., Naz S., Gender
Revised and Re-arranged by Wilson, J and Kaul, Construction in Saraiki Folktales: A Case Study
Pandit Hari Kishen, Lahore, Pakistan, 1903 from District Muzzafar Garh, Applied Sciences
[15] Brien O., Mughal S., Siraiki Urdu Dictionary, and Business Economics, ISSN: 2312-9832, vol.
Jhoke Publishers, Multan, Pakistan, 2001 2, issue 2, pages 8-15
[16] Farooq M., An Acoustic Phonetic Study of Six [29] Hussaini N.A.Z., Ma’aarif e Saraiki (Translated
Accents of Urdu in Pakistan, a thesis accepted from Urdu: The Knowledge of Saraiki
for M. Phil in Applied Linguistics, Department Language), Bahawalpur, 1972
of English Language and Literature, School of [30] Hussaini N.A.Z., Maarif Saraiki (Translated
Social Sciences and Humanities, University of from Urdu: The Knowledge of Saraiki
Management and Technology, Lahore, Pakistan, Language), Saraiki Abadi Board, Multan, 1972
2014 [31] Imtiaz A.S., Siraiki Language and Ethnic
[17] Gardezi, Hassan N., Siraiki Language and its Identity, Journal of Research, Faculty of
poetics: An Introduction, Sangat Publishers, Languages and Islamic Studies, Bahauddin
London 1996 Zakariya University, Multan, Pakistan, vol. 7,
[18] Gilani M.H., Historical Background of Saraiki 2005
Language, Pakistan Journal of Social Sciences, [32] Jampuri, Saraiki Poetry, Bazm-e-Saqafat,
vol. 33, issue 1, 2013, pages 61-76 Multan, 1968
[19] Gilani M.H., The Saraiki Translations of the [33] Jan M.T., Saleem Y., Optical Character
Holy Quran with its Varieties, Pakistan Journal Recognition (OCR) System For Saraiki
of Social Sciences, vol. 33, issue 2, 2013, pages Language Using Neural Networks, University of
255-264 Engineering and Technology Taxila. Technical
[20] Haq A.M., Multani Zuban Aur Uska Urdu Se Journal, vol. 21, issue 3, September 2016, pages
Taalluq (Translated from Urdu: Multani 106-109
Language and its Relation with Urdu), Urdu [34] Jumani N.B., Rehman R., Iqbal M.J., Chishti S.,
Academy, Bahawalpur, 1967 Effects of Native Language Seraiki on English
[21] Haq M.A., Saraiki Dian Mazeed Lisani Language Pronunciation, International Journal of
Tehqeeqan (Translated from Urdu: More Insights

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

Business and social Sciences, vol. 2, issue 8, [48] Mughal S., Saraiki diyaan khaas aavaazaan dee
2011 kahaanee (Translated from Punjabi: The story of
[35] Kalanchavi D., Saraiki Aur Uss ke Nasar Peculiar Sounds in Saraiki), Jhok Publishers,
(Translated from Urdu: The curriculum of the Multan, 2002
Saraiki Language), Academy Saraiki Adab, [49] Mughal S., Saraiki Zuban De Hawaley
Bahawalpur, 1982 (Translated from Punjabi: The Review
[36] Kalanchvi D., Saraiki Lisaniat (Translated from Description of the Saraiki Language), (Hussain
Urdu: Saraiki Linguistics), Academy of Saraiki S.S., Intervieweur), February, 2010
Literature, Bahawalpur, 2008 [50] Nasir N.K., Saraiki Shaire Daa Irtiqa (Translated
[37] Kalanchvi D., Saraiki tarjameen wala Quran from Punjabi: The Evolution of the Sharayi style
Shareef (Translated from Urdu: The Holy Book of Poetry of the Saraiki Language), Saraiki Adbi
Quran in the Saraiki Language), Ibn-e-Kaleem Board, Multan, 1977
publishers, Multan, 2000 [51] Nasir S., Nasals and Nasality in Saraiki,
[38] Kalanchvi D., Saraiki Zaban tey Adab Language in India, ISSN: 1930-2940, vol. 12,
(Translated from Urdu: The Saraiki Language issue 7, July 2012; pages 323-347
and Literature), Saraiki Adabi Majlis, [52] Nasir S.A., Voice onset time of plosives in
Bahawalpur, 1987 Saraiki: Implications for the acquisition of
[39] Khan J. A., Saraiki Saqafat (Translated from English aspiration contrast, ELF Annual
Urdu: The Saraiki Culture), Roohani Art Press, Research Journal, Department of English Shah
Multan, 1995 Abdul Latif University Khairpur, Sindh, vol. 14,
[40] Khokhlova L.V., Majority language death, 2012
Language Documentation & Conservation [53] Pandey A., Preliminary Proposal to Encode the
Special Publication No. 7, Language Multani Script in ISO/IEC 10646, ISO/IEC
Endangerment and Preservation in South Asia, JTC1/SC2/WG2 N4027, L2/11-124, Unicode,
ed. by Cardoso H.C., January 2014, pages 19-45 April 2011
[41] Latif A., Phonemic Inventory of Siraiki [54] Pandey A., Proposal to Encode the Multani
Language and Acoustic Analysis of Voiced Script in ISO/IEC 10646, ISO/IEC
Implosives, Center for Language Engineering, JTC1/SC2/WG2 N4159, L2/12-316, Unicode,
Al-Khawarizmi Institute of Computer Science, September 2012
University of Engineering and Technology, [55] Parveen S., Anjum R.Y., Vocabulary Bank for
Lahore, Pakistan English Text Books in Major Pakistani
[42] Lewis P.M., Simons G.F., Fennig C.D., eds., Languages, 2nd International Conference of the
Saraiki, Ethnologue, 19 ed., 2016, Available Linguistic Association of Pakistan organized at
online: http://www.ethnologue.com/language/skr Institute of Languages, University of the Punjab,
[43] Mahar M.A., Hussain S.S., Shah A., Diverse Lahore, 2016
Mentions of Crane Bird in Saraiki Poetry, [56] Pervaiz S.H., Saraiki zabaan o adab kee muxtasar
Pakistan Journal of Social Sciences, vol. 36, taareex (Translated from Urdu: A brief history of
issue 1, 2016, pages 445-450 the Saraiki language and literature), National
[44] Malik M.G.A., Boitet C., Bhattacharyya P., Language Authority, Islamabad, 2001
Finite-state Scriptural Translation, published by [57] Qais F., Saraiki Zuban da Wewra (Translated
Association for Computational Linguistics, from Urdu: The Description of the Saraiki
ACM, USA in the proceedings of the 23rd Language), Nawa-e-Kaleem, Kot Mithan, 1993
International Conference on Computational [58] Rahman T., The Siraiki language movement in
Linguistics (COLING '10), Beijing, China, Pakistan, Language Planning and Language
August 2010, pages 791-800 Problems, vol. 19, issue 1, 1995
[45] Molana S.M., Par Awal Mutarjam Ba muhawra [59] Rasoolpuri A., Saraiki Rasm-ul-Khat ki
Bazaboon-e-Saraiki (Translated from Urdu: The Mukhtasar Tareekh (Translated from Urdu: Brief
Translation of Idioms of the Farmers of Saraiki History of the Saraiki Script), Multan: Bazm-e-
Language), Lahore Gilani Press, 1925 Saqafat, 1976
[46] Monnier J.A., Glossary of the Multani language [60] Rasoolpuri M.A., Saraiki- Adab-Wich- Ma’anee-
compared with Punjabi and Sindhi, printed at the da- Pandh (Translated from Urdu: The Saraiki
Punjab Government Civil Secretariat Press, Literature) , Saraiki Publications, Rasoolpur,
Lahore, 1881 2014
[47] Moran S., McCloy D., Wright R., eds., Seraiki [61] Rasoolpuri M.A., Saraiki zabaan ate lisaaniyaat
sound inventory, Max Planck Institute for (Translated from Urdu: Saraiki Language and
Evolutionary Anthropology, Leipzig, 2014 Linguistics), Jhok Publishers, Multan, 2010

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

[62] Rassoolpuri M.A., Saraiki Zaban Unda [76] Syed N.A., Quantity-sensitivity and left-
Rasmulkhat Te Awazan (Translated from Urdu: headedness: A study of Arabic loanword
The Script and Sounds of the Saraiki Language), adaptations in Saraiki, 2nd International
1980, page 72 Conference of the Linguistic Association of
[63] Rauf S., Hameed A., Habib T., Hussain S., Pakistan organized at Institute of Languages,
District names speech corpus for Pakistani University of the Punjab, Lahore, 2016
Languages, published by IEEE in the [77] Syed N.A.R., Malik R., Abbas M.Y.A., Role of
proceedings of Oriental COCOSDA held jointly markedness and a dominant language in L1
with 2015 Conference on Asian Spoken attrition, Available online:
Language Research and Evaluation http://www.phonology.info/OCP/abstracts/Syed_
(OCOCOSDA/CASLRE), Shanghai, China, OCP.pdf
October 2015, DOI: [78] Wagha A., The Siraiki Language Its Growth and
10.1109/ICSDA.2015.7357893 Development, Derawar Publications, Islamabad,
[64] Raza G., Etymology of the Saraiki Language Pakistan 1990
Name, Journal of Linguistics & Literature, vol. [79] Zami B.A., Saraiki zabaan kaa irtiqaa
1, issue 1, 2016, pages 61-81 (Translated from Urdu: Evolution of the Saraiki
[65] Raza G., Prospects of Machine Translation for language), Markaz Saraiki Zaban ka Adab,
Pairs of Pakistani Languages, 2nd International Bahawalpur, 1970
Conference of the Linguistic Association of [80] Khan A.H., Shami J.U., The European Concept
Pakistan organized at Institute of Languages, of the Siraiki Language and its Political
University of the Punjab, Lahore, 2016 Significance, The Journal of Political Science,
[66] Raza G., Reduction of Compound Adpositions in Department of Political Science, Government
Persian, Urdu and Saraiki, presented at the Sixth College, Lahore, Pakistan, 1996, vol. 19, issue 1
International Contrastive Linguistics Conference, and 2, pages 97-105
Berlin, Germany, October, 2010 [81] Perveen R., Mind your language: HIV
[67] Sajjad H., Saraiki Zuba-o-Adab ki Mukhtasir communication in Pakistani languages
Tareekh (Translated from Urdu: A Brief History [82] Anonymous, The Translation of the Holy Bible
of the Language and Literature of Saraiki in Siraiki, 1818
Language), Muqtadra Qaumi Zuban, Islamabad, [83] Skemp F.W., Multani Stories: Collected and
2001 Translated by Skemp, Journal of the Royal
[68] Saraiki Dialect, Wikipedia Inc., Available online: Asiatic Society, vol. 50, issue 3-4, October 1918,
https://en.wikipedia.org/wiki/Saraiki_language pages 626-628, published online March 2011,
[69] Shackle C., Siraiki language, Encyclopedia DOI: 10.1017/S0035869X00052205
Britannica, 2014 [84] Amritsari M., Multani Proverbs, Sayings and
[70] Shackle C., Siraiki, A language movement in Verses (Saraiki Muhaware), Available online:
Pakistan, Modern Asian Studies, 11-3, 1977 http://gupshup.org/gs/cultureartsliteratureandling
[71] Shackle C., The Multani Marsiya, Der Islam, uistics/612611multaniproverbssayingsversessarai
Journal of the History and Culture of the Middle kimuhaware.html
East, De Gruyter, vol. 55, issue 2, January 1978, [85] Khawaja Z.F., Saraikipedia, Available online:
pages 281-311, DOI:10.1515/islm.1978.55.2.281 http://saraikipedia.blogspot.in
[72] Shackle C., The Siraiki Language of Central [86] Shackle C., Styles and themes in Siraiki Mystical
Pakistan A Reference Grammar, School of Poetry of Sind, Bazme-Saqafat, Multan, pages 5-
Oriental and African Studies, University of 6
London, London, UK, 1976 [87] Samarnov U.A., The Lehnda Language
[73] Shoukat M., Aao Saraiki Parhoon te Saraiki (originally in Russian; translated into English by
Likhoon (Translated from Hindi: Come to Read Sipan E.H.T. and translated into Saraiki by
and Write Saraiki Language), Jhoke Publishers, Kalanchavi D., Saraiki Bagho Bahran,
Multan, 2004 Bahawalpur Academy, Siraiki Adab, 1995)
[74] Singh A., Kaur J., Review for Dialects in Punjabi [88] Shackle C., A Century of Seraiki Studies
Language, International Journal of Innovations & (originally in English; translated into Saraiki by
Advancement in Computer Science, ISSN: Kalanchavi D., Siraiki Mutalay de So Saal,
2347–8616, vol. 5, issue 8, August 2016, pages Bahawalpur Academy, Siraiki Adabi Majlis,
25-30 Bahawalpur, 1986)
[75] Syed N. A., Nasalization in Saraiki, MA thesis [89] Khan H.A., Siraiki Identity Construction in
accepted by University of Essex, UK, 2009 South Punjab

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018


Saini An Exhaustive Meta-analytical Study of the History, Evolution and Development of ‘Saraiki NLP’

[90] Singh H., A Comparative Study of Majhi and [93] Afsheen, Irtza S., Farooq M., Hussain S., Accent
Multani, PhD thesis submitted to and accepted Classification among Punjabi, Urdu, Pashto,
by Punjab University, Chandigarh, December Saraiki and Sindhi Accents of Urdu Language,
1968 Center for Language Engineering, Al-
[91] Shackle C., Siraiki and Siraiki Literature, c. Khawarizmi Institute of Computer Science,
1750-1900, in Upper Sind and South-West University of Engineering and Technology,
Panjab, an unpublished Ph.D. thesis, School of Lahore, Pakistan, Available Online:
Oriental and African Studies, University of http://www.cle.org.pk/Publication/papers/2014/cl
London, 1972 t14_submission_25.pdf, 2014
[92] Duong L.T., Natural Language Processing for [94] Farooq M., Hussain S., Acoustic Analysis of
Resource-Poor Languages, a PhD thesis accepted Corner Vowels in Six Indigneous Languages of
by The Department of Computing and Pakistan, Journal of Research in Social Sciences,
Information System, The University of ISSN: 2305- 6533, June, 2018 vol. 6, issue 2,
Melbourne, Melbourne, Australia, October 2017 pages 18-46, June 2018

INFOCOMP, v. 17, no. 2, p. 41-55, December 2018

S-ar putea să vă placă și