Sunteți pe pagina 1din 53

From the book: Integration-Ready Architecture and Design, by Cambridge University Press, IS !

"#$%#$#&'(, )e** +huk

http://javaschool.com/about/publications.html
Dave: Open the pod bay door, Hal... Hal: I'm sorry, Dave. I'm afraid I can't do that. , A conversation bet-een a man and a com.uter *rom the movie /$""%: A S.ace 0dyssey1 by Stan2ey 3ubrick

Voice Technologies on the way to a Natural User Interface 4his cha.ter is about s.eech techno2ogies and re2ated APIs: 5oice678, SA84, )ava S.eech API, 7S S.eech SD39 It 2ooks into uni*ied scenarios -ith audio and video inter*ace de*initions: considers design and code e;am.2es: and introduces im.ortant ski22s *or a ne- -or2d o* -ired and -ire2ess s.eech a..2ications9 What is a Natural User Interface? Is it another set o* tags and ru2es covered by the nice name< Abso2ute2y not= 4his time end users, not a standards committee, make the determination o* -hat they .re*er *or their methods o* interaction9 A natura2 user inter*ace >!UI? a22o-s end users to give their .re*erences at the time o* the service re@uest, and change them *2e;ib2y9 Are you a /com.uter1 .erson< 7y guess is that you are, because you are reading this book9 /Com.uter 2iterate1 *o2ks 2ike you and me enAoy e;.2oring the ca.acities o* com.uter .rograms via traditiona2 inter*aces9 Bven so, there are sti22 times, such as on vacation, on the go, and in the car, -hen even -e -ou2d .re*er /hands *ree1 conversations over using keyboards to access com.uteriCed services9 0ne .erson .re*ers hand-riting, and someone e2se is com*ortab2e -ith ty.ing9 0ne -ou2d 2ike to *orget key-ords and use common sense termino2ogy instead9 Can a com.uter understand that /*ind1 is the same as /search1 and / ob1 is actua22y /Robert1< Can it understand that someone has chosen a *oreign >nonBng2ish? 2anguage to interact -ith it< A natura2 user inter*ace -i22 be ab2e to o**er a22 these .ossibi2ities9 Some o* these com.2e; tasks can be addressed in a uni*ied -ay -ith Audio-5ideo Inter*ace >A5I? scenarios9 First, 2et us 2ook into )ava detai2s o* the voice inter*ace, one .art o* a !UI9 A signi*icant .art o* the .o.u2ation -ou2d 2ike a voice inter*ace as the most natura2 and .re*erred -ay o* interaction9 Speaking with style

Cha.ter D introduced te;t-to-s.eech conversion9 Ee used the Free44S )ava S.eech API im.2ementation to -rite sim.2e )ava code *or a voice com.onent, but the sound o* the voice may not have been terrib2y im.ressive9 Ehat can the )ava S.eech API >)SAPI? F%G o**er to enhance a voice com.onent< Here are t-o 2ines o* the source >*rom the cha.ter D? that actua22y s.eak *or themse2ves: Voice talker = new CMUDiphoneVoice(); talker.speak(speech); Remember that the 5oice c2ass is the main ta2ker9 Ee can create an obAect o* the 5oice c2ass -ith *our *eatures: voice name, gender, age, and s.eaking sty2e9 4he voice name and s.eaking sty2e are both String obAects, and the synthesiCer determines the a22o-ed contents o* those strings9 4he gender o* a voice can be IB!DBRJFB7A8B, IB!DBRJ7A8B, IB!DBRJ!BU4RA8, or IB!DBRJD0!4JCARB9 Iender neutra2 means some robotic or arti*icia2 voices9 Ee can use the KdonLt careK va2ue i* the *eature is not im.ortant and -e are /031 -ith any avai2ab2e voice9 4he age can be AIBJCHI8D >u. to %$ years?, AIBJ4BB!AIBR >%'-%M?, AIBJN0U!IBRJADU84 >$"-D"?, AIBJ7IDD8BJADU84 >D"-O"?, AIBJ08DBRJADU84 >O"P?, AIBJ!BU4RA8, and AIBJD0!4JCARB9 Here is the e;am.2e o* a -omanQs voice se2ection -ith a /greeting1 voice sty2e: Voice("Julie", GE DE!"#EM$%E, $GE"&'U GE!"$DU%(, ")reetin)"); !ot a22 the *eatures o* )ava S.eech API are im.2emented yet, as o* the time o* -riting9 Here is the rea2ity check: the *atch() method o* the Voice c2ass can test -hether an engine-im.2ementation has suitab2e .ro.erties *or the voice9 Fig9%$-%9Aava i22ustrates the .oint9
/** * The getVoices() method creates a set of voices according to initial age and gender parameters * The method checks if the voice if available. * If requested age or gender is not available the default voice parameters are set. * @param gender * @param ages * @return voices */ public Voice ! getVoices(int ! gender" int ! ages) # // make sure that at least default set

if(gender $$ null) # gender $ ne% int &!' gender (! $ )*+,*-.,/+T.01-*' 2 if(ages $$ null) # ages $ ne% int &!' ages (! $ 1)*.,/+T.01-*' 2 Voice ! voices $ ne% Voice ages.length * gender.length!' // all combinations 34nthesi5er6ode,esc desc $ ne% 34nthesi5er6ode,esc(7ocale.*+)7I38)' Voice ! availableVoices $ desc.getVoices()' // tr4 to set requested voices for (int i $ (' i 9 ages.length' i::) # for (int ; $ (' ; 9 gender.length' ;::) # int k$(i:&)*;' // current voice inde< // tr4 to set voice according to requirements // and check availabilit4 boolean available $ false' // start from gender voices k!.set)ender(gender ;!)' for(int n$(' =available >> n9availableVoices.length' n: :) # if (voices k!.match(availableVoices n!)) # available $ true' 2 2 if(=available) # voices k!.set)ender()*+,*-.,/+T.01-*)' 2 // continue %ith ages voices k!.set1ge(ages i!)' for(int n$(' =available >> n9availableVoices.length' n: :) # if (voices k!.match(availableVoices n!)) # available $ true' 2 2 if(=available) # voices k!.set1ge(1)*.,/+T.01-*)' 2 2 2 // at this point all requested voices set to requested parameters or to default return voices' 2

[Fig.1 !1" 4he get5oices>? method creates a set o* voices according to initia2 age and gender .arameters9 I* the re@uested age or gender is not avai2ab2e, the de*au2t voice .arameters are set9

Ee can use this method in the scenario o* mu2ti.2e actors that have di**erent gender and ages9 +actors na*e=,a)e, -alue=, $GE"(EE $GE! . $GE"M/DD%E"$DU%(, 01 +actors na*e=,)en2er, -alue=,GE DE!"#EM$%E . GE DE!"M$%E, 01 4hese t-o scenario 2ines de*ine arrays o* age and gender arguments that .roduce *our actor voices9 #a$aT% Speech &'I %arkup (anguage An even more intimate contro2 on voice characteristics can be achieved by using the Ja-a(M 3peech $4/ Markup %an)ua)e F$G >)S78?9 )S78 is a subset o* 678 that a22o-s a..2ications to annotate te;t that is to be s.oken, -ith additiona2 in*ormation9 Ee can set .rosody rate or s.eed o* s.eech9 For e;am.2e: &our +e*phasis1Unite2 $irlines+0e*phasis1 5li)ht is sche2ule2 toni)ht +proso26 rate="789:"1at ;<;9p*+0proso261 /t is al*ost =p* now. Goo2 ti*e to )et rea26. 4his *riend2y reminder em.hasiCes the air2ine com.any name, and s2o-s do-n the voice s.eed $"R -hi2e .ronouncing the de.arture time9 According to the )ava S.eech API, the synthesiCerQs speak() method understands )S789 )S78 has more e2ement names or tags, in addition to e*phasis and proso26. For e;am.2e, the 2i- marks te;t content structures such as .aragra.h and sentences9 >ello +2i- t6pe="para)raph"1>ow are 6ou+02i-1 4he sa6as tag .rovides im.ortant hints to the synthesiCer on ho- to treat the te;t that *o22o-s9 For e;am.2e, /'S#1 can be treated as a number or as a date9 +sa6as class="2ate<2*"1?0@+0sa6as1 +A77 spoken as "Ma6 thir2" 771 +sa6as class="nu*Ber"1C08+0sa6as1 +A77 spoken as "hal5" 771 +sa6as class="phone"1C8?=@DE;F9C+0sa6as1 +A77 spoken as "C78?=7@DE7;F9C" 771 +sa6as class="net<e*ail"1Ge55.HhukIGa-aschool.co*+0sa6as1 +A77 spoken as "Je55 2ot Jhuk at Ja-aschool 2ot co*" 771

+sa6as class="currenc6"1KF.F@+0sa6as1 +A77 spoken as "nine 2ollars ninet6 5i-e cents" 771 +sa6as class="literal"1/!3+0sa6as1 +A77 spoken as character7B67character "/.!.3" 771 +sa6as class="*easure"1D@k)+0sa6as1 +A77 spoken as "siLt6 5i-e kilo)ra*s" 771 In addition, a voice tag s.eci*ies the s.eaking voice9 For e;am.2e: Tvoice genderUKma2eK ageUK$"KV Do you -ant to send *a;<TSvoiceV Nou can see that -e can de*ine the s.eaking voice in our )ava code >see Fig9%$%? or in )S789 Ehich -ay is .re*erab2e< For the *amous /He22o Eor2d1 a..2ication, it is easier to s.eci*y voice characteristics direct2y in the code9 )S78 -ou2d be the better ans-er *or rea2-2i*e .roduction @ua2ity a..2ications: it gives more contro2 o* s.eech synthesis9 #S%( Factory as one of the &VIFactory i)ple)entations )S78 based te;t can be generated on the *2y by an a..ro.riate 2ight-eight .resentation 2ayer com.onent o* the a..2ication9 !o adAustment o* the core services is re@uired9 Ee can use 678 Sty2e Sheet 8anguage *or 4rans*ormations >6S84? to automatica22y convert core service contents into H478 or )S78 *orms9 Fig9%$-$ sho-s the 0bAect 7ode2 Diagram *or such an a..2ication9

[Fig.1 ! " 4he A5IFactory inter*ace on the 2e*t side o* the Fig9%$-$ has mu2ti.2e im.2ementations *or audio and video .resentation *ormats9 Any A5IFactory im.2ementation trans*orms data into )S78, H478, or other .resentation *ormats9 4he ScenarioP2ayer c2ass .2ays a se2ected scenario and uses the ServiceConnector obAect to invoke any service that retrieves data according to a scenario9 4he ScenarioP2ayer c2ass creates the .ro.er A5IFactory .resenter obAect to trans*orm data into the .ro.er audio or video .resentation *ormat9 In this cha.ter, -e -i22 2ook into an e;am.2e o* the AudioFactory -hi2e in the cha.ter %D -e -i22 consider im.2ementation e;am.2es o* the ScenarioP2ayer and other c2asses9 Speech Synthesis %arkup (anguage Is )S78 the best -ay to re.resent voice data< Ee have to ask severa2 more @uestions be*ore -e can ans-er this one9 For e;am.2e, -hat are the current standards in the s.eech synthesis and recognition area<

S.eech Synthesis 7arku. 8anguage >SS78? F'G is an u.coming E'C standard: the SS78 S.eci*ication 5ersion %9" might be a2ready a..roved by no-9 )S78 and SS78 are not e;act2y the same >sur.rise=?9 oth are 678-based de*initions *or voice synthesis characteristics9 Do not .anic= It is re2ative2y easy to ma. SS78 to )S78 *or at 2east some voice characteristics9 An e;am.2e o* an SS78 document is .rovided in Fig9%$-'9;m29
9?<ml version$@&.(@ encoding$@I3/ABBCDA&@?E 9=AA 3367 ,ocumentAAE 9speak version$@&.(@ <mlns$@httpF//%%%.%G.org/H((&/&(/s4nthesis@ <mlFlang$@enAI3@E 9sentenceE 6essage from 9prosod4 rate$@AG(J@E ,eena 6alkin 9/prosod4E delivered 9sa4Aas t4pe$@date@E (H/&G/H((G 9/sa4AasE research paper on 9prosod4 rate$@AG(J@E 3367 9/prosod4E 9/sentenceE 9/speakE

[Fig.1 !*" 4he di**erences bet-een SS78 and )S78 are very visib2e but simi2arities are even more im.ressive9 Here is a brie* revie- o* basic SS78 e2ements9 au2io , a22o-s insertion o* recorded audio *i2es Break - an em.ty e2ement to set the .rosodic boundaries bet-een -ords e*phasis - increases the 2eve2 o* stress -ith -hich the contained te;t is s.oken *ark - a s.eci*ic re*erence .oint in the te;t para)raph - indicates the .aragra.h structure o* the document phone*e - .rovides the .honetic .ronunciation *or the contained te;t proso26 - contro2s the .itch, rate, and vo2ume o* the s.eech out.ut sa67as - indicates the ty.e o* te;t contained in the e2ement: *or e;am.2e, date or number sentence - indicates the sentence structure o* the document

speak , inc2udes the document body, the re@uired root e2ement *or a22 SS78 documents suB , substitutes a string o* te;t that shou2d be .ronounced in .2ace o* the te;t contained in the e2ement -oice - describes s.eaking voice characteristics Here is an e;am.2e o* another SS78 document: +ML*l -ersion="C.9" enco2in)="/3'7;;@F7C"M1 +speak -ersion="C.9" L*lns="http<00www.w?.or)0899C0C90s6nthesis" L*l<lan)="en7U3"1 +sentence1 &our 5rien2l6 re*in2er +proso26 pitch=,hi)h, rate=,789, -olu*e=,loa2,1it is onl6 +sa67as t6pe=,nu*Ber,1 @ +0sa67as1 2a6s le5t till the Valentine 2a6+0proso261 +0sentence1 +0speak1 Fro) SS%( to #S%( A sim.2e e;am.2e o* ma..ing SS78 to )S78 is .rovided -ith the SS78to)S78AudioFactory c2ass, -hich im.2ements the A5IFactory inter*ace and re.resents a version o* the AudioFactory, Fig9%$-D9Aava9
// 3367toK367Lactor4 package com.its.connector' import ;ava<.speech.*' import ;ava<.speech.s4nthesis.*' import ;ava.util.7ocale' /** * The 3367toK367Lactor4 class is to transform 3367 data into K367 format * 1nd present the data * @author Keff.Mhuk@Kava3chool.com */ public class 3367toK3671udioLactor4 implements 1VILactor4" 3peakable # private 3tring source' private 3tring ;sml' // formatted K367 string private 34nthesi5er s4nthesi5er' // private Voice ! voices' // optional

// private int ! ages $ #1)*.T**+1)*-"1)*.6I,,7*.1,I7T2' // private int ! gender $ #)*+,*-.L*617*" )*+,*-.617*2' private 3tring ! 3367map $ #@9sentenceE@"@9/sentenceE@"@9sa4Aas t4pe@"@9/sa4AasE@2' private 3tring ! K367map $ #@9div t4pe$N@sentN@E@"@9/divE@"@9sa4as class@"@9/sa4asE@2 /** * The init() method is to initiate data * @param source */ public void init(3tring source) # this.source $ source' 2 /** * The init0omponents() method initiali5es s4nthesi5er * The method can optionall4 use getVoices() to init voice components */ public void init0omponents() # tr4 # // 0reate a s4nthesi5er for *nglish language s4nthesi5er $ 0entral.create34nthesi5er(ne% 34nthesi5er6ode,esc(7ocale.*+)7I38))' // )et it read4 to speak s4nthesi5er.allocate()' s4nthesi5er.resume()' // voices $ getVoices(gender" ages)' // optional 2 catch(*<ception e) # 34stem.out.println(@3367toK3671udioLactor4.initF*--/creating s4nthesi5er@)' 2 /** * The get8eader() method is to provide a proper presentation header * @return header */ public 3tring get8eader() # return @9?<ml version$N@&.(N@?ENn9;smlENn@' 2 /** * The getOod4() method is to proved a proper bod4 in the presentation format * @return bod4 */ public 3tring getOod4() # // e<tract bod4 from the original source 3tring bod4 $ 3tringer.get3tringOet%eenTags(source" @speak@)' // replace the 3367map cases %ith the K367 tags // 3tringer.replaceIgnore0ase is similar to the replace1ll() of 3tring in &.P for(int i$(' i93367map.length' i::) # bod4 $ 3tringer.replaceIgnore0ase(bod4" 3367map i!"K367map i!)' 2 return bod4' 2

/** * The getLooter() method is to provide a proper footer * @return footer */ public 3tring getLooter() # return @9/;smlE@' 2 // more methods

[Fig.1 !+" 4he SS78to)S78AudioFactory c2ass im.2ements the A5IFactory inter*ace9 4here are at 2east si; methods that must be .rovided in this c2ass9 4he Fig9%$-D dis.2ays *our o* them9 4he init() method initia2iCes an origina2 source9 4he initCo*ponents>? method creates the synthesiCer obAect and o.tiona22y can use )etVoices>? method considered in Fig9%$-% to initia2iCe voice com.onents9 4he )et>ea2er>? method returns the standard )S78 header9 4he )et#ooter>? method returns the standard )S78 *ooter9 4he )etNo26>? method e;tracts a s.eakab2e te;t *rom the origina2 SS78 source9 4he /s.eak1 tags *rame this te;t9 Ee use the 3trin)er.)et3trin)Netween(a)s>? method *or body e;traction9 4he ne;t move is to ma. SS78 tags -ith a..ro.riate )S78 tags9 Ee use t-o string array-ma.s: the SS78ma. array and the corres.onding )S78ma. array9 A sim.2e 2oo. uses the 3trin)er.replace/)noreCase>? method to ma. these tags9 >Ee -i22 consider the Stringer c2ass -ith its methods in cha.ter %D9? 'lay #S%( !ote that the 33M%toJ3M%$u2io#actor6 c2ass im.2ements the 3peakaBle inter*ace9 4he 3peakaBle inter*ace is the s.oken version o* the to3trin)() method o* the 0bAect c2ass9 Im.2ementing the 3peakaBle inter*ace means to im.2ement the )etJ3M%(eLt>? method9 4he )etJ3M%(eLt>? method as -e22 as the pla6>? method are sho-n in Fig9%$-#9Aava
/** * The getK367Te<t() method returns K367 string * @return ;sml */ public 3tring getK367Te<t() # ;sml $ getLooter() : getOod4() : getLooter()' 2 /** * The pla4() method uses the factor4 properties to present the content */

public void pla4() # if(s4nthesi5er $$ null) # init0omponents()' 2 s4nthesi5er.speak(;sml" null)' 2 /** * The get0hoices() method returns an Q67 string %ith e<pected user input * @return <ml */ public 3tring get0hoices() # return null' 2 /** * The getVoices() method creates a set of voices according to initial age and gender parameters * The method checks if the voice if available. * If requested age or gender is not available the default voice parameters are set. * @param gender * @param ages * @return voices */ public Voice ! getVoices(int ! gender" int ! ages) # // make sure that at least default set if(gender $$ null) # gender $ ne% int &!' gender (! $ )*+,*-.,/+T.01-*' 2 if(ages $$ null) # ages $ ne% int &!' ages (! $ 1)*.,/+T.01-*' 2 Voice ! voices $ ne% Voice ages.length * gender.length!' // all combinations 34nthesi5er6ode,esc desc $ ne% 34nthesi5er6ode,esc(7ocale.*+)7I38)' Voice ! availableVoices $ desc.getVoices()' // tr4 to set requested voices for (int i $ (' i 9 ages.length' i::) # for (int ; $ (' ; 9 gender.length' ;::) # int k$(i:&)*;' // current voice inde< // tr4 to set voice according to requirements // and check availabilit4 boolean available $ false' // start from gender voices k!.set)ender(gender ;!)' for(int n$(' =available >> n9availableVoices.length' n: :) # if (voices k!.match(availableVoices n!)) # available $ true' 2 2 if(=available) # voices k!.set)ender()*+,*-.,/+T.01-*)' 2

:) #

// continue %ith ages voices k!.set1ge(ages i!)' for(int n$(' =available >> n9availableVoices.length' n: if (voices k!.match(availableVoices n!)) # available $ true' 2

2 2 // at this point all requested voices set to requested parameters or to default return voices' 2 2 // end of the class

2 if(=available) # voices k!.set1ge(1)*.,/+T.01-*)' 2

[Fig.1 !," Eith the source code .resented in Fig9%$-D, the )etJ3M%(eLt>? method can be im.2emented as a sing2e 2ine that concatenates the header, the trans*ormed body, and the *ooter o* the )S78 string9 4he pla6>? method uses the *actory .ro.erties to .resent the )S78 content9 4he pla6>? method starts by checking -hether the synthesiCer is ready to ta2k9 4hen it uses the synthesiCer obAect to invoke its speak>? method, .assing the )S78 string as one o* the arguments9 4he other argument is a 3peakaBle%istener obAect9 4his obAect can be used to receive and hand2e events associated -ith the s.oken te;t9 Ee do not .2an to use the 3peakaBle%istener obAect in our e;am.2e, so -e .assed the nu22 obAect as the second argument9 It is .ossib2e to use the norma2 mechanisms *or attachment and remova2 o* 2isteners -ith a223peakaBle%istener() and re*o-e3peakaBle%istener() methods9 Speech -ecognition with #a$a S.eech techno2ogies are not 2imited by s.eech synthesis9 S.eech recognition techno2ogies have matured to the .oint that the de*au2t message /P2ease re.eat your se2ection1 is not so common.2ace anymore, and human-com.uter conversation can go beyond mu2ti.2e-choice menus9 4here are recogniCer .rograms *or .ersona2 usage, cor.orate sa2es, and other .ur.oses9 7ost .ersona2 recogniCers su..ort dictation mode9 4hey are s.eakerde.endent, re@uiring /.rogram training1 that creates a Ks.eaker .ro*i2eK -ith a detai2ed ma. o* the userLs s.eaking .atterns and accent9 4hen the .rogram uses this ma. to im.rove recognition accuracy9

4he )ava S.eech API o**ers a !eco)niHer that may, o.tiona22y, .rovide a 3peakerMana)er obAect that a22o-s an a..2ication to manage the 3peaker4ro5iles o* that !eco)niHer9 4he )et3peakerMana)er() method o* the !eco)niHer inter*ace returns the 3peakerMana)er i* this o.tion is avai2ab2e *or this !eco)niHer9 RecogniCers that do not maintain s.eaker .ro*i2es - kno-n as s.eaker-inde.endent recogniCers - return nu22 *or this method9 A sing2e recogniCer may have mu2ti.2e 3peaker4ro5iles *or one user, and may store the .ro*i2es o* mu2ti.2e users9 4he 3peaker4ro5ile c2ass is a re*erence to data stored -ith the recogniCer9 A .ro*i2e is identi*ied by three va2ues: its uni@ue id, its name, and its variant9 4he id and the name are se2*-e;.2anatory9 4he variant identi*ies a .articu2ar enro22ment o* a user, and becomes use*u2 -hen one user has more than one enro22ment, or 3peaker4ro5ile9 Additiona2 data stored by a recogniCer -ith the .ro*i2e might inc2ude: S.eaker data such as name, age, gender, etc9 S.eaker .re*erences Data about the -ords and -ord .atterns o* the s.eaker >2anguage mode2s? Data about the .ronunciation o* -ords by the s.eaker >-ord mode2s? Data about the s.eakerLs voice and s.eaking sty2e >acoustic mode2s? Records o* .revious training and usage history9 S.eech recognition systems >SRS? can 2isten to users and, to some degree, recogniCe and trans2ate their s.eech to -ords and sentences9 Current s.eech techno2ogies have to constrain s.eech conte;t -ith )ra**ars. 4oday, the systems can achieve /reasonab2e1 recognition accuracy on2y -ithin these constraints9 The #a$a Speech .ra))ar For)at. 4he Ja-a(M 3peech Gra**ar #or*at >)SIF? FDG is a .2at*orm and vendor inde.endent -ay o* describing a rule )ra**ar >a2so kno-n as a co**an2 an2 control grammar or re)ular )ra**ar?9 A ru2e grammar s.eci*ies the ty.es o* utterances a user might say9 For e;am.2e, a service contro2 grammar might inc2ude /Service,1 and KAction1 commands9 A voice a..2ication can be based on a set o* scenarios9 Bach scenario kno-s the conte;t and .rovides a..ro.riate grammar ru2es *or the conte;t9 Irammar ru2es can be .rovided in mu2ti-2ingua2 manner9 For e;am.2e: +)reetin)s.en)lish.hello1 +)reetin)s.russian.pri-et1

+)reetin)s.2eutsch.)uten(a)1 /He22o,1 /Privet,1 and /Iuten4ag1 are tokens in the grammar ru2es9 4okens de*ine e;.ected -ords that may be s.oken by a user9 4he -or2d o* tokens *orms a -ocaBular6 or leLicon9 Bach record in the vocabu2ary de*ines the pronunciation o* the token9 A sing2e *i2e de*ines a sing2e grammar -ith its header and body9 4he header de*ines the )SIF version and >o.tiona22y? encoding9 For e;am.2e: OJ3G# VC.9 /3';;@F7@; 4he grammar starts -ith the grammar name and is simi2ar to Aava .ackage names9 For e;am.2e: )ra**ar co*.its.scenarios.eLa*ples.)reetin)s: Ee can a2so i*port grammar ru2es and .ackages as is usua22y done in )ava code: i*port +co*.its.scenarios.eLa*ples.c6c.P1 ; 00 talk to knowle2)e Base 4he grammar body de*ines rules as a ru2e name *o22o-ed by its de*inition-token9 4he de*inition can inc2ude severa2 a2ternatives se.arated by /W1 characters9 For e;am.2e: +lo)in1 = lo)in ; +5in21 = 5in2 . search . )et . lookup ; +new1 = new . create . a22 ; +co**an21 = +5in21 . +new1 . +lo)in1 ; Ee can use the 32eene star >a*ter Ste.hen Co2e 32eene, originator? or the /P1 character to set e;.ectations that user can re.eat a -ord mu2ti.2e times9 +a)ree1 = / P a)ree . 6es . 'Q ; SS /I agree1 and /agree1 - both covered +2isa)ree1 = no R ; 00 no can Be spoken C or *ore ti*es 4he 32eene star and the .2us o.erator are both unar6 operators in the )SIF9 4here is a2so the tag unary o.erator that he2.s to return a..2ication-s.eci*ic in*ormation as the resu2t o* recognition9 For e;am.2e: +ser-ice1 = (*ail . e*ail ) Se*ailT . (search . research . 5in2) S5in2T ;

4he system returns the -ord /emai21 i* /mai21 or /emai21 -as s.oken9 In the case -hen one o* the -ords /search,1 /research,1 or /*ind1 -as s.oken, the system returns the -ord /*ind91 4he mainstream o* s.eech recognition techno2ogies 2ies outside o* the )ava S.eech API today9 >4his may be di**erent ne;t year9? 0ne e;am.2e is the o.en source S.hin; F#G .roAect -ritten in CPP by a grou. *rom Carnegie 7e22on University9 In the S.hin; system, recognition takes .2ace in three .asses >the 2ast t-o are o.tiona2?: 2e;ica2-tree 5iterbi search, *2at-structured 5iterbi search, and g2oba2 best-.ath search9 I)pro$ing Sphin/ recognition rate with training. S.hin; can be trained to better satis*y a s.eci*ic c2ient -ith the S.hin;4rain .rogram9 Bven a*ter training, the rate o* accuracy *or S.hin; II is about #"R, and *or S.hin; ' de2ivered at the end o* $""$, the rate is about ("R9 In com.arison, the rate o* accuracy in 7icroso*tQs S.eech SD3 recognition engine is M#R a*ter voice training and micro.hone ca2ibration9 %icrosoft Speech Software 0e$elop)ent 1it 4he 7icroso*t S.eech So*t-are Deve2o.ment 3it >SD3? FOG is based on the 7icroso*t S.eech API >SAPI?, a 2ayer o* so*t-are that a22o-s a..2ications and s.eech engines to communicate in a standardiCed -ay9 4he 7S S.eech SD3 .rovides both te;t-to-s.eech >44S? and s.eech recognition >SR? *unctiona2ity9 Fig9%$-O, be2o-, i22ustrates the 44S synthesis .rocess9

[Fig.1 !2" 4he main b2ocks that .artici.ate in the te;t-to-s.eech conversion are:

/3pVoice - 4he inter*ace, -hich is used by the a..2ication to access 44S *unctiona2ity /3p((3En)ine3ite - 4he engine inter*ace to s.eak data and @ueue events /sp'BGectUith(oken - 4he inter*ace to create and initia2iCe the engine /3p((3En)ine - 4he inter*ace to ca22 the engine /sp(okenU/ - 4he -ay *or the SAPI contro2 .ane2 to access the User Inter*ace 4he S.eech Recognition Architecture 2ooks even sim.2er in Fig9%$-( be2o-9

[Fig.1 !3" 4he main s.eech recognition b2ocks interact in the *o22o-ing -ay9 %9 4he engine uses the IS.SRBngineSite inter*ace to ca22 SAPI to read audio, and returns recognition resu2ts9 2. SAPI ca22s the engine using the methods o* the /3p3!En)ine inter*ace to .ass detai2s o* recognition grammars9 SAPI a2so uses these methods to start and sto. recognition9 3. 4he /sp'BGectUith(oken inter*ace .rovides a mechanism *or the engine to @uery and edit in*ormation about the obAect token9 4. /3p(okenU/ re.resents User Inter*ace com.onents that are ca22ab2e *rom an a..2ication9 SAPI # synthesis marku. is not e;act2y SS78: it is c2oser to the *ormat .ub2ished by the SA 8B Consortium9 SAPI 678 tags .rovide *unctiona2ity such as vo2ume contro2 and -ord em.hasis9 4hese tags can be inserted into te;t

.assed into /3pVoice<<3peak and te;t streams o* *ormat SPDFIDJ678, -hich are then .assed into /3pVoice<<3peak3trea* and auto-detected >by de*au2t? by the SAPI 678 .arser9 In the case o* an inva2id 678 structure, a s.eak error may be returned to the a..2ication9 Ee can change rate and vo2ume attributes in rea2 time using /3pVoice<<3et!ate and /3pVoice<<3etVolu*e9 Volu)e 4he 5o2ume tag contro2s the vo2ume o* a voice and re@uires Aust one attribute: 8eve2: an integer bet-een Cero and one hundred9 4he tag can be em.ty to a..2y to a22 *o22o-ing te;t, or it can *rame a content, to -hich a2one it a..2ies9 +-olu*e le-el="@9"1(his teLt shoul2 Be spoken at -olu*e le-el 5i5t6. +-olu*e le-el="C99"1 (his teLt shoul2 Be spoken at -olu*e le-el one hun2re2. +0-olu*e1 +0-olu*e1 +-olu*e le-el=";9"01$ll teLt which 5ollows shoul2 Be spoken at -olu*e le-el ei)ht6. -ate 4he Rate tag de*ines the rate o* a voice -ith one o* t-o attributes, S.eed and AbsS.eed9 4he S.eed attribute de*ines re2ative increase or decrease o* the s.eed va2ue, -hi2e AbsS.eed de*ines its abso2ute rate va2ue: an integer bet-een negative ten and ten9 4he tag can be em.ty to a..2y to a22 *o22o-ing te;t, or it can *rame content to -hich a2one it a..2ies9 +rate aBsspee2="@"1 (his teLt shoul2 Be spoken at rate 5i-e. +rate spee2="7@"1 Decrease the rate to le-el Hero. +0rate1 +0rate1 +rate aBsspee2="C9"01 3peak the rest with the rate C9. 'itch In a very simi2ar manner, the Pitch tag contro2s the .itch o* a voice -ith one o* t-o attributes, 7idd2e and Abs7idd2e: an integer bet-een negative ten and ten can re.resent an abso2ute as -e22 as re2ative va2ue9 +pitch aBs*i22le="@"1

(his teLt shoul2 Be spoken at pitch 5i-e. +pitch *i22le="7@"1 (his teLt shoul2 Be spoken at pitch Hero. +0pitch1 +0pitch1 +pitch aBs*i22le="C9"01 $ll the rest shoul2 Be spoken at pitch ten. +ero re.resents the de*au2t 2eve2 *or rate, vo2ume, and .itch va2ues9 4)ph 4he Bm.h tag instructs the voice to em.hasiCe a -ord or section o* te;t9 4he Bm.h tag cannot be em.ty9 &our +e*ph1$*erican $irline +0e*ph1 5li)ht 2eparts at +e*ph1ei)ht +0e*ph1 toni)ht Voice 4he 5oice tag de*ines a voice based -ith its $)e, Gen2er, %an)ua)e, a*e, Ven2or, and Ven2or4re5erre2 attributes, that can be !eVuire2 and 'ptional9 4hese corres.ond e;act2y to the re@uired and o.tiona2 attributes .arameters to /3p'BGect(okenCate)or6"Enu*erate(oken and 3p#in2Nest(oken *unctions9 I* no voice is *ound that matches a22 o* the re@uired attributes, no voice change -i22 occur9 0.tiona2 attributes are treated di**erent2y9 In this case, the e;act match is not necessari2y e;.ected9 A voice that is c2oser to the .rovided attributes -i22 be se2ected over one that is 2ess simi2ar9 B;am.2e: (he 2e5ault -oice shoul2 speak this sentence. +-oice reVuire2="Gen2er=#e*ale;$)eA=Chil2"1 $ 5e*ale non7chil2 shoul2 speak this sentence, i5 one eLists. +-oice reVuire2="$)e=(een"1 $ teen shoul2 speak this sentence. /5 a 5e*ale, non7chil2 teen -oice is present; this -oice will Be selecte2 o-er a *ale teen -oice, 5or eLa*ple. +0-oice1 +0-oice1 8et us consider a demonstration .rogram that uses te;t-to-s.eech and s.eech recognition *aci2ities o* the 7icroso*t S.eech SD39 Speech technology to 5ecrease network 6an5wi5th

4he a..2ication is a con*erence bet-een mu2ti.2e c2ients over the Internet9 4he a..2ication uti2iCes s.eech techno2ogy to signi*icant2y decrease net-ork band-idth9 C2ient .rograms interce.t a userQs s.eech, trans2ate it to te;t, and send ASCII te;t over the Internet to the server-dis.atcher9 4he server broadcasts the te;t it receives to the other c2ients >a regu2ar chat schema?9 C2ient .rograms receive te;t *rom the server and convert it back to s.eech9 Fig9%$-& i22ustrates the a..2ication -ith a diagram9

[Fig.1 !7" 7ore detai2s: - S.eech recognition and 44S is done on the c2ient side - 4he c2ient recogniCes a .hrase - P2ain te;t is transmitted bet-een the c2ient and the server - 4he c2ient .rogram a..ends meta-data 2ike the userQs name in SS78 *ormat - 4he server side can be im.2emented in CPP, CX, or )ava, using 4CPSIP sockets An e;am.2e o* the (alkin)Client .rogram can be *ound in Fig9%$-M9c..
/***************** ***TalkingClient.cpp*** *****************/ #include <iost eam! #include "#ocket.h" #include <$indo$s.h! #include <sapi.h!

#include <stdio.h! #include <st ing.h! #include <atlbase.h! #include "sphelpe .h" using namespace std% // & ovided b' (eena )alkina $ith use and modi*ication o* )ic oso*t #peech #(+ e,amples inline -./#01T .ead2oice(ata34#p.ecoConte,t * voice(ataConte,t5 4#p.eco.esult ** ecognition(ata6 7 -./#01T success1evel 8 #9:+% C#p/vent speech/vent% $hile 3#0CC//(/(3success1evel6 ;; #0CC//(/(3success1evel 8 speech/vent.<et= om3voice(ataConte,t66 ;; success1evel 88 #9=>1#/6 7 success1evel 8 voice(ataConte,t?!@ait=o Aoti*'/vent34A=4A4T/6% B * ecognition(ata 8 speech/vent..eco.esult36% i* 3* ecognition(ata6 7 3* ecognition(ata6?!>dd.e*36% B etu n success1evel% B int main3int a gc5 cha * a gvCD6 7 -./#01T success1evel 8 /9=>41% // use name can be obtained * om a gvCED o * om s'stem.p ope ties const @C->. * use Aame 8 1"(eena"% // * om use p o*ile const @C->. * use <ende 8 1"=emale"% // * om use p o*ile const @C->. * use >ge 8 1"Teen"% // * om use p o*ile const @C->. * voiceTag 8 sp int*3 "<voice optional8F"<ende 8Gs%>ge8Gs%Aame8GsF"!"5 use <ende 5use >ge5use >ge6% i*3a gc 88 26 7 // hope*ull' the e is the voice p o*ile on the name use Aame 8 a gvCED% B t '7 // code to connect to ecipient5 *o e,ample5 "ipse ve.com5 po t8H44I" via #ocketClient #ocketClient sende 3"ipse ve.com"5H44I6% // initialiJe #peech /ngine i* 3#0CC//(/(3success1evel 8 ::Co4nitialiJe3A011666 7 CCom&t <4#p.ecoConte,t! conte,t% CCom&t <4#p.eco< amma ! g amma % success1evel 8 conte,t.CoC eate4nstance3C1#4(9#p#ha ed.ecoConte,t6%

i* 3conte,t ;; #0CC//(/(3success1evel 8 conte,t?!#etAoti*'@in32/vent366 ;; #0CC//(/(3success1evel 8 conte,t?!#et4nte est3#&=/43#&/49./C:<A4T4:A65 #&=/43#&/49./C:<A4T4:A666 ;; #0CC//(/(3success1evel 8 conte,t?!#et>udio:ptions3#&>:9./T>4A9>0(4:5 A0115 A01166 ;; #0CC//(/(3success1evel 8 conte,t?!C eate< amma 3K5 ;g amma 66 ;; #0CC//(/(3success1evel 8 g amma ?!1oad(ictation3A0115 #&1:9#T>T4C66 ;; #0CC//(/(3success1evel 8 g amma ?!#et(ictation#tate3#&.#9>CT42/666 7 0#/#9C:A2/.#4:A% // de*ine the b eak signal that $ill send the sentence to the se ve const @C->. * const b eak#ign 8 1"oke'"% const @C->. * const e,it#ign 8 1"e,it the p og am"% CCom&t <4#p.eco.esult! esult:bject% p int*3 "Lou can sta t talking.Fn#a' F"GsF" to send 'ou ph ase.Fn"5 @2>3b eak#ign6 6% $hile 3#0CC//(/(3success1evel 8 .ead2oice(ata3conte,t5 ; esult:bject666 7 g amma ?!#et(ictation#tate3 #&.#94A>CT42/ 6% C#p('namic#t ing esultingTe,t% i* 3#0CC//(/(3 esult:bject?!<etTe,t3#&9</T@-:1/&-.>#/5 #&9</T@-:1/&-.>#/5 T.0/5 ; esultingTe,t5 A011666 7 p int*3"#aid b' Gs: GsFn"5 @2>3use Aame65 @2>3 esultingTe,t66% // send te,t to se ve : voiceTag M esultingTe,t M "</voice!"% cha * te,tTo#end 8 sp int*3"GsGs</voice!"5 voiceTag5 esultingTe,t6% sende .#end1ine3te,tTo#end6% esult:bject?!#peak>udio3A0115 K5 A0115 A0116% esult:bject..elease36% B i* 39$csicmp3 esultingTe,t5 b eak#ign6 88 K6 7 b eak% B g amma ?!#et(ictation#tate3 #&.#9>CT42/ 6% B B B catch3const cha * e6 7 ce << e <<endl% B ::Co0ninitialiJe36% B etu n success1evel% B

[Fig.1 !8" 4he (alkin)Client .rogram takes userQs name as an o.tiona2 argument9 I* the argument is not .rovided, the /de*au2t1 voice -i22 be used9 4he main routine starts -ith the socket connection to a reci.ient9 Ee then initia2iCe the s.eech engine and de*ine the break signa2 that -i22 be used to indicate -hen to send to the server -hat the user has said9 In the e;am.2e, the break signa2 is the -ord /okay91 4he main .rocessing ha..ens in the -hi2e 2oo.9 4he s.eech engine recogniCes the userQs sentence and .rints it on the screen9 4hen the .rogram sends the resu2ting te;t to the server -ith the additiona2 SAPI 678 5oice tag: +-oice optional="Gen2er=userGen2er;$)e=user$)e; a*e=user a*e"1 resultin)(eLt+0-oice1 4he other .art o* the a..2ication is .resented in Fig9%$-%"9c..
/** * 1isteningClient.cpp * N' (eena.)alkinaOjavaschool.com * $ith use and modi*ication o* )ic oso*t #peech #(+ e,amples */ #include "#ocket.h" #include <st ing! #include <iost eam! #include <$indo$s.h! #include <sapi.h! #include <stdio.h! #de*ine 9>T19>&>.T)/AT9T-./>(/( #include <atlbase.h! //Lou ma' de ive a class * om CCom)odule and use it i* 'ou $ant to ove ide something5 //but do not change the name o* 9)odule e,te n CCom)odule 9)odule% #include <atlcom.h! #include <st ing.h! //#include <atlbase.h! #include "sphelpe .h" using namespace std% int main36 7 const st ing machine8"javaschool.com"% const int po t8HII4% 4#p2oice * p2oice 8 A011% i* 3=>41/(3::Co4nitialiJe3A011666 etu n =>1#/%

-./#01T h 8 CoC eate4nstance3 C1#4(9#p2oice5 A0115 C1#CTP9>115 44(94#p2oice5 3void **6;p2oice6% t '7 #ocketClient s3machine5 po t6% #t ing te,tTo#peak% i*3 #0CC//(/(3 h 6 6 7 $hile 3E6 7 // ead one cha at a time #t ing c 8 s..eceiveCha 36% i* 3c.empt'366 b eak% cout << c% cout.*lush36% i*3c88"." QQ c88"R" QQ c88"S"6 7 // t ans*o m te,tTo#peak to @C->. h 8 p2oice?!#peak3te,tTo#peak5 K5 A0116% B else 7 te,tTo#peak.append3c6% B B // end o* $hile loop p2oice?!.elease36% p2oice 8 A011% B // .eset o 0ninitialiJe ::Co0ninitialiJe36% B catch 3const cha * s6 7 ce << s << endl% B catch 3#t ing s6 7 ce << s << endl% B catch 3...6 7 ce << "unhandled e,ceptionFn"% B cha T% cin!!T% etu n T.0/% B

[Fig.1 !19" 4he %istenin)Client re@uirements< ? ? Receive te;t *rom the server 4rans*orm te;t to s.eech using the voice .ro*i2e i* avai2ab2e

4he %istenin)Client .rogram starts in a very simi2ar manner9 It uses the !ecei-e%ine method o* the 3ocketClient to 2isten to messages coming *rom the

server9 4he .rogram converts every unit o* s.eech into voice and dis.2ays the 2ine on the screen9 B;am.2es o* Socket c2asses *or Eindo-s can be *ound on2ine F(G9 A..endi; ', Sources, .rovides e;am.2es o* te;t-to-s.eech and s.eech recognition .rograms -ritten in CX using SAPI#9 Stan5ar5s for scenarios for speech applications 8et us sto. this overvie- o* the .arts o* s.eech recognition techno2ogy *or a minute9 4hey are a22 im.ortant9 At the same time, some o* them are more im.ortant *or system .rogrammers -ho do the ground-ork9 0thers .ieces o* the techno2ogy target a..2ication deve2o.ers9 A..2ication deve2o.ers can use this ground-ork to describe a..2ication *2o- and -rite inter.retation scenarios9 0ur ne;t ste. is to -rite scenarios *or s.eech a..2ications9 8et us consider current and u.coming standards that can he2.9 !ote that the 7icroso*t 9!B4 S.eech SD3 uses SS78, -hich is a .art o* E'C S.eech Inter*ace Frame-ork >un2ike the 7icroso*t S.eech SD3 that uses SAPI 678?9 SS78 is a marku. 2anguage to de*ine te;t-to-s.eech .rocessing, -hich is the sim.2est .art o* s.eech techno2ogy9 4here are t-o marku. 2anguages ab2e to describe a com.2ete s.eech inter*ace: S.eech A..2ication 8anguage 4ags >SA84? F&G, a re2ative2y ne- u.coming standard, and 5oice678 FMG, a -e22 estab2ished techno2ogy -ith mu2ti.2e im.2ementations9 5oice678 -as deve2o.ed *or te2e.hony a..2ications as a high-2eve2 dia2og marku. 2anguage that integrates s.eech inter*ace -ith data and contro2 *2o-9 Un2ike 5oice678, SA84 o**ers a 2o-er 2eve2 inter*ace that strict2y *ocuses on s.eech tags, but targets mu2ti.2e devices, inc2uding but not 2imited to te2e.hone systems9 5oice678, as -e22 as SA84, uses such standards o* the E'C S.eech Inter*ace Frame-ork as SS78 and S.eech Recognition Irammar Standard >SRIS? F%"G9 SA84 a2so inc2udes recommendations on !atura2 8anguage Semantics 7arku. 8anguage >!8S78? F%%G as a recognition resu2t *ormat, and Ca22 Contro2 678 >CC678? F%$G as a ca22 contro2 2anguage9 In a nutshe22: !8S78 is an 678-based marku. *or re.resenting the meaning o* a natura2 2anguage utterance, and CC678 .rovides te2e.hony ca22 contro2 su..ort *or 5oice678 or SA84, and other dia2og systems9 !8S78 uses an 6Forms data mode2 *or the semantic in*ormation being returned in the inter.retation9 >See !8S78 and 6Forms overvie-s in the A..endi; $, 678 Crossroads9?

SA84 .rovides *aci2ities *or mu2ti-moda2 a..2ications that can inc2ude not on2y voice but a2so screen inter*aces9 SA84 a2so gives deve2o.ers the *reedom to embed SA84 tags into other 2anguages9 4his a22o-s *or more *2e;ibi2ity in -riting s.eak-and-dis.2ay scenarios9 Speech &pplication (anguage Tags SA84 consists o* a re2ative2y sma22 set o* 678 e2ements9 Bach 678 e2ement has associated attributes and D07 obAect .ro.erties, events and methods9 0ne can -rite s.eech inter*aces *or voice-on2y and mu2ti-moda2 a..2ications using SA84 -ith H478, 6H478, and other standards9 SA84 contro2s dia2og scenarios through the D07 event mode2 that is .o.u2ar in -eb so*t-are9 4hree to.-2eve2 e2ements in SA84 are: +listen W1, +pro*ptW1, and +2t*5..1. First t-o 678 e2ements de*ine s.eech engine .arameters9 +listen W1 con*igures the s.eech recogniCer, e;ecutes recognitions, and hand2es s.eech in.ut events +pro*pt W1 con*igures the s.eech synthesiCer and .2ays out .rom.ts 4he third 678 e2ement .2ays a signi*icant ro2e in ca22 contro2s *or te2e.hony a..2ications +2t*5 W1 con*igures and contro2s dua2-tone mu2ti-*re@uency >D47F? signa2ing in te2e.hony a..2ications9 4e2e.hony systems use D47F to signa2 -hich key has been .ressed by a c2ient9 Regu2ar .hones usua22y have %$ keys: ten decima2 digit keys, and additiona2 KXK, and KYK keys9 Bach key corres.onds to a di**erent .air o* *re@uencies9 4he listen and the 2t*5 e2ement may contain +)ra**ar1 and +Bin21 e2ements9 4he listen e2ement can a2so inc2ude the +recor21 e2ement. 4he +)ra**ar1 e2ement de*ines grammars9 A sing2e +listen1 e2ement can inc2ude mu2ti.2e grammars9 4he +listen1 e2ement can have methods to activate an individua2 grammar be*ore starting recognition9 SA84 itse2* is inde.endent o* the grammar *ormats, but *or intero.erabi2ity it recommends su..orting at 2east the 678 *orm o* the E'C S.eech Recognition Irammar S.eci*ication9 4he +Bin21 e2ement can ins.ect the resu2ts o* recognition and .rovide conditiona2 co.y-actions9 4he +Bin21 e2ement can cause the re2evant data to be co.ied to va2ues in the containing .age9 A sing2e +listen1 e2ement may contain mu2ti.2e binds9 +Bin21 can have a conditiona2 test attribute as -e22 as a va2ue attribute9 +Bin21 uses 6Path >see A..endi;-678 on 6.ath and other 678 standards mentioned in the book? synta; in its va2ue attribute to .oint to a .articu2ar node o* the resu2t9 +Bin21 uses an 678 .attern @uery in its conditiona2 test attribute9 I* the

condition is true, the content o* the node is bound into the .age e2ement s.eci*ied by the tar)etEle*ent attribute9 4he on!eco event hand2er -ith scri.t .rogramming can .rovide even more com.2e; .rocessing9 4he Ton!eco1 and the +Bin21 e2ements are triggered on the return o* a recognition resu2t9 4he +recor21 e2ement can s.eci*y .arameters re2ated to s.eech recording9 +Bin21 or scri.ted code can .rocess the resu2ts o* recording, i* necessary9 & spoken )essage scenario Fig9%$-%%9;m2 demonstrates a scenario in -hich dia2og *2o- is .rovided -ith a c2ient-side scri.t9
<R?? -T)1 ??! <html ,mlns:salt8"u n:salt*o um.o g/schemas/K2KE24"! <bod' onload8"ask=o #e vice36"! <*o m id8"message=o m" action8"http://javaschool.com/school/public/kno$ledge/#>1T/message" method8"post"! <input name8"* omTe,tNo," t'pe8"te,t" /! <input name8"subjectTe,tNo," t'pe8"te,t" /! <input name8" ecipientTe,tNo," t'pe8"te,t" /! <input name8"messageTe,tNo," t'pe8"te,t" /! </*o m! <R?? #peech >pplication 1anguage Tags ??! <salt:p ompt id8"askAame"! @hat is 'ou nameS </salt:p ompt! <salt:p ompt id8"ask#ubject"! @hat is the subjectS </salt:p ompt! <salt:p ompt id8"ask.ecipient"! @ho is the ecipientS </salt:p ompt! <salt:p ompt id8"ask)essage"! @hat is 'ou messageS </salt:p ompt! <salt:p ompt id8" epeat(e*ault" onComplete8"ask=o #e vice36"! &lease epeat 'ou ans$e . </salt:p ompt! <salt:listen id8"name.ecognition" on.eco8"setAame36" onAo.eco8" epeat(e*ault.#ta t36"! <salt:g amma s c8"spoken)essage.,ml" /! </salt:listen! <salt:listen id8"subject.ecognition" on.eco8"set#ubject36" onAo.eco8" epeat(e*ault.#ta t36"! <salt:g amma s c8"spoken)essage.,ml" /! </salt:listen! <salt:listen id8" ecipient.ecognition" on.eco8"set.ecipient36" onAo.eco8" epeat(e*ault.#ta t36"! <salt:g amma s c8"spoken)essage.,ml" /! </salt:listen! <salt:listen id8"message.ecognition" on.eco8"set)essage36" onAo.eco8" epeat(e*ault.#ta t36"! <salt:g amma s c8"spoken)essage.,ml" /! </salt:listen! <R?? sc ipt ??! <sc ipt! // settings a e based on use Us ans$e s *unction setAame36 7

message=o m.* omTe,tNo,.value 8 name.ecognition.te,t% ask=o #e vice36% B *unction set#ubject36 7 message=o m.subjectTe,tNo,.value 8 subject.ecognition.te,t% ask=o #e vice36% B *unction set.ecipient36 7 message=o m. ecipientTe,tNo,.value 8 ecipient.ecognition.te,t% ask=o #e vice36% B *unction set)essage36 7 message=o m.messageTe,tNo,.value 8 message.ecognition.te,t% message=o m.submit36% B // the main sc ipt *unction ask=o #e vice36 7 i* message=o m.* omTe,tNo,.value88""6 7 askAame.#ta t36% name.ecognition.#ta t36% B else i* 3message=o m.subjectTe,tNo,.value88""6 7 ask#ubject.#ta t36% subject.ecognition.#ta t36% B else i* 3message=o m. ecipientTe,tNo,.value88""6 7 ask.ecipient.#ta t36% ecipient.ecognition.#ta t36% B else i* 3message=o m.messageTe,tNo,.value88""6 7 ask)essage.#ta t36% message.ecognition.#ta t36% B B </sc ipt! </bod'! </html!

[Fig.1 !11" 4he scenario is actua22y an H478 .age -ith embedded SA84 tags and scri.t *unctions9 4he ask#or3er-ice>? scri.t activates the SA84 +listen1 and +pro*pt1 tags9 For e;am.2e, ask a*e.3tart>? .rom.ts the user -ith, /Ehat is your name<1, and the *o22o-ing na*e!eco)nition.3tart>? e;amines the recognition resu2ts9 4he ask#or3er-ice>? scri.t e;ecutes the re2evant .rom.ts and recognitions unti2 a22 va2ues are obtained9 Success*u2 message recognition triggers the suB*it() *unction, -hich submits the message to the reci.ient9 4he userQs name can serve not on2y as the userQs signature, but can a2so invoke a chosen voice .ro*i2e, i* avai2ab2e, on reci.ientQs side9 Did you notice the re*erence to the spokenMessa)e.L*l grammar *i2e that su..orts the scenario in the code< Ho- do -e de*ine grammar<

.ra))ar 0efinition First, 2et us 2ook into the e;isting Command and Contro2 *eatures o* the 7S S.eech SD39 4he Command and Contro2 *eatures o* S.eech API # >SAPI #? are based on conte;t-*ree grammars >CFIs?9 A CFI de*ines a s.eci*ic set o* -ords, and the sentences that are va2id *or recognition by the s.eech recognition >SR? engine9 4he CFI *ormat in SAPI # uses 678 to de*ine the structure o* grammars and grammar ru2es9 SAPI #-com.2iant SR engines e;.ect grammar de*initions in a binary *ormat .roduced by any CFISIrammar com.i2er: *or e;am.2e, )c.eLe, the SAPI # grammar com.i2er that is inc2uded in the S.eech SD39 Com.i2ation is usua22y done be*ore a..2ication run-time, but can be done at run-time9 Here is an e;am.2e o* a *i2e that .rovides grammar ru2es to navigate through mai2 messages >/ne;t1, /.revious1? and to retrieve the current2y se2ected emai2 >/get7ai21?9
+G!$MM$! %$ G/D="=9F"1 +DE#/ E1 +/D $ME="V/D"Mail a-i)ation!ules" V$%="C"01 +/D $ME="V/D"Mail!ecei-er!ules" V$%="8"01 +0DE#/ E1 +!U%E /D="V/D"Mail a-i)ation!ules" 1 +%1 +4 V$%="neLt"1 +o14lease PR+0o1 +p1neLt+0p1 +o1*essa)eXe*ailX*ail+0o1 +041 +4 V$%="pre-ious"1 +o14lease PR+0o1 +p1pre-iousXlastXBack+0p1 +o1*essa)eXe*ailX*ail+0o1 +041 +0%1 +0!U%E1 +!U%E /D="V/D"Mail!ecei-er!ules" ('4%EVE%="$C(/VE"1 +'14lease+0'1 +41 +%1 +4 -al=")etMail"1!etrie-e+041 +4 -al=")etMail"1!ecei-e+041 +4 -al=")etMail"1Get+041 +0%1 +041 +'1the *ail+0'1 +0!U%E1 +0G!$MM$!1

A..endi;', Sources, .rovides more e;am.2es >a2ong -ith CX .rogram source code? *or a s.eech a..2ication based on SAPI#9 4he grammar *i2e can be dynamica22y 2oaded and com.i2ed at run-time9 4his -ou2d decrease the number o* choices *or any current recognition, and im.rove recognition @ua2ity9 Voice:%( 4he 2ast but de*inite2y not the 2east im.ortant techno2ogy on the 2ist is 5oice6789 A2though SA84 and 5oice678 have di**erent targets, in some -ays they com.ete in the s.eech techno2ogy arena9 Un2ike SA84, -hich is re2ative2y ne-, 5oice678 started in %MM#, -ithin an A4Z4 .roAect ca22ed Phone 7arku. 8anguage >P78?9 4he 5oice678 Forum -as *ormed in %MM&-%MMM by A4Z4, I 7, 8ucent, and 7otoro2a9 At that time 7otoro2a had deve2o.ed 5o;78, and I 7 -as deve2o.ing its o-n S.eech789 4he 5oice678 Forum he2.ed integrate the e**orts9 Since then 5oice678 had a history o* success*u2 im.2ementations by mu2ti.2e vendors9 Un2ike SA84, -hich is a roya2ty-*ree u.coming standard, 5oice678 can be subAect to roya2ty .ayments9 Severa2 com.anies, inc2uding I 7, 7otoro2a, and A4Z4, have indicated that they cou2d have .atent rights in 5oice6789 4his brie* overvie- o* 5oice678 is based on the 5oice678$9" S.eci*ication submitted to E'C in the beginning o* $""'9 What is Voice:%(? 5oice678 is designed *or creating dia2og scenarios -ith digitiCed audio, s.eech recognition, and D47F key in.ut9 5oice678 can record s.oken in.ut, te2e.hony, and mi;ed initiative conversations9 4he mi;ed conversation is an e;tended case o* the most common ty.e o* com.uter-human conversations directed by the com.uter9 4he main target o* 5oice678 is -eb-based deve2o.ment and content de2ivery to interactive s.eech a..2ications9 4he 5oice678 inter.reter renders 5oice678 documents audib2y, Aust as a -eb bro-ser renders H478 documents visua22y9 Ho-ever, standard -eb bro-sers run on the 2oca2 machine, -hereas the 5oice678 inter.reter runs at a remote hosting site9 Fig9%$-%$ dis.2ays the enter.rise a..2ication -ith mu2ti-moda2 access to business services9

[Fig.1 !1 " 8ike H478 -eb .ages, 5oice678 documents have -eb UR8s and can be 2ocated on any -eb server9 5oice678 .ages de2iver the service content via s.eech a..2ications using com.uter te2e.hony .rotoco2s 2ike )4API, 4API, H9'$' and SIP >Session Initiation Protoco2, most -ide2y acce.ted by the industry? %ain co)ponents of speech recognition syste)s. S.eech Recognition Systems >SRS? in genera2 and 5oice678 systems in .articu2ar re2y on high-.er*ormance server side hard-are and so*t-are 2ocated on or connected to the -eb container9 4he -eb container is the architecture tier res.onsib2e *or corres.ondence to c2ients over H44P and dis.atching c2ient re@uests to .ro.er business services9 In this case, s.eech recognition services become the c2ient that interce.ts voice *2o- and trans2ates it into H44P streams9 4he key hard-are *actors *or de2ivering re2iab2e, sca2ab2e 5oice678 a..2ications are: ? ? ? 4e2e.hony Connectivity Internet Connectivity Sca2ab2e Architecture

? ?

Caching and 7edia Streaming C0DBCs - combinations o* ana2og to digita2 >ASD? -ith digita2 to ana2og >DSA? signa2 converters

Progress in hard-are techno2ogies such as the high-s.eed, 2o--.o-er consum.tion digita2 signa2 .rocessor >DSP? has substantia22y contributed to im.roving C0DBC conversion e**iciency9 4he SRS .2at*orm contains inte22igent caching techno2ogy that minimiCes net-ork tra**ic by caching 5oice678, audio *i2es, and com.i2ed grammars9 4he 5678 P2at*orm makes e;tensive use o* 2oad ba2ancing, resource .oo2ing, and dynamic resource a22ocation9 SRS servers use mu2ti-threaded CPP im.2ementations, de2ivering the most .er*ormance *rom avai2ab2e hard-are resources9 4o .revent unnecessary re-com.i2ation o* grammars, the 5oice678 .2at*orm uses a high-.er*ormance inde;ing techni@ue to cache and re-use .revious2y com.i2ed grammars9 Voice ser-ices o**er the *o22o-ing so*t-are com.onents to im.2ement an end-toend so2ution *or .hone-accessib2e Eeb content: (elephon6 plat5or* - so*t-are modu2es *or te;t-to-s.eech, voice recognition, menuing system, .arsing engines and D47F rie*2y about the main te2e.hony .rotoco2s: )4API: 4he )ava 4e2e.hony API su..orts te2e.hony ca22 contro2 *rom consumer devices to ca22 centers9 4API: 4he 4e2e.hony A..2ication Programming Inter*ace -as created by 7icroso*t and Inte2 to .rovide com.uter te2e.hony services H9'$': 4his standard *or ca22 signa2ing, mu2timedia trans.ort and contro2 is -ide2y im.2emented *or .oint-to-.oint and mu2ti-.oint voice and videocon*erencing over Integrated Services Digita2 !et-ork >ISD!?, Pub2ic S-itched 4e2e.hone !et-ork >PS4!? or Signa2ing System ( >SS(?, and 'I mobi2e net-orks9 SIP: 4he Session Initiation Protoco2 is common2y used *or voice and video ca22s over Internet Protoco29 'pen 3tan2ar2s support - 0.en system architecture in com.2iance -ith industry standards9 5oice678, EAP, E78, 6H478, SS78, SRIS, !8S78, etc9 U$4 solution - Su..ort *or using Eire2ess A..2ication Protoco2 to de2iver -eb and audio content to ne- -eb .hones and enab2ing seam2ess integration bet-een -eb and audio content9

Voice application an2 acti-ation - User inter*ace and 2ogic >such as .ersona2iCation? *or accessing back-end audio content, and -eb and emai2 databases *or easy .hone access What is the Voice:%( architecture an5 how 5oes it work? A document server >a Eeb server? contains 5oice678 documents or 5678 .ages -ith dia2og based scenarios9 >I try to use the -ord /scenario1 on every other .age, but sometimes the -ord sneaks in-bet-een9? 4he document server res.onds to a c2ient re@uest by sending the 5oice678 document to a S.eech Recognition System, or a 5oice678 im.2ementation .2at*orm >the 5oice678 inter.reter?9 A voice service scenario is a se@uence o* interaction dia2ogs bet-een a user and an im.2ementation .2at*orm9 Document servers .er*orm business 2ogic, database and 2egacy system o.erations, and .roduce 5oice678 documents that describe interaction dia2ogs9 User in.ut a**ects dia2og inter.retation by the 5oice678 inter.reter9 4he 5oice678 inter.reter trans*orms user in.ut into re@uests submitted to a document server9 4he document server re.2ies -ith other 5oice678 documents describing ne- sets o* dia2ogs9 What 5oes the Voice:%( 5ocu)ent look like? A 5oice678 document can describe: - 0ut.ut o* synthesiCed s.eech >te;t-to-s.eech?9 - 0ut.ut o* audio *i2es9 - Recognition o* s.oken in.ut9 - Recognition o* D47F in.ut9 - Recording o* s.oken in.ut9 - Contro2 o* dia2og *2o-9 4he 5oice678 2anguage re@uires a common grammar *ormat, name2y the 678 Form o* the E'C S.eech Recognition Irammar S.eci*ication >SRIS?, to *aci2itate intero.erabi2ity9 A voice a..2ication is a co22ection o* one or more 5oice678 documents sharing the same application root 2ocu*ent9 A 5oice678 document is com.osed o* one or more dia2ogs9 4he a..2ication entry .oint is the *irst 5oice678 document that the 5oice678 inter.reter 2oads -hen it starts the a..2ication9 4he deve2o.erQs task is to .rovide voice commands to the user in the most com*ortab2e -ay -hi2e o**ering c2ear2y distinguished .ossibi2ities o* res.onses e;.ected *rom the user through voice orSand te2e.hone keys 9

4here are t-o kinds o* dia2ogs: 5or*s and *enus9 Forms de*ine an interaction that co22ects *ie2d-va2ues9 Bach *ie2d may s.eci*y a grammar -ith e;.ected in.uts *or that *ie2d9 A menu common2y asks the user to choose one o* severa2 o.tions, and then uses the choice to transition to another dia2og9 Fig9%$-%'9v;m2 .resents a very sim.2e e;am.2e o* a 5oice678 document9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd" ve sion8"2.K"! <*o m! <*ield name8"se vice"! <p ompt!@ould 'ou like to ead 'ou mail5 send a message5 o check 'ou calenda S</p ompt! <g amma s c8"com.its.se vices.g ,ml" t'pe8"application/s gsM,ml"/! </*ield! <block! <submit ne,t8"http://javaschool.com/school/public/speech/v,ml/se vice.jsp"/! </block! </*o m! </v,ml!

[Fig.1 !1*" 4his 5oice678 document .rovides a *orm dia2og that o**ers user a choice o* services to *i22 the service *ie2d9 B;.ected ans-ers are .rovided in the grammar document /com9its9services9gr;m219 Bach dia2og has one or more s.eech andSor D47F )ra**ars associated -ith it9 7ost o* the s.eech a..2ications today are *achine 2irecte29 A sing2e dia2og grammar is active at any current time *or machine directed a..2ications, the grammar associated -ith a current user dia2og9 In *iLe2 initiati-e a..2ications, the user and the machine a2ternate in determining -hat to do ne;t9 In this case, more than one dia2og grammar can be active, and the user can say something that matches another dia2ogQs grammar9 MiLe2 initiati-e adds *2e;ibi2ity and .o-er to voice a..2ications9 5oice678 can hand2e events not covered by the *orm mechanism described above9 4here are de*au2t hand2ers *or the .rede*ined events: .2us, deve2o.ers can override these hand2ers -ith their o-n event hand2ers in any e2ement that can thro- an event9 4he .2at*orm thro-s events, *or e;am.2e, -hen the user does not res.ond, does not res.ond inte22igib2y, re@uests he2., etc9 4he TcatchV, TerrorV, +helpV, Tnoinput1, and Tno*atchV e2ements are e;am.2es o* event hand2ers9

For e;am.2e, the catch e2ement can detect a disconnect event and .rovide some action u.on the event: +catch e-ent="connection.2isconnect.han)up"1 +suB*it na*elist="2isconnect" neLt="http<00Ga-aschool.co*0school0puBlic0speech0-L*l0eLit.Gsp"01 +0catch1 A..2ications can su..ort he2. by .utting the he2. key-ord in a grammar in the a..2ication root document9 +help1 +pro*pt13a6 "!etr6" to retr6 authoriHation, or "!e)ister" to hear the re)istration instructions. 3a6 "ELit" or "Goo2B6e" to eLit. +0pro*pt1 +listen01 +0help1 & list of Voice:%( ele)ents +assi)n1 - Assign a variab2e a va2ue +au2io1 - P2ay an audio c2i. -ithin a .rom.t +Block1 - A container o* >non-interactive? e;ecutab2e code +catch1 - Catch an event TchoiceV - De*ine a menu item TclearV - C2ear one or more *orm item variab2es T2isconnectV - Disconnect a session TelseV - Used in Ti5V e2ements Telsei5V - Used in Ti51 e2ements Tenu*erateV - Shorthand *or enumerating the choices in a menu TerrorV - Catch an error event TeLitV - B;it a session

T5iel2V - Dec2ares an in.ut *ie2d in a *orm T5ille2V - An action e;ecuted -hen *ie2ds are *i22ed T5or*V - A dia2og *or .resenting in*ormation and co22ecting data T)otoV - Io to another dia2og in the same or di**erent document T)ra**arV - S.eci*y a s.eech recognition or D47F grammar ThelpV - Catch a he2. event Ti5V - Sim.2e conditiona2 2ogic TinitialV - Dec2ares initia2 2ogic u.on entry into a mi;ed initiative *orm TlinkV - S.eci*y a transition common to a22 dia2ogs in the 2inkQs sco.e Tlo)V - Ienerate a debug message T*enuV - A dia2og *or choosing amongst a2ternative destinations T*etaV - De*ine a metadata item as a nameSva2ue .air T*eta2ataV - De*ine metadata in*ormation using a metadata schema TnoinputV - Catch a noinput event Tno*atchV - Catch a no*atch event ToBGectV - Interact -ith a custom e;tension ToptionV - S.eci*y an o.tion in a T5iel2V Tpara*V - Parameter in ToBGectV or TsuB2ialo)V Tpro*ptV - [ueue s.eech synthesis and audio out.ut to the user Tpropert6V - Contro2 im.2ementation .2at*orm settings9 Trecor2V - Record an audio sam.2e Trepro*ptV - P2ay a *ie2d .rom.t -hen a *ie2d is re-visited a*ter an event TreturnV - Return *rom a subdia2og9

TscriptV - S.eci*y a b2ock o* BC7AScri.t c2ient-side scri.ting 2ogic TsuB2ialo)V - Invoke another dia2og as a subdia2og o* the current one TsuB*itV - Submit va2ues to a document server TthrowV - 4hro- an event9 Ttrans5erV - 4rans*er the ca22er to another destination T-alueV - Insert the va2ue o* an e;.ression in a .rom.t T-arV - Dec2are a variab2e Tv;m2V - 4o.-2eve2 e2ement in each 5oice678 document Fig9%$-%D9v;m2 introduces a ty.ica2 5oice678 document that initiates a brie* .hone conversation9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd" ve sion8"2.K"! <*o m id8"t aining"! <*ield name8"cou se"! <g amma t'pe8"application/s gsM,ml" s c8"/g amma s/t aining.g ,ml"/! <p ompt!@hich cou se do 'ou $ant to takeS -e e is the list o* cou ses: <R?? list o* cou ses o**e ed ??! </p ompt! <i* cond8"cou se 88 Uope ato U "! <goto ne,t8"http://javaschool.com/school/public/speech/v,ml/ope ato .v,ml" /! </i*! <noinput! 4 could not hea 'ou. < ep ompt/! </noinput! <nomatch count8"E"! &lease select an' Wava5 @i eless5 o :ntolog' cou se * om the list. < ep ompt/! </nomatch! <nomatch count8"2"! <p ompt! 4 am so '5 $e have almost so man' t'pes o* t aining cou ses but not this one.

4 $ould ecommend 'ou to sta t $ith the :ntolog' 4nt oduction cou se at this time. </p ompt! </nomatch! <nomatch count8"3"! 4 s$itch 'ou to the ope ato . -ope*ull' 'ou $ill *ind the cou se 'ou $ant. <ood luck. <goto ne,t8"http://javaschool.com/school/public/speech/v,ml/ope ato .v,ml" /! </nomatch! </*ield! <block! <submit ne,t8"http://javaschool.com/school/public/speech/v,ml/t aining.jsp"/! </block! </*o m! </v,ml!

[Fig.1 !1+" 4he source code starts -ith the standard 678, and then has 5678 re*erence 2ines9 4he ne;t thing -e see is a 5or* that 2ooks a2most e;act2y 2ike an H478 *orm9 In *act, the *orm has e;act2y the same .ur.ose , to co22ect in*ormation *rom a user into the *orm *ie2ds9 4his *orm has a sing2e *ie2d named /course91 4he .rogram .rom.ts the user to choose one o* the training courses9 +pro*pt1Uhich course 2o 6ou want to takeM+0pro*pt1 4he grammar 2ine above the .rom.t de*ines a grammar ru2es *i2e that -i22 try to reso2ve the ans-er9 +)ra**ar t6pe="application0sr)sRL*l" src="0)ra**ars0trainin).)rL*l"01 4he user might -ant to ta2k to a human being9 In this case, the grammar ru2es might reso2ve userQs desire and return the /o.erator1 -ord as the userQs se2ection9 4he .rogram uses an Ti*V e2ement to check on this condition9 +i5 con2="course == YoperatorY "1 I* this condition is true, the .rogram -i22 use the +)oto1 e2ement to Aum. to another document that trans*ers the ca22er to the o.erator9 !ote that a22 tags are .ro.er2y c2osed, as shou2d be done in any 678 *i2e9

8ooking do-n the code be2o- the /i*1 e2ement, -e *ind +noinputV and Tno*atchV event hand2ers9 I* the user .roduces no in.ut during the de*au2t time, the .rogram .2ays the .rom.t again using the +repro*ptSV e2ement9 +noinput1 / coul2 not hear 6ou. +repro*pt01 +0noinput1 4he most interesting scri.t starts -hen a user se2ection is not e;.ected9 In this case, the Tno*atchV event hand2er is *ired9 4his e2ement can o.tiona22y have a counter, -hich -e use here to try to .rovide a more a..ro.riate res.onse, and .ossib2y decrease the userQs discom*ort9 4he very *irst /nomatch1 e2ement -i22 .rovide an additiona2 hint to the user and re.rom.t the origina2 message9 +no*atch count="C"1 4lease select a Ja-a, Uireless, or 'ntolo)6 course 5ro* the list. +repro*pt01 +0no*atch1 4he ne;t time the user makes a strange se2ection, the .rogram o**ers its candid advice9 +no*atch count="8"1 +pro*pt1 / a* sorr6, we ha-e so *an6 t6pes o5 trainin) courses, But not this one. / woul2 reco**en2 5or 6ou to start with the 'ntolo)6 /ntro2uction course at this ti*e. Uill that work 5or 6ouM +0pro*pt1 +0no*atch1 4he third /nomatch1 event -i22 s-itch user to the o.erator9 +no*atch count="?"1 / will switch 6ou to the operator. >ope5ull6, 6ou will 5in2 the course 6ou want. Goo2 luck. +)oto neLt="http<00Ga-aschool.co*0school0puBlic0speech0-L*l0operator.-L*l" 01 +0no*atch1 ut -hat i* the user -as success*u2 in the course se2ection<

In this case, the se2ected course va2ue -i22 *i22 the /course1 *ie2d and the va2ue -i22 be submitted to the training .age9 +Block1 +suB*it neLt="http<00Ga-aschool.co*0school0puBlic0speech0-L*l0trainin).Gsp"01 +0Block1 4he 2ast t-o 2ines c2ose the *orm and the 5oice678 document9 +05or*1 +0-L*lV Eo-= Ho- does 5oice678 do the trans*er o.eration< Here is the code9 +A77 (rans5er to the operator 771 +A77 3a6 it 5irst 771 (rans5errin) to the operator accor2in) 6our reVuest. +A77 4la6 *usic while trans5er 771 +A77 Uait up to D9 secon2s 5or the trans5er 771 +trans5er 2est="tel<RC78?=7@DE7;F9C" trans5erau2io="*usic.wa-" connectti*eout="D9s"1 +0trans5er1 4he code e;tract *irst says, /4rans*erring to the o.erator according to your re@uest,1 and then actua22y tries to trans*er the user to the o.erator9 4he trans5er e2ement in our e;am.2e turns on some music and sets the timeout to O" seconds *or the trans*er9 4here is a2so essentia2 .art o* the trans*er e2ement , the te2e.hone number o* the o.erator9 Here is another trans*er e;am.2e -hen .rogram catches the /busy1 event9 +trans5er *aLlen)th="D9" 2est=";99@@@;?@@"1 +catch e-ent="e-ent.Bus6"1 +au2io1 Bus6 +0au2io1 +)oto neLt=""ho*e"01 +0catch1 +0trans5er1 4he TlinkV e2ement be2o- navigates to mai29v;m2 -henever the user says Kmai2K9 +link neLt="*ail.-L*l"1 +)ra**ar t6pe="application0sr)sRL*l" root="root" -ersion="C.9"1 +rule i2="root" scope="puBlic"1*ail+0rule1 +0)ra**ar1

+0link1 4his e;am.2e .rovides in-2ine grammar ru2es, un2ike most o* *o22o-ing e;am.2es -here -e re*erence gramar ru2es *i2es9 4he TsuB2ialo)V e2ement he2.s to create reusab2e dia2og com.onents and decom.ose an a..2ication into mu2ti.2e documents9 +suB2ialo) na*e="co*pose" src="new*ail.-L*l"1 +5ille21 +A77 (he "co*pose" suB2ialo) returns ? -ariaBles Below. (hese -ariaBles *ust Be speci5ie2 in the "return" ele*ent o5 the Zco*pose, 771 +assi)n na*e="to"a22ress" eLpr=" co*pose.to"a22ress"01 +assi)n na*e="suBGect" eLpr=" co*pose.suBGect"01 +assi)n na*e="*essa)e" eLpr=" co*pose.Bo26"01 +05ille21 +0suB2ialo)1 Fig9%$-%#9v;m2 .rovides an e;am.2e o* the /ne-Jmai21 service re@uest9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd" ve sion8"2.K"! <*o m id8"ne$9mail"! <R?? t$o va iables collected b' the "compose" subdialog ??! <va name8"to9name"/! <va name8"message"/! <subdialog name8"compose" s c8"compose.v,ml"! <*illed! <R?? The "compose" subdialog etu ns its status and t$o va iables belo$. The status and othe va iables must be speci*ied in the " etu n" element o* the "compose" ??! <i* cond8"compose.status 88 U:+U"! <assign name8"to9name" e,p 8"compose.to9name"/! <assign name8"message" e,p 8"compose.message"/! <else/! #o '5 the s'stem cannot delive the message. <e,it/! </i*! </*illed! </subdialog! <*ield name8"subject"!

<g amma t'pe8"application/s gsM,ml" s c8"/g amma s/mail9subject.g ,ml"/! <p ompt! @hat is the subject o* 'ou messageS </p ompt! <*illed! <submit ne,t8"http://javaschool.com/school/public/speech/send9mail.jsp"/! </*illed! </*ield! </*o m! </v,ml!

[Fig.1 !1," 4he e;am.2e uses the /com.ose1 subdia2og to *i22 t-o *ie2ds *or the ne- mai2 *orm9 4he /com.ose1 subdia2og returns its status and t-o re@uested *ie2ds9 I* the returned status is not /03,1 the service says that the message cannot be de2ivered and e;its9 0ther-ise, the service assigns returned va2ues to the /toJname1 and /message1 *ie2ds, and .rom.ts the user *or the message subAect9 It o*ten ha..ens that mai2 goes out -ithout any subAect9 4he subAect can serve as communication meta-data, -hich makes even more sense today -hen com.uter systems are increasing2y invo2ved in the communication .rocess9 Eith this 2ast *ie2d, the service is ready to rock-n-ro22 and submits a22 the data to the 2ong UR8 .rovided in the /submit1 e2ement9 4he ne-Jmai2 service 2isting a2so i22ustrates the usage o* /i*-e2se1 e2ements -ith the conditiona2 actions described above9 Fig9%$-%O dis.2ays the /com.ose1 subdia2og9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd" ve sion8"2.K"! <*o m id8"compose"! <va name8"status" e,p 8"Unot9kno$n9nameU"/! <*ield name8"to9name"! <g amma t'pe8"application/s gsM,ml" s c8"/g amma s/names.g ,ml"/! <p ompt! @hat is 'ou <p osod' ate8"?4KG"! ecipient</p osod'! nameS </p ompt! <help! &lease sa' *i st and last name o* 'ou message ecipient. =i st name *i st. =o e,ample: Wohn #mith. < ep ompt/! </help! <nomatch! < etu n namelist8"status"/!

</nomatch! </*ield! <*ield name8"message"! <g amma t'pe8"application/s gsM,ml" s c8"/g amma s/phone9numbe s.g ,ml"/! <p ompt! & ovide 'ou <emphasis!message no$</emphasis! </p ompt! </*ield! <block! <assign name8"status" e,p 8"U:+U"/! < etu n namelist8"status to9name message"/! </block! </*o m! </v,ml!

[Fig.1 !12" In the /com.ose1 dia2og, the .rom.t asks *or a reci.ientQs name9 A..arent2y, the grammar ru2es behind the scene are -orking hard to recover the emai2 address *rom the 2ist o* avai2ab2e names9 4he user can ask *or he2. to hear more detai2ed .rom.t messages9 I* the name recognition *ai2s, the /no*atch1 e2ement returns the status va2ue /not"known"na*e1, back to the /new"*ail1 service9 In the best-case scenario, -hen the name recognition succeeds, the /com.ose1 dia2og sets the status va2ue to /031 and .rom.ts the user to *i22 >ans-er? the /message1 *ie2d9 4he /com.ose1 dia2og then returns the /031 status and t-o variab2es >the /to"na*e1 and the /*essa)e1?, back to the /new"*ail1 service9 4he /5orwar2"*ail1 service -i22 reuse the same /com.ose1 subdia2og to co22ect the /to"a22ress1 and /message1 *ie2ds9 Fig9%$-%( sho-s the /5orwar2"*ail1 5oice678 .age9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd" ve sion8"2.K"! <*o m id8"*o $a d9mail"! <R?? t$o pa amete s elated to the o iginal mail passed * om the mail9se vice ??! <va name8"subject"/! <va name8"old9message"/! <R?? t$o va iables collected b' the "compose" subdialog ??! <va name8"to9name"/! <va name8"message"/! <subdialog name8"compose" s c8"compose.v,ml"! <*illed! <R?? The "compose" subdialog etu ns its status and t$o va iables belo$. The status and othe va iables must be speci*ied in the " etu n" element o* the "compose" ??!

<i* cond8"compose.status 88 U:+U"! <assign name8"to9name" e,p 8"compose.to9name"/! <assign name8"message" e,p 8"compose.message"/! <R?? use /C)>#c ipt to p epa e subject and bod' *ields ??! <R?? subject $ill sta t $ith "=@: " ??! <R?? bod' $ill include not onl' cu ent but also o iginal "old9message" ??! < etu n namelist8"to9name subject message" /! <else/! #o '5 the s'stem cannot delive the message. <e,it/! </i*! </*illed! </subdialog! </*o m! </v,ml!

[Fig.1 !13" 4he /*or-ardJmai21 2isting inc2udes t-o additiona2 variab2es: /subAect1 and /o2dJmessage91 4hese variab2es .assed as .arameters e;tracted by the /mai2Jservice1 dia2og *rom the origina2 mai29 4he /*or-ardJmai21 service behaves simi2ar2y to the /ne-Jmai21 service9 I* the /com.ose1 subdia2og returns an /031 status -ith the t-o re@uested *ie2ds >the /toJaddress1 and the /message1?, the /*or-ardJmai21 service -i22 submit a22 data, inc2uding the t-o additiona2 *ie2ds >/subAect1 and /o2dJmessage1? that came as .arameters *rom the origina2 emai2, to the *ina2 UR89 I* the status returned by the /com.ose1 subdia2og is not as cheer*u2, the /*or-ardJmai21 service -i22 not *or-ard the message but -i22 e;it instead9 Parameters can be .assed -ith the T.aramV e2ements o* a Tsubdia2ogV9 4hese .arameters must be dec2ared in the subdia2og using TvarV e2ements, as dis.2ayed in Fig9%$-%(9 4he /mai2Jservice1 dia2og .asses the .arameters to the /*or-ardJmai21 service -ith the *o22o-ing 2ines: +5or*1 +suB2ialo) na*e="5orwar2"*ail" src="5orwar2"*ail.-L*l"1 +para* na*e="suBGect" eLpr=" [ >ello\ "01 +para* na*e=,ol2"*essa)e, eLpr=, [>ow are 6ouM\ Z01 +5ille21 +suB*it neLt="http<00Ga-aschool.co*0school0puBlic0speech0*ail.Gsp"01 +05ille21 +0suB2ialo)1 +05or*1

8ooking into the PR07P4 e;am.2es in Fig9%$-%O, -e can see the tags -e 2earned be*ore as SS78 e2ements9 !o -onder9 4he 5oice678 $9" S.eci*ication mode2s the content o* the T.rom.tV e2ement based on the E'C S.eech Synthesis 7arku. 8anguage %9" >SS78?, and makes avai2ab2e the *o22o-ing SS78 e2ements: +au2io1 - S.eci*ies audio *i2es to be .2ayed and te;t to be s.oken9 +Break1 - S.eci*ies a .ause in the s.eech out.ut9 +2esc1 - Provides a descri.tion o* a non-s.eech audio source in +au2io19 +e*phasis1 - S.eci*ies that the enc2osed te;t shou2d be s.oken -ith em.hasis9 +leLicon1 - S.eci*ies a .ronunciation 2e;icon *or the .rom.t9 +*ark1 - Ignored by 5oice678 .2at*orms9 +*eta2ata1 - S.eci*ies 678 metadata content *or the .rom.t9 +para)raph1(a2ias T.V? - Identi*ies the enc2osed te;t as a .aragra.h, containing Cero or more sentences +phone*e1 - S.eci*ies a .honetic .ronunciation *or the contained te;t9 +proso261 - S.eci*ies .rosodic in*ormation *or the enc2osed te;t9 +sa67as1 - S.eci*ies the ty.e o* te;t construct contained -ithin the e2ement9 +sentence1 >a2ias TsV? - Identi*ies the enc2osed te;t as a sentence9 +suB1 - S.eci*ies re.2acement s.oken te;t *or the contained te;t9 +-oice1 - S.eci*ies voice characteristics *or the s.oken te;t9 4he *o22o-ing e;am.2e in Fig9%$-%&9v;m2 uses the +recor21 e2ement to co22ect an audio recording *rom the user9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ve sion8"2.K" ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd"! <*o m! <p ope t' name8"ba gein" value8"t ue"/! < eco d name8"msg" beep8"t ue" ma,time8"EKs"

*inalsilence8"3KKKms" dtm*te m8"t ue" t'pe8"audio/,?$av"! <p ompt timeout8"Is"! .eco d 'ou audio message a*te the beep. </p ompt! <noinput! 4 didnUt hea an'thing5 please t ' again. </noinput! </ eco d! <submit ne,t8"http://javaschool.com/school/public/speech/ eco ding.jsp" enct'pe8"multipa t/*o m?data" method8"post" namelist8"msg"/! </*o m! </v,ml!

[Fig.1 !17" 4his e;am.2e a2so uses the Bar)ein .ro.erty that contro2s -hether a user can interru.t a .rom.t9 Setting the Bar)ein .ro.erty to /true1 a22o-s the user to interru.t the .rogram, introducing a *iLe2 initiati-e9 4he .rogram .rom.ts the user to record her or his message9 A re*erence to the recorded audio is stored in the /msg1 variab2e9 4here are severa2 im.ortant settings in the recor2 e2ement, inc2uding timeouts and D47F4BR79 4he recording sto.s under one o* the *o22o-ing conditions: a *ina2 si2ence *or more than ' sec occurs, a D47F key is .ressed, the ma;imum recording time, %" sec, is e;ceeded, or the ca22er hangs u.9 4he audio message -i22 be sent to the -eb server via the H44P P0S4 method -ith the encty.eUKmu2ti.artS*orm-data19 Another e;am.2e, in Fig9%$-%M9v;m2, demonstrates a 5oice678 *eature that a22o-s the user to enter te;t messages using a te2e.hone key.ad9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ve sion8"2.K" ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd"! <*o m id8"ke'9message"! <object name8"message" classid8"builtin://ke'pad9te,t9input"! <p ompt! /nte 'ou message $ith the telephone ke's. & ess sta *o a space5 and the pound sign to end the message. </p ompt! </object! <block! <assign name8"document.ke'9message" e,p 8"message.te,t"/! <goto ne,t8"#send9message"/! </block! </*o m!

</v,ml!

[Fig.1 !18" 5oice678 su..orts .2at*orms -ith te2e.hone keys9 In the e;am.2e above the user is .rom.ted to ty.e the message9 4he +Block1 e2ement co.ies the message to the variab2e 2ocu*ent.ke6"*essa)e. 4his e;am.2e sho-s the usage o* the oBGect e2ement, a .art o* BC7AScri.t F%'G9 4;%&Script Deve2o.ed under the Buro.ean Com.uter 7anu*acturers Association >BC7A?, BC7AScri.t -as mode2ed a*ter )avaScri.t but designed as a..2icationinde.endent9 4he 2anguage -as divided into t-o .arts: a domain inde.endent core, and a domain s.eci*ic obAect mode29 BC7AScri.t de*ines a 2anguage core, 2eaving the design o* domain obAect mode2 to s.eci*ic vendors9 An BC7AScri.t obAect, .resented in the e;am.2e, can have *o22o-ing attributes: na*e - Ehen the obAect is eva2uated, it sets this variab2e to an BC7AScri.t va2ue -hose ty.e is de*ined by the obAect9 eLpr - 4he initia2 va2ue o* the *orm item variab2e: de*au2t is the BC7AScri.t va2ue /un2e5ine219 I* initia2iCed to a va2ue, then the *orm item -i22 not be visited un2ess the *orm item variab2e is c2eared9 con2 - An e;.ression that must eva2uate to true a*ter conversion to boo2ean in order *or the *orm item to be visited9 classi2 - 4he URI s.eci*ying the 2ocation o* the obAectQs im.2ementation9 4he URI conventions are .2at*orm-de.endent9 co2eBase - 4he base .ath used to reso2ve re2ative URIs s.eci*ied by c2assid, data, and archive9 It de*au2ts to the base URI o* the current document9 co2et6pe - 4he content ty.e o* data e;.ected -hen do-n2oading the obAect s.eci*ied by classi29 4he de*au2t is the va2ue o* the ty.e attribute9 2ata - 4he URI s.eci*ying the 2ocation o* the obAectQs data9 I* it is a re2ative URI, it is inter.reted re2ative to the co2eBase attribute9 t6pe - 4he content ty.e o* the data s.eci*ied by the data attribute9 archi-e - A s.ace-se.arated 2ist o* URIs *or archives containing resources re2evant to the obAect, -hich may inc2ude the resources s.eci*ied by the classi2 and data attributes9

BC7AScri.t .rovides scri.ting ca.abi2ities *or Eeb-based c2ient-server architecture and makes it .ossib2e to distribute com.utation bet-een the c2ient and server9 Bach Eeb bro-ser and Eeb server that su..orts BC7AScri.t su..orts >in its o-n -ay? the BC7AScri.t e;ecution environment9 Some o* the *aci2ities o* BC7AScri.t are simi2ar to )ava and Se2* F%DG 2anguages9 An BC7AScri.t .rogram is a c2uster o* communicating obAects that consist o* an unordered co22ection o* properties -ith their attriButes. Attributes, 2ike /Read0n2y1, /DontBnum1, /DontDe2ete1, or /Interna21, determine ho- each .ro.erty can be used9 For e;am.2e, the .ro.erty -ith the /Read0n2y1 attribute is not changeab2e and not e;ecutab2e by BC7AScri.t .rograms, the /DontBnum1 .ro.erties cannot be enumerated in the .rogramming 2oo.s, your attem.ts to de2ete the /DontDe2ete1 .ro.erties -i22 be ignored, and the /Interna21 .ro.erties are not direct2y accessib2e via the .ro.erty access o.erators9 BC7AScri.t .ro.erties are containers *or obAects, pri*iti-e -alues, or *etho2s9 A .rimitive va2ue is a member o* one o* the *o22o-ing bui2t-in ty.es: Un2e5ine2, ull, Noolean, u*Ber, and 3trin)9 BC7AScri.t de*ines a co22ection o* Built7in oBGects that inc2ude the *o22o-ing obAect names: GloBal, 'BGect, #unction, $rra6, 3trin) >yes, there obAects -ith the same names as bui2t-in .rimitive ty.es?, Noolean, u*Ber, Math, Date, !e)ELp, and severa2 Error obAect ty.es9 4;%&Script in Voice:%( 5ocu)ents Fig9%$-$"9v;m2 .resents BC7AScri.t embedded into the /*or-ardJmai21 subdia2og9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd" ve sion8"2.K"! <*o m id8"*o $a d9mail"! <R?? t$o pa amete s elated to the o iginal mail passed * om the mail9se vice ??! <va name8"subject"/! <va name8"old9message"/! <R?? t$o va iables collected b' the "compose" subdialog ??! <va name8"to9name"/! <va name8"message"/! <subdialog name8"compose" s c8"compose.v,ml"! <*illed! <R?? The "compose" subdialog etu ns its status and t$o va iables belo$.

The status and othe va iables must be speci*ied in the " etu n" element o* the "compose" ??! <i* cond8"compose.status 88 U:+U"! <assign name8"to9name" e,p 8"compose.to9name"/! <assign name8"message" e,p 8"compose.message"/! <R?? use /C)>#c ipt to p epa e subject and bod' *ields ??! <sc ipt! <RCC(>T>C subject 8 U=@: U M subject% message 8 message M UFn????? : iginal message ????FnU M old9message% DD! </sc ipt! <R?? etu n all data ??! < etu n namelist8"to9name subject message" /! <else/! #o '5 the s'stem cannot delive the message. <e,it/! </i*! </*illed! </subdialog! </*o m! </v,ml!

[Fig.1 ! 9" Severa2 2ines o* the BC7AScri.t give the *ina2 touch to the /*or-ardJmai21 dia2og9 4he subAect o* the *or-arded message -i22 start -ith /FE:/ and the body o* the message -i22 inc2ude not on2y the current message .rovided by the user, but a2so the origina2 message that the user -ants to *or-ard to another reci.ient9 .ra))ar rules According to the 5oice678 $9" S.eci*ication, .2at*orms shou2d su..ort the Augmented !F >A !F? Form o* the E'C S.eech Recognition Irammar S.eci*ication, a2though 5oice678 .2at*orms may choose to su..ort grammar *ormats other than SRIS9 4he TgrammarV e2ement may s.eci*y an inline grammar or an eLternal grammar9 Fig9%$-$% demostrates an e;am.2e o* in2ine grammar9
<g amma mode8"voice" ,ml:lang8"en?0#" ve sion8"E.K" oot8"t aining"! <R?? #election o* one o* the t aining cou ses ??! < ule id8"cou se" scope8"public"! <one?o*! <item! Wava 4nt oduction </item! <item! >dvanced Wava </item! <item! @i eless 4nt oduction </item! <item! Wava )ic oedition </item! <item! #peech Technologies </item!

<item! :ntolog' 4nt oduction </item! <item! 4nteg ation Technologies </item! <item! +no$ledge and #e vice 4nteg ation </item! <item! Aatu al 0se 4nte *ace </item! </one?o*! </ ule! </g amma !

[Fig.1 ! 1" 4his sim.2e e;am.2e .rovides in2ine grammar ru2es *or the se2ection o* one o* many items9 In a simi2ar manner, 5oice678 a22o-s deve2o.er to .rovide D47F grammar ru2es9 +)ra**ar *o2e="2t*5" wei)ht="9.?" src="http<00Ga-aschool.co*0school0puBlic0speech0-L*l02t*5.nu*Ber"01 4he grammar above inc2udes re*erences to the dtm* grammar *i2e9 4he e;tract be2o- sho-s in2ine dtm* grammar ru2es9 +)ra**ar *o2e="2t*5" -ersion="C.9" root="co2e"1 +rule i2="root" scope="puBlic"1 +one7o51 +ite*1 C 8 ? +0ite*1 +ite*1 O +0ite*1 +0one7o51 +0rule1 +0)ra**ar1 The Voice:%( interpreter e$aluates its own perfor)ance9 4he application.lastresultK variab2e ho2ds in*ormation about the 2ast recognition9 4he application.lastresultK]i^.con5i2ence can vary *rom "9" to%9"9 A va2ue o* "9" indicates minimum con*idence (he application.lastresultK]i^.utterance kee.s the ra- string o* -ords >or digits *or D47F?? that -ere recogniCed *or this inter.retation9 4he application.lastresultK]i^.input*o2e stores the 2ast mode va2ue >dtm* or voice?9 4he application.lastresultK]i^.interpretation variab2e contains the 2ast inter.retation resu2t9 4his se2*-eva2uation *eature can be used to .rovide additiona2 con*irmationa2 .rom.t -hen necessary9

+i5 con2="application.lastresultK.con5i2ence _lt; 9.E"1 +)oto neLtite*="con5ir*ation2ialo)"01 +else01 -esources an5 ;aching A 5oice678 inter.reter *etches 5oice678 documents and other resources, such as audio *i2es, grammars, scri.ts, and obAects, using .o-er*u2 caching mechanisms9 Un2ike a visua2 bro-ser, a 5oice678 inter.reter 2acks end user contro2s *or cache re*resh, -hich is contro22ed on2y through a..ro.riate use o* the *aLa)e and *aLstale attributes in v;m2 documents9 4he *aLa)e attribute indicates that the document is -i22ing to use content -hose age is no greater than the s.eci*ied time in seconds9 I* the *aLstale attribute is assigned a va2ue, then the document is -i22ing to acce.t content that has e;ceeded its e;.iration time by no more than the s.eci*ied number o* seconds9 %eta5ata 5oice678 does not re@uire metadata in*ormation9 Ho-ever, it .rovides t-o e2ements in -hich metadata in*ormation can be e;.ressed: T*etaV and T*eta2ataV, -ith the recommendation that metadata is e;.ressed using the T*eta2ataV e2ement, -ith in*ormation in Resource Descri.tion Frame-ork >RDF?9 Simi2ar2y to H478, the T*etaV e2ement can contain a metadata .ro.erty o* the document e;.ressed by the .air o* attributes, na*e and content9 +*eta na*e=")enerator" content="http<00Ja-a3chool.co*"01 4he T*etaV e2ement can a2so s.eci*y H44P res.onse headers -ith http7eVuiand content attributes9 +*eta http7eVui-="Content7(6pe" content="teLt0ht*l; charset=iso7;;@F7C" 01 A 5oice678 document can inc2ude the TmetadataV e2ement using the Dub2in Core version %9" RDF schema F%#G9 Fig9%$-$$9v;m2 .rovides an e;am.2e o* a 5oice678 document -ith the T*eta2ataV e2ement9
<S,ml ve sion8"E.K" encoding8"0T=?V"S! <v,ml ve sion8"2.K" ,mlns8"http://$$$.$3.o g/2KKE/v,ml" ,mlns:,si8"http://$$$.$3.o g/2KKE/P)1#chema?instance" ,si:schema1ocation8"http://$$$.$3.o g/2KKE/v,ml http://$$$.$3.o g/T./voice,ml2K/v,ml.,sd"!

<metadata! < d*:.(= ,mlns: d* 8 "http://$$$.$3.o g/EXXX/K2/22? d*?s'nta,?ns#" ,mlns: d*s 8 "http://$$$.$3.o g/T./EXXX/&.? d*?schema?EXXXK3K3#" ,mlns:dc 8 "http://pu l.o g/metadata/dublin9co e#"! <R?? )etadata about the 2oiceP)1 document ??! < d*:(esc iption about8"http://javaschool.com/school/public/speech/v,ml/t aining.v,ml" dc:Title8"T aining Cou ses" dc:(esc iption8"T aining Cou ses 1ist" dc:&ublishe 8"4T#" dc:1anguage8"en" dc:(ate8"2KK3?KI?KI" dc:.ights8"Cop' ight 2KK3 We** Yhuk" dc:=o mat8"application/voice,mlM,ml" ! </ d*:(esc iption! </ d*:.(=! </metadata! <*o m id8"t aining"! <*ield name8"cou se"! <g amma t'pe8"application/s gsM,ml" s c8"/g amma s/t aining.g ,ml"/! <p ompt!@hich cou se do 'ou $ant to takeS -e e is the list o* cou ses: <R?? list o* cou ses o**e ed ??! </p ompt! <i* cond8"cou se 88 Uope ato U "! <goto ne,t8"http://javaschool.com/school/public/speech/v,ml/ope ato .v,ml" /! </i*! <noinput! 4 could not hea 'ou. < ep ompt/! </noinput! <nomatch count8"E"! &lease select an' Wava5 @i eless5 o :ntolog' cou se * om the list. < ep ompt/! </nomatch! <nomatch count8"2"! <p ompt! 4 am so '5 $e have almost so man' t'pes o* t aining cou ses but not this one. 4 $ould ecommend 'ou to sta t $ith the :ntolog' 4nt oduction cou se at this time. </p ompt! </nomatch! <nomatch count8"3"! 4 s$itch 'ou to the ope ato . -ope*ull' 'ou $ill *ind the cou se 'ou $ant. <ood luck. <goto ne,t8"http://javaschool.com/school/public/speech/v,ml/ope ato .v,ml" /! </nomatch! </*ield!

<block! <submit ne,t8"http://javaschool.com/school/public/speech/v,ml/t aining.jsp"/! </block! </*o m! </v,ml! </*o m! </v,ml!

[Fig.1 !

"

4he T*eta2ataV e2ement .rovides hidden >and si2ent? in*ormation about the document, -hich nonethe2ess serves >or -i22 serve? an e;treme2y im.ortant ro2e in the interconnected -or2d9 4his in*ormation *eeds search engines and he2.s end users *ind the document9 4he metadata e2ement ends our voyage into 5oice678 techno2ogy, and a2so ends this cha.ter9 Su))ary 4his cha.ter revie-ed voice techno2ogies, s.eech synthesis and recognition, re2ated standards, and some im.2ementations9 5oice678-based techno2ogy is the most mature, and is .rime-time ready *or -hat it -as designed *or: te2e.hony a..2ications that o**er menu driven voice dia2ogs that eventua22y 2ead to services9 Data communications is gro-ing, and -ire2ess devices -i22 begin to e;change more data .ackets outside than inside o* the te2e.hony -or2d9 At that .oint, the 2ightness and mu2ti-moda2ity o* SA84 -i22 make it a stronger com.etitor9 !either o* these techno2ogies -as designed *or a natura2 2anguage user inter*ace9 0ne common 2imitation is the grammar ru2es standard de*ined by the SRIS9 4hey *it .er*ect2y into the mu2ti.2e-choice -or2d, but have no room *or the thought*u2 .rocess o* understanding9
Integrating Q estions Uhat are the co**on 5eatures o5 speech application architecturesM Uhat role pla6s `M% in speech applicationsM !ase "t dy

E. Create a 3$%( 5ile, si*ilar to #i).C87CC.L*l, that is relate2 to a Book or2er.


8. Create a )ra**ar 5ile to support or2erin) a Book. ?. DescriBe an application at 6our workplace that can Bene5it 5ro* speech technolo)6

-eferences

E. 4he )ava S.eech API - htt.:SSAava9sun9comS.roductsSAava-mediaSs.eech 2. 4he )ava47 S.eech API 7arku. 8anguage (J3M%) 7 htt.:SSAava9sun9comS.roductsSAava-mediaSs.eechS*orDeve2o.ersS)S78 3. S.eech Synthesis 7arku. 8anguage - htt.:SS---9-'9orgS4RSs.eechsynthesisS 4. 4he )ava47 S.eech Irammar Format (J3G#) 3peci5ication a htt.:SSAava9sun9comS.roductsSAava-mediaSs.eechS*orDeve2o.ersS)SIFS I. S.hin;, o.en source s.eech recognition .roAect htt.:SSsource*orge9netS.roAectsScmus.hin;S O9 7icroso*t S.eech SD3 htt.:SSdo-n2oad9microso*t9comSdo-n2oadSs.eechSD3S (9 Rene !y**enegger, A CPP Socket C2ass *or Eindo-s, on2ine, Internet, Dec9 $", $""$9 htt.:SS---9ad.-gmbh9chS-inSmiscSsockets9htm2 &9 S.eech A..2ication 8anguage 4ags >SA84? 4echnica2 Ehite Pa.er, on2ine, SA84*orum, Internet, "%S$"S$""$9 Avai2ab2e: htt.:SS---9sa2t*orum9orgSs.ec9as. M9 5oice678 - ---9voice;m29orgSs.ec9htm2 %"9 S.eech Recognition Irammar Standard >SRIS?, htt.:SS---9-'9orgS4RSs.eech-grammar %%9 !atura2 8anguage Semantics 7arku. 8anguage htt.:SS---9-'9orgS4RSn2-s.ecS %$9 Ca22 Contro2 678 - htt.:SS---9-'9orgS4RScc;m2S %'9 Standard BC7A-$O$ BC7AScri.t 8anguage S.eci*ication htt.:SS---9ecma-internationa29orgS %D9 Ungar, David, and Smith, Randa22 9 Se2*: 4he Po-er o* Sim.2icity9 00PS8A L&( Con*erence Proceedings, ..9 $$(,$D%, 0r2ando, F8, 0ctober %M&( %#9 KDub2in Core 7etadata Initiative K, a Sim.2e Content Descri.tion 7ode2 *or B2ectronic Resources9 - htt.:SS.ur29orgSDCS

S-ar putea să vă placă și