教育资源为主的文档平台

当前位置: 查字典文档网> 所有文档分类> 工程科技> 电子/电路> A-PSO-AB-classifier-for-solving-sequence-classification-problems_2015_Applied-Soft-Computing

A-PSO-AB-classifier-for-solving-sequence-classification-problems_2015_Applied-Soft-Computing

A-PSO-AB-classifier-for-solving-sequence-classification-problems

AppliedSoftComputing27(2015)11–27

ContentslistsavailableatScienceDirect

AppliedSoftComputing

内容需要下载文档才能查看

journalhomepage:http://wendang.chazidian.com/locate/aso

内容需要下载文档才能查看 内容需要下载文档才能查看

c

APSO-ABclassi?erforsolvingsequenceclassi?cationproblems

Chieh-YuanTsaia,b,?,Chih-JungChena

ab

DepartmentofIndustrialEngineeringandManagement,Yuan-ZeUniversity,TaiwanInnovationCenterforBigDataandDigitalConvergence,Yuan-ZeUniversity,Taiwan

article

info

abstract

Articlehistory:

Received8September2013

Receivedinrevisedform27July2014Accepted21October2014

Availableonline30October2014

Keywords:

Sequenceclassi?cationClosedminingalgorithm

Particleswarmoptimization(PSO)algorithm

Adaptiveboosting(AdaBoost)

Recently,considerableattentionhasfocusedoncompoundsequenceclassi?cationmethodswhichinte-gratemultipledataminingtechniques.Amongthesemethods,sequentialpatternmining(SPM)basedsequenceclassi?ersareconsideredtobeef?cientforsolvingcomplexsequenceclassi?cationprob-lems.AlthoughpreviousstudieshavedemonstratedthestrengthofSPM-basedsequenceclassi?cationmethods,thechallengesofpatternredundancy,inappropriatesequencesimilaritymeasures,andhard-to-classifysequencesremainunsolved.Thispaperproposesanef?cienttwo-stageSPM-basedsequenceclassi?cationmethodtoaddressthesethreeproblems.Inthe?rststage,duringthesequentialpatternminingprocess,redundantsequentialpatternsareidenti?edifthepatternisasub-sequenceofothersequentialpatterns.Alistofcompactsequentialpatternsisgeneratedexcludingredundantpatternsandusedasrepresentativefeaturesforthesecondstage.Inthesecondstage,asequencesimilaritymeasurementisusedtoevaluatepartialsimilaritybetweensequencesandpatterns.Finally,aparticleswarmoptimization-AdaBoost(PSO-AB)sequenceclassi?erisdevelopedtoimprovesequenceclassi?-cationaccuracy.InthePSO-ABsequenceclassi?er,thePSOalgorithmisusedtooptimizetheweightsintheindividualsequenceclassi?er,whiletheAdaBooststrategyisusedtoadaptivelychangethedistribu-tionofpatternsthatarehardtoclassify.Theexperimentsshowthattheproposedtwo-stageSPM-basedsequenceclassi?cationmethodisef?cientandsuperiortootherapproaches.

©2014ElsevierB.V.Allrightsreserved.

1.Introduction

TherapiddevelopmentofcomputerandInternettechnologieshasallowedforthecollectionofhugeamountsofsequencedatainmany?elds.Inbioinformatics,DNA,RNAandproteinsarecom-posedofsequencesofmoleculesegments.Ininformationretrieval,documentsarecomposedofsequencesofwords.Ineconomics,http://wendang.chazidian.com-workintrusiondetectionfeaturessequencesofTCP/IPApackets.Asequencemayrepresentaspeci?cfunction,target,orclasslabel.Forexample,atimeseriesofECGdatamaycomefromahealthyorillperson.ADNAsequencemaybelongtoagenecodingareaoranon-codingarea.Asequenceclassi?cationproblem,therefore,seekstoassignthemostprobableclasslabeltoagivensequencebyagenerativesequenceclassi?er.Manyreal-worldapplicationssuchasproteinclassi?cation[1–5],textclassi?cation[6–8],speechrecognition[9–11]andimageidenti?cation[12,13]belongtothisdomain.

?Correspondingauthorat:DepartmentofIndustrialEngineeringandManage-ment,Yuan-ZeUniversity,Taiwan.

E-mailaddress:cytsai@saturn.yzu.edu.tw(C.-Y.Tsai).

Compoundsequenceclassi?cationmethodsseektomaximizeclassi?cationaccuracybyintegratingmultipledataminingtech-niques.Amongthesemethods,sequentialpatternmining(SPM)basedsequenceclassi?ersareconsideredtobeef?cientforsolv-ingcomplexsequenceclassi?cationproblems[14–17].Typically,anSPM-basedsequenceclassi?erconsistsoftwostages.The?rststageappliesthesequentialpatternminingapproachtoextractfrequentsequentialpatternsfromalargedatabase.Theextractedsequentialpatterns,consideredasrepresentativefeatures,arethenusedtobuildtheclassi?cationmodelinthesecondstage.

AlthoughpreviousstudieshaveshownthestrengthoftheSPM-basedsequenceclassi?cationmethods,thesemethodssufferfromthreeproblems.First,previousstudieshavesimplytakenallsequentialpatternsextractedfromthe?rststageastheinputfea-turesofthesequenceclassi?erinthesecondstage.Althoughthisapproachisstraightforwardandeasytoimplement,alargenum-berofextractedsequentialpatternsmayresultinimpracticallylongtrainingtime.Moreover,redundantandnon-discriminativepatternsinvolvedinthefeaturesetmightsigni?cantlydegradetheclassi?cationperformance.Forexample,inpreviousstudies,ifsequentialpatternsA–B–C,A–B–D,B–C–D,andA–B–C–Darederivedinthe?rststage,theywillallbeconsideredasinputfea-turesinthesecondstage.However,itisclearthatA–B–C,A–B–D,

http://wendang.chazidian.com/10.1016/j.asoc.2014.10.029

1568-4946/©2014ElsevierB.V.Allrightsreserved.

A-PSO-AB-classifier-for-solving-sequence-classification-problems

12

C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27

andB–C–DarethesubsequencesofA–B–C–D.Ifallfoursequencesareconsideredasfeatures,featureredundancymightover-?tthesequenceclassi?erandthusdegradeclassi?cationaccuracy[18].Therefore,screeningredundantsequentialpatternsandreducingthenumberofrepresentativefeaturesisanimportantconsider-ationinSPM-basedsequenceclassi?cationmethods.

Second,sequencesimilaritymeasurementplaysacriticalroleinjudgingthedifferencedistancebetweenasequenceandasequen-tialpattern.Mostpreviousstudiesincorporatedgapconstraintsonconsecutivesequenceelementstodeterminesequencedis-tinction.Ifapatterniscontainedinasequence,thesimilaritybetweenthepatternandthesequenceis1;otherwise,itis0.Forexample,ifthegapissetas1,thesimilaritybetweensequenceA–B–CandpatternA–Bis1,whilethesimilaritybetweensequenceA–C–BandpatternA–Bis0.Ifthegapissetas2,thesimilaritybetweensequenceA–C–D–BandpatternA–Bis0,whilethesim-ilaritybetweenA–B–C–D(orA–C–B–D)andA–Bis1.However,thesimilaritymeasurecannotrevealpartialmatchingbetweenasequenceandapattern,since“match”and“non-match”aretheonlyoptions.Inaddition,usingthisapproachitisdif?culttodeter-mineanappropriategapvalue.

Third,sequentialpatternsextractedinthe?rststagearecon-sideredastherepresentativefeaturesoftheclassi?cationmodelinthesecondstage.However,notallsequentialpatternsareequallyimportant.Previousworks[1,19]solvedthisproblembyassigningweightstosequentialpatternswheretheweightsareadaptivelyadjustedbyoptimizationtechniques.Althoughthisapproachsuc-ceededinraisingsequenceclassi?cationaccuracy,themodelstendtobeover-?tting[19]andwereinsuf?cientlysensitivetominor-itypatterns,thusoftenresultinginincorrectclasslabelpredictions.Therefore,aniterativeproceduretoadaptivelychangethedistribu-tionoftrainingdatabyfocusingmoreonpreviouslymisclassi?edminoritypatternsshouldhelpimprovesequenceclassi?cationaccuracy.

Tosolvetheabovedif?culties,anef?cientsequentialpatternmining-basedsequenceclassi?cationmethodisproposed.Inthe?rststage,duringthesequentialpatternminingprocess,redundantsequentialpatternsareidenti?edifthepatternisasub-sequenceofothersequentialpatterns.Alistofcompactsequentialpat-terns(excludingredundantpatterns)isgeneratedandusedasrepresentativefeaturesinthesecondstage.Thesecondstageusesasequencesimilaritymeasurementwhichcanevaluatepartialmatchingbetweensequencesandpatterns.Finally,aPSO-AdaBoostsequenceclassi?erisdevelopedtoimprovesequenceclassi?cationaccuracy.InthePSO-AdaBoostsequenceclassi?er,particleswarmoptimization(PSO)algorithmisusedtooptimizetheweightsintheindividualsequenceclassi?er,whiletheAdaBooststrategyisusedtoensemblepatternsthatarehardtoclassifyusingtheindividualclassi?ers.

2.Literaturereview

Thesequenceclassi?cationproblemistoassignthemostprob-ableclasslabeltoagivensequencebyagenerativeclassi?erandarisesinmanyreal-worldapplications,suchasproteinfunctionprediction,textclassi?cationandspeechrecognition.Inproteinfunctionpredictionresearch,alargeamountofsequencedataisclassi?edintovariouscategoriescorrespondingtoeithertheirroleinthechromosomes,theirstructure,and/ortheirfunction[20].DeSouzaRodriguesetal.presentedamethodologybasedonarti?-cialneuralnetworksforproteinfunctionalclassi?cation[21].Theresearchpresentsanewproteincodingscheme,calledExtended-SequenceCodingbySlidingWindows,toovercomingsomeofthedif?cultiesofthewell-knownmethod–SequenceCodingbySlid-ingWindow.ShiandZhangpresentedthe6-statehiddenMarkov

model(HMM)whichholdsfewerstates,cleartransitiongroupsandfewermodelparameters[22].Consideringthehierarchicalstructureofproteinsbasedonthe6-stateHMM,theyproposedusingthehierarchicalhiddenMarkovmodel(HHMM)whichnotonlymaintainsaclearbiologicalmeaning,buthasalsofewertran-sitions.Textclassi?cationisthetaskofautomaticallysortingasetofdocumentsintoprede?nedclasses.Wangetal.automat-icallyconstructedathesaurusofconceptsfromWikipedia[23].Theythenintroducedauni?edframeworktoexpandthe“BagofWords”(BOW)representationwithsemanticrelations(syn-onymy,hyponymy,andassociativerelations),anddemonstrateditsef?cacyinenhancingpreviousapproachesfortextclassi?ca-tion.Zuoetal.proposedanewtextclassi?cationmodelbasedonMarkovnetworkdistance[24].Theyrepresentedthedocu-mentsandcategoriesusingMarkovnetworkstomodelrelevantinformationinthedocuments.Inspeechrecognition,SmaragdisandRajextendednon-negativerepresentationsofspectrogramstodesignaMarkovselectionmodelthatcanrecognizesequencesevenwhentheyaremixedtogetherandwithoutsignalseparation[25].Lipeikaaddressedformantfeaturesindynamictimewarping-basedspeechrecognition[26].Thesefeaturescanbesimplyvisualizedandprovidenewinsightintothecausesofspeechrecognitionerrors.Thefeaturesoptimizetheformantfeature-basedisolatedwordrecognitionperformancebyvaryingtherecognitionsystem’sprocessingparameterswhileidentifyingpotentialimprovementstotherecognitionsystem.Yangetal.proposedanewimagedenois-ingschemeusingsupportvectormachine(SVM)classi?cationinshiftablecomplexdirectionalpyramid(PDTDFB)domain[12].ThedetailsubbandsofPDTDFBcoef?cientsaredenoisedbyusingthedifferentparameterstocontrolthemultiscaleandmultidirectionalanisotropicdiffusion.AwadandMotaiusedsupportvectormachine(SVM)techniquetopresentadynamicclassi?cationasanewincre-mentalframeworkformultiple-classifyingvideostreamdata[13].ThisdynamicapproachleadstoanextensionofSVMbeyonditscurrentstaticimage-basedlearningcapabilities.

Recently,somestudieshaveusedsequentialpatternminingtechniquestoenhancecomputationalef?ciencyforthesequenceclassi?cationproblem.Sequentialpatternminingcan?ndfrequentsequentialpatternswithinalargedatabase.Theextractedsequen-tialpatternsareconsideredtobeimportantfeaturesandareusedtobuildtheclassi?cationmodel.Leshetal.proposedanalgorithmforsequenceclassi?cationusingfrequentpatternsasfeaturesintheclassi?er[15].Intheiralgorithm,subsequencesareextractedandtransformedintosetsoffeatures.Followingfeatureextraction,generalclassi?cationalgorithmssuchasNaïveBayes,SVMorneu-ralnetworkscanbeusedforclassi?cation.Leshetal.proposedascalablefeatureminingalgorithmtoactasthepreprocessortoselectfeaturesforstandardclassi?cationalgorithmssuchasWin-nowandNaïveBayes[16].Byadaptingscalableanddisk-baseddataminingalgorithms,theywereabletoclassifythesequencesef?-ciently.Exarchosetal.proposedanovelclassi?cationmethodforbiologicaldata,usingcSPADE(SequentialPAtternDiscoveryusingEquivalenceclasses)toanalyzeproteinsequences[1].cSPADEwasusedtoextractthesesequentialpatternstocharacterizeeachclass(proteinfold).Inaddition,aclassi?erusestheextractedsequen-tialpatternstoclassifyproteinsintheappropriatefoldcategory.Exarchosetal.presentedanovelmethodologyforsequenceclas-si?cation,basedonsequentialpatternminingandoptimizationalgorithms[14].Thesequentialpatternminingalgorithmisappliedtoextractsequentialpatternstofromasetofsequences.Thescoreofeverypatternwithrespecttoeachsequenceisthencalculatedusingascoringfunctionandthescoreofeachclassundercon-siderationisestimatedbysummingthespeci?cpatternscores.Eachscoreisupdatedandmultipliedbyaweight.Theoptimiza-tiontechniqueisemployedtoestimatetheweightvaluesandachieveoptimalclassi?cationaccuracy.Tsaietal.tookcustomer

A-PSO-AB-classifier-for-solving-sequence-classification-problems

C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27

13

内容需要下载文档才能查看

Fig.1.Theproposedsequentialpatternmining-basedsequenceclassi?cationmethod.

temporalbehaviordata,calledtime-intervalsequences,asclassi?-cationcriteriaanddevelopedatwo-stageclassi?cationframework[19].Inthe?rststage,time-intervalsequentialpatternsarediscov-eredfromcustomertemporaldatabases.Atime-intervalsequenceclassi?eroptimizedbytheparticleswamoptimization(PSO)algo-rithmwasthendevelopedtoachievehighclassi?cationaccuracyinthesecondstage.

??a1,a2,...,an??andˇbe??b1,b2,...,bm??.?isasubsequenceofˇifthereexistintegers1<i1<i2<···<in<msuchthata1?bi1,a2?bi2,...,an?bin.Alternatively,ˇiscalledasuper-sequenceof?,orˇcontains?.Thesupportofasequence?isde?nedasthefractionofallsequencesthatcontain?.Ifthesupportof?isgreaterthanorequaltoauser-speci?edthresholdminsup,?isdeclaredtobeasequentialpattern(orafrequentsequence).

3.Thesequentialpatternmining-basedsequenceclassi?cationmethod

http://wendang.chazidian.compactsequentialpatternmining

Theproposedsequentialpatternmining-basedsequenceclas-si?cationmethodisillustratedinFig.1.Initially,sequencesintheSequenceDatabase,SD,aredividedintoatrainingsequencedataset,TrainSD,andatestingsequencedataset,TestSD.Inthe?rststage,sequencesinTrainSDareinputtedintothecompactsequentialpatternminingmethodtogenerateasetofcompactsequentialpatterns,CSP.Inthesecondstage,theboostingmechanismintheparticleswarmoptimization-AdaBoost(PSO-AB)sequenceclassi-?cationmethodadaptivelychangestheweightofeachcompactsequentialpatterninCSP.Inthekthround,asampletrainingpat-ternset,STPk,isobtainedaccordingtotheweightofeachpatterninCSP.APSO-basedsequenceclassi?er,calledPSOSeqClassi?erk,isthusbuiltbasedonSTPkandTrainSD,andtestedusingTestSD.InthePSOSeqClassi?erk,apartialmatchingsimilaritymeasurementisdevelopedtocalculatethesimilarityvaluebetweenasequenceandapattern.Furthermore,particleswarmoptimization(PSO)isusedtoupdatetheweightsinthePSOSeqClassi?erktomaximizetheaccuracyoftheclassi?cationresult.

3.1.De?nition

Asetofsymbolsandnotationsisde?nedasfollows.AsequencedatabaseSD={??Si,ci??|i=1,...,q}whereSiisasequenceandciisaclasslabel(ci∈{1,2,...,n}).LetI={i1,i2,...,ip}beasetofitems.AsequenceSisanorderedlistofitemsets,denotedas??s1,s2,...,sm??wheresjisanitemsetandsj?I.Thelengthofasequencecorrespondstothenumberofitemsetsinthesequence,whileak-sequenceisasequencethatcontainskitemsets.Letsequence?be

Previoussequentialpatternminingalgorithmsemphasizetheorderofoccurrenceandgeneratecompletefrequentsequentialpatternssatisfyingaminsupthreshold.Infact,thecompletemin-ingstrategyoftengeneratesahugenumberofpatterns,especiallywhenminislow.Thisisbecauseifapatternisfrequent,eachofitssub-patternsisfrequentaswell.Forexample,thereare?vesequencesinthedatabaseasshowninFig.2(a).Ifminissetas2,ninesequentialpatternsasshowninFig.2(b)canbederivedifthecompleteminingstrategyisapplied.Itisclearthat??a,b??,??a,c??,??b,c??aresub-patternsof??a,b,c??,while??a,d??and??c,d??aresub-patternsof??a,c,d??.Ifthe?vesub-patternsareremoved,thesequentialpatternswillbereducedtofourasshowninFig.2(c)whichisamuchmorecompactsolution.Infact,thefoursequentialpatternscanstillrepresenttheimportantfeaturesinFig.2(a).Basedontheaboveconcept,ifapattern?isasub-patternofpatternˇ,pattern?isde?nedasaredundantpatterninthisstudy.

Togeneratecompactsequentialpatterns,theClosedMiningalgorithmdevelopedbyYanetal.isusedinthe?rststageoftheproposedmethod[27].Asequentialpatternsisclosedifthesequentialpatternscontainsnosuper-patternwiththesamesupportinthedatabase.TheClosedMiningalgorithmproducessig-ni?cantlyfewerfrequentpatternsthanthetraditionalcompletesequenceminingmethodswhilepreservingthesameexpressivepowersincethewholesetoffrequentsubsequences,togetherwiththeirsupports,canbeeasilyderivedfromtheminingresult.IntheClosedMiningalgorithm,theitemset-extensionandsequence-extensionareusedtoextendeverysubsequence.Given

??,...,t????,s?pmeanstwosequences,s=??t1,...,tm??andp=???t1n

sconcatenateswithp.Theitemset-extensionaddsptothelast

A-PSO-AB-classifier-for-solving-sequence-classification-problems

14

C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27

内容需要下载文档才能查看 内容需要下载文档才能查看 内容需要下载文档才能查看

Fig.2.Frequentsequentialpatternswithminsup=40%.

itemsetofsobtainings?ip=??t1,...,tm∪t??,...,t????if?k∈t,j∈t??,sequence-extensionisappendedthe1nmk<j.Thesequenceptosobtain-1

ings?sp=??t1,...,tm,t??,...,t????.pseudocodeof1n

ThetheClosedMiningalgorithmisshowninFig.3.TheinputtothealgorithmisthetrainingsequencedatasetTrainSDanduser-speci?edthresholdminsup,whiletheoutputisthesetofcompactsequentialpatternsCSP.Thealgorithm?rstsortseveryitemsetandremovesinfrequentitems,asshowninline1.Allfrequent1-itemsequencesarethenstoredinS1andareinputtedtosubroutineCloSpanwhichgeneratesasupersetofclosedfre-quentsequence.Finally,thenon-closedsequencesareeliminatedfromCSP,asshowninline5.TheCloSpansubroutine?rstscansDonceto?ndeveryfrequentitem?thentoassembleelementofsortoappendtostoformasequentialpattern,asshowninline6.Lines7–8showtheterminationcondition:whenthenumberofsequencesinthes-projecteddatabaseislessthanmin,thereisnoneedtofurtherextends.Foreachsequencesanditspro-jecteddatabase,itrecursivelyperformsitemset-extension(line10)andsequence-extension(line12)untilallthefrequentsequencesarediscovered.

3.3.PSO-ABsequenceclassi?cationmethod

Aftercompletingcompactsequentialpatternmininginthe

?rststage,theextractedsequentialpatternsinCSPareconsideredtobeimportantfeaturesrepresentingthesequencesinTrainSD.However,somepatternsarehardtoclassifysincetheyareminorminorityandirregular.Theparticleswarmoptimization-AdaBoost(PSO-AB)sequenceclassi?cationmethodisproposedtosolvethishard-to-classifypatternproblem.Theboostingmechanismadap-tivelyincreasestheweightsofhard-to-classifypatternssothatthesepatternshavemoreopportunitiestobeselectedforbuild-ingPSO-basedsequenceclassi?ers,referredtoasPSOSeqClassi?er.IneachPSOSeqClassi?er,PSOisusedtooptimizetwosetsofweights(oneforpatternsandoneforclasses)tomaximizetheclassi?cationaccuracyofthePSOSeqClassi?er.

3.3.1.PSO-ABsequenceclassi?cationmethod

ThePSO-ABsequenceclassi?cationmethodusestheAdap-tiveBoosting(AdaBoost)mechanism[28].LetthesetofcompactsequentialpatternsbeCSP={??cspi,ci??|i=1,1...,m}wherecspidenotestheithcompactsequentialpattern;ciistheclasslabelassociatedwithcspiwhereci∈{1,2,...,n}.Inaddition,thesetofweightsforcorrespondingpatternsare{w1,w2,...,wm}.Initially,equalweights(i.e.1/m)areassignedtoallcompactsequentialpat-ternssothattheyhavethesameprobabilityofbeingchosenfortraining.Thenumberofboostingrounds,N,isdeterminedbytheuser.Ineachround,asampletrainingpatternset(STPk)isobtainedaccordingtotheweightsofthecompactsequentialpatterns.BasedonthesampletrainingpatternsinSTPk,aPSOSeqClassi?erk,isbuilt

(seeSection3.3.2fordetails).Theerrorrateofeachclassi?erPSOSeqClassi?er??

kiscalculatedas:

??

ε1??

m

k=

m

[wi×I(PSOSeqClassifierk(cspi)=/ci)]

(1)

i=1

whereI(?)isanindicatorfunctionthatreturnsthevalue1ifits

argumentistrueand0otherwise.Ifεkislargerthan0.5,thealgo-rithmresetstheweightsofallexamplesas1/mandgoesbacktothebeginningoftheboostingstage.Otherwise,theimportanceofaPSOSeqClassi?er??kisgivenby

?11?εk=k

??

2

ln

ε(2)

k

Notethat?khasalargepositivevalueiftheerrorrateiscloseto0andhasalargenegativevalueiftheerrorrateiscloseto1.?kis

alsousedtoupdatetheweightofeachsequence.Letw(k)

weightassignedtoexample??cspidenotethei,ci??duringthekthboostinground.Theweightsareupdatedby:

(k)

??

?kifPSOSeqClassifierw(k+1)k(cspi)=ci

i

=

wi

Z(3)

k

×

exp?exp?k

ifPSOSeqClassifierk(cspi)=/ci

where??

Zkisthenormalizationfactorusedtoensurethat(k+1)

=1.ThePSO-ABsequenceclassi?cationmethodissummarizediwi

inFig.4.TheinputtothealgorithmisthecompactsequentialpatternsinCSP,trainingsequencesinTrainSD,testingsequencesinTestSD,andthenumberofboostingrounds,N.Theoutputistheensembleclassi?er

PSO-AB(S)??N=argmax

?c

kI(PSOSeqClassifierk(S)=c)

(4)

k=1

whereSisanewsequencetobeclassi?ed.

3.3.2.PSOsequenceclassi?er

LetthesampletrainingpatternsSTPkbeseparatedintonsubsets{TP1,TP2,...,TPn}wheretheclasslabelsofpatternsinTPcallbelongtoclassc.IfanewsequenceSissimilartopatternsinTPc,Swillbemorelikelytobeassignedasclassc.However,notallpatternsinTPcareequallyimportant.Thus,thejthpatterncspc,jinTPcshouldbeassignedanimportantweightpwc,jtore?ectitsownimportance.Similarly,eachTPcshouldbeassignedanimportantweightcwctore?ectitsin?uenceonthe?nalclassi?cationresult.Basedonthemajorityvotingscheme,thePSOsequenceclassi?er,denotedasPSOSeqClassi?er,ismodeled??

as:

??

????

PSOSeqClassifier(S)=pwc,j×Sim(cspc,j,S)

argmax

cwc×

??

(5)

c=1,2,...,n

ck=1,2,...,|TPc|

whereSim(cspc,j,S)denotesthesimilaritybetweenthecompactsequentialpatterncspc,jandsequenceS.

A-PSO-AB-classifier-for-solving-sequence-classification-problems

C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27

15

Fig.3.PseudocodeoftheClosedMiningalgorithm.

Inthisresearch,thepartialmatchingconceptisusedtodevelopthesequencesimilaritymeasurement.Thatis,thesimi-laritybetweenasequenceandasequentialpatternmightbearealvaluebetween1and0where1indicatestheyareexactlythesameand0indicatestheyhavenothingincommon.Givensequences?=??a1,a2,...,ap??andˇ=??b1,b2,...,bq??,thesequencesimilaritybetween?andˇ,Sim(?,ˇ),canberepresentedas:

LCS(?,ˇ)

Sim(?,ˇ)=

Max(|?|,|ˇ|)

(6)

whereLCS(?,ˇ)isthelengthofthelongestcommonsubsequences,and|?|and|ˇ|arethelengthofsequence?andˇ,respectively.Thelengthofthelongestcommonsubsequencefor?andˇcanbeformulatedasthefollowingrecurrencerelation[29]:

L[1,1]=L[0,0]+1=1sincethe?rstitemsetin?(a)isthesameasthe?rstitemsetinˇ(a),asshowninFig.5(b).Next,fori=1andj=2,sinceitemsetain?isnotthesameasitemsetcdinˇ,L[1,2]=max(L[1,1],L[0,2])+0=max(1,0)+0=1.Fori=1andj=3,sinceitemsetain?isnotthesameasitemsetabinˇ,L[1,3]=max(L[1,2],L[0,3])+(|a∩ab|)/(Max(|a|,|ab|))=max(1,0)+(1/Max(1,2))=1.5,asshowninFig.5(c).TheprocesscontinuesuntilL[3,5]=3.98isobtained,asshowninFig.5(d).Therefore,thesimilarityvaluebetween?andˇbasedonsimilaritymeasurementinEq.(6)is3.98/5=0.796.

LetatestingsequencedatasetTestSDbe{??TestS1,testc1??,??TestS2,testc2??,...}whereTestSiisithtestingsequenceandtestciistheclasslabelassociatedwithTestSiwheretestci∈{1,2,...,n}.The

?0???

ifeitheri=0orj=0;

L[i,j]=

L[i?1,j?1]+1

|ai∩bj|

ifai=bj;

???max(L[i,j?1],L[i?1,j])+

(7)

Max(|aijifai=/bj;

where1≤i≤pand1≤j≤q.Withthisrecurrenceoperation,the

LCS(?,ˇ)canbefoundinL[p,q].Letustakesequences?=??a,abc,c??andˇ=??a,cd,ab,adc,c??asanexample.Fig.5showsthevisu-alizationfortheevaluationprocess.At?rst,thevaluesforalli=0andj=0aresetaszero,asshowninFig.5(a).Fori=1andj=1,

predictionaccuracyoftheclassi?eriscalculatedas:

??

accuracy=

i=1,...,|TestSD|

I(PSOSeqClassifier(TestSi)==testci)

(8)

版权声明:此文档由查字典文档网用户提供,如用于商业用途请与作者联系,查字典文档网保持最终解释权!

下载文档

热门试卷

2016年四川省内江市中考化学试卷
广西钦州市高新区2017届高三11月月考政治试卷
浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
广西钦州市钦州港区2017届高三11月月考政治试卷
广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
广西钦州市高新区2016-2017学年高二11月月考政治试卷
广西钦州市高新区2016-2017学年高一11月月考政治试卷
山东省滨州市三校2017届第一学期阶段测试初三英语试题
四川省成都七中2017届高三一诊模拟考试文科综合试卷
2017届普通高等学校招生全国统一考试模拟试题(附答案)
重庆市永川中学高2017级上期12月月考语文试题
江西宜春三中2017届高三第一学期第二次月考文科综合试题
内蒙古赤峰二中2017届高三上学期第三次月考英语试题
2017年六年级(上)数学期末考试卷
2017人教版小学英语三年级上期末笔试题
江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
四川省简阳市阳安中学2016年12月高二月考英语试卷
四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
安徽省滁州中学2016—2017学年度第一学期12月月考​高三英语试卷
山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷

网友关注

2015教资国考|政治哲学原理之唯物辩证法(二)
教师资格证国考|高中政治知识之文化生活(三)
2015教资国考|政治哲学原理之辩证唯物主义认识论
教师资格证国考|政治经济学重点知识解析(六)
2015教资国考初中政治易错知识点辨析(一)
教师资格考试|初中政治知识点归类复习精要(五)
2015教资统考:初中政治“法律知识”复习纲要(三)
2015教资国考|初中政治“法律知识”跟踪练习(2)
2015教资国考|政治哲学原理之辩证唯物论(二)
2015教资国考初中政治易错知识点辨析(三)
教师资格面试备考:政治学科试讲如何导入
2015教资国考|政治哲学原理之唯物辩证法(三)
2015教资统考:初中政治“法律知识”复习纲要(四)
教师资格证国考|高中政治知识之政治生活(五)
2015教资国考|政治学基础知识之政治关系(七)
2015教资国考政治学科“哲学常识”跟踪练习(1)
2015教资国考|初中政治“法律知识”跟踪练习(3)
2015教资统考:初中政治“法律知识”复习纲要(五)
2015教资国考政治学科“哲学常识”跟踪练习(2)
教师资格证国考|高中政治知识之文化生活(一)
2015教资统考:初中政治“法律知识”复习纲要(六)
2015教资国考|初中政治“法律知识”跟踪练习(4)
2015教资国考|政治学基础知识之政治关系(四)
2015教资国考|政治哲学原理之历史唯物主义(一)
2015教资国考|政治哲学原理之历史唯物主义(二)
教师资格证国考|政治经济学重点知识解析(三)
2015教资国考|政治学基础知识之政治关系(一)
2015教资国考|政治学基础知识之政治关系(六)
教师资格证国考|政治经济学重点知识解析(二)
2015教资国考初中政治易错知识点辨析(四)

网友关注视频

外研版英语七年级下册module3 unit2第二课时
外研版英语三起6年级下册(14版)Module3 Unit2
3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
每天日常投篮练习第一天森哥打卡上脚 Nike PG 2 如何调整运球跳投手感?
第8课 对称剪纸_第一课时(二等奖)(沪书画版二年级上册)_T3784187
七年级英语下册 上海牛津版 Unit3
冀教版小学英语四年级下册Lesson2授课视频
北师大版八年级物理下册 第六章 常见的光学仪器(二)探究凸透镜成像的规律
沪教版八年级下册数学练习册21.3(2)分式方程P15
冀教版英语五年级下册第二课课程解读
冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣
沪教版牛津小学英语(深圳用) 四年级下册 Unit 12
第五单元 民族艺术的瑰宝_15. 多姿多彩的民族服饰_第二课时(市一等奖)(岭南版六年级上册)_T129830
外研版八年级英语下学期 Module3
冀教版小学英语五年级下册lesson2教学视频(2)
沪教版八年级下次数学练习册21.4(2)无理方程P19
苏科版数学八年级下册9.2《中心对称和中心对称图形》
青岛版教材五年级下册第四单元(走进军营——方向与位置)用数对确定位置(一等奖)
北师大版数学四年级下册第三单元第四节街心广场
冀教版英语四年级下册第二课
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,辽宁省
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,广东省
第4章 幂函数、指数函数和对数函数(下)_六 指数方程和对数方程_4.7 简单的指数方程_第一课时(沪教版高一下册)_T1566237
沪教版牛津小学英语(深圳用) 六年级下册 Unit 7
《空中课堂》二年级下册 数学第一单元第1课时
沪教版牛津小学英语(深圳用) 五年级下册 Unit 12
沪教版牛津小学英语(深圳用) 四年级下册 Unit 4
外研版英语三起6年级下册(14版)Module3 Unit1
【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省