A-PSO-AB-classifier-for-solving-sequence-classification-problems_2015_Applied-Soft-Computing
上传者:方昉|上传时间:2015-05-08|密次下载
A-PSO-AB-classifier-for-solving-sequence-classification-problems_2015_Applied-Soft-Computing
A-PSO-AB-classifier-for-solving-sequence-classification-problems
AppliedSoftComputing27(2015)11–27
ContentslistsavailableatScienceDirect
AppliedSoftComputing
内容需要下载文档才能查看journalhomepage:http://wendang.chazidian.com/locate/aso
内容需要下载文档才能查看 内容需要下载文档才能查看c
APSO-ABclassi?erforsolvingsequenceclassi?cationproblems
Chieh-YuanTsaia,b,?,Chih-JungChena
ab
DepartmentofIndustrialEngineeringandManagement,Yuan-ZeUniversity,TaiwanInnovationCenterforBigDataandDigitalConvergence,Yuan-ZeUniversity,Taiwan
article
info
abstract
Articlehistory:
Received8September2013
Receivedinrevisedform27July2014Accepted21October2014
Availableonline30October2014
Keywords:
Sequenceclassi?cationClosedminingalgorithm
Particleswarmoptimization(PSO)algorithm
Adaptiveboosting(AdaBoost)
Recently,considerableattentionhasfocusedoncompoundsequenceclassi?cationmethodswhichinte-gratemultipledataminingtechniques.Amongthesemethods,sequentialpatternmining(SPM)basedsequenceclassi?ersareconsideredtobeef?cientforsolvingcomplexsequenceclassi?cationprob-lems.AlthoughpreviousstudieshavedemonstratedthestrengthofSPM-basedsequenceclassi?cationmethods,thechallengesofpatternredundancy,inappropriatesequencesimilaritymeasures,andhard-to-classifysequencesremainunsolved.Thispaperproposesanef?cienttwo-stageSPM-basedsequenceclassi?cationmethodtoaddressthesethreeproblems.Inthe?rststage,duringthesequentialpatternminingprocess,redundantsequentialpatternsareidenti?edifthepatternisasub-sequenceofothersequentialpatterns.Alistofcompactsequentialpatternsisgeneratedexcludingredundantpatternsandusedasrepresentativefeaturesforthesecondstage.Inthesecondstage,asequencesimilaritymeasurementisusedtoevaluatepartialsimilaritybetweensequencesandpatterns.Finally,aparticleswarmoptimization-AdaBoost(PSO-AB)sequenceclassi?erisdevelopedtoimprovesequenceclassi?-cationaccuracy.InthePSO-ABsequenceclassi?er,thePSOalgorithmisusedtooptimizetheweightsintheindividualsequenceclassi?er,whiletheAdaBooststrategyisusedtoadaptivelychangethedistribu-tionofpatternsthatarehardtoclassify.Theexperimentsshowthattheproposedtwo-stageSPM-basedsequenceclassi?cationmethodisef?cientandsuperiortootherapproaches.
©2014ElsevierB.V.Allrightsreserved.
1.Introduction
TherapiddevelopmentofcomputerandInternettechnologieshasallowedforthecollectionofhugeamountsofsequencedatainmany?elds.Inbioinformatics,DNA,RNAandproteinsarecom-posedofsequencesofmoleculesegments.Ininformationretrieval,documentsarecomposedofsequencesofwords.Ineconomics,http://wendang.chazidian.com-workintrusiondetectionfeaturessequencesofTCP/IPApackets.Asequencemayrepresentaspeci?cfunction,target,orclasslabel.Forexample,atimeseriesofECGdatamaycomefromahealthyorillperson.ADNAsequencemaybelongtoagenecodingareaoranon-codingarea.Asequenceclassi?cationproblem,therefore,seekstoassignthemostprobableclasslabeltoagivensequencebyagenerativesequenceclassi?er.Manyreal-worldapplicationssuchasproteinclassi?cation[1–5],textclassi?cation[6–8],speechrecognition[9–11]andimageidenti?cation[12,13]belongtothisdomain.
?Correspondingauthorat:DepartmentofIndustrialEngineeringandManage-ment,Yuan-ZeUniversity,Taiwan.
E-mailaddress:cytsai@saturn.yzu.edu.tw(C.-Y.Tsai).
Compoundsequenceclassi?cationmethodsseektomaximizeclassi?cationaccuracybyintegratingmultipledataminingtech-niques.Amongthesemethods,sequentialpatternmining(SPM)basedsequenceclassi?ersareconsideredtobeef?cientforsolv-ingcomplexsequenceclassi?cationproblems[14–17].Typically,anSPM-basedsequenceclassi?erconsistsoftwostages.The?rststageappliesthesequentialpatternminingapproachtoextractfrequentsequentialpatternsfromalargedatabase.Theextractedsequentialpatterns,consideredasrepresentativefeatures,arethenusedtobuildtheclassi?cationmodelinthesecondstage.
AlthoughpreviousstudieshaveshownthestrengthoftheSPM-basedsequenceclassi?cationmethods,thesemethodssufferfromthreeproblems.First,previousstudieshavesimplytakenallsequentialpatternsextractedfromthe?rststageastheinputfea-turesofthesequenceclassi?erinthesecondstage.Althoughthisapproachisstraightforwardandeasytoimplement,alargenum-berofextractedsequentialpatternsmayresultinimpracticallylongtrainingtime.Moreover,redundantandnon-discriminativepatternsinvolvedinthefeaturesetmightsigni?cantlydegradetheclassi?cationperformance.Forexample,inpreviousstudies,ifsequentialpatternsA–B–C,A–B–D,B–C–D,andA–B–C–Darederivedinthe?rststage,theywillallbeconsideredasinputfea-turesinthesecondstage.However,itisclearthatA–B–C,A–B–D,
http://wendang.chazidian.com/10.1016/j.asoc.2014.10.029
1568-4946/©2014ElsevierB.V.Allrightsreserved.
A-PSO-AB-classifier-for-solving-sequence-classification-problems
12
C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27
andB–C–DarethesubsequencesofA–B–C–D.Ifallfoursequencesareconsideredasfeatures,featureredundancymightover-?tthesequenceclassi?erandthusdegradeclassi?cationaccuracy[18].Therefore,screeningredundantsequentialpatternsandreducingthenumberofrepresentativefeaturesisanimportantconsider-ationinSPM-basedsequenceclassi?cationmethods.
Second,sequencesimilaritymeasurementplaysacriticalroleinjudgingthedifferencedistancebetweenasequenceandasequen-tialpattern.Mostpreviousstudiesincorporatedgapconstraintsonconsecutivesequenceelementstodeterminesequencedis-tinction.Ifapatterniscontainedinasequence,thesimilaritybetweenthepatternandthesequenceis1;otherwise,itis0.Forexample,ifthegapissetas1,thesimilaritybetweensequenceA–B–CandpatternA–Bis1,whilethesimilaritybetweensequenceA–C–BandpatternA–Bis0.Ifthegapissetas2,thesimilaritybetweensequenceA–C–D–BandpatternA–Bis0,whilethesim-ilaritybetweenA–B–C–D(orA–C–B–D)andA–Bis1.However,thesimilaritymeasurecannotrevealpartialmatchingbetweenasequenceandapattern,since“match”and“non-match”aretheonlyoptions.Inaddition,usingthisapproachitisdif?culttodeter-mineanappropriategapvalue.
Third,sequentialpatternsextractedinthe?rststagearecon-sideredastherepresentativefeaturesoftheclassi?cationmodelinthesecondstage.However,notallsequentialpatternsareequallyimportant.Previousworks[1,19]solvedthisproblembyassigningweightstosequentialpatternswheretheweightsareadaptivelyadjustedbyoptimizationtechniques.Althoughthisapproachsuc-ceededinraisingsequenceclassi?cationaccuracy,themodelstendtobeover-?tting[19]andwereinsuf?cientlysensitivetominor-itypatterns,thusoftenresultinginincorrectclasslabelpredictions.Therefore,aniterativeproceduretoadaptivelychangethedistribu-tionoftrainingdatabyfocusingmoreonpreviouslymisclassi?edminoritypatternsshouldhelpimprovesequenceclassi?cationaccuracy.
Tosolvetheabovedif?culties,anef?cientsequentialpatternmining-basedsequenceclassi?cationmethodisproposed.Inthe?rststage,duringthesequentialpatternminingprocess,redundantsequentialpatternsareidenti?edifthepatternisasub-sequenceofothersequentialpatterns.Alistofcompactsequentialpat-terns(excludingredundantpatterns)isgeneratedandusedasrepresentativefeaturesinthesecondstage.Thesecondstageusesasequencesimilaritymeasurementwhichcanevaluatepartialmatchingbetweensequencesandpatterns.Finally,aPSO-AdaBoostsequenceclassi?erisdevelopedtoimprovesequenceclassi?cationaccuracy.InthePSO-AdaBoostsequenceclassi?er,particleswarmoptimization(PSO)algorithmisusedtooptimizetheweightsintheindividualsequenceclassi?er,whiletheAdaBooststrategyisusedtoensemblepatternsthatarehardtoclassifyusingtheindividualclassi?ers.
2.Literaturereview
Thesequenceclassi?cationproblemistoassignthemostprob-ableclasslabeltoagivensequencebyagenerativeclassi?erandarisesinmanyreal-worldapplications,suchasproteinfunctionprediction,textclassi?cationandspeechrecognition.Inproteinfunctionpredictionresearch,alargeamountofsequencedataisclassi?edintovariouscategoriescorrespondingtoeithertheirroleinthechromosomes,theirstructure,and/ortheirfunction[20].DeSouzaRodriguesetal.presentedamethodologybasedonarti?-cialneuralnetworksforproteinfunctionalclassi?cation[21].Theresearchpresentsanewproteincodingscheme,calledExtended-SequenceCodingbySlidingWindows,toovercomingsomeofthedif?cultiesofthewell-knownmethod–SequenceCodingbySlid-ingWindow.ShiandZhangpresentedthe6-statehiddenMarkov
model(HMM)whichholdsfewerstates,cleartransitiongroupsandfewermodelparameters[22].Consideringthehierarchicalstructureofproteinsbasedonthe6-stateHMM,theyproposedusingthehierarchicalhiddenMarkovmodel(HHMM)whichnotonlymaintainsaclearbiologicalmeaning,buthasalsofewertran-sitions.Textclassi?cationisthetaskofautomaticallysortingasetofdocumentsintoprede?nedclasses.Wangetal.automat-icallyconstructedathesaurusofconceptsfromWikipedia[23].Theythenintroducedauni?edframeworktoexpandthe“BagofWords”(BOW)representationwithsemanticrelations(syn-onymy,hyponymy,andassociativerelations),anddemonstrateditsef?cacyinenhancingpreviousapproachesfortextclassi?ca-tion.Zuoetal.proposedanewtextclassi?cationmodelbasedonMarkovnetworkdistance[24].Theyrepresentedthedocu-mentsandcategoriesusingMarkovnetworkstomodelrelevantinformationinthedocuments.Inspeechrecognition,SmaragdisandRajextendednon-negativerepresentationsofspectrogramstodesignaMarkovselectionmodelthatcanrecognizesequencesevenwhentheyaremixedtogetherandwithoutsignalseparation[25].Lipeikaaddressedformantfeaturesindynamictimewarping-basedspeechrecognition[26].Thesefeaturescanbesimplyvisualizedandprovidenewinsightintothecausesofspeechrecognitionerrors.Thefeaturesoptimizetheformantfeature-basedisolatedwordrecognitionperformancebyvaryingtherecognitionsystem’sprocessingparameterswhileidentifyingpotentialimprovementstotherecognitionsystem.Yangetal.proposedanewimagedenois-ingschemeusingsupportvectormachine(SVM)classi?cationinshiftablecomplexdirectionalpyramid(PDTDFB)domain[12].ThedetailsubbandsofPDTDFBcoef?cientsaredenoisedbyusingthedifferentparameterstocontrolthemultiscaleandmultidirectionalanisotropicdiffusion.AwadandMotaiusedsupportvectormachine(SVM)techniquetopresentadynamicclassi?cationasanewincre-mentalframeworkformultiple-classifyingvideostreamdata[13].ThisdynamicapproachleadstoanextensionofSVMbeyonditscurrentstaticimage-basedlearningcapabilities.
Recently,somestudieshaveusedsequentialpatternminingtechniquestoenhancecomputationalef?ciencyforthesequenceclassi?cationproblem.Sequentialpatternminingcan?ndfrequentsequentialpatternswithinalargedatabase.Theextractedsequen-tialpatternsareconsideredtobeimportantfeaturesandareusedtobuildtheclassi?cationmodel.Leshetal.proposedanalgorithmforsequenceclassi?cationusingfrequentpatternsasfeaturesintheclassi?er[15].Intheiralgorithm,subsequencesareextractedandtransformedintosetsoffeatures.Followingfeatureextraction,generalclassi?cationalgorithmssuchasNaïveBayes,SVMorneu-ralnetworkscanbeusedforclassi?cation.Leshetal.proposedascalablefeatureminingalgorithmtoactasthepreprocessortoselectfeaturesforstandardclassi?cationalgorithmssuchasWin-nowandNaïveBayes[16].Byadaptingscalableanddisk-baseddataminingalgorithms,theywereabletoclassifythesequencesef?-ciently.Exarchosetal.proposedanovelclassi?cationmethodforbiologicaldata,usingcSPADE(SequentialPAtternDiscoveryusingEquivalenceclasses)toanalyzeproteinsequences[1].cSPADEwasusedtoextractthesesequentialpatternstocharacterizeeachclass(proteinfold).Inaddition,aclassi?erusestheextractedsequen-tialpatternstoclassifyproteinsintheappropriatefoldcategory.Exarchosetal.presentedanovelmethodologyforsequenceclas-si?cation,basedonsequentialpatternminingandoptimizationalgorithms[14].Thesequentialpatternminingalgorithmisappliedtoextractsequentialpatternstofromasetofsequences.Thescoreofeverypatternwithrespecttoeachsequenceisthencalculatedusingascoringfunctionandthescoreofeachclassundercon-siderationisestimatedbysummingthespeci?cpatternscores.Eachscoreisupdatedandmultipliedbyaweight.Theoptimiza-tiontechniqueisemployedtoestimatetheweightvaluesandachieveoptimalclassi?cationaccuracy.Tsaietal.tookcustomer
A-PSO-AB-classifier-for-solving-sequence-classification-problems
C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27
13
内容需要下载文档才能查看Fig.1.Theproposedsequentialpatternmining-basedsequenceclassi?cationmethod.
temporalbehaviordata,calledtime-intervalsequences,asclassi?-cationcriteriaanddevelopedatwo-stageclassi?cationframework[19].Inthe?rststage,time-intervalsequentialpatternsarediscov-eredfromcustomertemporaldatabases.Atime-intervalsequenceclassi?eroptimizedbytheparticleswamoptimization(PSO)algo-rithmwasthendevelopedtoachievehighclassi?cationaccuracyinthesecondstage.
??a1,a2,...,an??andˇbe??b1,b2,...,bm??.?isasubsequenceofˇifthereexistintegers1<i1<i2<···<in<msuchthata1?bi1,a2?bi2,...,an?bin.Alternatively,ˇiscalledasuper-sequenceof?,orˇcontains?.Thesupportofasequence?isde?nedasthefractionofallsequencesthatcontain?.Ifthesupportof?isgreaterthanorequaltoauser-speci?edthresholdminsup,?isdeclaredtobeasequentialpattern(orafrequentsequence).
3.Thesequentialpatternmining-basedsequenceclassi?cationmethod
http://wendang.chazidian.compactsequentialpatternmining
Theproposedsequentialpatternmining-basedsequenceclas-si?cationmethodisillustratedinFig.1.Initially,sequencesintheSequenceDatabase,SD,aredividedintoatrainingsequencedataset,TrainSD,andatestingsequencedataset,TestSD.Inthe?rststage,sequencesinTrainSDareinputtedintothecompactsequentialpatternminingmethodtogenerateasetofcompactsequentialpatterns,CSP.Inthesecondstage,theboostingmechanismintheparticleswarmoptimization-AdaBoost(PSO-AB)sequenceclassi-?cationmethodadaptivelychangestheweightofeachcompactsequentialpatterninCSP.Inthekthround,asampletrainingpat-ternset,STPk,isobtainedaccordingtotheweightofeachpatterninCSP.APSO-basedsequenceclassi?er,calledPSOSeqClassi?erk,isthusbuiltbasedonSTPkandTrainSD,andtestedusingTestSD.InthePSOSeqClassi?erk,apartialmatchingsimilaritymeasurementisdevelopedtocalculatethesimilarityvaluebetweenasequenceandapattern.Furthermore,particleswarmoptimization(PSO)isusedtoupdatetheweightsinthePSOSeqClassi?erktomaximizetheaccuracyoftheclassi?cationresult.
3.1.De?nition
Asetofsymbolsandnotationsisde?nedasfollows.AsequencedatabaseSD={??Si,ci??|i=1,...,q}whereSiisasequenceandciisaclasslabel(ci∈{1,2,...,n}).LetI={i1,i2,...,ip}beasetofitems.AsequenceSisanorderedlistofitemsets,denotedas??s1,s2,...,sm??wheresjisanitemsetandsj?I.Thelengthofasequencecorrespondstothenumberofitemsetsinthesequence,whileak-sequenceisasequencethatcontainskitemsets.Letsequence?be
Previoussequentialpatternminingalgorithmsemphasizetheorderofoccurrenceandgeneratecompletefrequentsequentialpatternssatisfyingaminsupthreshold.Infact,thecompletemin-ingstrategyoftengeneratesahugenumberofpatterns,especiallywhenminislow.Thisisbecauseifapatternisfrequent,eachofitssub-patternsisfrequentaswell.Forexample,thereare?vesequencesinthedatabaseasshowninFig.2(a).Ifminissetas2,ninesequentialpatternsasshowninFig.2(b)canbederivedifthecompleteminingstrategyisapplied.Itisclearthat??a,b??,??a,c??,??b,c??aresub-patternsof??a,b,c??,while??a,d??and??c,d??aresub-patternsof??a,c,d??.Ifthe?vesub-patternsareremoved,thesequentialpatternswillbereducedtofourasshowninFig.2(c)whichisamuchmorecompactsolution.Infact,thefoursequentialpatternscanstillrepresenttheimportantfeaturesinFig.2(a).Basedontheaboveconcept,ifapattern?isasub-patternofpatternˇ,pattern?isde?nedasaredundantpatterninthisstudy.
Togeneratecompactsequentialpatterns,theClosedMiningalgorithmdevelopedbyYanetal.isusedinthe?rststageoftheproposedmethod[27].Asequentialpatternsisclosedifthesequentialpatternscontainsnosuper-patternwiththesamesupportinthedatabase.TheClosedMiningalgorithmproducessig-ni?cantlyfewerfrequentpatternsthanthetraditionalcompletesequenceminingmethodswhilepreservingthesameexpressivepowersincethewholesetoffrequentsubsequences,togetherwiththeirsupports,canbeeasilyderivedfromtheminingresult.IntheClosedMiningalgorithm,theitemset-extensionandsequence-extensionareusedtoextendeverysubsequence.Given
??,...,t????,s?pmeanstwosequences,s=??t1,...,tm??andp=???t1n
sconcatenateswithp.Theitemset-extensionaddsptothelast
A-PSO-AB-classifier-for-solving-sequence-classification-problems
14
C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27
内容需要下载文档才能查看 内容需要下载文档才能查看 内容需要下载文档才能查看Fig.2.Frequentsequentialpatternswithminsup=40%.
itemsetofsobtainings?ip=??t1,...,tm∪t??,...,t????if?k∈t,j∈t??,sequence-extensionisappendedthe1nmk<j.Thesequenceptosobtain-1
ings?sp=??t1,...,tm,t??,...,t????.pseudocodeof1n
ThetheClosedMiningalgorithmisshowninFig.3.TheinputtothealgorithmisthetrainingsequencedatasetTrainSDanduser-speci?edthresholdminsup,whiletheoutputisthesetofcompactsequentialpatternsCSP.Thealgorithm?rstsortseveryitemsetandremovesinfrequentitems,asshowninline1.Allfrequent1-itemsequencesarethenstoredinS1andareinputtedtosubroutineCloSpanwhichgeneratesasupersetofclosedfre-quentsequence.Finally,thenon-closedsequencesareeliminatedfromCSP,asshowninline5.TheCloSpansubroutine?rstscansDonceto?ndeveryfrequentitem?thentoassembleelementofsortoappendtostoformasequentialpattern,asshowninline6.Lines7–8showtheterminationcondition:whenthenumberofsequencesinthes-projecteddatabaseislessthanmin,thereisnoneedtofurtherextends.Foreachsequencesanditspro-jecteddatabase,itrecursivelyperformsitemset-extension(line10)andsequence-extension(line12)untilallthefrequentsequencesarediscovered.
3.3.PSO-ABsequenceclassi?cationmethod
Aftercompletingcompactsequentialpatternmininginthe
?rststage,theextractedsequentialpatternsinCSPareconsideredtobeimportantfeaturesrepresentingthesequencesinTrainSD.However,somepatternsarehardtoclassifysincetheyareminorminorityandirregular.Theparticleswarmoptimization-AdaBoost(PSO-AB)sequenceclassi?cationmethodisproposedtosolvethishard-to-classifypatternproblem.Theboostingmechanismadap-tivelyincreasestheweightsofhard-to-classifypatternssothatthesepatternshavemoreopportunitiestobeselectedforbuild-ingPSO-basedsequenceclassi?ers,referredtoasPSOSeqClassi?er.IneachPSOSeqClassi?er,PSOisusedtooptimizetwosetsofweights(oneforpatternsandoneforclasses)tomaximizetheclassi?cationaccuracyofthePSOSeqClassi?er.
3.3.1.PSO-ABsequenceclassi?cationmethod
ThePSO-ABsequenceclassi?cationmethodusestheAdap-tiveBoosting(AdaBoost)mechanism[28].LetthesetofcompactsequentialpatternsbeCSP={??cspi,ci??|i=1,1...,m}wherecspidenotestheithcompactsequentialpattern;ciistheclasslabelassociatedwithcspiwhereci∈{1,2,...,n}.Inaddition,thesetofweightsforcorrespondingpatternsare{w1,w2,...,wm}.Initially,equalweights(i.e.1/m)areassignedtoallcompactsequentialpat-ternssothattheyhavethesameprobabilityofbeingchosenfortraining.Thenumberofboostingrounds,N,isdeterminedbytheuser.Ineachround,asampletrainingpatternset(STPk)isobtainedaccordingtotheweightsofthecompactsequentialpatterns.BasedonthesampletrainingpatternsinSTPk,aPSOSeqClassi?erk,isbuilt
(seeSection3.3.2fordetails).Theerrorrateofeachclassi?erPSOSeqClassi?er??
kiscalculatedas:
??
ε1??
m
k=
m
[wi×I(PSOSeqClassifierk(cspi)=/ci)]
(1)
i=1
whereI(?)isanindicatorfunctionthatreturnsthevalue1ifits
argumentistrueand0otherwise.Ifεkislargerthan0.5,thealgo-rithmresetstheweightsofallexamplesas1/mandgoesbacktothebeginningoftheboostingstage.Otherwise,theimportanceofaPSOSeqClassi?er??kisgivenby
?11?εk=k
??
2
ln
ε(2)
k
Notethat?khasalargepositivevalueiftheerrorrateiscloseto0andhasalargenegativevalueiftheerrorrateiscloseto1.?kis
alsousedtoupdatetheweightofeachsequence.Letw(k)
weightassignedtoexample??cspidenotethei,ci??duringthekthboostinground.Theweightsareupdatedby:
(k)
??
?kifPSOSeqClassifierw(k+1)k(cspi)=ci
i
=
wi
Z(3)
k
×
exp?exp?k
ifPSOSeqClassifierk(cspi)=/ci
where??
Zkisthenormalizationfactorusedtoensurethat(k+1)
=1.ThePSO-ABsequenceclassi?cationmethodissummarizediwi
inFig.4.TheinputtothealgorithmisthecompactsequentialpatternsinCSP,trainingsequencesinTrainSD,testingsequencesinTestSD,andthenumberofboostingrounds,N.Theoutputistheensembleclassi?er
PSO-AB(S)??N=argmax
?c
kI(PSOSeqClassifierk(S)=c)
(4)
k=1
whereSisanewsequencetobeclassi?ed.
3.3.2.PSOsequenceclassi?er
LetthesampletrainingpatternsSTPkbeseparatedintonsubsets{TP1,TP2,...,TPn}wheretheclasslabelsofpatternsinTPcallbelongtoclassc.IfanewsequenceSissimilartopatternsinTPc,Swillbemorelikelytobeassignedasclassc.However,notallpatternsinTPcareequallyimportant.Thus,thejthpatterncspc,jinTPcshouldbeassignedanimportantweightpwc,jtore?ectitsownimportance.Similarly,eachTPcshouldbeassignedanimportantweightcwctore?ectitsin?uenceonthe?nalclassi?cationresult.Basedonthemajorityvotingscheme,thePSOsequenceclassi?er,denotedasPSOSeqClassi?er,ismodeled??
as:
??
????
PSOSeqClassifier(S)=pwc,j×Sim(cspc,j,S)
argmax
cwc×
??
(5)
c=1,2,...,n
ck=1,2,...,|TPc|
whereSim(cspc,j,S)denotesthesimilaritybetweenthecompactsequentialpatterncspc,jandsequenceS.
A-PSO-AB-classifier-for-solving-sequence-classification-problems
C.-Y.Tsai,C.-J.Chen/AppliedSoftComputing27(2015)11–27
15
Fig.3.PseudocodeoftheClosedMiningalgorithm.
Inthisresearch,thepartialmatchingconceptisusedtodevelopthesequencesimilaritymeasurement.Thatis,thesimi-laritybetweenasequenceandasequentialpatternmightbearealvaluebetween1and0where1indicatestheyareexactlythesameand0indicatestheyhavenothingincommon.Givensequences?=??a1,a2,...,ap??andˇ=??b1,b2,...,bq??,thesequencesimilaritybetween?andˇ,Sim(?,ˇ),canberepresentedas:
LCS(?,ˇ)
Sim(?,ˇ)=
Max(|?|,|ˇ|)
(6)
whereLCS(?,ˇ)isthelengthofthelongestcommonsubsequences,and|?|and|ˇ|arethelengthofsequence?andˇ,respectively.Thelengthofthelongestcommonsubsequencefor?andˇcanbeformulatedasthefollowingrecurrencerelation[29]:
L[1,1]=L[0,0]+1=1sincethe?rstitemsetin?(a)isthesameasthe?rstitemsetinˇ(a),asshowninFig.5(b).Next,fori=1andj=2,sinceitemsetain?isnotthesameasitemsetcdinˇ,L[1,2]=max(L[1,1],L[0,2])+0=max(1,0)+0=1.Fori=1andj=3,sinceitemsetain?isnotthesameasitemsetabinˇ,L[1,3]=max(L[1,2],L[0,3])+(|a∩ab|)/(Max(|a|,|ab|))=max(1,0)+(1/Max(1,2))=1.5,asshowninFig.5(c).TheprocesscontinuesuntilL[3,5]=3.98isobtained,asshowninFig.5(d).Therefore,thesimilarityvaluebetween?andˇbasedonsimilaritymeasurementinEq.(6)is3.98/5=0.796.
LetatestingsequencedatasetTestSDbe{??TestS1,testc1??,??TestS2,testc2??,...}whereTestSiisithtestingsequenceandtestciistheclasslabelassociatedwithTestSiwheretestci∈{1,2,...,n}.The
?0???
ifeitheri=0orj=0;
L[i,j]=
L[i?1,j?1]+1
|ai∩bj|
ifai=bj;
???max(L[i,j?1],L[i?1,j])+
(7)
Max(|aijifai=/bj;
where1≤i≤pand1≤j≤q.Withthisrecurrenceoperation,the
LCS(?,ˇ)canbefoundinL[p,q].Letustakesequences?=??a,abc,c??andˇ=??a,cd,ab,adc,c??asanexample.Fig.5showsthevisu-alizationfortheevaluationprocess.At?rst,thevaluesforalli=0andj=0aresetaszero,asshowninFig.5(a).Fori=1andj=1,
predictionaccuracyoftheclassi?eriscalculatedas:
??
accuracy=
i=1,...,|TestSD|
I(PSOSeqClassifier(TestSi)==testci)
(8)
下载文档
热门试卷
- 2016年四川省内江市中考化学试卷
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
- 山东省滨州市三校2017届第一学期阶段测试初三英语试题
- 四川省成都七中2017届高三一诊模拟考试文科综合试卷
- 2017届普通高等学校招生全国统一考试模拟试题(附答案)
- 重庆市永川中学高2017级上期12月月考语文试题
- 江西宜春三中2017届高三第一学期第二次月考文科综合试题
- 内蒙古赤峰二中2017届高三上学期第三次月考英语试题
- 2017年六年级(上)数学期末考试卷
- 2017人教版小学英语三年级上期末笔试题
- 江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
- 重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
- 江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
- 江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
- 山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
- 【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
- 四川省简阳市阳安中学2016年12月高二月考英语试卷
- 四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
- 安徽省滁州中学2016—2017学年度第一学期12月月考高三英语试卷
- 山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
- 福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
- 甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷
网友关注
- 2015教资国考|政治哲学原理之唯物辩证法(二)
- 教师资格证国考|高中政治知识之文化生活(三)
- 2015教资国考|政治哲学原理之辩证唯物主义认识论
- 教师资格证国考|政治经济学重点知识解析(六)
- 2015教资国考初中政治易错知识点辨析(一)
- 教师资格考试|初中政治知识点归类复习精要(五)
- 2015教资统考:初中政治“法律知识”复习纲要(三)
- 2015教资国考|初中政治“法律知识”跟踪练习(2)
- 2015教资国考|政治哲学原理之辩证唯物论(二)
- 2015教资国考初中政治易错知识点辨析(三)
- 教师资格面试备考:政治学科试讲如何导入
- 2015教资国考|政治哲学原理之唯物辩证法(三)
- 2015教资统考:初中政治“法律知识”复习纲要(四)
- 教师资格证国考|高中政治知识之政治生活(五)
- 2015教资国考|政治学基础知识之政治关系(七)
- 2015教资国考政治学科“哲学常识”跟踪练习(1)
- 2015教资国考|初中政治“法律知识”跟踪练习(3)
- 2015教资统考:初中政治“法律知识”复习纲要(五)
- 2015教资国考政治学科“哲学常识”跟踪练习(2)
- 教师资格证国考|高中政治知识之文化生活(一)
- 2015教资统考:初中政治“法律知识”复习纲要(六)
- 2015教资国考|初中政治“法律知识”跟踪练习(4)
- 2015教资国考|政治学基础知识之政治关系(四)
- 2015教资国考|政治哲学原理之历史唯物主义(一)
- 2015教资国考|政治哲学原理之历史唯物主义(二)
- 教师资格证国考|政治经济学重点知识解析(三)
- 2015教资国考|政治学基础知识之政治关系(一)
- 2015教资国考|政治学基础知识之政治关系(六)
- 教师资格证国考|政治经济学重点知识解析(二)
- 2015教资国考初中政治易错知识点辨析(四)
网友关注视频
- 外研版英语七年级下册module3 unit2第二课时
- 外研版英语三起6年级下册(14版)Module3 Unit2
- 3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
- 每天日常投篮练习第一天森哥打卡上脚 Nike PG 2 如何调整运球跳投手感?
- 第8课 对称剪纸_第一课时(二等奖)(沪书画版二年级上册)_T3784187
- 七年级英语下册 上海牛津版 Unit3
- 冀教版小学英语四年级下册Lesson2授课视频
- 北师大版八年级物理下册 第六章 常见的光学仪器(二)探究凸透镜成像的规律
- 沪教版八年级下册数学练习册21.3(2)分式方程P15
- 冀教版英语五年级下册第二课课程解读
- 冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
- 冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 12
- 第五单元 民族艺术的瑰宝_15. 多姿多彩的民族服饰_第二课时(市一等奖)(岭南版六年级上册)_T129830
- 外研版八年级英语下学期 Module3
- 冀教版小学英语五年级下册lesson2教学视频(2)
- 沪教版八年级下次数学练习册21.4(2)无理方程P19
- 苏科版数学八年级下册9.2《中心对称和中心对称图形》
- 青岛版教材五年级下册第四单元(走进军营——方向与位置)用数对确定位置(一等奖)
- 北师大版数学四年级下册第三单元第四节街心广场
- 冀教版英语四年级下册第二课
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,辽宁省
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,广东省
- 第4章 幂函数、指数函数和对数函数(下)_六 指数方程和对数方程_4.7 简单的指数方程_第一课时(沪教版高一下册)_T1566237
- 沪教版牛津小学英语(深圳用) 六年级下册 Unit 7
- 《空中课堂》二年级下册 数学第一单元第1课时
- 沪教版牛津小学英语(深圳用) 五年级下册 Unit 12
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 4
- 外研版英语三起6年级下册(14版)Module3 Unit1
- 【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
精品推荐
- 2016-2017学年高一语文人教版必修一+模块学业水平检测试题(含答案)
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
分类导航
- 互联网
- 电脑基础知识
- 计算机软件及应用
- 计算机硬件及网络
- 计算机应用/办公自动化
- .NET
- 数据结构与算法
- Java
- SEO
- C/C++资料
- linux/Unix相关
- 手机开发
- UML理论/建模
- 并行计算/云计算
- 嵌入式开发
- windows相关
- 软件工程
- 管理信息系统
- 开发文档
- 图形图像
- 网络与通信
- 网络信息安全
- 电子支付
- Labview
- matlab
- 网络资源
- Python
- Delphi/Perl
- 评测
- Flash/Flex
- CSS/Script
- 计算机原理
- PHP资料
- 数据挖掘与模式识别
- Web服务
- 数据库
- Visual Basic
- 电子商务
- 服务器
- 搜索引擎优化
- 存储
- 架构
- 行业软件
- 人工智能
- 计算机辅助设计
- 多媒体
- 软件测试
- 计算机硬件与维护
- 网站策划/UE
- 网页设计/UI
- 网吧管理