教育资源为主的文档平台

当前位置: 查字典文档网> 所有文档分类> > 数学> 1309.0911v1

1309.0911v1

上传者:陈冀兵
|
上传时间:2015-04-29
|
次下载

1309.0911v1

Beysian

ABAYESIANINFORMATIONCRITERIONFORSINGULAR

MODELS

MATHIASDRTONANDMARTYNPLUMMER

Abstract.WeconsiderapproximateBayesianmodelchoiceformodelselec-

tionproblemsthatinvolvemodelswhoseFisher-informationmatricesmayfail

tobeinvertiblealongothercompetingsubmodels.Suchsingularmodelsdonot

obeytheregularityconditionsunderlyingthederivationofSchwarz’sBayesian

informationcriterion(BIC)andthepenaltystructureinBICgenerallydoes

notre?ectthefrequentistlarge-samplebehavioroftheirmarginallikelihood.

Whilelarge-sampletheoryforthemarginallikelihoodofsingularmodelshas

beendevelopedrecently,theresultingapproximationsdependonthetruepa-

rametervalueandleadtoaparadoxofcircularreasoning.Guidedbyexamples

suchasdeterminingthenumberofcomponentsofmixturemodels,thenumber

offactorsinlatentfactormodelsortherankinreduced-rankregression,we

proposearesolutiontothisparadoxandgiveapracticalextensionofBICfor

singularmodelselectionproblems.

1.Introduction

Informationcriteriaareclassicaltoolsformodelselection.Atahigh-level,theyfallintotwocategories(Yang,2005).Ononehand,therearecriteriathattar-getgoodpredictivebehavioroftheselectedmodel;theinformationcriterionofAkaike(1974)andcross-validationbasedscoresareexamples.TheBayesianin-formationcriterion(BIC)ofSchwarz(1978),ontheotherhand,drawsmotivationfromBayesianapproaches.Fromthefrequentistperspective,ithasbeenshowninanumberofsettingsthattheBICisconsistent.Inotherwords,underoptimizationofBICtheprobabilityofselectinga?xedmostparsimonioustruemodeltendstooneasthesamplesizetendstoin?nity(e.g.,Nishii,1984,Haughton,1988,1989).FromaBayesianpointofview,theBICyieldsrathercrudebutcomputationallyinexpensiveapproximationstootherwisedi?culttocalculateposteriormodelprob-abilitiesinBayesianmodelselection/averaging;seeKassandWasserman(1995),Raftery(1995),DiCiccioetal.(1997)orHastieetal.(2009,Chap.7.7).

Inthispaper,weareconcernedwithBayesianinformationcriteriainthecontextofsingularmodelselectionproblems,thatis,problemsthatinvolvemodelswithFisher-informationmatricesthatmayfailtobeinvertible.Forexample,duetothebreak-downofparameteridenti?ability,theFisher-informationmatrixofamixturemodelwiththreecomponentdistributionsissingularatadistributionthatcanbeobtainedbymixingonlytwocomponents.Thisclearlypresentsafundamentalchallengeforselectionofthenumberofcomponents.Otherimportantexamplesofthistypeincludedeterminingtherankinreduced-rankregression,thenumberofKeywordsandphrases.Bayesianinformationcriterion,factoranalysis,mixturemodel,modelselection,reduced-rankregression,singularlearningtheory,Schwarzinformationcriterion.

1arXiv:1309.0911v1 [stat.ME] 4 Sep 2013

Beysian

2MATHIASDRTONANDMARTYNPLUMMER

factorsinfactoranalysisorthenumberofstatesinlatentclassorhiddenMarkovmodels.Moregenerally,alltheclassicalhidden/latentvariablemodelsaresingular.AsdemonstratedbySteeleandRaftery(2010)forGaussianmixturemodelsorLopesandWest(2004)forfactoranalysis,BICcanbeastate-of-the-artmethodforsingularmodelselection.However,whileBICisknowntobeconsistentintheseandothersingularsettings(Keribin,2000,Drtonetal.,2009,Chap.5.1),thetechnicalargumentsinitsBayesian-inspiredderivationdonotapply.Inanutshell,whentheFisher-informationissingular,thelog-likelihoodfunctiondoesnotadmitalarge-sampleapproximationbyaquadraticform.Consequently,theBICdoesnotre?ectthefrequentistlarge-samplebehavioroftheBayesianmarginallikelihoodofsingularmodels(Watanabe,2009).Incontrast,thispaperdevelopsageneralizationofBICthatisnotonlyconsistentbutalsomaintainsarigorousconnectiontoBayesianmodelchoiceinsingularsettings.ThegeneralizationishonestinthesensethatthenewcriterioncoincideswithSchwarz’swhenthemodelisregular.

Thenewcriterion,whichweabbreviatetosBIC,ispresentedinSection3.Itreliesontheoreticalknowledgeaboutthelarge-samplebehaviorofthemarginallikelihoodoftheconsideredmodels.Section2reviewsthenecessarybackgroundonthistheoryasdevelopedbyWatanabe(2009).ConsistencyofsBICisshowninSection4,andtheconnectiontoBayesianmethodsisdevelopedinSection5.InthenumericalexamplesinSection6,sBICachievesimprovedstatisticalinferenceswhilekeepingcomputationalcostlow.ConcludingremarksaregiveninSection7.

2.Background

LetYn=(Yn1,...,Ynn)denoteasampleofnindependentandidenticallydis-tributedobservations,andlet{Mi:i∈I}bea?nitesetofcandidatemodelsforthedistributionoftheseobservations.ForaBayesiantreatment,supposethatwehavepositivepriorprobabilitiesP(Mi)forthemodelsandthat,ineachmodelMi,apriordistributionP(πi|Mi)isspeci?edfortheprobabilitydistributionsπi∈Mi.WriteP(Yn|πi,Mi)forthelikelihoodofYnunderdata-generatingdistributionπifrommodelMi.Let??(2.1)L(Mi):=P(Yn|Mi)=P(Yn|πi,Mi)dP(πi|Mi).

Mi

bethemarginallikelihoodofmodelMi.Bayesianmodelchoiceisthenbasedontheposteriormodelprobabilities

P(Mi|Yn)∝P(Mi)L(Mi),i∈I.

TheprobabilitiesP(Mi|Yn)canbeapproximatedbyvariousMonteCarlopro-cedures,seeFrielandWyse(2012)forarecentreview,butpractitionersalsoof-tenturntocomputationallyinexpensiveproxiessuggestedbylarge-sampletheory.TheseproxiesarebasedontheasymptoticpropertiesofthesequenceofrandomvariablesL(Mi)obtainedwhenYnisdrawnfromadata-generatingdistributionπ0∈Mi,andweletthesamplesizengrow.

Inpractice,apriordistributionP(πi|Mi)istypicallyspeci?edbyparametrizingMiandplacingadistributionontheinvolvedparameters.Soassumethat(2.2)Mi={πi(ωi):ωi∈?i}

withdi-dimensionalparameterspace?i?Rdi,andthatP(πi|Mi)isthetrans-formationofadistributionP(ωi|Mi)on?iunderthemapωi→πi(ωi).The

Beysian

ABAYESIANINFORMATIONCRITERIONFORSINGULARMODELS3

marginallikelihoodthenbecomesthedi-dimensionalintegral

??(2.3)L(Mi)=P(Yn|πi(ωi),Mi)dP(ωi|Mi).

?i

TheobservationofSchwarzandothersubsequentworkisthat,undersuitabletechnicalconditionsonthemodelMi,theparametrizationωi→πi(ωi)andthepriordistributionP(ωi|Mi),itholdsforallπ0∈Mithat

(2.4)logL(Mi)=logP(Yn|π?i,Mi)?dilog(n)+Op(1).2

Here,P(Yn|π?i,Mi)isthemaximumofthelikelihoodfunction,andOp(1)standsforaremainderthatisboundedinprobability,i.e.,uniformlytightasthesamplesizengrows.The?rsttwotermsontheright-handsideof(2.4)arefunctionsofthedataYnandthemodelMialoneandmaythusbeusedasamodelscoreorproxyforthelogarithmofthemarginallikelihood.

De?nition2.1.TheBayesianorSchwarz’sinformationcriterionformodelMiis

diBIC(Mi)=logP(Yn|π?i,Mi)?log(n).2

Brie?yput,thelarge-samplebehaviorfrom(2.4)reliesonthefollowingpropertiesofregularproblems.First,withhighprobability,theintegrandin(2.3)isnegligi-blysmalloutsideasmallneighborhoodofthemaximumlikelihoodestimatorofωi.Second,insuchaneighborhood,thelog-likelihoodfunctionlogP(Yn|πi(ωi),Mi)canbeapproximatedbyanegativede?nitequadraticform,whileasmoothpriorP(ωi|Mi)isapproximatelyconstant.Theintegralin(2.3)maythusbeapproxi-matedbyaGaussianintegral,whosenormalizingconstantleadsto(2.4).Were-markthatthisapproachalsoallowsforestimationoftheremaindertermin(2.4),givingaLaplaceapproximationwitherrorOp(n?1/2);comparee.g.,TierneyandKadane(1986),Haughton(1988),KassandWasserman(1995),Wasserman(2000).

Alarge-samplequadraticapproximationtothelog-likelihoodfunctionisnotpossible,however,whentheFisher-informationmatrixissingular.Consequently,theclassicaltheoryalludedtoabovedoesnotapplytosingularmodels.Indeed,(2.4)isgenerallyfalseinsingularmodels.Nevertheless,asymptotictheoryforthemarginallikelihoodofsingularmodelshasbeendevelopedoverthelastdecade,culminatinginthemonographofWatanabe(2009).Theorem6.7inWatanabe(2009)showsthatawidevarietyofsingularmodelshavethepropertythat,forYndrawnfromπ0∈Mi,

(2.5)logL(Mi)=????logP(Yn|π0,Mi)?λi(π0)log(n)+mi(π0)?1loglog(n)+Op(1);seealsotheintroductiontothetopicinDrtonetal.(2009,Chap.5.1).Ifthese-quenceoflikelihoodratiosP(Yn|π?i,Mi)/P(Yn|π0,Mi)isboundedinprobability,thenwealsohavethat

(2.6)logL(Mi)=????logP(Yn|π?i,Mi)?λi(π0)log(n)+mi(π0)?1loglog(n)+Op(1).

Beysian

4MATHIASDRTONANDMARTYNPLUMMER

Forsingularsubmodelsofexponentialfamiliessuchasthereduced-rankregressionandfactoranalysismodelstreatedlater,thelikelihoodratiosconvergeindistri-butionandarethusboundedinprobability(Drton,2009).Formorecomplicatedmodels,suchasmixturemodels,likelihoodratioscanoftenbeshowntoconvergeindistributionundercompactnessassumptionsontheparameterspace;comparee.g.Aza¨?setal.(2006,2009).Suchcompactnessassumptionsalsoappearinthederivationof(2.5).Wewillnotconcernourselvesfurtherwiththedetailsoftheseissuesasthemainpurposeofthispaperistodescribeastatisticalmethodthatcanleveragemathematicalinformationintheformofequation(2.6).

Thequantityλi(π0)isknownasthelearningcoe?cient(oralsoreallog-canonicalthresholdorstochasticcomplexity)andmi(π0)isitsmultiplicity.IntheanalyticsettingsconsideredinWatanabe(2009),itholdsthatλi(π0)isarationalnumberin[0,di/2]andmi(π0)isanintegerin{1,...,di}.Weremarkthatinsingularmodelsitisverydi?culttoestimatetheOp(1)remaindertermin(2.6).Wearenotawareofanysuccessfulworkonhigher-orderapproximationsinstatisticallyrelevantsettings.

Example2.1.Reduced-rankregressionismultivariatelinearregressionsubjecttoarankconstraintonthematrixofregressioncoe?cients(ReinselandVelu,1998).Keepingonlywiththemostessentialstructure,supposeweobservenindependentcopiesofapartitionedzero-meanGaussianrandomvectorY=(Y1,Y2),withY1∈RNandY2∈RM,andwherethecovariancematrixofY2andtheconditionalcovariancematrixofY1givenY2areboththeidentitymatrix.Thereduced-rankregressionmodelMiassociatedtoanintegeri≥0postulatesthattheN×MmatrixπintheconditionalexpectationE[Y1|Y2]=πY2hasrankatmosti.

InaBayesiantreatment,considertheparametrizationπ=ω2ω1,withabsolutelycontinuouspriordistributionsforω2∈RN×iandω1∈Ri×M.Letthetruedata-generatingdistributionbegivenbythematrixπ0ofrankj≤i.AoyagiandWatanabe(2005)derivedthelearningcoe?cientsλi(π0)andtheirmultiplicitiesmi(π0)forthissetup.Inparticular,λi(π0)andmi(π0)dependonπ0onlythroughthetruerankj.Foraconcreteinstance,takeN=5andM=3.Thenthemultiplicitymi(π0)=1unlessi=3andj=0inwhichcasemi(π0)=2.Thevaluesofλi(π0)are:

j=0j=1j=2j=3

i=00

37i=19i=2369111315i=3Notethatthetableentriesforj=iareequaltodim(Mi)/2,wheredim(Mi)=i(N+M?i)isthedimensionofMi,whichcanbeidenti?edwiththesetofN×Mmatricesofrankatmosti.ThedimensionisalsothemaximalrankoftheJacobianofthemap(ω1,ω2)→ω2ω1.ThesingularitiesofMicorrespondtothepointswheretheJacobianfailstohavemaximalrank.Thesehaverank(ω2ω1)<i.Thefactthatthesingularitiescorrespondtoadropinrankpresentsachallengeformodelselection,whichhereamountstoselectionofanappropriaterank.

SimulationstudiesonrankselectionhaveshownthatthestandardBIC,withdi=dim(Mi)inDe?nition2.1,hasatendencytoselectoverlysmallranks;fora

Beysian

ABAYESIANINFORMATIONCRITERIONFORSINGULARMODELS5

recentexampleseeChengandPhillips(2012).Thequotedvaluesofλi(π0)giveatheoreticalexplanationastheuseofdimensioninBICleadstooverpenalizationofmodelsthatcontainthetruedata-generatingdistributionbutarenotminimalinthatregard.?

Determininglearningcoe?cientscanbeachallengingproblem,butprogresshasbeenmade.Forsomeoftheexamplesthathavebeentreated,wereferthereadertoAoyagi(2010a,b,2009),WatanabeandAmari(2003),WatanabeandWatanabe(2007),RusakovandGeiger(2005),YamazakiandWatanabe(2003,2005,2004),andZwiernik(2011).Theuseoftechniquesfromcomputationalalgebraandcombi-natoricsisemphasizedinLin(2011);seealsoArnol??detal.(1988),Vasil??ev(1979).

Thementionedtheoreticalprogress,however,doesnotreadilytranslateintopracticalstatisticalmethodologybecauseonefacestheobstaclethatthelearningcoe?cientsdependontheunknowndata-generatingdistributionπ0,asindicatedinournotationin(2.6).Forinstance,fortheproblemofselectingtherankinreduced-rankregression(Example2.1),theBayesianmeasureofmodelcomplexitythatisgivenbythelearningcoe?cientanditsmultiplicitydependsontherankwewishtodetermineinthe?rstplace.Itisforthisreasonthatthereiscurrentlynostatisticalmethodthattakesadvantageoftheoreticalknowledgeaboutlearningcoe?cients.Intheremainderofthispaper,weproposeasolutionforhowtoovercometheproblemofcircularreasoningandgiveapracticalextensionoftheBayesianinformationcriteriontosingularmodels.

3.NewBayesianinformationcriterionforsingularmodels

Ifthetruedata-generatingdistributionπ0wasknown,then(2.6)wouldsuggestreplacingthemarginallikelihoodL(Mi)by

(3.1)L???i,Mi)·n?λi(π0)(logn)mi(π0)?1.π0(Mi):=P(Yn|π

Thedata-generatingdistributionbeingunknown,however,weproposetofollowthestandardBayesianapproachandtoassignaprobabilitydistributionQitothedistributionsinmodelMi.Wetheneliminatetheunknowndistributionπ0bymarginalization.Inotherwords,wecomputeanapproximationtoL(Mi)as

??(3.2)L??L??Qi(Mi):=π0(Mi)dQi(π0).

Mi

ThecruxofthematternowbecomeschoosinganappropriatemeasureQi.BeforediscussingparticularchoicesforQi,westressthatanychoiceforQireducestoSchwarz’scriterionintheregularcase.

Proposition3.1.IfthemodelMiisregular,thenitholdsforallprobabilitymea-suresQionMithat

BIC(Mi)L??.Qi(Mi)=e

Proof.Inourcontext,aregularmodelwithdiparameterssatis?esλi(π0)=di/2andmi(π0)=1foralldata-generatingdistributionsπ0∈Mi.Hence,theintegrandin(3.2)isconstantandequalto

BIC(Mi)L??.π0(Mi)=e??

版权声明:此文档由查字典文档网用户提供,如用于商业用途请与作者联系,查字典文档网保持最终解释权!

下载文档

热门试卷

2016年四川省内江市中考化学试卷
广西钦州市高新区2017届高三11月月考政治试卷
浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
广西钦州市钦州港区2017届高三11月月考政治试卷
广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
广西钦州市高新区2016-2017学年高二11月月考政治试卷
广西钦州市高新区2016-2017学年高一11月月考政治试卷
山东省滨州市三校2017届第一学期阶段测试初三英语试题
四川省成都七中2017届高三一诊模拟考试文科综合试卷
2017届普通高等学校招生全国统一考试模拟试题(附答案)
重庆市永川中学高2017级上期12月月考语文试题
江西宜春三中2017届高三第一学期第二次月考文科综合试题
内蒙古赤峰二中2017届高三上学期第三次月考英语试题
2017年六年级(上)数学期末考试卷
2017人教版小学英语三年级上期末笔试题
江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
四川省简阳市阳安中学2016年12月高二月考英语试卷
四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
安徽省滁州中学2016—2017学年度第一学期12月月考​高三英语试卷
山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷

网友关注视频

外研版英语七年级下册module3 unit1第二课时
冀教版小学数学二年级下册第二单元《余数和除数的关系》
第4章 幂函数、指数函数和对数函数(下)_六 指数方程和对数方程_4.7 简单的指数方程_第一课时(沪教版高一下册)_T1566237
冀教版小学英语五年级下册lesson2教学视频(2)
第19课 我喜欢的鸟_第一课时(二等奖)(人美杨永善版二年级下册)_T644386
人教版历史八年级下册第一课《中华人民共和国成立》
沪教版牛津小学英语(深圳用) 五年级下册 Unit 10
3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
冀教版小学数学二年级下册第二单元《有余数除法的简单应用》
苏教版二年级下册数学《认识东、南、西、北》
冀教版小学数学二年级下册1
沪教版八年级下册数学练习册21.3(3)分式方程P17
苏科版数学七年级下册7.2《探索平行线的性质》
冀教版英语四年级下册第二课
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,天津市
七年级英语下册 上海牛津版 Unit3
北师大版数学四年级下册3.4包装
第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T1406126
化学九年级下册全册同步 人教版 第18集 常见的酸和碱(二)
第12章 圆锥曲线_12.7 抛物线的标准方程_第一课时(特等奖)(沪教版高二下册)_T274713
《空中课堂》二年级下册 数学第一单元第1课时
六年级英语下册上海牛津版教材讲解 U1单词
七年级英语下册 上海牛津版 Unit9
二年级下册数学第二课
苏科版数学 八年级下册 第八章第二节 可能性的大小
沪教版牛津小学英语(深圳用) 四年级下册 Unit 3
19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T502436
沪教版八年级下册数学练习册21.3(2)分式方程P15
冀教版小学数学二年级下册第二单元《租船问题》
【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,辽宁省