1309.0911v1
上传者:陈冀兵|上传时间:2015-04-29|密次下载
1309.0911v1
Beysian
ABAYESIANINFORMATIONCRITERIONFORSINGULAR
MODELS
MATHIASDRTONANDMARTYNPLUMMER
Abstract.WeconsiderapproximateBayesianmodelchoiceformodelselec-
tionproblemsthatinvolvemodelswhoseFisher-informationmatricesmayfail
tobeinvertiblealongothercompetingsubmodels.Suchsingularmodelsdonot
obeytheregularityconditionsunderlyingthederivationofSchwarz’sBayesian
informationcriterion(BIC)andthepenaltystructureinBICgenerallydoes
notre?ectthefrequentistlarge-samplebehavioroftheirmarginallikelihood.
Whilelarge-sampletheoryforthemarginallikelihoodofsingularmodelshas
beendevelopedrecently,theresultingapproximationsdependonthetruepa-
rametervalueandleadtoaparadoxofcircularreasoning.Guidedbyexamples
suchasdeterminingthenumberofcomponentsofmixturemodels,thenumber
offactorsinlatentfactormodelsortherankinreduced-rankregression,we
proposearesolutiontothisparadoxandgiveapracticalextensionofBICfor
singularmodelselectionproblems.
1.Introduction
Informationcriteriaareclassicaltoolsformodelselection.Atahigh-level,theyfallintotwocategories(Yang,2005).Ononehand,therearecriteriathattar-getgoodpredictivebehavioroftheselectedmodel;theinformationcriterionofAkaike(1974)andcross-validationbasedscoresareexamples.TheBayesianin-formationcriterion(BIC)ofSchwarz(1978),ontheotherhand,drawsmotivationfromBayesianapproaches.Fromthefrequentistperspective,ithasbeenshowninanumberofsettingsthattheBICisconsistent.Inotherwords,underoptimizationofBICtheprobabilityofselectinga?xedmostparsimonioustruemodeltendstooneasthesamplesizetendstoin?nity(e.g.,Nishii,1984,Haughton,1988,1989).FromaBayesianpointofview,theBICyieldsrathercrudebutcomputationallyinexpensiveapproximationstootherwisedi?culttocalculateposteriormodelprob-abilitiesinBayesianmodelselection/averaging;seeKassandWasserman(1995),Raftery(1995),DiCiccioetal.(1997)orHastieetal.(2009,Chap.7.7).
Inthispaper,weareconcernedwithBayesianinformationcriteriainthecontextofsingularmodelselectionproblems,thatis,problemsthatinvolvemodelswithFisher-informationmatricesthatmayfailtobeinvertible.Forexample,duetothebreak-downofparameteridenti?ability,theFisher-informationmatrixofamixturemodelwiththreecomponentdistributionsissingularatadistributionthatcanbeobtainedbymixingonlytwocomponents.Thisclearlypresentsafundamentalchallengeforselectionofthenumberofcomponents.Otherimportantexamplesofthistypeincludedeterminingtherankinreduced-rankregression,thenumberofKeywordsandphrases.Bayesianinformationcriterion,factoranalysis,mixturemodel,modelselection,reduced-rankregression,singularlearningtheory,Schwarzinformationcriterion.
1arXiv:1309.0911v1 [stat.ME] 4 Sep 2013
Beysian
2MATHIASDRTONANDMARTYNPLUMMER
factorsinfactoranalysisorthenumberofstatesinlatentclassorhiddenMarkovmodels.Moregenerally,alltheclassicalhidden/latentvariablemodelsaresingular.AsdemonstratedbySteeleandRaftery(2010)forGaussianmixturemodelsorLopesandWest(2004)forfactoranalysis,BICcanbeastate-of-the-artmethodforsingularmodelselection.However,whileBICisknowntobeconsistentintheseandothersingularsettings(Keribin,2000,Drtonetal.,2009,Chap.5.1),thetechnicalargumentsinitsBayesian-inspiredderivationdonotapply.Inanutshell,whentheFisher-informationissingular,thelog-likelihoodfunctiondoesnotadmitalarge-sampleapproximationbyaquadraticform.Consequently,theBICdoesnotre?ectthefrequentistlarge-samplebehavioroftheBayesianmarginallikelihoodofsingularmodels(Watanabe,2009).Incontrast,thispaperdevelopsageneralizationofBICthatisnotonlyconsistentbutalsomaintainsarigorousconnectiontoBayesianmodelchoiceinsingularsettings.ThegeneralizationishonestinthesensethatthenewcriterioncoincideswithSchwarz’swhenthemodelisregular.
Thenewcriterion,whichweabbreviatetosBIC,ispresentedinSection3.Itreliesontheoreticalknowledgeaboutthelarge-samplebehaviorofthemarginallikelihoodoftheconsideredmodels.Section2reviewsthenecessarybackgroundonthistheoryasdevelopedbyWatanabe(2009).ConsistencyofsBICisshowninSection4,andtheconnectiontoBayesianmethodsisdevelopedinSection5.InthenumericalexamplesinSection6,sBICachievesimprovedstatisticalinferenceswhilekeepingcomputationalcostlow.ConcludingremarksaregiveninSection7.
2.Background
LetYn=(Yn1,...,Ynn)denoteasampleofnindependentandidenticallydis-tributedobservations,andlet{Mi:i∈I}bea?nitesetofcandidatemodelsforthedistributionoftheseobservations.ForaBayesiantreatment,supposethatwehavepositivepriorprobabilitiesP(Mi)forthemodelsandthat,ineachmodelMi,apriordistributionP(πi|Mi)isspeci?edfortheprobabilitydistributionsπi∈Mi.WriteP(Yn|πi,Mi)forthelikelihoodofYnunderdata-generatingdistributionπifrommodelMi.Let??(2.1)L(Mi):=P(Yn|Mi)=P(Yn|πi,Mi)dP(πi|Mi).
Mi
bethemarginallikelihoodofmodelMi.Bayesianmodelchoiceisthenbasedontheposteriormodelprobabilities
P(Mi|Yn)∝P(Mi)L(Mi),i∈I.
TheprobabilitiesP(Mi|Yn)canbeapproximatedbyvariousMonteCarlopro-cedures,seeFrielandWyse(2012)forarecentreview,butpractitionersalsoof-tenturntocomputationallyinexpensiveproxiessuggestedbylarge-sampletheory.TheseproxiesarebasedontheasymptoticpropertiesofthesequenceofrandomvariablesL(Mi)obtainedwhenYnisdrawnfromadata-generatingdistributionπ0∈Mi,andweletthesamplesizengrow.
Inpractice,apriordistributionP(πi|Mi)istypicallyspeci?edbyparametrizingMiandplacingadistributionontheinvolvedparameters.Soassumethat(2.2)Mi={πi(ωi):ωi∈?i}
withdi-dimensionalparameterspace?i?Rdi,andthatP(πi|Mi)isthetrans-formationofadistributionP(ωi|Mi)on?iunderthemapωi→πi(ωi).The
Beysian
ABAYESIANINFORMATIONCRITERIONFORSINGULARMODELS3
marginallikelihoodthenbecomesthedi-dimensionalintegral
??(2.3)L(Mi)=P(Yn|πi(ωi),Mi)dP(ωi|Mi).
?i
TheobservationofSchwarzandothersubsequentworkisthat,undersuitabletechnicalconditionsonthemodelMi,theparametrizationωi→πi(ωi)andthepriordistributionP(ωi|Mi),itholdsforallπ0∈Mithat
(2.4)logL(Mi)=logP(Yn|π?i,Mi)?dilog(n)+Op(1).2
Here,P(Yn|π?i,Mi)isthemaximumofthelikelihoodfunction,andOp(1)standsforaremainderthatisboundedinprobability,i.e.,uniformlytightasthesamplesizengrows.The?rsttwotermsontheright-handsideof(2.4)arefunctionsofthedataYnandthemodelMialoneandmaythusbeusedasamodelscoreorproxyforthelogarithmofthemarginallikelihood.
De?nition2.1.TheBayesianorSchwarz’sinformationcriterionformodelMiis
diBIC(Mi)=logP(Yn|π?i,Mi)?log(n).2
Brie?yput,thelarge-samplebehaviorfrom(2.4)reliesonthefollowingpropertiesofregularproblems.First,withhighprobability,theintegrandin(2.3)isnegligi-blysmalloutsideasmallneighborhoodofthemaximumlikelihoodestimatorofωi.Second,insuchaneighborhood,thelog-likelihoodfunctionlogP(Yn|πi(ωi),Mi)canbeapproximatedbyanegativede?nitequadraticform,whileasmoothpriorP(ωi|Mi)isapproximatelyconstant.Theintegralin(2.3)maythusbeapproxi-matedbyaGaussianintegral,whosenormalizingconstantleadsto(2.4).Were-markthatthisapproachalsoallowsforestimationoftheremaindertermin(2.4),givingaLaplaceapproximationwitherrorOp(n?1/2);comparee.g.,TierneyandKadane(1986),Haughton(1988),KassandWasserman(1995),Wasserman(2000).
Alarge-samplequadraticapproximationtothelog-likelihoodfunctionisnotpossible,however,whentheFisher-informationmatrixissingular.Consequently,theclassicaltheoryalludedtoabovedoesnotapplytosingularmodels.Indeed,(2.4)isgenerallyfalseinsingularmodels.Nevertheless,asymptotictheoryforthemarginallikelihoodofsingularmodelshasbeendevelopedoverthelastdecade,culminatinginthemonographofWatanabe(2009).Theorem6.7inWatanabe(2009)showsthatawidevarietyofsingularmodelshavethepropertythat,forYndrawnfromπ0∈Mi,
(2.5)logL(Mi)=????logP(Yn|π0,Mi)?λi(π0)log(n)+mi(π0)?1loglog(n)+Op(1);seealsotheintroductiontothetopicinDrtonetal.(2009,Chap.5.1).Ifthese-quenceoflikelihoodratiosP(Yn|π?i,Mi)/P(Yn|π0,Mi)isboundedinprobability,thenwealsohavethat
(2.6)logL(Mi)=????logP(Yn|π?i,Mi)?λi(π0)log(n)+mi(π0)?1loglog(n)+Op(1).
Beysian
4MATHIASDRTONANDMARTYNPLUMMER
Forsingularsubmodelsofexponentialfamiliessuchasthereduced-rankregressionandfactoranalysismodelstreatedlater,thelikelihoodratiosconvergeindistri-butionandarethusboundedinprobability(Drton,2009).Formorecomplicatedmodels,suchasmixturemodels,likelihoodratioscanoftenbeshowntoconvergeindistributionundercompactnessassumptionsontheparameterspace;comparee.g.Aza¨?setal.(2006,2009).Suchcompactnessassumptionsalsoappearinthederivationof(2.5).Wewillnotconcernourselvesfurtherwiththedetailsoftheseissuesasthemainpurposeofthispaperistodescribeastatisticalmethodthatcanleveragemathematicalinformationintheformofequation(2.6).
Thequantityλi(π0)isknownasthelearningcoe?cient(oralsoreallog-canonicalthresholdorstochasticcomplexity)andmi(π0)isitsmultiplicity.IntheanalyticsettingsconsideredinWatanabe(2009),itholdsthatλi(π0)isarationalnumberin[0,di/2]andmi(π0)isanintegerin{1,...,di}.Weremarkthatinsingularmodelsitisverydi?culttoestimatetheOp(1)remaindertermin(2.6).Wearenotawareofanysuccessfulworkonhigher-orderapproximationsinstatisticallyrelevantsettings.
Example2.1.Reduced-rankregressionismultivariatelinearregressionsubjecttoarankconstraintonthematrixofregressioncoe?cients(ReinselandVelu,1998).Keepingonlywiththemostessentialstructure,supposeweobservenindependentcopiesofapartitionedzero-meanGaussianrandomvectorY=(Y1,Y2),withY1∈RNandY2∈RM,andwherethecovariancematrixofY2andtheconditionalcovariancematrixofY1givenY2areboththeidentitymatrix.Thereduced-rankregressionmodelMiassociatedtoanintegeri≥0postulatesthattheN×MmatrixπintheconditionalexpectationE[Y1|Y2]=πY2hasrankatmosti.
InaBayesiantreatment,considertheparametrizationπ=ω2ω1,withabsolutelycontinuouspriordistributionsforω2∈RN×iandω1∈Ri×M.Letthetruedata-generatingdistributionbegivenbythematrixπ0ofrankj≤i.AoyagiandWatanabe(2005)derivedthelearningcoe?cientsλi(π0)andtheirmultiplicitiesmi(π0)forthissetup.Inparticular,λi(π0)andmi(π0)dependonπ0onlythroughthetruerankj.Foraconcreteinstance,takeN=5andM=3.Thenthemultiplicitymi(π0)=1unlessi=3andj=0inwhichcasemi(π0)=2.Thevaluesofλi(π0)are:
j=0j=1j=2j=3
i=00
37i=19i=2369111315i=3Notethatthetableentriesforj=iareequaltodim(Mi)/2,wheredim(Mi)=i(N+M?i)isthedimensionofMi,whichcanbeidenti?edwiththesetofN×Mmatricesofrankatmosti.ThedimensionisalsothemaximalrankoftheJacobianofthemap(ω1,ω2)→ω2ω1.ThesingularitiesofMicorrespondtothepointswheretheJacobianfailstohavemaximalrank.Thesehaverank(ω2ω1)<i.Thefactthatthesingularitiescorrespondtoadropinrankpresentsachallengeformodelselection,whichhereamountstoselectionofanappropriaterank.
SimulationstudiesonrankselectionhaveshownthatthestandardBIC,withdi=dim(Mi)inDe?nition2.1,hasatendencytoselectoverlysmallranks;fora
Beysian
ABAYESIANINFORMATIONCRITERIONFORSINGULARMODELS5
recentexampleseeChengandPhillips(2012).Thequotedvaluesofλi(π0)giveatheoreticalexplanationastheuseofdimensioninBICleadstooverpenalizationofmodelsthatcontainthetruedata-generatingdistributionbutarenotminimalinthatregard.?
Determininglearningcoe?cientscanbeachallengingproblem,butprogresshasbeenmade.Forsomeoftheexamplesthathavebeentreated,wereferthereadertoAoyagi(2010a,b,2009),WatanabeandAmari(2003),WatanabeandWatanabe(2007),RusakovandGeiger(2005),YamazakiandWatanabe(2003,2005,2004),andZwiernik(2011).Theuseoftechniquesfromcomputationalalgebraandcombi-natoricsisemphasizedinLin(2011);seealsoArnol??detal.(1988),Vasil??ev(1979).
Thementionedtheoreticalprogress,however,doesnotreadilytranslateintopracticalstatisticalmethodologybecauseonefacestheobstaclethatthelearningcoe?cientsdependontheunknowndata-generatingdistributionπ0,asindicatedinournotationin(2.6).Forinstance,fortheproblemofselectingtherankinreduced-rankregression(Example2.1),theBayesianmeasureofmodelcomplexitythatisgivenbythelearningcoe?cientanditsmultiplicitydependsontherankwewishtodetermineinthe?rstplace.Itisforthisreasonthatthereiscurrentlynostatisticalmethodthattakesadvantageoftheoreticalknowledgeaboutlearningcoe?cients.Intheremainderofthispaper,weproposeasolutionforhowtoovercometheproblemofcircularreasoningandgiveapracticalextensionoftheBayesianinformationcriteriontosingularmodels.
3.NewBayesianinformationcriterionforsingularmodels
Ifthetruedata-generatingdistributionπ0wasknown,then(2.6)wouldsuggestreplacingthemarginallikelihoodL(Mi)by
(3.1)L???i,Mi)·n?λi(π0)(logn)mi(π0)?1.π0(Mi):=P(Yn|π
Thedata-generatingdistributionbeingunknown,however,weproposetofollowthestandardBayesianapproachandtoassignaprobabilitydistributionQitothedistributionsinmodelMi.Wetheneliminatetheunknowndistributionπ0bymarginalization.Inotherwords,wecomputeanapproximationtoL(Mi)as
??(3.2)L??L??Qi(Mi):=π0(Mi)dQi(π0).
Mi
ThecruxofthematternowbecomeschoosinganappropriatemeasureQi.BeforediscussingparticularchoicesforQi,westressthatanychoiceforQireducestoSchwarz’scriterionintheregularcase.
Proposition3.1.IfthemodelMiisregular,thenitholdsforallprobabilitymea-suresQionMithat
BIC(Mi)L??.Qi(Mi)=e
Proof.Inourcontext,aregularmodelwithdiparameterssatis?esλi(π0)=di/2andmi(π0)=1foralldata-generatingdistributionsπ0∈Mi.Hence,theintegrandin(3.2)isconstantandequalto
BIC(Mi)L??.π0(Mi)=e??
下载文档
热门试卷
- 2016年四川省内江市中考化学试卷
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
- 山东省滨州市三校2017届第一学期阶段测试初三英语试题
- 四川省成都七中2017届高三一诊模拟考试文科综合试卷
- 2017届普通高等学校招生全国统一考试模拟试题(附答案)
- 重庆市永川中学高2017级上期12月月考语文试题
- 江西宜春三中2017届高三第一学期第二次月考文科综合试题
- 内蒙古赤峰二中2017届高三上学期第三次月考英语试题
- 2017年六年级(上)数学期末考试卷
- 2017人教版小学英语三年级上期末笔试题
- 江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
- 重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
- 江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
- 江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
- 山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
- 【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
- 四川省简阳市阳安中学2016年12月高二月考英语试卷
- 四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
- 安徽省滁州中学2016—2017学年度第一学期12月月考高三英语试卷
- 山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
- 福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
- 甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷
网友关注
- 关注抗寄生虫药物在牛奶中的残留
- 影响母猪繁殖的因素有哪些
- 22.卫生防疫制度
- 正确培育后备母猪、提高繁殖性能
- 转基因水稻
- 功能室标语
- 金枪鱼速冻
- 中部地区新农村建设现状\存在问题和可持续发展对策
- 乳与乳制品深加工问题探究
- 猪场母猪繁殖障碍发生的原因及对策
- 霉菌毒素降解剂—霉立消
- 自发花生芽 营养翻百倍
- 嵩阳苑环境噪声的监测和评价
- 高沙土地区阻断血吸虫病传播后的灭螺措施及效果
- 基于三摄像系统的苹果缺陷快速判别
- 猪市全面剖析:未来10年养猪业平稳盈利
- 家禽饲养中疫苗正确的使用过程
- 凡纳滨对虾覆膜养殖池塘沉积物微生物群落的磷脂脂肪酸谱图分析(1)
- 1---PCBN刀具铣削灰铸铁HT200的试验研究_徐媛媛
- 鸡受凉引起呼吸道病有几个阶段
- 育雏有哪些重要性
- 兽医寄生虫学--人畜共患--三节 姜片吸虫病
- 松达认识的茶油种类
- 养花
- 一位有良知的厨师对大家的忠告
- 背膘测定仪在猪场测定说明
- 顶味目标导航
- 微生物霉菌毒素降解剂-霉立消
- 豆油替代鱼油对中华绒螯蟹幼蟹生长_非特异性免疫和抗病力的影响_陈彦良
- 水产养殖生产记录
网友关注视频
- 外研版英语七年级下册module3 unit1第二课时
- 冀教版小学数学二年级下册第二单元《余数和除数的关系》
- 第4章 幂函数、指数函数和对数函数(下)_六 指数方程和对数方程_4.7 简单的指数方程_第一课时(沪教版高一下册)_T1566237
- 冀教版小学英语五年级下册lesson2教学视频(2)
- 第19课 我喜欢的鸟_第一课时(二等奖)(人美杨永善版二年级下册)_T644386
- 人教版历史八年级下册第一课《中华人民共和国成立》
- 沪教版牛津小学英语(深圳用) 五年级下册 Unit 10
- 3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
- 冀教版小学数学二年级下册第二单元《有余数除法的简单应用》
- 苏教版二年级下册数学《认识东、南、西、北》
- 冀教版小学数学二年级下册1
- 沪教版八年级下册数学练习册21.3(3)分式方程P17
- 苏科版数学七年级下册7.2《探索平行线的性质》
- 冀教版英语四年级下册第二课
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,天津市
- 七年级英语下册 上海牛津版 Unit3
- 北师大版数学四年级下册3.4包装
- 第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T1406126
- 化学九年级下册全册同步 人教版 第18集 常见的酸和碱(二)
- 第12章 圆锥曲线_12.7 抛物线的标准方程_第一课时(特等奖)(沪教版高二下册)_T274713
- 《空中课堂》二年级下册 数学第一单元第1课时
- 六年级英语下册上海牛津版教材讲解 U1单词
- 七年级英语下册 上海牛津版 Unit9
- 二年级下册数学第二课
- 苏科版数学 八年级下册 第八章第二节 可能性的大小
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 3
- 19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T502436
- 沪教版八年级下册数学练习册21.3(2)分式方程P15
- 冀教版小学数学二年级下册第二单元《租船问题》
- 【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,辽宁省
精品推荐
- 2016-2017学年高一语文人教版必修一+模块学业水平检测试题(含答案)
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
分类导航
- 互联网
- 电脑基础知识
- 计算机软件及应用
- 计算机硬件及网络
- 计算机应用/办公自动化
- .NET
- 数据结构与算法
- Java
- SEO
- C/C++资料
- linux/Unix相关
- 手机开发
- UML理论/建模
- 并行计算/云计算
- 嵌入式开发
- windows相关
- 软件工程
- 管理信息系统
- 开发文档
- 图形图像
- 网络与通信
- 网络信息安全
- 电子支付
- Labview
- matlab
- 网络资源
- Python
- Delphi/Perl
- 评测
- Flash/Flex
- CSS/Script
- 计算机原理
- PHP资料
- 数据挖掘与模式识别
- Web服务
- 数据库
- Visual Basic
- 电子商务
- 服务器
- 搜索引擎优化
- 存储
- 架构
- 行业软件
- 人工智能
- 计算机辅助设计
- 多媒体
- 软件测试
- 计算机硬件与维护
- 网站策划/UE
- 网页设计/UI
- 网吧管理