教育资源为主的文档平台

当前位置: 查字典文档网> 所有文档分类> 医学/心理学> 预防医学、卫生学> Model selection and model averaging

Model selection and model averaging

上传者:丁国强
|
上传时间:2015-05-05
|
次下载

Model selection and model averaging

Cambridge University Press

978-0-521-85225-8 - Model Selection and Model Averaging

Gerda Claeskens and Nils Lid Hjort

ExcerptMore information

1

Modelselection:dataexamplesandintroduction

Thisbookisaboutmakingchoices.Ifthereareseveralpossibilitiesformod-

ellingdata,whichshouldwetake?Ifmultipleexplanatoryvariablesaremea-

sured,shouldtheyallbeusedwhenformingpredictions,makingclassi?cations,

orattemptingtosummariseanalysisofwhatin?uencesresponsevariables,or

willincludingonlyafewofthemworkequallywell,orbetter?Ifso,which

onescanwebestinclude?Modelselectionproblemsarriveinmanyformsand

onwidelyvaryingoccasions.Inthischapterwepresentsomedataexamples

http://wendang.chazidian.comterinthebookwecomeback

tothesedataandsuggestsomeanswers.Ashortpreviewofwhatistocomein

laterchaptersisalsoprovided.

1.1Introduction

Withthecurrenteaseofdatacollectionwhichinmany?eldsofappliedsciencehasbecomecheaperandcheaper,thereisagrowingneedformethodswhichpointtointer-esting,importantfeaturesofthedata,andwhichhelptobuildamodel.Themodelwewishtoconstructshouldberichenoughtoexplainrelationsinthedata,butontheotherhandsimpleenoughtounderstand,explaintoothers,anduse.Itiswhenwenegotiatethisbalancethatmodelselectionmethodscomeintoplay.Theyprovideformalsupporttoguidedatausersintheirsearchforgoodmodels,orfordeterminingwhichvariablestoincludewhenmakingpredictionsandclassi?cations.

Statisticalmodelselectionisanintegralpartofalmostanydataanalysis.Modelselectioncannotbeeasilyseparatedfromtherestoftheanalysis,andthequestion‘whichmodelisbest’isnotfullywell-poseduntilsupplementinginformationisgivenaboutwhatoneplanstodoorhopestoachievegiventhechoiceofamodel.Thesurveyofdataexamplesthatfollowsindicatesthebroadvarietyofapplicationsandrelevanttypesofquestionsthatarise.

Beforegoingontothissurveyweshallbrie?ydiscusssomeofthekeygeneralissuesinvolvedinmodelselectionandmodelaveraging.1© Cambridge University http://wendang.chazidian.com

内容需要下载文档才能查看

Cambridge University Press

978-0-521-85225-8 - Model Selection and Model Averaging

Gerda Claeskens and Nils Lid Hjort

ExcerptMore information

2Modelselection:dataexamplesandintroduction

(i)Modelsareapproximations:Whendealingwiththeissuesofbuildingorselectingamodel,itneedstoberealisedthatinmostsituationswewillnotbeabletoguessthe‘correct’or‘true’model.Thistruemodel,whichinthebackgroundgeneratedthedatawecollected,mightbeverycomplex(andalmostalwaysunknown).Forworkingwiththedataitmightbeofmorepracticalvaluetoworkinsteadwithasimpler,butalmost-as-goodmodel:‘Allmodelsarewrong,butsomeareuseful’,asamaximformulatedbyG.E.P.Boxexpressesthisview.Severalmodelselectionmethodsstartfromthisperspective.

(ii)Thebias–variancetrade-off:Thebalanceandinterplaybetweenvarianceandbiasisfundamentalinseveralbranchesofstatistics.Intheframeworkofmodel?ttingandselectionittakestheformofbalancingsimplicity(fewerparameterstoestimate,leadingtolowervariability,butassociatedwithmodellingbias)againstcomplexity(enteringmoreparametersinamodel,e.g.regressionparametersformorecovariates,meansahigherdegreeofvariabilitybutsmallermodellingbias).Statisticalmodelselectionmethodsmustseekaproperbalancebetweenover?tting(amodelwithtoomanyparameters,morethanactuallyneeded)andunder?tting(amodelwithtoofewparameters,notcapturingtherightsignal).

(iii)Parsimony:‘Theprincipleofparsimony’takesmanyformsandhasmanyfor-mulations,inareasrangingfromphilosophy,physics,arts,communication,andindeedstatistics.TheoriginalOckham’srazoris‘entitiesshouldnotbemultipliedbeyondne-cessity’.Forstatisticalmodellingareasonabletranslationisthatonlyparametersthatreallymatteroughttobeincludedinaselectedmodel.Onemight,forexample,bewillingtoextendalinearregressionmodeltoincludeanextraquadratictermifthismanifestlyimprovespredictionquality,butnototherwise.

(iv)Thecontext:Allmodellingisrootedinanappropriatescienti?ccontextandisforacertainpurpose.AsDarwinoncewrote,‘Howodditisthatanyoneshouldnotseethatallobservationmustbefororagainstsomeviewifitistobeofanyservice’.Onemustrealisethat‘thecontext’isnotalwaysapreciselyde?nedconcept,anddifferentresearchersmightdiscoverorlearndifferentthingsfromthesamedatasets.Also,differentschoolsofsciencemighthavedifferentpreferencesforwhattheaimsandpurposesarewhenmodellingandanalysingdata.Breiman(2001)discusses‘thetwocultures’ofstatistics,broadlysortingscienti?cquestionsintorespectivelythoseofpredictionandclassi?cationononehand(whereevena‘blackbox’modelis?neaslongasitworkswell)andthoseof‘deeperlearningaboutmodels’ontheotherhand(wherethediscoveryofanon-nullparameterisimportantevenwhenitmightnothelpimproveinferenceprecision).ThusS.Karlin’sstatementthat‘Thepurposeofmodelsisnotto?tthedata,buttosharpenthequestions’(inhisR.A.Fishermemoriallecture,1983)isimportantinsomecontextsbutlessrelevantinothers.Indeedtherearedifferentlyspiritedmodelselectionmethods,gearedtowardsansweringquestionsraisedbydifferentcultures.© Cambridge University http://wendang.chazidian.com

内容需要下载文档才能查看

Cambridge University Press

978-0-521-85225-8 - Model Selection and Model Averaging

Gerda Claeskens and Nils Lid Hjort

ExcerptMore information

1.2Egyptianskulldevelopment3

(v)Thefocus:Inappliedstatisticsworkitisoftenthecasethatsomequantitiesorfunctionsofparametersaremoreimportantthanothers.Itisthenfruitfultogearmodelbuildingandmodelselectioneffortstowardscriteriathatfavourgoodperformancepreciselyforthosequantitiesthataremoreimportant.Thatdifferentaimsmightleadtodifferentlyselectedmodels,forthesamedataandthesamelistofcandidatemodels,shouldnotbeconsideredaparadox,asitre?ectsdifferentpreferencesanddifferentlossfunctions.Inlaterchaptersweshallinparticularworkwithfocussedinformationcriteriathatstartfromestimatingthemeansquarederror(varianceplussquaredbias)ofcandidateestimators,foragivenfocusparameter.

(vi)Con?ictingrecommendations:Asisclearfromtheprecedingpoints,questionsabout‘whichmodelisbest’areinherentlymoredif?cultthanthoseofthetype‘foragivenmodel,howshouldwecarryoutinference’.Sometimesdifferentmodelselectionstrategiesendupofferingdifferentadvice,forthesamedataandthesamelistofcandidatemodels.Thisisnotacontradictionassuch,butstressestheimportanceoflearninghowthemostfrequentlyusedselectionschemesareconstructedandwhattheiraimsandpropertiesare.

(vii)Modelaveraging:Mostselectionstrategiesworkbyassigningacertainscoretoeachcandidatemodel.Insomecasestheremightbeaclearwinner,butsometimesthesescoresmightrevealthatthereareseveralcandidatesthatdoalmostaswellasthewinner.Insuchcasestheremaybeconsiderableadvantagesincombininginferenceoutputacrossthesebestmodels.

1.2Egyptianskulldevelopment

MeasurementsonskullsofmaleEgyptianshavebeencollectedfromdifferentarchaeo-logicaleras,withaviewtowardsestablishingbiometricaldifferences(ifany)andmoregenerallystudyingevolutionaryaspects.Changesovertimeareinterpretedanddiscussedinacontextofinterbreedingandin?uxofimmigrantpopulations.Thedataconsistoffourmeasurementsforeachof30skullsfromeachof?vetimeeras,originallypresentedbyThomsonandRandall-Maciver(1905).The?vetimeperiodsaretheearlypredy-nastic(around4000b.c.),latepredynastic(around3300b.c.),12thand13thdynasties(around1850b.c.),theptolemaicperiod(around200b.c.),andtheRomanperiod(around150a.d.).Foreachofthe150skulls,thefollowingmeasurementsaretaken(allinmil-limetres):x1=maximalbreadthoftheskull(MB),x2=basibregmaticheight(BH),x3=basialveolarlength(BL),andx4=nasalheight(NH);seeFigure1.1,adaptedfromManly(1986,page6).Figure1.2givespairwisescatterplotsofthedataforthe?rstandlasttimeperiod,respectively.Similarplotsareeasilymadefortheothertimeperiods.Wenotice,forexample,thatthelevelofthex1measurementappearstohaveincreasedwhilethatofthex3measurementmayhavedecreasedsomewhatovertime.Statisticalmodellingandanalysisarerequiredtoaccuratelyvalidatesuchclaims.© Cambridge University http://wendang.chazidian.com

内容需要下载文档才能查看

Cambridge University Press

978-0-521-85225-8 - Model Selection and Model Averaging

Gerda Claeskens and Nils Lid Hjort

ExcerptMore information

4Modelselection:dataexamplesand

内容需要下载文档才能查看

introduction

Fig.1.1.Thefourskullmeasurementsx1=MB,x2=BH,x3=BL,x4=NH;fromManly(1986,page6).

Thereisafour-dimensionalvectorofobservationsyt,iassociatedwithskulliandtimeperiodt,fori=1,...,30andt=1,...,5,wheret=1correspondsto4000b.c.,

¯t,?todenotethefour-dimensionalvectorandsoon,uptot=5for150a.d.Weusey

ofaveragesacrossthe30skullsfortimeperiodt.Thisyieldsthefollowingsummarymeasures:

¯1,?=(131.37,133.60,99.17,50.53),y

¯2,?=(132.37,132.70,99.07,50.23),y

¯3,?=(134.47,133.80,96.03,50.57),y

¯4,?=(135.50,132.30,94.53,51.97),y

¯5,?=(136.27,130.33,93.50,51.37).y

Standarddeviationsforthefourmeasurements,computedfromaveragingvarianceesti-matesoverthe?vetimeperiods(intheorderMB,BH,BL,NH),are4.59,4.85,4.92,

3.19.WeassumethatthevectorsYt,iareindependentandfour-dimensionalnormallydistributed,withmeanvectorξtandvariancematrix??tforerast=1,...,5.However,itisnotgiventoushowthesemeanvectorsandvariancematricescouldbestruc-tured,orhowtheymightevolveovertime.Hence,althoughwehavespeci?edthatdatastemfromfour-dimensionalnormaldistributions,themodelforthedataisnotyetfullyspeci?ed.

Wenowwishto?ndastatisticalmodelthatprovidestheclearestexplanationofthemainfeaturesofthesedata.Giventheinformationandevolutionarycontextalludedtoabove,searchingforgoodmodelswouldinvolvetheirabilitytoanswerthefollowingquestions.Dothemeanparameters(populationaveragesofthefourmeasurements)© Cambridge University http://wendang.chazidian.com

内容需要下载文档才能查看

Cambridge University Press

978-0-521-85225-8 - Model Selection and Model AveragingGerda Claeskens and Nils Lid HjortExcerpt

More information

1.2Egyptianskulldevelopment

120 125 130 135 140 145

115

60

5

10580 85 90 95

NH

120

130

140

BH

BL

120130140

45

120

5055

130140

MB

115

60

MB

60

MB

105

55

NH

80 85 90 95

50

NH

120 125 130 135 140 145

BL

45

120 125 130 135 140 145

45

80 85 90 95

5055

105115

BH

120 125 130 135 140 145

115

BH

60

BL

10580 85 90 95

NH

120

130

140

BH

BL

120130140

45

120

5055

130140

MB

115

60

MB

60

MB

105

55

NH

80 85 90 95

50

NH

120 125 130 135 140 145

BL

45

120 125 130 135 140 145

45

80 85 90 95

5055

105115

BHBHBL

Fig.1.2.PairwisescatterplotsfortheEgyptianskulldata.Firsttworows:earlypredy-nasticperiod(http://wendang.chazidian.comsttworows:Romanperiod(150a.d.).

© Cambridge University http://wendang.chazidian.com

内容需要下载文档才能查看

版权声明:此文档由查字典文档网用户提供,如用于商业用途请与作者联系,查字典文档网保持最终解释权!

下载文档

热门试卷

2016年四川省内江市中考化学试卷
广西钦州市高新区2017届高三11月月考政治试卷
浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
广西钦州市钦州港区2017届高三11月月考政治试卷
广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
广西钦州市高新区2016-2017学年高二11月月考政治试卷
广西钦州市高新区2016-2017学年高一11月月考政治试卷
山东省滨州市三校2017届第一学期阶段测试初三英语试题
四川省成都七中2017届高三一诊模拟考试文科综合试卷
2017届普通高等学校招生全国统一考试模拟试题(附答案)
重庆市永川中学高2017级上期12月月考语文试题
江西宜春三中2017届高三第一学期第二次月考文科综合试题
内蒙古赤峰二中2017届高三上学期第三次月考英语试题
2017年六年级(上)数学期末考试卷
2017人教版小学英语三年级上期末笔试题
江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
四川省简阳市阳安中学2016年12月高二月考英语试卷
四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
安徽省滁州中学2016—2017学年度第一学期12月月考​高三英语试卷
山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷

网友关注

特岗教师招考物理备考资料:质量与密度(二)
特岗教师招考语文备考重点:文言文常见固定句式(一)
特岗教师招考化学知识点精讲:燃料及其利用(三)
2015特岗教师招聘考试《政治学原理》高频考点(二十四)
特岗教师招聘体育《运动训练学》备考:运动技术与运动员技术能力
特岗教师招考语文备考重点:文言文常见固定句式(六)
2015特岗教师招考物理“物态变化”精选试题(三)
特岗教师招聘生物知识点:细胞的能量供应和利用(二)
2015特岗教师招考物理“质量与密度”专项练习题(2)
特岗教师招考语文备考重点:文言文常见固定句式(五)
2015特岗教师招聘历史备考要点之新民主主义革命的兴起(二)
2015特岗教师招考物理“物态变化”精选试题(一)
2015特岗教师招考物理“物态变化”精选试题(二)
2015特岗教师招聘考试《政治学原理》高频考点(二十六)
特岗教师招考语文备考重点:文言文常见固定句式(二)
特岗教师招聘生物知识点:细胞的能量供应和利用(五)
2015特岗教师招考物理“质量与密度”专项练习题(1)
2015特岗教师招考体育运动训练学章节练习题(7)
特岗教师招聘生物知识点:细胞的能量供应和利用(三)
2015特岗教师招考化学“燃料及其利用”测试题(2)
特岗教师招聘英语备考:句子的种类之陈述句
2015特岗教师招考化学“燃料及其利用”测试题(3)
2015特岗教师招聘考试《政治学原理》高频考点(二十五)
特岗教师招聘体育《运动训练学》备考:影响运动技术的因素
2015特岗教师招聘历史备考要点之新民主主义革命的兴起(一)
2015特岗教师招考音乐基本乐理强化练习题(二)
特岗招考历史备考:“新民主主义革命的兴起”考点梳理(二)
特岗教师招聘体育《运动训练学》备考:运动技术评价
2015特岗教师招聘数学平面向量经典试题一
特岗教师招考语文备考重点:文言文常见固定句式(三)

网友关注视频

沪教版牛津小学英语(深圳用) 四年级下册 Unit 2
【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,江苏省
第4章 幂函数、指数函数和对数函数(下)_六 指数方程和对数方程_4.7 简单的指数方程_第一课时(沪教版高一下册)_T1566237
【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
小学英语单词
青岛版教材五年级下册第四单元(走进军营——方向与位置)用数对确定位置(一等奖)
沪教版八年级下次数学练习册21.4(2)无理方程P19
《空中课堂》二年级下册 数学第一单元第1课时
8.练习八_第一课时(特等奖)(苏教版三年级上册)_T142692
每天日常投篮练习第一天森哥打卡上脚 Nike PG 2 如何调整运球跳投手感?
沪教版牛津小学英语(深圳用)五年级下册 Unit 1
沪教版八年级下册数学练习册21.4(1)无理方程P18
外研版英语七年级下册module3 unit2第二课时
冀教版英语五年级下册第二课课程解读
沪教版牛津小学英语(深圳用) 五年级下册 Unit 7
化学九年级下册全册同步 人教版 第25集 生活中常见的盐(二)
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,辽宁省
飞翔英语—冀教版(三起)英语三年级下册Lesson 2 Cats and Dogs
苏教版二年级下册数学《认识东、南、西、北》
冀教版小学数学二年级下册第二单元《有余数除法的简单应用》
冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T1406126
外研版英语三起6年级下册(14版)Module3 Unit2
沪教版牛津小学英语(深圳用) 四年级下册 Unit 3
【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,辽宁省
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,湖北省
冀教版小学数学二年级下册1
人教版历史八年级下册第一课《中华人民共和国成立》
人教版二年级下册数学