Model selection and model averaging
上传者:丁国强|上传时间:2015-05-05|密次下载
Model selection and model averaging
Cambridge University Press
978-0-521-85225-8 - Model Selection and Model Averaging
Gerda Claeskens and Nils Lid Hjort
ExcerptMore information
1
Modelselection:dataexamplesandintroduction
Thisbookisaboutmakingchoices.Ifthereareseveralpossibilitiesformod-
ellingdata,whichshouldwetake?Ifmultipleexplanatoryvariablesaremea-
sured,shouldtheyallbeusedwhenformingpredictions,makingclassi?cations,
orattemptingtosummariseanalysisofwhatin?uencesresponsevariables,or
willincludingonlyafewofthemworkequallywell,orbetter?Ifso,which
onescanwebestinclude?Modelselectionproblemsarriveinmanyformsand
onwidelyvaryingoccasions.Inthischapterwepresentsomedataexamples
http://wendang.chazidian.comterinthebookwecomeback
tothesedataandsuggestsomeanswers.Ashortpreviewofwhatistocomein
laterchaptersisalsoprovided.
1.1Introduction
Withthecurrenteaseofdatacollectionwhichinmany?eldsofappliedsciencehasbecomecheaperandcheaper,thereisagrowingneedformethodswhichpointtointer-esting,importantfeaturesofthedata,andwhichhelptobuildamodel.Themodelwewishtoconstructshouldberichenoughtoexplainrelationsinthedata,butontheotherhandsimpleenoughtounderstand,explaintoothers,anduse.Itiswhenwenegotiatethisbalancethatmodelselectionmethodscomeintoplay.Theyprovideformalsupporttoguidedatausersintheirsearchforgoodmodels,orfordeterminingwhichvariablestoincludewhenmakingpredictionsandclassi?cations.
Statisticalmodelselectionisanintegralpartofalmostanydataanalysis.Modelselectioncannotbeeasilyseparatedfromtherestoftheanalysis,andthequestion‘whichmodelisbest’isnotfullywell-poseduntilsupplementinginformationisgivenaboutwhatoneplanstodoorhopestoachievegiventhechoiceofamodel.Thesurveyofdataexamplesthatfollowsindicatesthebroadvarietyofapplicationsandrelevanttypesofquestionsthatarise.
Beforegoingontothissurveyweshallbrie?ydiscusssomeofthekeygeneralissuesinvolvedinmodelselectionandmodelaveraging.1© Cambridge University http://wendang.chazidian.com
内容需要下载文档才能查看
Cambridge University Press
978-0-521-85225-8 - Model Selection and Model Averaging
Gerda Claeskens and Nils Lid Hjort
ExcerptMore information
2Modelselection:dataexamplesandintroduction
(i)Modelsareapproximations:Whendealingwiththeissuesofbuildingorselectingamodel,itneedstoberealisedthatinmostsituationswewillnotbeabletoguessthe‘correct’or‘true’model.Thistruemodel,whichinthebackgroundgeneratedthedatawecollected,mightbeverycomplex(andalmostalwaysunknown).Forworkingwiththedataitmightbeofmorepracticalvaluetoworkinsteadwithasimpler,butalmost-as-goodmodel:‘Allmodelsarewrong,butsomeareuseful’,asamaximformulatedbyG.E.P.Boxexpressesthisview.Severalmodelselectionmethodsstartfromthisperspective.
(ii)Thebias–variancetrade-off:Thebalanceandinterplaybetweenvarianceandbiasisfundamentalinseveralbranchesofstatistics.Intheframeworkofmodel?ttingandselectionittakestheformofbalancingsimplicity(fewerparameterstoestimate,leadingtolowervariability,butassociatedwithmodellingbias)againstcomplexity(enteringmoreparametersinamodel,e.g.regressionparametersformorecovariates,meansahigherdegreeofvariabilitybutsmallermodellingbias).Statisticalmodelselectionmethodsmustseekaproperbalancebetweenover?tting(amodelwithtoomanyparameters,morethanactuallyneeded)andunder?tting(amodelwithtoofewparameters,notcapturingtherightsignal).
(iii)Parsimony:‘Theprincipleofparsimony’takesmanyformsandhasmanyfor-mulations,inareasrangingfromphilosophy,physics,arts,communication,andindeedstatistics.TheoriginalOckham’srazoris‘entitiesshouldnotbemultipliedbeyondne-cessity’.Forstatisticalmodellingareasonabletranslationisthatonlyparametersthatreallymatteroughttobeincludedinaselectedmodel.Onemight,forexample,bewillingtoextendalinearregressionmodeltoincludeanextraquadratictermifthismanifestlyimprovespredictionquality,butnototherwise.
(iv)Thecontext:Allmodellingisrootedinanappropriatescienti?ccontextandisforacertainpurpose.AsDarwinoncewrote,‘Howodditisthatanyoneshouldnotseethatallobservationmustbefororagainstsomeviewifitistobeofanyservice’.Onemustrealisethat‘thecontext’isnotalwaysapreciselyde?nedconcept,anddifferentresearchersmightdiscoverorlearndifferentthingsfromthesamedatasets.Also,differentschoolsofsciencemighthavedifferentpreferencesforwhattheaimsandpurposesarewhenmodellingandanalysingdata.Breiman(2001)discusses‘thetwocultures’ofstatistics,broadlysortingscienti?cquestionsintorespectivelythoseofpredictionandclassi?cationononehand(whereevena‘blackbox’modelis?neaslongasitworkswell)andthoseof‘deeperlearningaboutmodels’ontheotherhand(wherethediscoveryofanon-nullparameterisimportantevenwhenitmightnothelpimproveinferenceprecision).ThusS.Karlin’sstatementthat‘Thepurposeofmodelsisnotto?tthedata,buttosharpenthequestions’(inhisR.A.Fishermemoriallecture,1983)isimportantinsomecontextsbutlessrelevantinothers.Indeedtherearedifferentlyspiritedmodelselectionmethods,gearedtowardsansweringquestionsraisedbydifferentcultures.© Cambridge University http://wendang.chazidian.com
内容需要下载文档才能查看
Cambridge University Press
978-0-521-85225-8 - Model Selection and Model Averaging
Gerda Claeskens and Nils Lid Hjort
ExcerptMore information
1.2Egyptianskulldevelopment3
(v)Thefocus:Inappliedstatisticsworkitisoftenthecasethatsomequantitiesorfunctionsofparametersaremoreimportantthanothers.Itisthenfruitfultogearmodelbuildingandmodelselectioneffortstowardscriteriathatfavourgoodperformancepreciselyforthosequantitiesthataremoreimportant.Thatdifferentaimsmightleadtodifferentlyselectedmodels,forthesamedataandthesamelistofcandidatemodels,shouldnotbeconsideredaparadox,asitre?ectsdifferentpreferencesanddifferentlossfunctions.Inlaterchaptersweshallinparticularworkwithfocussedinformationcriteriathatstartfromestimatingthemeansquarederror(varianceplussquaredbias)ofcandidateestimators,foragivenfocusparameter.
(vi)Con?ictingrecommendations:Asisclearfromtheprecedingpoints,questionsabout‘whichmodelisbest’areinherentlymoredif?cultthanthoseofthetype‘foragivenmodel,howshouldwecarryoutinference’.Sometimesdifferentmodelselectionstrategiesendupofferingdifferentadvice,forthesamedataandthesamelistofcandidatemodels.Thisisnotacontradictionassuch,butstressestheimportanceoflearninghowthemostfrequentlyusedselectionschemesareconstructedandwhattheiraimsandpropertiesare.
(vii)Modelaveraging:Mostselectionstrategiesworkbyassigningacertainscoretoeachcandidatemodel.Insomecasestheremightbeaclearwinner,butsometimesthesescoresmightrevealthatthereareseveralcandidatesthatdoalmostaswellasthewinner.Insuchcasestheremaybeconsiderableadvantagesincombininginferenceoutputacrossthesebestmodels.
1.2Egyptianskulldevelopment
MeasurementsonskullsofmaleEgyptianshavebeencollectedfromdifferentarchaeo-logicaleras,withaviewtowardsestablishingbiometricaldifferences(ifany)andmoregenerallystudyingevolutionaryaspects.Changesovertimeareinterpretedanddiscussedinacontextofinterbreedingandin?uxofimmigrantpopulations.Thedataconsistoffourmeasurementsforeachof30skullsfromeachof?vetimeeras,originallypresentedbyThomsonandRandall-Maciver(1905).The?vetimeperiodsaretheearlypredy-nastic(around4000b.c.),latepredynastic(around3300b.c.),12thand13thdynasties(around1850b.c.),theptolemaicperiod(around200b.c.),andtheRomanperiod(around150a.d.).Foreachofthe150skulls,thefollowingmeasurementsaretaken(allinmil-limetres):x1=maximalbreadthoftheskull(MB),x2=basibregmaticheight(BH),x3=basialveolarlength(BL),andx4=nasalheight(NH);seeFigure1.1,adaptedfromManly(1986,page6).Figure1.2givespairwisescatterplotsofthedataforthe?rstandlasttimeperiod,respectively.Similarplotsareeasilymadefortheothertimeperiods.Wenotice,forexample,thatthelevelofthex1measurementappearstohaveincreasedwhilethatofthex3measurementmayhavedecreasedsomewhatovertime.Statisticalmodellingandanalysisarerequiredtoaccuratelyvalidatesuchclaims.© Cambridge University http://wendang.chazidian.com
内容需要下载文档才能查看
Cambridge University Press
978-0-521-85225-8 - Model Selection and Model Averaging
Gerda Claeskens and Nils Lid Hjort
ExcerptMore information
4Modelselection:dataexamplesand
内容需要下载文档才能查看introduction
Fig.1.1.Thefourskullmeasurementsx1=MB,x2=BH,x3=BL,x4=NH;fromManly(1986,page6).
Thereisafour-dimensionalvectorofobservationsyt,iassociatedwithskulliandtimeperiodt,fori=1,...,30andt=1,...,5,wheret=1correspondsto4000b.c.,
¯t,?todenotethefour-dimensionalvectorandsoon,uptot=5for150a.d.Weusey
ofaveragesacrossthe30skullsfortimeperiodt.Thisyieldsthefollowingsummarymeasures:
¯1,?=(131.37,133.60,99.17,50.53),y
¯2,?=(132.37,132.70,99.07,50.23),y
¯3,?=(134.47,133.80,96.03,50.57),y
¯4,?=(135.50,132.30,94.53,51.97),y
¯5,?=(136.27,130.33,93.50,51.37).y
Standarddeviationsforthefourmeasurements,computedfromaveragingvarianceesti-matesoverthe?vetimeperiods(intheorderMB,BH,BL,NH),are4.59,4.85,4.92,
3.19.WeassumethatthevectorsYt,iareindependentandfour-dimensionalnormallydistributed,withmeanvectorξtandvariancematrix??tforerast=1,...,5.However,itisnotgiventoushowthesemeanvectorsandvariancematricescouldbestruc-tured,orhowtheymightevolveovertime.Hence,althoughwehavespeci?edthatdatastemfromfour-dimensionalnormaldistributions,themodelforthedataisnotyetfullyspeci?ed.
Wenowwishto?ndastatisticalmodelthatprovidestheclearestexplanationofthemainfeaturesofthesedata.Giventheinformationandevolutionarycontextalludedtoabove,searchingforgoodmodelswouldinvolvetheirabilitytoanswerthefollowingquestions.Dothemeanparameters(populationaveragesofthefourmeasurements)© Cambridge University http://wendang.chazidian.com
内容需要下载文档才能查看
Cambridge University Press
978-0-521-85225-8 - Model Selection and Model AveragingGerda Claeskens and Nils Lid HjortExcerpt
More information
1.2Egyptianskulldevelopment
120 125 130 135 140 145
115
60
5
10580 85 90 95
NH
120
130
140
BH
BL
120130140
45
120
5055
130140
MB
115
60
MB
60
MB
105
55
NH
80 85 90 95
50
NH
120 125 130 135 140 145
BL
45
120 125 130 135 140 145
45
80 85 90 95
5055
105115
BH
120 125 130 135 140 145
115
BH
60
BL
10580 85 90 95
NH
120
130
140
BH
BL
120130140
45
120
5055
130140
MB
115
60
MB
60
MB
105
55
NH
80 85 90 95
50
NH
120 125 130 135 140 145
BL
45
120 125 130 135 140 145
45
80 85 90 95
5055
105115
BHBHBL
Fig.1.2.PairwisescatterplotsfortheEgyptianskulldata.Firsttworows:earlypredy-nasticperiod(http://wendang.chazidian.comsttworows:Romanperiod(150a.d.).
© Cambridge University http://wendang.chazidian.com
内容需要下载文档才能查看下载文档
热门试卷
- 2016年四川省内江市中考化学试卷
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
- 山东省滨州市三校2017届第一学期阶段测试初三英语试题
- 四川省成都七中2017届高三一诊模拟考试文科综合试卷
- 2017届普通高等学校招生全国统一考试模拟试题(附答案)
- 重庆市永川中学高2017级上期12月月考语文试题
- 江西宜春三中2017届高三第一学期第二次月考文科综合试题
- 内蒙古赤峰二中2017届高三上学期第三次月考英语试题
- 2017年六年级(上)数学期末考试卷
- 2017人教版小学英语三年级上期末笔试题
- 江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
- 重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
- 江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
- 江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
- 山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
- 【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
- 四川省简阳市阳安中学2016年12月高二月考英语试卷
- 四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
- 安徽省滁州中学2016—2017学年度第一学期12月月考高三英语试卷
- 山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
- 福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
- 甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷
网友关注
- 2015贵州公务员考试行测真题-常识判断
- 2015贵州公务员考试行测真题-数量关系
- 2014年贵州省公务员面试真题6月27日
- 2016贵州公务员面试模拟题:“互联网+”时代的到来需要以诚为先
- 2016贵州公务员面试模拟题:玩手机算缺课
- 2014年贵州省公务员面试真题6月28日
- 贵州公务员考试试题:法律常识练习题4
- 2016贵州公务员面试热点模拟题:假奶粉如何消除
- 2016贵州公务员面试模拟题: 高学历犯罪
- 2016贵州公务员面试模拟题:号贩子的猖獗是谁之过
- 2016贵州公务员面试模拟题:飞机“选座收费”惹争议
- 2016贵州公务员面试模拟题:一号难求
- 2016贵州公务员面试模拟题:保护好祖国花朵
- 2011年贵州公务员考试面试真题(8月20日)
- 2015年贵州公务员考试面试真题(7月12日)
- 2016贵州公务员面试模拟题:奇葩证明
- 2015贵州公务员考试行测真题-言语理解与表达
- 2015年贵州公务员考试面试真题(7月11日)
- 2016贵州公务员面试模拟题:论规矩
- 2016贵州公务员面试模拟题:下跪执法
- 2016贵州公务员面试模拟题:“狗占人座”
- 2016贵州公务员面试热点模拟题:如何劝说商贩
- 2016贵州公务员面试热点模拟题:“饿了么”曝光揭示外卖乱象
- 2016贵州公务员面试模拟题:全面放开二孩政策
- 2016贵州公务员面试模拟题:如何看待网络谣言
- 2016贵州公务员面试模拟题:垃圾分类的窘境
- 2016贵州公务员面试模拟题:“女性专用公交”是与非
- 2015年贵州公务员考试面试真题(7月13日)
- 贵州公务员考试试题:法律常识练习题3
- 2016贵州公务员面试热点模拟题:教师资格打破“终身制”
网友关注视频
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,天津市
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 7
- 苏科版八年级数学下册7.2《统计图的选用》
- 北师大版数学四年级下册第三单元第四节街心广场
- 沪教版八年级下册数学练习册21.3(3)分式方程P17
- 冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣.mp4
- 第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T3751175
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 4
- 北师大版小学数学四年级下册第15课小数乘小数一
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 8
- 七年级下册外研版英语M8U2reading
- 冀教版小学数学二年级下册第二单元《租船问题》
- 冀教版小学英语四年级下册Lesson2授课视频
- 外研版英语三起6年级下册(14版)Module3 Unit1
- 沪教版八年级下次数学练习册21.4(2)无理方程P19
- 冀教版小学数学二年级下册1
- 苏科版数学八年级下册9.2《中心对称和中心对称图形》
- 北师大版数学 四年级下册 第三单元 第二节 小数点搬家
- 七年级英语下册 上海牛津版 Unit9
- 冀教版英语四年级下册第二课
- 第五单元 民族艺术的瑰宝_15. 多姿多彩的民族服饰_第二课时(市一等奖)(岭南版六年级上册)_T129830
- 【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
- 沪教版牛津小学英语(深圳用) 五年级下册 Unit 10
- 【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,安徽省
- 8 随形想象_第一课时(二等奖)(沪教版二年级上册)_T3786594
- 【获奖】科粤版初三九年级化学下册第七章7.3浓稀的表示
- 七年级英语下册 上海牛津版 Unit3
- 19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T3763925
- 【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,安徽省
- 精品·同步课程 历史 八年级 上册 第15集 近代科学技术与思想文化
精品推荐
- 2016-2017学年高一语文人教版必修一+模块学业水平检测试题(含答案)
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
分类导航
- 互联网
- 电脑基础知识
- 计算机软件及应用
- 计算机硬件及网络
- 计算机应用/办公自动化
- .NET
- 数据结构与算法
- Java
- SEO
- C/C++资料
- linux/Unix相关
- 手机开发
- UML理论/建模
- 并行计算/云计算
- 嵌入式开发
- windows相关
- 软件工程
- 管理信息系统
- 开发文档
- 图形图像
- 网络与通信
- 网络信息安全
- 电子支付
- Labview
- matlab
- 网络资源
- Python
- Delphi/Perl
- 评测
- Flash/Flex
- CSS/Script
- 计算机原理
- PHP资料
- 数据挖掘与模式识别
- Web服务
- 数据库
- Visual Basic
- 电子商务
- 服务器
- 搜索引擎优化
- 存储
- 架构
- 行业软件
- 人工智能
- 计算机辅助设计
- 多媒体
- 软件测试
- 计算机硬件与维护
- 网站策划/UE
- 网页设计/UI
- 网吧管理