教育资源为主的文档平台

当前位置: 查字典文档网> 所有文档分类> > 计算机硬件及网络> gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here

gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here

上传者:李万林
|
上传时间:2015-05-04
|
次下载

gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here

Stack Layers

AlgorithmApplicationCompilerOSIO

Mem Controler

CachesCore Microarch

CircuitsGatesTransistorsPhysics

Small FootprintMeduimFootprintLargeFootprint

gem5,GPGPUSim,McPAT,GPUWattch,"Yourfavoritesimulatorhere"

ConsideredHarmful

TonyNowatzki

Basic Energy Mathematical

tjn@cs.wisc.edumenon@cs.wisc.eduCharacterizationProofCustom First-Program Reasoned

Order ModelsAnalysisArguements

Cycle Accurate Simulation

JaikrishnanMenonChen-HanHoKarthikeyanSankaralingam

UniversityofWisconsin-Madison

ho9@wisc.edu

karu@cs.wisc.edu

MuchasDijkstra,in1968,observedthedangersofrely-Best Research Approach?How do I ingonthegotostatement,weobservethatthedominant

evaluate relianceonquantitativesimulatorsishavingadetrimentalef-fectonour eld.Overtime,simulatortoolshavebecomemoreSatisfy Program Committee?my idea?

sophisticated.FromthesimpledaysofthenowdebunkedSim-“Cycle Accurate Simulation”pleScalarwithitsRUU-basedOOOmodelwith xedDRAM

latency,tothegem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchitectsareclaimingasvalidatedtools.Weargue,though,thatnewgener-ationsofsimulatorsareoftenover ttedtocertainbenchmarksorcon gurationsforvalidationandcanhavesigni cantmod-elingerrorsthatresearchersarenotawareof.Thoughtheexis-tenceoftheseerrorsareunsurprising,theycancauseunawareuserstoderiveincorrectconclusions.Simultaneously,andevenmoreproblematic,isthatreviewersdemandresearchersinappropriatelyusethesetools.Weenumerateeightcommon,butnotacknowledgedorrecognizedpitfallsofsimulatorsorsimulatoruse,consideringfourmodernsimulationinfrastruc-tures.Weproposethattheevaluationstandardsforaworkshouldmatchit’s“footprint,”thebreadthoflayerswhichthetechniqueaffects,andconcludewithouropiniononhowtoescapeoutofour eld’ssimulate-or-rejectmindset.

1.Introduction

Foranumberofyearswehavebeenfamiliarwiththeobserva-tionthatthequalityofarchitectureresearchersisadecreasingfunctionofthere-lianceonquantitativearchitecturesimulatorsinthearchitecturepaperstheyproduce.Morerecentlywediscov-eredwhytheuseofarchitecturesimulatorshassuchdisastrouseffects,andwebecameconvincedthatthearchitecturesimulatorshouldbeabolishedfromall"higherlevel"architectureresearch.Atthattimewedidnotattachtoomuchimportancetothisdiscovery;wenowsubmitourconsiderationsforpub-lication1.MuchasDijkstraobservedtheeraofrelianceonthegotostatementwashavinganegativeeffect,weobservetheeraofover-relianceonquantitativesimulatorsishavingadetrimentaleffectonthe eld,andshouldcometoanend.Weobservethatsimulation,inparticular“detailed”toolsthatprovidecycle-accurateperformanceestimates,areaesti-mates,powerandenergyestimates,asavehicleforarchitec-paragraphisreproducedandcriticismsaremodi edfromDijkstra’s

seminalACaseagainsttheGOTOStatement[10].Additionsareinitalics.

1This

Figure1:Thefootprintofatechnique(thescopeoflayersitinteractswith),andthechoiceresearchersfacebetweenap-propriateevaluationandPC-compliantevaluationpractices.

tureresearchisubiquitous.FromthesimpledaysofthenowdebunkedSimpleScalar[8]withitsRUU-basedOOOmodel+ xedmemorytogem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchi-tectsareclaimingasvalidatedsimulators2.Thislevelofaddeddetailhasledtothebeliefthatwehavebettertoolsandaredoingbetterandbetterquantitativeevaluation.Ithasalsoledtothepreponderanceofpapersrelyingonsuchtoolsandhascreatedanimplicitstandardandtemplateofhowquantitativeevaluationmustbedone.Thisrelianceandbeliefinsuchdetailedtoolsishurtingthe eldandcreatingvariouspitfalls.Partoftheproblemisthatthesetoolsarecommonlyover- ttedforvalidation,meaningthattheirparametersaretunedsuchthattheyareaccurateonlyonasmallsetofbenchmarksorcon gurationparameters.Theimplicationofover ttingisthatsimulatormodelscapturethenoiseratherthanthefunda-mentalrelationshipsandtradeoffs.Inaddition,simulatortoolsoftenhavesigni cantmodelingerrorswhicharenoteasily

2We

remarkthatnotallsimulators’authorsthemselvesclaimvalidation.

accessiblebyusers.Overall,therelianceonsimulatorsarecreatingmanypitfallsbothintechnicalaspectsandinhurtingthe eldbydistortingreviewerexpectationsofwhatentailsgoodquantitativeevaluation.

Asawayforwardforresearchers,webelievethatthecorrectapproachdependsonthefootprint,orlayersofthestackwhichthetechniqueaffectsorrelieson,andthatthereisnoone-size- ts-allsolutiontoarchitectureresearch.Figure1highlightshowdifferenttechniques,representedbygrayboxes,canaf-fectdifferentstacklayers.Unfortunately,itistoooftenthecasethatresearchersmakethechoiceofresearchapproachbasedonwhatwillgettheirpaperaccepted,ratherthanwhatisthemostscienti c.Werevisitaversionofthis gurewithspeci cexamplesinSection4.Mostimportantly,webelievethatreviewersmustrecalibratetheirevaluationstandards,andappropriatelygaugethemtothefootprintoftheresearch.Webelievethisissueisimportantandvitalnowasmoreresearchinour eldismovingtowardlargerfootprints,evidencedbyrecentkeynotes[7]andfundingcalls[21].Restrictingour-selvestoanill-suitedone-size- tsallapproachcouldcurtailscienti cadvancementoftheseefforts.

Thispaperenumerateseightcommon,butnotacknowl-edgedorrecognizedpitfalls,consideringfourmodernsim-ulationinfrastructures:gem5[5],McPAT[19],GPGPUSimV2.x[3],andGPUWattch[18].Webeginthispaperwithasectiondescribingerrorsinpopularsimulators,whichweusetosubstantiatethepitfalls.Indiscussingsimulatorerrorsandpitfalls,ourgoalisnottooffendorcriticizebuttoinformandprovokethoughtfuldiscussion.Weconcludewithstrategieswhichcanallowustoescapeoutofour eld’stemplatizedsimulate-or-rejectmindset.

andwebelievetheseproblemscanbetackledwithoutdif -culty.Werevisitthebene tsofcommunitydriventoolsinSection4.2.

Conservative/ObscureDefaultforWritebackMechanism:Thegem5OOOmodelonlyschedulesinstructionsforissueifthereareguaranteedtobeenough“writebackbuffers”forthem,wherethetotalbuffersarecalculatedbywriteback-width×writeback-depth.Thedefault,acrossallISAs,isawriteback-depthof1.Thismeansthatifafewlonglatencyinstructionsholdupwriteback-bufferslots,thentheeffectiveissuewidthgoesto0.ForanOOO2-widecorewithbench-marksthathavelong-latencymemoryreferences,adding5bufferslotsincreasesperformancebymorethan5X.WedonotbelievethistradeoffisrepresentativeofrealOOOdesigns,andthisimportantparameterisnotsuf cientlyde nedinthedocumentationorsourcecode.

InconsistentPipelineReplayMechanism:gem5’sOOOmodelforspeculativeinstructionschedulingandpipelinere-playappearstobebothcontradictoryandunnecessarilycon-servative.Toexplain,adeeplypipelinedOOOcoremustspeculativelyscheduleinstructionstoenableback-to-backex-ecution.Whenanunexpectedlatencyoccurs,thescheduleforthemiss-dependentinstructionsneedstobecorrected.Ingem5,whenaloadissuestoablockedcache,gem5conser-vativelymodelsthe“correction”tothespeculativescheduleby ushingtheentirepipeline.Thelargerissueisthatafterapipeline ush,instructionsareimmediatelyrescheduled,evenifthecacheremainsblocked.Thisleadstoacycleofrepeated ushingoftheentirepipeline.Whiletheperformancedoesnottakeasigni canthit,theamountofenergycandoubleonsomebenchmarksversusadesignwithahandfulmoreMSHRstopreventthecachefromblocking.

Tobeconsistent,anarchitecturewhich ushesthepipelineonacache-blockshouldalso ushthepipelineonothervariable-latencyevents.However,gem5doesnot ushthepipelineoneventslikecachemisses,whichwouldhavevari-ablelatency.Inshort,thepipelinereplaymechanismissimul-taneouslybothhighlyconservativeandoptimistic.

Inef cient/MislabeledMicro-ops:gem5micro-opsareop-timizedmoreforcorrectnessandeconomyratherthanef -ciency.Oneexampleisthatthesamemicro-opthatperformsconditionalmovesalsoperformsregularregistermoves.Thismeansthatregularmoveswillincurthedynamicdependenceandenergycostofreadingthedestinationregister,eventhoughtheyarecompletelyoverwritingit.Also,thoughthegem5 agregisterimplementationhasgreatlyimprovedinrecentversions,afewinstructionsstillrequireextradependenciesandregisterreadsbecauseof agregistergrouping.Oneex-ampleishowlogicalinstructions(likeXOR)don’twritetheAF ag,butsinceitisgroupedwiththeother ags,itmustbereadbeforewritten.Thisisarguablyacceptable,butdif culttounderstandandaccessasauser.

Animportantyet xableproblemisthatsomemicro-opsare2.Errorsinsimulators

Webeginby rstoutliningsomeexampleinstancesoferrorsinmainstreamandpopularsimulators.Webelievetheexis-tenceoftheseerrorsshouldneitherbesurprising,noraretheyintendedasanattackonparticularsimulatorsorsimulatorauthors;anylargebodyofcodewillhaveerrors.Weonlybringattentiontoaddsomecontexttoourpitfallsandaidinsubstantiation.Ifanything,ourcriticismissquarelyaimedatusersofsuchtools,forexampleGovindarajuetal.[12].Errorreportsareavailableathttp://www.cs.wisc.edu/vertical/sim-harmful,whichhavebeenveri edbyatleastoneotherpersonnotaf liatedwithourresearchgroup.Theirpurposeistopointoutthetypeofproblemswhichcanbedetrimentalifusersarenotaware.Inthissection,foreachtool,we rstpresentobservationsaboutanissue,thengiveouropinionstheissue’simplications.

2.1.gem5

Tobeclear,theerrorsdiscussedinthissectionhaveonlybeenveri edontheX86versionofgem5,andthemicro-opis-suescanonlyapplytoX86.Also,someofthebelowerrorshavebeencommunicatedtothequiteactivegem5community,

内容需要下载文档才能查看

which(nofp

memorymultiplies,upinthedatafromgem5toperformenergyanalysison oatingpointcodewouldproduceincorrectresultsbypotentiallyintegerfactors.2.2.McPAT

Unclear/Over ttedFunctionalUnit(FU)EnergyModel-ing:IntheMcPATmodel,ifthecoreisOOO,thenasmalldynamiccomponentofenergyisaddedforeachFUregardlessofwhethertheFUisbeingused.Thisconstantiscitedas“averagenumbersfromIntel4Gand773Mhz(Wattch)”.WhythisoccursinOOObutnotInorderprocessorscouldbeduetoover ttinginvalidation.Anotherrelatedexampleisfortheper-accessenergyofanFU.Iftheprocessoris“embed-ded,”thenthispowerisdividedbytwo,citing:“AccordingtoARMdataembeddedprocessorhasmuchlowerperaccenergy”.Whetherornotthese(inouropinion)seeminglyarbitrarydecisionsarevalidornot,sincetheyarenoteas-ilyaccessibleordecipherablebytheuser,theymaycometoincorrectconclusionsaboutthequantitativeresults.

ErrorinPipelineandClockPowerMcPATcalculatesanestimateofthepipelineandclockpowerconsideringswitch-ingfactorsinpipeline ip- ops.ThispowerisnotreporteddirectlyinMcPAT,rather,itisdistributedequallyamongstthevariousprocessorstructures,makingitdif culttodeter-minewhenthereareerrors.Figure2showsthedynamicpowerwhichthepipelinecontributesforinorderandOOOprocessors(65nm),whichcanonlybeseenbyinstrumentingtheMcPATsourcecode.OurexperimentsshowthatthiscomponentofpoweriseffectivelydroppedforallOOOcoreexperimentslastinglongerthanafewcycles.Theerrorappearstobeintro-ducedwhenconvertingbetweenpowerandenergy,whereafactorofthenumberofcyclesislostfortheOOOcoreonly.ThisapparenterrorisinallversionsofMcPATthatwetested(fromv0.7tov1.1March2014).Theimplicationofthiserroristhatitcreatesuncertaintyabouttheestimationofpipelineandclockpower3.

2.3.GPGPUSimV2.x

Inthissubsection,weconsiderawidelyadoptedversionoftheGPGPUSimtool,anddescribeseveralmissingorabstractedcomponentsofitsarchitecturalmodel.WediscussGPG-PUSimV2.x,eventhoughitisnotthelatestversionofthetool,speci callybecausemanyresearchersarestillusingthisversion[16,17],andwebelievethefollowingclaimsabout

thoseawareoftheexactdetailsofMcPAT,whenitisusedina ne-grainedmode(calledeverycycle-asopposedtotheXMLinterfaceofcallingatendofmillionsofcyclesofsimulation),thisissuewilldisappear.However,theXMLbulkmodeisthemostprevalentusageofMcPATinliterature.

3For

Figure2:McPATpipelinepowerfora65nmidleprocessor.

itsmodelingfeaturescanbemadewithoutcontroversy.GPG-PUSim3.xhas xedmanyoftheseissues,(seeslide20inthetutorial[2]).

RegisterFilemicroarchitecture:Theoperandcollector(single-portedregister lebanks+arbiter+X-bar+col-lectorunits)ismodeledassuming xedlatencyaccessestotheSRAMwithsomeadditionalqueuinglatency.Itdoesnotmodellow-leveldetails,likecontention,whichimpactperformanceinhigh-computebandwidthscenarios.

Thread/warp/wavefrontschedulinganddispatch:Threadschedulingisfunctional,andwhileanumberofdifferentwarpschedulingschemesareimplemented,thesearenotmodeledinthemicroarchitecture,theyaresimplygeneratedfunctionally.

BranchdivergencestructuresandBranchUnit:Similartothreaddispatch,branchdivergencetrackingstructuresarefunctionallyemulatedaspartoftheabstracthardwaremodel,andthebranchunitmicroarchitectureisnotmodeledatthecycle-level.

Theeffectofomittingthedetailedmodelingofthesemicro-architecturalfeatures,andaccountingforthemabstractlyorfunctionally,isthatitencouragesarchitectsnottoreasonaboutthemicroarchitecturalfeasibilityoftheproposedtechnique.Forexample,considerdevelopingandevaluatinganon-trivialwarpschedulingtechniqueinGPGPUSimV2.x.Itsmodelwouldbeafunctionalone,meaningthatitwouldnotcapturetheindividualcomponentsofthehardware,theircommunica-tion,ortheirpipelinestages.ThiswouldbetantamounttoaCPUload-storequeuedesignevaluationwhichfunctionallymodelsthedependencepredictor,whileignoringcacheportcontentionetc.FortheCPUdomain,thismashupofhigh-levelmodelingandlow-levelsimulationwouldnotbeconsideredsuf cienttounderstandtheeffectivenessofatechniquequan-titatively.2.4.GPUWattch

GiventhestraightforwardreadingoftheGPUWattch[18]pa-per,itsmethodologyhasaformofmodelingerrorwhichwecall“mathematicallyirrelevant”modeling.Wede nethisasmodelwhich,whentakenasawhole,containsmathematically-irrelevantsub-components.The rstpartofthissubsection

willdescribehowthisformoferrorappliestothemethodol-ogy(aspresented)inLengetal.[18].Essentially,thedetailedmodelingusingMcPAT,empiricalmemorymodelsandsyn-thesisbasedmodelsaren’tmeaningfultothe nalobtainedprediction.However,additionalunpresenteddetailsoftheGPUWattchmethodologyhelpjustifythedetailedmodeling.Therefore,wewillsubsequentlydiscusssomeofthesedetails,andconcludewiththeimplicationsforappropriatemodelus-age.

GPUWattchPowerModeling(aspresented)GPUWattchmodelsthecycle-levelpowerofGPUarchitecturesby rstus-ingGPGPUSimtoobtainactivityfactors.Then,GPUWattchcalculatesthedynamicpowerofaparticularbenchmarkPbench,asthesumoftheactivityfactorsαbench,comp,multipliedbythemaximumpowerofthecomponent,PMAXcomp.ThePMAXcomppowervalueisobtainedthroughhighlydetailedmodelingusingacombinationofMcPAT-basedmodeling,em-piricalmodels,andsynthesis-basedmodels.TheGPUWattchauthorsstatethatsinceMcPATistunedtoCPUs,andsincetherearemanyundocumentedGPUfeatures,theyneedtocorrectforthisbyaddinganerrortermforeachcomponentxcomp,anduseleast-squaresestimation(linearregression)toestimatetheerrors.Their nalmodelfordynamicpoweris:

Pbench=

Ifthismethodologywasdirectlyapplied,thecomputationofPMAXismathematicallyirrelevant.Performingalin-eartransformationontheexplanatoryvariablesofalinearregressiondoesnotaffecttheerrororpredictionaccuracy.Infact,runningthebelowregression,whichdoesnothavePMAXcompvalues,wouldbemathematicallyequivalent,andtheresultingregressioncoef cientsaresimplyscaledasfol- lows:xcomp=PMAXcomp×xcomp.

Pbench=

comp

αbench,comp×xcomp

(2)

WhatthismeansisthatthePMAXcompvariablesaremath-ematicallymeaninglesstothe nalmodel.Therefore,userswhoapplytheGPUWattchmethodologyaswrittenwillputunnecessaryeffortintodetailedpowermodeling(whichwouldincludeMcPAT,empiricalmodelandsynthesismodeldevel-opment).

GPUWattchPowerModeling(asimplemented)Themethodologywhichisimplementedactuallydoesemploythedetailedpowermodelingresultsduringthescalingparameterselectionforsomepurpose,asweexplainnext5.

First,insteadofscalingtheinternalpowervaluesofMc-PAT’soptimizationframework,whatisactuallyscaledaretheactivitycountswhicharefedasinputstoMcPAT.ThisassumesthatMcPAT’schoiceofcomponentswouldbeunaf-fectedbythedifferentpowerscalingfactorappliedtovariouscomponents.

Second,insteadofautomaticlinearregression,theycal-culatetherootmeansquareerroroftheirpredictionsandmanuallymodifythescalingcoef cientstoreducetheerror.Thisalonewouldbethemanualequivalenttolinearregression,andhencewouldstillhavethemathematicalirrelevancebug.However,theauthorsalsoboundthescalingcoef cientsbybetween10×to50×foron-coreandoff-corecomponentsrespectively(here,theauthorsexplainthattheboundischosenbasedonthecon denceintheoriginaldetailedmodel).Theauthorsobservethatwithoutboundingthescalingcoef cients,theerrorisactuallyless:amathematically“better”model.However,theper-componentbreakdownswithpurelinearre-gressiondonotmatchexpectedintuition(toobigornegativescalingfactors).Therefore,theboundsonscalingfactorsserveasaroughguidelineinattainingaplausiblepowerdistribution.Overall,webelieveitispossibletouseapurelymathematicalapproach,applyingthesametypeofroughintuition,toachievethesamequalityofresultswithoutdetailedpowermodelinglikeMcPAT.

WhatareGPUWattch’sappropriateusecases?ThemethodologybehindtheGPUWattchpowermodelhasim-plicationsforitsappropriateusage.Ourpositionisthatitthatitcanonlybeappropriatelyemployedwhenaphysicalartifactwithmeasurablepowernumbersareavailable.Forthe

authorsgraciouslyprovidedusdetailsontheirmethodology,andwe

taketheblameifwehavemadeanymistakesinreproducingithere.

5The

comp

αbench,comp×PMAXcomp×xcomp(1)

Inlinearregressionterminology,αbench,comp×PMAXcomparetheexplanatoryorinputvariables,xcomparetheregressioncoef cientsandPbenchisthedependentvariable.Atthispoint,theauthors’methodologyisasfollows:

Weiterativelyre nethepowermodelonthebasisofthesourcesofthevariousinaccuraciesthatLSE[regression]identi es.Forinstance,inourinfrastructure(i.e.,McPAT)thepowerestimationforcertaincomponentsisbiasedtowardCPUimplementations.WenarrowtheresultinginaccuracygapfortheGPUpowermodelby xingourinitialassump-tionsabouttheimplementationandthenapplyingthescalingfactorsthatareobtainedfromLSE.

WecontendthatadirectinterpretationoftheirmethodologywouldbetoruntheregressioninEquation1usingmeasuredvaluesofPbenchofsomemicrobenchmarksto ndthecompo-nentwiseerrorsxcomp,thenmodifythesourcecodetomultiplytheoriginalcomponentwisepowerαbench,comp×PMAXcompbythe“scalingfactor”forthatcomponent,xcomp,toobtainthe nalpowerestimate.Notethatthisprocedureisperformedonaplatformspeci cbasis4.

theXML lesprovidedinthetoolforGTX480and

QuadroFX5600,thescalingcoef cientsaretheseriesof32paramnamesstart-ingatline31(TOT_INST,FP_INT,IC_H,etc.)Inthesourcecode,ingpgpu_sim_wrapper.c,theseareusedinmethodslikeset_inst_power,set_regfile_poweretc.toscaleuptheMcPATcomputedvalues.

4Considering

twoplatformswhichhavecon gurationsnow,theGTX480wasreleasedin2010,andthetheQuadroFX5600isevenolder.GeneratinganewGPUWattchcon gurationrequiresattainingdetailedpowermeasurements,includingphysicallyinstrumentingtheGPUpowersupplywithsensingresistors,followedbyanapplicationofthemanualerror-minimizationproceduredescribedabove.

Thereasonwhyaphysicalartifactisnecessaryisthatthescalingfactors,xcomp,areplatformspeci c.Asanexample,considerthepowerofregister leaccessintheGTX480andFX5600.TheMcPATscalingbetweenthetwodesignsdoesnotcapturetheirarchitecturaldifferences,whichshowsupintheGPUWattchmodelastheratiooftheirregister lescalingfactors,whichis1.7×.

IfwewanttoconsiderahypotheticalGPUwithdifferentcon gurationparameters,withoutchangingthescalingfactors,weshouldnotexpecttheGPUWattchpowermodeltobevalid.Toexplain,theaveragescalingfactormagnitudeis22×fortheGTX480,and8×forQuadroFX5600.Toclaimthatthehypo-theticalGPUcon gurationisvalid,theargumentthatwouldhavetobemadeisthatMcPATgetsthepowerwrongbyanor-derofmagnitude,butsomehowcangettherelativescalingofcomponentscorrect.Thisisapositionwebelieveisuntenablewithoutevidence.Thelackofvalidatedcon gurabilitywouldimpedeanaccuratedesignspaceexploration.

GoodusesforGPUWattchwouldincludeestimatingtheenergyimpactofpolicychangeswhichaffecttheactivityfac-tors,orinaddingcomponentswhichhaveexternallyvalidatedpowercharacteristics(again,ifthetargetarchitecturealreadyhasaGPUWattchpowermodel).Revisitingtheconceptoffootprint,thesearebothsmall-footprintevaluationscenarios.WeclarifyherethattheauthorsofGPUWattchnevermentiondesignspaceexplorationassomethingtheirtoolismeantfor.Soagain,ourcriticismisaimedattoolusersratherthandevel-opers,andadditionallythereviewerwhonowthinksenergyestimationforGPUresearchisalwaysdoable.

accessible,makingbugsdif cultto ndevenwithcarefuldataanalysis.Thatisbecausemanyofthefeaturesareobscuredbehindimplicitassumptions,lackofdocumentationandlackofgoodreportingofresults.OneexampleishowthepipelinepowerisreportedinMcPAT.Sinceitisimplicitlydistributedamongsttheindividualcomponentsoftheprocessor,whatappearstobeasigni canterrorisobfuscated.Errorslikethisputresearchersinadif cultposition.Shouldtheygo xthetoolwhichisalreadyvalidated?Andwhatifanothererrorintheoppositedirectioniscancelingouttheeffects?

Suggestions:Authorsshould rstvalidateandsanitycheckthesimulatorindividually.Further,whenitmakessense,theyshouldconsiderbuildingtrace-driventoolsthatmodelthe rstordereffectstheyareawareoff,insteadofusingcycle-accuratetools.Webelieveitisbettertohaveatoolwithknownabstractionerrorsthananunknownblackbox.Reviewersandthecommunityneedstochangeitsmindsetaswell–havingblindfaithin“standardtools,”whilecompletelydiscountingothertoolsisnotappropriate.Werevisittheissueofopenversusin-housetoolsinSection4.

3.2.Pitfall2:Falsecon dencefromvalidation-over-generalizationinsimulatorpapers,ortoolmisusesSimulatorwriterstypicallymakenarrowandfactuallyconsis-tentstatementsaboutvalidation,andsomeexamplesarebelow.However,thenatureofvalidationisoftenmisunderstoodbyusers,andthesetoolsareputtouseinwaysnotintendedfor,includingmakingquantitativegeneralizations.

gem5’sOOOmodeliswidelyused,butasobservedinarecentpaper[13]andourobservationsabove,ithasseveralspeci cationerrors.Thoughthegem5authorsthemselvesdonotclaimitassuch,somedoclaimitisa“validatedsimu-lator.”Clearly,thiscannotbetakenasalleffectsmodeled.Forinstance,atechniquethatworksontheinstructionfront-endmustpayattentiontogem5’sbaselineand rst xthespeci cationerrordescribedhere[13].

ConsideringMcPAT,accordingtotheirowndocumentationandcodecomments,constantsaresometimeschosentomatchthevalidationtargets.Weagreethisisareasonabledecisioninsomecases,especiallywhenhighlycustomizedlogicisemployed(e.g.functionalunitimplementations).Thedangeriswhenresearchersattempttogeneralizetheresultsoutsidethevalidatedprocessors.Theseconstantswilllikelynotbeappropriate.

ForGPUWattch,itmightbetemptingforresearcherstoper-formsensitivitystudiesbyvaryingMcPATparameters.Thepathofleastresistancewouldbetousethesamescalingfac-tors,insteadofmeasuringthepowerofaknownGPUandderivingnewscalingfactorsusingtheGPUWattchmethodol-ogy.Forreasonsdescribedintheprevioussection,wearguethatwithoutobtainingnewscalingfactors,thistypeofsensi-tivityanalysiswouldbeinappropriate.

Suggestions:Usewithcautionvalidatedsimulators.Lookfordetailsonthesimulator’sdesignandfactorthosedecisions3.Pitfalls

Thissectiondescribeseightpitfallsofmodernsimulatorsandsimulatorusage.Foreachpitfall,wedescribethehigh-levelproblemandsubstantiateourpositionwithempiricalevidence.Wethengiveouropinionsonhowbesttoavoidthepitfall.3.1.Pitfall1:Errorsinsimulatorsareinaccessibleto

users

Asoutlinedabove,simulatortoolscanhavesigni cantabstrac-tion,modeling,andspeci cationerrors.Furthermore,sincesimulatorsaredistributedasC/C++codewithlittlespeci -cation,itisdif cultforenduserstoevenbecomeawareoftheseerrors.Withoutunderstandingwhetherthesimulatoriscorrectlycapturingtheparticularphenomenonadesignerisinterestedin,off-the-shelfusagerendersthemineffectiveforeven rst-orderanalysisofeffects.

Sometimes,thefeaturesofmodelingtoolsarenoteasily

版权声明:此文档由查字典文档网用户提供,如用于商业用途请与作者联系,查字典文档网保持最终解释权!

下载文档

热门试卷

2016年四川省内江市中考化学试卷
广西钦州市高新区2017届高三11月月考政治试卷
浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
广西钦州市钦州港区2017届高三11月月考政治试卷
广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
广西钦州市高新区2016-2017学年高二11月月考政治试卷
广西钦州市高新区2016-2017学年高一11月月考政治试卷
山东省滨州市三校2017届第一学期阶段测试初三英语试题
四川省成都七中2017届高三一诊模拟考试文科综合试卷
2017届普通高等学校招生全国统一考试模拟试题(附答案)
重庆市永川中学高2017级上期12月月考语文试题
江西宜春三中2017届高三第一学期第二次月考文科综合试题
内蒙古赤峰二中2017届高三上学期第三次月考英语试题
2017年六年级(上)数学期末考试卷
2017人教版小学英语三年级上期末笔试题
江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
四川省简阳市阳安中学2016年12月高二月考英语试卷
四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
安徽省滁州中学2016—2017学年度第一学期12月月考​高三英语试卷
山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷

网友关注

教师资格证考试:《高等教育心理学》考点模拟题归纳六
2016年上半年政治教师资格笔试高频考点模拟题
教资国考:结构化面试押题模拟题三十二
全国教师资格统考政治知识核心考点《我国公民的政治参与》模拟题考点
教资国考:结构化面试押题模拟题三十五
教资国考:结构化面试押题模拟题三十
幼儿结构化名家观点类模拟题:幼儿美育(苏霍姆林斯基)
教师资格证考试:《高等教育心理学》考点模拟题归纳五
教资国考:结构化面试押题模拟题二十七
教师资格考试初中生物模拟题
教师资格证考试:《高等教育心理学》考点模拟题归纳九
中小学教师资格面试考题模拟题:学生习惯题海战术的看法
教资国考:结构化面试押题模拟题二十六
教资国考:结构化面试押题模拟题三十七
教师资格证考试:《高等教育心理学》考点模拟题归纳一
教资国考:结构化面试押题模拟题二十八
中小学教师资格面试考题模拟题:如何看待小学生竞选班干部变相拉票
幼儿园教师资格面试考题模拟题:教师如何开展德育工作
教资国考:结构化面试押题模拟题三十一
中小学结构化时政类模拟题:科学课新标准
幼儿园教师资格面试考题模拟题:幼儿园小学化现象
教师资格证考试:《高等教育心理学》考点模拟题归纳三
教师资格面试备考之结构化面试题目模拟题
幼儿园教师资格面试考题模拟题:幼儿园暴力虐童事件
教资国考:结构化面试押题模拟题三十六
教师资格证考试:《高等教育心理学》考点模拟题归纳八
教资国考:结构化面试押题模拟题三十三
中小学教师资格面试考题模拟题:学校对学生成绩进行排名并予以公布现象
中小学教师资格面试考题模拟题:对学生要凶,不然不听话
幼儿结构化模拟题之教师职业认知

网友关注视频

沪教版牛津小学英语(深圳用) 四年级下册 Unit 7
小学英语单词
第12章 圆锥曲线_12.7 抛物线的标准方程_第一课时(特等奖)(沪教版高二下册)_T274713
沪教版牛津小学英语(深圳用) 四年级下册 Unit 8
七年级英语下册 上海牛津版 Unit9
沪教版牛津小学英语(深圳用) 四年级下册 Unit 2
七年级下册外研版英语M8U2reading
河南省名校课堂七年级下册英语第一课(2020年2月10日)
8.练习八_第一课时(特等奖)(苏教版三年级上册)_T142692
二年级下册数学第三课 搭一搭⚖⚖
冀教版英语四年级下册第二课
8 随形想象_第一课时(二等奖)(沪教版二年级上册)_T3786594
化学九年级下册全册同步 人教版 第18集 常见的酸和碱(二)
沪教版八年级下册数学练习册20.4(2)一次函数的应用2P8
冀教版小学数学二年级下册第二单元《余数和除数的关系》
每天日常投篮练习第一天森哥打卡上脚 Nike PG 2 如何调整运球跳投手感?
沪教版牛津小学英语(深圳用) 六年级下册 Unit 7
苏科版八年级数学下册7.2《统计图的选用》
冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
沪教版牛津小学英语(深圳用) 五年级下册 Unit 7
苏科版数学 八年级下册 第八章第二节 可能性的大小
19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T3763925
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,湖北省
30.3 由不共线三点的坐标确定二次函数_第一课时(市一等奖)(冀教版九年级下册)_T144342
冀教版英语三年级下册第二课
【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,江苏省
冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣.mp4
北师大版数学 四年级下册 第三单元 第二节 小数点搬家
二次函数求实际问题中的最值_第一课时(特等奖)(冀教版九年级下册)_T144339
沪教版牛津小学英语(深圳用) 五年级下册 Unit 10