教育资源为主的文档平台

当前位置: 查字典文档网> 所有文档分类> > 计算机硬件及网络> gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here

gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here

上传者:李万林
|
上传时间:2015-05-04
|
次下载

gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here

Stack Layers

AlgorithmApplicationCompilerOSIO

Mem Controler

CachesCore Microarch

CircuitsGatesTransistorsPhysics

Small FootprintMeduimFootprintLargeFootprint

gem5,GPGPUSim,McPAT,GPUWattch,"Yourfavoritesimulatorhere"

ConsideredHarmful

TonyNowatzki

Basic Energy Mathematical

tjn@cs.wisc.edumenon@cs.wisc.eduCharacterizationProofCustom First-Program Reasoned

Order ModelsAnalysisArguements

Cycle Accurate Simulation

JaikrishnanMenonChen-HanHoKarthikeyanSankaralingam

UniversityofWisconsin-Madison

ho9@wisc.edu

karu@cs.wisc.edu

MuchasDijkstra,in1968,observedthedangersofrely-Best Research Approach?How do I ingonthegotostatement,weobservethatthedominant

evaluate relianceonquantitativesimulatorsishavingadetrimentalef-fectonour eld.Overtime,simulatortoolshavebecomemoreSatisfy Program Committee?my idea?

sophisticated.FromthesimpledaysofthenowdebunkedSim-“Cycle Accurate Simulation”pleScalarwithitsRUU-basedOOOmodelwith xedDRAM

latency,tothegem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchitectsareclaimingasvalidatedtools.Weargue,though,thatnewgener-ationsofsimulatorsareoftenover ttedtocertainbenchmarksorcon gurationsforvalidationandcanhavesigni cantmod-elingerrorsthatresearchersarenotawareof.Thoughtheexis-tenceoftheseerrorsareunsurprising,theycancauseunawareuserstoderiveincorrectconclusions.Simultaneously,andevenmoreproblematic,isthatreviewersdemandresearchersinappropriatelyusethesetools.Weenumerateeightcommon,butnotacknowledgedorrecognizedpitfallsofsimulatorsorsimulatoruse,consideringfourmodernsimulationinfrastruc-tures.Weproposethattheevaluationstandardsforaworkshouldmatchit’s“footprint,”thebreadthoflayerswhichthetechniqueaffects,andconcludewithouropiniononhowtoescapeoutofour eld’ssimulate-or-rejectmindset.

1.Introduction

Foranumberofyearswehavebeenfamiliarwiththeobserva-tionthatthequalityofarchitectureresearchersisadecreasingfunctionofthere-lianceonquantitativearchitecturesimulatorsinthearchitecturepaperstheyproduce.Morerecentlywediscov-eredwhytheuseofarchitecturesimulatorshassuchdisastrouseffects,andwebecameconvincedthatthearchitecturesimulatorshouldbeabolishedfromall"higherlevel"architectureresearch.Atthattimewedidnotattachtoomuchimportancetothisdiscovery;wenowsubmitourconsiderationsforpub-lication1.MuchasDijkstraobservedtheeraofrelianceonthegotostatementwashavinganegativeeffect,weobservetheeraofover-relianceonquantitativesimulatorsishavingadetrimentaleffectonthe eld,andshouldcometoanend.Weobservethatsimulation,inparticular“detailed”toolsthatprovidecycle-accurateperformanceestimates,areaesti-mates,powerandenergyestimates,asavehicleforarchitec-paragraphisreproducedandcriticismsaremodi edfromDijkstra’s

seminalACaseagainsttheGOTOStatement[10].Additionsareinitalics.

1This

Figure1:Thefootprintofatechnique(thescopeoflayersitinteractswith),andthechoiceresearchersfacebetweenap-propriateevaluationandPC-compliantevaluationpractices.

tureresearchisubiquitous.FromthesimpledaysofthenowdebunkedSimpleScalar[8]withitsRUU-basedOOOmodel+ xedmemorytogem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchi-tectsareclaimingasvalidatedsimulators2.Thislevelofaddeddetailhasledtothebeliefthatwehavebettertoolsandaredoingbetterandbetterquantitativeevaluation.Ithasalsoledtothepreponderanceofpapersrelyingonsuchtoolsandhascreatedanimplicitstandardandtemplateofhowquantitativeevaluationmustbedone.Thisrelianceandbeliefinsuchdetailedtoolsishurtingthe eldandcreatingvariouspitfalls.Partoftheproblemisthatthesetoolsarecommonlyover- ttedforvalidation,meaningthattheirparametersaretunedsuchthattheyareaccurateonlyonasmallsetofbenchmarksorcon gurationparameters.Theimplicationofover ttingisthatsimulatormodelscapturethenoiseratherthanthefunda-mentalrelationshipsandtradeoffs.Inaddition,simulatortoolsoftenhavesigni cantmodelingerrorswhicharenoteasily

2We

remarkthatnotallsimulators’authorsthemselvesclaimvalidation.

accessiblebyusers.Overall,therelianceonsimulatorsarecreatingmanypitfallsbothintechnicalaspectsandinhurtingthe eldbydistortingreviewerexpectationsofwhatentailsgoodquantitativeevaluation.

Asawayforwardforresearchers,webelievethatthecorrectapproachdependsonthefootprint,orlayersofthestackwhichthetechniqueaffectsorrelieson,andthatthereisnoone-size- ts-allsolutiontoarchitectureresearch.Figure1highlightshowdifferenttechniques,representedbygrayboxes,canaf-fectdifferentstacklayers.Unfortunately,itistoooftenthecasethatresearchersmakethechoiceofresearchapproachbasedonwhatwillgettheirpaperaccepted,ratherthanwhatisthemostscienti c.Werevisitaversionofthis gurewithspeci cexamplesinSection4.Mostimportantly,webelievethatreviewersmustrecalibratetheirevaluationstandards,andappropriatelygaugethemtothefootprintoftheresearch.Webelievethisissueisimportantandvitalnowasmoreresearchinour eldismovingtowardlargerfootprints,evidencedbyrecentkeynotes[7]andfundingcalls[21].Restrictingour-selvestoanill-suitedone-size- tsallapproachcouldcurtailscienti cadvancementoftheseefforts.

Thispaperenumerateseightcommon,butnotacknowl-edgedorrecognizedpitfalls,consideringfourmodernsim-ulationinfrastructures:gem5[5],McPAT[19],GPGPUSimV2.x[3],andGPUWattch[18].Webeginthispaperwithasectiondescribingerrorsinpopularsimulators,whichweusetosubstantiatethepitfalls.Indiscussingsimulatorerrorsandpitfalls,ourgoalisnottooffendorcriticizebuttoinformandprovokethoughtfuldiscussion.Weconcludewithstrategieswhichcanallowustoescapeoutofour eld’stemplatizedsimulate-or-rejectmindset.

andwebelievetheseproblemscanbetackledwithoutdif -culty.Werevisitthebene tsofcommunitydriventoolsinSection4.2.

Conservative/ObscureDefaultforWritebackMechanism:Thegem5OOOmodelonlyschedulesinstructionsforissueifthereareguaranteedtobeenough“writebackbuffers”forthem,wherethetotalbuffersarecalculatedbywriteback-width×writeback-depth.Thedefault,acrossallISAs,isawriteback-depthof1.Thismeansthatifafewlonglatencyinstructionsholdupwriteback-bufferslots,thentheeffectiveissuewidthgoesto0.ForanOOO2-widecorewithbench-marksthathavelong-latencymemoryreferences,adding5bufferslotsincreasesperformancebymorethan5X.WedonotbelievethistradeoffisrepresentativeofrealOOOdesigns,andthisimportantparameterisnotsuf cientlyde nedinthedocumentationorsourcecode.

InconsistentPipelineReplayMechanism:gem5’sOOOmodelforspeculativeinstructionschedulingandpipelinere-playappearstobebothcontradictoryandunnecessarilycon-servative.Toexplain,adeeplypipelinedOOOcoremustspeculativelyscheduleinstructionstoenableback-to-backex-ecution.Whenanunexpectedlatencyoccurs,thescheduleforthemiss-dependentinstructionsneedstobecorrected.Ingem5,whenaloadissuestoablockedcache,gem5conser-vativelymodelsthe“correction”tothespeculativescheduleby ushingtheentirepipeline.Thelargerissueisthatafterapipeline ush,instructionsareimmediatelyrescheduled,evenifthecacheremainsblocked.Thisleadstoacycleofrepeated ushingoftheentirepipeline.Whiletheperformancedoesnottakeasigni canthit,theamountofenergycandoubleonsomebenchmarksversusadesignwithahandfulmoreMSHRstopreventthecachefromblocking.

Tobeconsistent,anarchitecturewhich ushesthepipelineonacache-blockshouldalso ushthepipelineonothervariable-latencyevents.However,gem5doesnot ushthepipelineoneventslikecachemisses,whichwouldhavevari-ablelatency.Inshort,thepipelinereplaymechanismissimul-taneouslybothhighlyconservativeandoptimistic.

Inef cient/MislabeledMicro-ops:gem5micro-opsareop-timizedmoreforcorrectnessandeconomyratherthanef -ciency.Oneexampleisthatthesamemicro-opthatperformsconditionalmovesalsoperformsregularregistermoves.Thismeansthatregularmoveswillincurthedynamicdependenceandenergycostofreadingthedestinationregister,eventhoughtheyarecompletelyoverwritingit.Also,thoughthegem5 agregisterimplementationhasgreatlyimprovedinrecentversions,afewinstructionsstillrequireextradependenciesandregisterreadsbecauseof agregistergrouping.Oneex-ampleishowlogicalinstructions(likeXOR)don’twritetheAF ag,butsinceitisgroupedwiththeother ags,itmustbereadbeforewritten.Thisisarguablyacceptable,butdif culttounderstandandaccessasauser.

Animportantyet xableproblemisthatsomemicro-opsare2.Errorsinsimulators

Webeginby rstoutliningsomeexampleinstancesoferrorsinmainstreamandpopularsimulators.Webelievetheexis-tenceoftheseerrorsshouldneitherbesurprising,noraretheyintendedasanattackonparticularsimulatorsorsimulatorauthors;anylargebodyofcodewillhaveerrors.Weonlybringattentiontoaddsomecontexttoourpitfallsandaidinsubstantiation.Ifanything,ourcriticismissquarelyaimedatusersofsuchtools,forexampleGovindarajuetal.[12].Errorreportsareavailableathttp://www.cs.wisc.edu/vertical/sim-harmful,whichhavebeenveri edbyatleastoneotherpersonnotaf liatedwithourresearchgroup.Theirpurposeistopointoutthetypeofproblemswhichcanbedetrimentalifusersarenotaware.Inthissection,foreachtool,we rstpresentobservationsaboutanissue,thengiveouropinionstheissue’simplications.

2.1.gem5

Tobeclear,theerrorsdiscussedinthissectionhaveonlybeenveri edontheX86versionofgem5,andthemicro-opis-suescanonlyapplytoX86.Also,someofthebelowerrorshavebeencommunicatedtothequiteactivegem5community,

内容需要下载文档才能查看

which(nofp

memorymultiplies,upinthedatafromgem5toperformenergyanalysison oatingpointcodewouldproduceincorrectresultsbypotentiallyintegerfactors.2.2.McPAT

Unclear/Over ttedFunctionalUnit(FU)EnergyModel-ing:IntheMcPATmodel,ifthecoreisOOO,thenasmalldynamiccomponentofenergyisaddedforeachFUregardlessofwhethertheFUisbeingused.Thisconstantiscitedas“averagenumbersfromIntel4Gand773Mhz(Wattch)”.WhythisoccursinOOObutnotInorderprocessorscouldbeduetoover ttinginvalidation.Anotherrelatedexampleisfortheper-accessenergyofanFU.Iftheprocessoris“embed-ded,”thenthispowerisdividedbytwo,citing:“AccordingtoARMdataembeddedprocessorhasmuchlowerperaccenergy”.Whetherornotthese(inouropinion)seeminglyarbitrarydecisionsarevalidornot,sincetheyarenoteas-ilyaccessibleordecipherablebytheuser,theymaycometoincorrectconclusionsaboutthequantitativeresults.

ErrorinPipelineandClockPowerMcPATcalculatesanestimateofthepipelineandclockpowerconsideringswitch-ingfactorsinpipeline ip- ops.ThispowerisnotreporteddirectlyinMcPAT,rather,itisdistributedequallyamongstthevariousprocessorstructures,makingitdif culttodeter-minewhenthereareerrors.Figure2showsthedynamicpowerwhichthepipelinecontributesforinorderandOOOprocessors(65nm),whichcanonlybeseenbyinstrumentingtheMcPATsourcecode.OurexperimentsshowthatthiscomponentofpoweriseffectivelydroppedforallOOOcoreexperimentslastinglongerthanafewcycles.Theerrorappearstobeintro-ducedwhenconvertingbetweenpowerandenergy,whereafactorofthenumberofcyclesislostfortheOOOcoreonly.ThisapparenterrorisinallversionsofMcPATthatwetested(fromv0.7tov1.1March2014).Theimplicationofthiserroristhatitcreatesuncertaintyabouttheestimationofpipelineandclockpower3.

2.3.GPGPUSimV2.x

Inthissubsection,weconsiderawidelyadoptedversionoftheGPGPUSimtool,anddescribeseveralmissingorabstractedcomponentsofitsarchitecturalmodel.WediscussGPG-PUSimV2.x,eventhoughitisnotthelatestversionofthetool,speci callybecausemanyresearchersarestillusingthisversion[16,17],andwebelievethefollowingclaimsabout

thoseawareoftheexactdetailsofMcPAT,whenitisusedina ne-grainedmode(calledeverycycle-asopposedtotheXMLinterfaceofcallingatendofmillionsofcyclesofsimulation),thisissuewilldisappear.However,theXMLbulkmodeisthemostprevalentusageofMcPATinliterature.

3For

Figure2:McPATpipelinepowerfora65nmidleprocessor.

itsmodelingfeaturescanbemadewithoutcontroversy.GPG-PUSim3.xhas xedmanyoftheseissues,(seeslide20inthetutorial[2]).

RegisterFilemicroarchitecture:Theoperandcollector(single-portedregister lebanks+arbiter+X-bar+col-lectorunits)ismodeledassuming xedlatencyaccessestotheSRAMwithsomeadditionalqueuinglatency.Itdoesnotmodellow-leveldetails,likecontention,whichimpactperformanceinhigh-computebandwidthscenarios.

Thread/warp/wavefrontschedulinganddispatch:Threadschedulingisfunctional,andwhileanumberofdifferentwarpschedulingschemesareimplemented,thesearenotmodeledinthemicroarchitecture,theyaresimplygeneratedfunctionally.

BranchdivergencestructuresandBranchUnit:Similartothreaddispatch,branchdivergencetrackingstructuresarefunctionallyemulatedaspartoftheabstracthardwaremodel,andthebranchunitmicroarchitectureisnotmodeledatthecycle-level.

Theeffectofomittingthedetailedmodelingofthesemicro-architecturalfeatures,andaccountingforthemabstractlyorfunctionally,isthatitencouragesarchitectsnottoreasonaboutthemicroarchitecturalfeasibilityoftheproposedtechnique.Forexample,considerdevelopingandevaluatinganon-trivialwarpschedulingtechniqueinGPGPUSimV2.x.Itsmodelwouldbeafunctionalone,meaningthatitwouldnotcapturetheindividualcomponentsofthehardware,theircommunica-tion,ortheirpipelinestages.ThiswouldbetantamounttoaCPUload-storequeuedesignevaluationwhichfunctionallymodelsthedependencepredictor,whileignoringcacheportcontentionetc.FortheCPUdomain,thismashupofhigh-levelmodelingandlow-levelsimulationwouldnotbeconsideredsuf cienttounderstandtheeffectivenessofatechniquequan-titatively.2.4.GPUWattch

GiventhestraightforwardreadingoftheGPUWattch[18]pa-per,itsmethodologyhasaformofmodelingerrorwhichwecall“mathematicallyirrelevant”modeling.Wede nethisasmodelwhich,whentakenasawhole,containsmathematically-irrelevantsub-components.The rstpartofthissubsection

willdescribehowthisformoferrorappliestothemethodol-ogy(aspresented)inLengetal.[18].Essentially,thedetailedmodelingusingMcPAT,empiricalmemorymodelsandsyn-thesisbasedmodelsaren’tmeaningfultothe nalobtainedprediction.However,additionalunpresenteddetailsoftheGPUWattchmethodologyhelpjustifythedetailedmodeling.Therefore,wewillsubsequentlydiscusssomeofthesedetails,andconcludewiththeimplicationsforappropriatemodelus-age.

GPUWattchPowerModeling(aspresented)GPUWattchmodelsthecycle-levelpowerofGPUarchitecturesby rstus-ingGPGPUSimtoobtainactivityfactors.Then,GPUWattchcalculatesthedynamicpowerofaparticularbenchmarkPbench,asthesumoftheactivityfactorsαbench,comp,multipliedbythemaximumpowerofthecomponent,PMAXcomp.ThePMAXcomppowervalueisobtainedthroughhighlydetailedmodelingusingacombinationofMcPAT-basedmodeling,em-piricalmodels,andsynthesis-basedmodels.TheGPUWattchauthorsstatethatsinceMcPATistunedtoCPUs,andsincetherearemanyundocumentedGPUfeatures,theyneedtocorrectforthisbyaddinganerrortermforeachcomponentxcomp,anduseleast-squaresestimation(linearregression)toestimatetheerrors.Their nalmodelfordynamicpoweris:

Pbench=

Ifthismethodologywasdirectlyapplied,thecomputationofPMAXismathematicallyirrelevant.Performingalin-eartransformationontheexplanatoryvariablesofalinearregressiondoesnotaffecttheerrororpredictionaccuracy.Infact,runningthebelowregression,whichdoesnothavePMAXcompvalues,wouldbemathematicallyequivalent,andtheresultingregressioncoef cientsaresimplyscaledasfol- lows:xcomp=PMAXcomp×xcomp.

Pbench=

comp

αbench,comp×xcomp

(2)

WhatthismeansisthatthePMAXcompvariablesaremath-ematicallymeaninglesstothe nalmodel.Therefore,userswhoapplytheGPUWattchmethodologyaswrittenwillputunnecessaryeffortintodetailedpowermodeling(whichwouldincludeMcPAT,empiricalmodelandsynthesismodeldevel-opment).

GPUWattchPowerModeling(asimplemented)Themethodologywhichisimplementedactuallydoesemploythedetailedpowermodelingresultsduringthescalingparameterselectionforsomepurpose,asweexplainnext5.

First,insteadofscalingtheinternalpowervaluesofMc-PAT’soptimizationframework,whatisactuallyscaledaretheactivitycountswhicharefedasinputstoMcPAT.ThisassumesthatMcPAT’schoiceofcomponentswouldbeunaf-fectedbythedifferentpowerscalingfactorappliedtovariouscomponents.

Second,insteadofautomaticlinearregression,theycal-culatetherootmeansquareerroroftheirpredictionsandmanuallymodifythescalingcoef cientstoreducetheerror.Thisalonewouldbethemanualequivalenttolinearregression,andhencewouldstillhavethemathematicalirrelevancebug.However,theauthorsalsoboundthescalingcoef cientsbybetween10×to50×foron-coreandoff-corecomponentsrespectively(here,theauthorsexplainthattheboundischosenbasedonthecon denceintheoriginaldetailedmodel).Theauthorsobservethatwithoutboundingthescalingcoef cients,theerrorisactuallyless:amathematically“better”model.However,theper-componentbreakdownswithpurelinearre-gressiondonotmatchexpectedintuition(toobigornegativescalingfactors).Therefore,theboundsonscalingfactorsserveasaroughguidelineinattainingaplausiblepowerdistribution.Overall,webelieveitispossibletouseapurelymathematicalapproach,applyingthesametypeofroughintuition,toachievethesamequalityofresultswithoutdetailedpowermodelinglikeMcPAT.

WhatareGPUWattch’sappropriateusecases?ThemethodologybehindtheGPUWattchpowermodelhasim-plicationsforitsappropriateusage.Ourpositionisthatitthatitcanonlybeappropriatelyemployedwhenaphysicalartifactwithmeasurablepowernumbersareavailable.Forthe

authorsgraciouslyprovidedusdetailsontheirmethodology,andwe

taketheblameifwehavemadeanymistakesinreproducingithere.

5The

comp

αbench,comp×PMAXcomp×xcomp(1)

Inlinearregressionterminology,αbench,comp×PMAXcomparetheexplanatoryorinputvariables,xcomparetheregressioncoef cientsandPbenchisthedependentvariable.Atthispoint,theauthors’methodologyisasfollows:

Weiterativelyre nethepowermodelonthebasisofthesourcesofthevariousinaccuraciesthatLSE[regression]identi es.Forinstance,inourinfrastructure(i.e.,McPAT)thepowerestimationforcertaincomponentsisbiasedtowardCPUimplementations.WenarrowtheresultinginaccuracygapfortheGPUpowermodelby xingourinitialassump-tionsabouttheimplementationandthenapplyingthescalingfactorsthatareobtainedfromLSE.

WecontendthatadirectinterpretationoftheirmethodologywouldbetoruntheregressioninEquation1usingmeasuredvaluesofPbenchofsomemicrobenchmarksto ndthecompo-nentwiseerrorsxcomp,thenmodifythesourcecodetomultiplytheoriginalcomponentwisepowerαbench,comp×PMAXcompbythe“scalingfactor”forthatcomponent,xcomp,toobtainthe nalpowerestimate.Notethatthisprocedureisperformedonaplatformspeci cbasis4.

theXML lesprovidedinthetoolforGTX480and

QuadroFX5600,thescalingcoef cientsaretheseriesof32paramnamesstart-ingatline31(TOT_INST,FP_INT,IC_H,etc.)Inthesourcecode,ingpgpu_sim_wrapper.c,theseareusedinmethodslikeset_inst_power,set_regfile_poweretc.toscaleuptheMcPATcomputedvalues.

4Considering

twoplatformswhichhavecon gurationsnow,theGTX480wasreleasedin2010,andthetheQuadroFX5600isevenolder.GeneratinganewGPUWattchcon gurationrequiresattainingdetailedpowermeasurements,includingphysicallyinstrumentingtheGPUpowersupplywithsensingresistors,followedbyanapplicationofthemanualerror-minimizationproceduredescribedabove.

Thereasonwhyaphysicalartifactisnecessaryisthatthescalingfactors,xcomp,areplatformspeci c.Asanexample,considerthepowerofregister leaccessintheGTX480andFX5600.TheMcPATscalingbetweenthetwodesignsdoesnotcapturetheirarchitecturaldifferences,whichshowsupintheGPUWattchmodelastheratiooftheirregister lescalingfactors,whichis1.7×.

IfwewanttoconsiderahypotheticalGPUwithdifferentcon gurationparameters,withoutchangingthescalingfactors,weshouldnotexpecttheGPUWattchpowermodeltobevalid.Toexplain,theaveragescalingfactormagnitudeis22×fortheGTX480,and8×forQuadroFX5600.Toclaimthatthehypo-theticalGPUcon gurationisvalid,theargumentthatwouldhavetobemadeisthatMcPATgetsthepowerwrongbyanor-derofmagnitude,butsomehowcangettherelativescalingofcomponentscorrect.Thisisapositionwebelieveisuntenablewithoutevidence.Thelackofvalidatedcon gurabilitywouldimpedeanaccuratedesignspaceexploration.

GoodusesforGPUWattchwouldincludeestimatingtheenergyimpactofpolicychangeswhichaffecttheactivityfac-tors,orinaddingcomponentswhichhaveexternallyvalidatedpowercharacteristics(again,ifthetargetarchitecturealreadyhasaGPUWattchpowermodel).Revisitingtheconceptoffootprint,thesearebothsmall-footprintevaluationscenarios.WeclarifyherethattheauthorsofGPUWattchnevermentiondesignspaceexplorationassomethingtheirtoolismeantfor.Soagain,ourcriticismisaimedattoolusersratherthandevel-opers,andadditionallythereviewerwhonowthinksenergyestimationforGPUresearchisalwaysdoable.

accessible,makingbugsdif cultto ndevenwithcarefuldataanalysis.Thatisbecausemanyofthefeaturesareobscuredbehindimplicitassumptions,lackofdocumentationandlackofgoodreportingofresults.OneexampleishowthepipelinepowerisreportedinMcPAT.Sinceitisimplicitlydistributedamongsttheindividualcomponentsoftheprocessor,whatappearstobeasigni canterrorisobfuscated.Errorslikethisputresearchersinadif cultposition.Shouldtheygo xthetoolwhichisalreadyvalidated?Andwhatifanothererrorintheoppositedirectioniscancelingouttheeffects?

Suggestions:Authorsshould rstvalidateandsanitycheckthesimulatorindividually.Further,whenitmakessense,theyshouldconsiderbuildingtrace-driventoolsthatmodelthe rstordereffectstheyareawareoff,insteadofusingcycle-accuratetools.Webelieveitisbettertohaveatoolwithknownabstractionerrorsthananunknownblackbox.Reviewersandthecommunityneedstochangeitsmindsetaswell–havingblindfaithin“standardtools,”whilecompletelydiscountingothertoolsisnotappropriate.Werevisittheissueofopenversusin-housetoolsinSection4.

3.2.Pitfall2:Falsecon dencefromvalidation-over-generalizationinsimulatorpapers,ortoolmisusesSimulatorwriterstypicallymakenarrowandfactuallyconsis-tentstatementsaboutvalidation,andsomeexamplesarebelow.However,thenatureofvalidationisoftenmisunderstoodbyusers,andthesetoolsareputtouseinwaysnotintendedfor,includingmakingquantitativegeneralizations.

gem5’sOOOmodeliswidelyused,butasobservedinarecentpaper[13]andourobservationsabove,ithasseveralspeci cationerrors.Thoughthegem5authorsthemselvesdonotclaimitassuch,somedoclaimitisa“validatedsimu-lator.”Clearly,thiscannotbetakenasalleffectsmodeled.Forinstance,atechniquethatworksontheinstructionfront-endmustpayattentiontogem5’sbaselineand rst xthespeci cationerrordescribedhere[13].

ConsideringMcPAT,accordingtotheirowndocumentationandcodecomments,constantsaresometimeschosentomatchthevalidationtargets.Weagreethisisareasonabledecisioninsomecases,especiallywhenhighlycustomizedlogicisemployed(e.g.functionalunitimplementations).Thedangeriswhenresearchersattempttogeneralizetheresultsoutsidethevalidatedprocessors.Theseconstantswilllikelynotbeappropriate.

ForGPUWattch,itmightbetemptingforresearcherstoper-formsensitivitystudiesbyvaryingMcPATparameters.Thepathofleastresistancewouldbetousethesamescalingfac-tors,insteadofmeasuringthepowerofaknownGPUandderivingnewscalingfactorsusingtheGPUWattchmethodol-ogy.Forreasonsdescribedintheprevioussection,wearguethatwithoutobtainingnewscalingfactors,thistypeofsensi-tivityanalysiswouldbeinappropriate.

Suggestions:Usewithcautionvalidatedsimulators.Lookfordetailsonthesimulator’sdesignandfactorthosedecisions3.Pitfalls

Thissectiondescribeseightpitfallsofmodernsimulatorsandsimulatorusage.Foreachpitfall,wedescribethehigh-levelproblemandsubstantiateourpositionwithempiricalevidence.Wethengiveouropinionsonhowbesttoavoidthepitfall.3.1.Pitfall1:Errorsinsimulatorsareinaccessibleto

users

Asoutlinedabove,simulatortoolscanhavesigni cantabstrac-tion,modeling,andspeci cationerrors.Furthermore,sincesimulatorsaredistributedasC/C++codewithlittlespeci -cation,itisdif cultforenduserstoevenbecomeawareoftheseerrors.Withoutunderstandingwhetherthesimulatoriscorrectlycapturingtheparticularphenomenonadesignerisinterestedin,off-the-shelfusagerendersthemineffectiveforeven rst-orderanalysisofeffects.

Sometimes,thefeaturesofmodelingtoolsarenoteasily

版权声明:此文档由查字典文档网用户提供,如用于商业用途请与作者联系,查字典文档网保持最终解释权!

下载文档

热门试卷

2016年四川省内江市中考化学试卷
广西钦州市高新区2017届高三11月月考政治试卷
浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
广西钦州市钦州港区2017届高三11月月考政治试卷
广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
广西钦州市高新区2016-2017学年高二11月月考政治试卷
广西钦州市高新区2016-2017学年高一11月月考政治试卷
山东省滨州市三校2017届第一学期阶段测试初三英语试题
四川省成都七中2017届高三一诊模拟考试文科综合试卷
2017届普通高等学校招生全国统一考试模拟试题(附答案)
重庆市永川中学高2017级上期12月月考语文试题
江西宜春三中2017届高三第一学期第二次月考文科综合试题
内蒙古赤峰二中2017届高三上学期第三次月考英语试题
2017年六年级(上)数学期末考试卷
2017人教版小学英语三年级上期末笔试题
江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
四川省简阳市阳安中学2016年12月高二月考英语试卷
四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
安徽省滁州中学2016—2017学年度第一学期12月月考​高三英语试卷
山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷

网友关注视频

北师大版小学数学四年级下册第15课小数乘小数一
七年级下册外研版英语M8U2reading
3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
冀教版小学英语五年级下册lesson2教学视频(2)
【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
沪教版牛津小学英语(深圳用) 五年级下册 Unit 12
【获奖】科粤版初三九年级化学下册第七章7.3浓稀的表示
冀教版小学英语四年级下册Lesson2授课视频
三年级英语单词记忆下册(沪教版)第一二单元复习
沪教版牛津小学英语(深圳用) 四年级下册 Unit 4
青岛版教材五年级下册第四单元(走进军营——方向与位置)用数对确定位置(一等奖)
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,湖北省
北师大版数学四年级下册第三单元第四节街心广场
北师大版数学四年级下册3.4包装
苏教版二年级下册数学《认识东、南、西、北》
冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T502436
第12章 圆锥曲线_12.7 抛物线的标准方程_第一课时(特等奖)(沪教版高二下册)_T274713
七年级英语下册 上海牛津版 Unit3
第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T3751175
外研版英语七年级下册module1unit3名词性物主代词讲解
冀教版英语五年级下册第二课课程解读
河南省名校课堂七年级下册英语第一课(2020年2月10日)
【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
外研版英语七年级下册module3 unit2第一课时
外研版英语三起6年级下册(14版)Module3 Unit2
苏科版数学 八年级下册 第八章第二节 可能性的大小
外研版英语七年级下册module3 unit2第二课时
苏科版八年级数学下册7.2《统计图的选用》
二年级下册数学第二课