gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here
上传者:李万林|上传时间:2015-05-04|密次下载
gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here
Stack Layers
AlgorithmApplicationCompilerOSIO
Mem Controler
CachesCore Microarch
CircuitsGatesTransistorsPhysics
Small FootprintMeduimFootprintLargeFootprint
gem5,GPGPUSim,McPAT,GPUWattch,"Yourfavoritesimulatorhere"
ConsideredHarmful
TonyNowatzki
Basic Energy Mathematical
tjn@cs.wisc.edumenon@cs.wisc.eduCharacterizationProofCustom First-Program Reasoned
Order ModelsAnalysisArguements
Cycle Accurate Simulation
JaikrishnanMenonChen-HanHoKarthikeyanSankaralingam
UniversityofWisconsin-Madison
ho9@wisc.edu
karu@cs.wisc.edu
MuchasDijkstra,in1968,observedthedangersofrely-Best Research Approach?How do I ingonthegotostatement,weobservethatthedominant
evaluate relianceonquantitativesimulatorsishavingadetrimentalef-fectonour eld.Overtime,simulatortoolshavebecomemoreSatisfy Program Committee?my idea?
sophisticated.FromthesimpledaysofthenowdebunkedSim-“Cycle Accurate Simulation”pleScalarwithitsRUU-basedOOOmodelwith xedDRAM
latency,tothegem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchitectsareclaimingasvalidatedtools.Weargue,though,thatnewgener-ationsofsimulatorsareoftenover ttedtocertainbenchmarksorcon gurationsforvalidationandcanhavesigni cantmod-elingerrorsthatresearchersarenotawareof.Thoughtheexis-tenceoftheseerrorsareunsurprising,theycancauseunawareuserstoderiveincorrectconclusions.Simultaneously,andevenmoreproblematic,isthatreviewersdemandresearchersinappropriatelyusethesetools.Weenumerateeightcommon,butnotacknowledgedorrecognizedpitfallsofsimulatorsorsimulatoruse,consideringfourmodernsimulationinfrastruc-tures.Weproposethattheevaluationstandardsforaworkshouldmatchit’s“footprint,”thebreadthoflayerswhichthetechniqueaffects,andconcludewithouropiniononhowtoescapeoutofour eld’ssimulate-or-rejectmindset.
1.Introduction
Foranumberofyearswehavebeenfamiliarwiththeobserva-tionthatthequalityofarchitectureresearchersisadecreasingfunctionofthere-lianceonquantitativearchitecturesimulatorsinthearchitecturepaperstheyproduce.Morerecentlywediscov-eredwhytheuseofarchitecturesimulatorshassuchdisastrouseffects,andwebecameconvincedthatthearchitecturesimulatorshouldbeabolishedfromall"higherlevel"architectureresearch.Atthattimewedidnotattachtoomuchimportancetothisdiscovery;wenowsubmitourconsiderationsforpub-lication1.MuchasDijkstraobservedtheeraofrelianceonthegotostatementwashavinganegativeeffect,weobservetheeraofover-relianceonquantitativesimulatorsishavingadetrimentaleffectonthe eld,andshouldcometoanend.Weobservethatsimulation,inparticular“detailed”toolsthatprovidecycle-accurateperformanceestimates,areaesti-mates,powerandenergyestimates,asavehicleforarchitec-paragraphisreproducedandcriticismsaremodi edfromDijkstra’s
seminalACaseagainsttheGOTOStatement[10].Additionsareinitalics.
1This
Figure1:Thefootprintofatechnique(thescopeoflayersitinteractswith),andthechoiceresearchersfacebetweenap-propriateevaluationandPC-compliantevaluationpractices.
tureresearchisubiquitous.FromthesimpledaysofthenowdebunkedSimpleScalar[8]withitsRUU-basedOOOmodel+ xedmemorytogem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchi-tectsareclaimingasvalidatedsimulators2.Thislevelofaddeddetailhasledtothebeliefthatwehavebettertoolsandaredoingbetterandbetterquantitativeevaluation.Ithasalsoledtothepreponderanceofpapersrelyingonsuchtoolsandhascreatedanimplicitstandardandtemplateofhowquantitativeevaluationmustbedone.Thisrelianceandbeliefinsuchdetailedtoolsishurtingthe eldandcreatingvariouspitfalls.Partoftheproblemisthatthesetoolsarecommonlyover- ttedforvalidation,meaningthattheirparametersaretunedsuchthattheyareaccurateonlyonasmallsetofbenchmarksorcon gurationparameters.Theimplicationofover ttingisthatsimulatormodelscapturethenoiseratherthanthefunda-mentalrelationshipsandtradeoffs.Inaddition,simulatortoolsoftenhavesigni cantmodelingerrorswhicharenoteasily
2We
remarkthatnotallsimulators’authorsthemselvesclaimvalidation.
accessiblebyusers.Overall,therelianceonsimulatorsarecreatingmanypitfallsbothintechnicalaspectsandinhurtingthe eldbydistortingreviewerexpectationsofwhatentailsgoodquantitativeevaluation.
Asawayforwardforresearchers,webelievethatthecorrectapproachdependsonthefootprint,orlayersofthestackwhichthetechniqueaffectsorrelieson,andthatthereisnoone-size- ts-allsolutiontoarchitectureresearch.Figure1highlightshowdifferenttechniques,representedbygrayboxes,canaf-fectdifferentstacklayers.Unfortunately,itistoooftenthecasethatresearchersmakethechoiceofresearchapproachbasedonwhatwillgettheirpaperaccepted,ratherthanwhatisthemostscienti c.Werevisitaversionofthis gurewithspeci cexamplesinSection4.Mostimportantly,webelievethatreviewersmustrecalibratetheirevaluationstandards,andappropriatelygaugethemtothefootprintoftheresearch.Webelievethisissueisimportantandvitalnowasmoreresearchinour eldismovingtowardlargerfootprints,evidencedbyrecentkeynotes[7]andfundingcalls[21].Restrictingour-selvestoanill-suitedone-size- tsallapproachcouldcurtailscienti cadvancementoftheseefforts.
Thispaperenumerateseightcommon,butnotacknowl-edgedorrecognizedpitfalls,consideringfourmodernsim-ulationinfrastructures:gem5[5],McPAT[19],GPGPUSimV2.x[3],andGPUWattch[18].Webeginthispaperwithasectiondescribingerrorsinpopularsimulators,whichweusetosubstantiatethepitfalls.Indiscussingsimulatorerrorsandpitfalls,ourgoalisnottooffendorcriticizebuttoinformandprovokethoughtfuldiscussion.Weconcludewithstrategieswhichcanallowustoescapeoutofour eld’stemplatizedsimulate-or-rejectmindset.
andwebelievetheseproblemscanbetackledwithoutdif -culty.Werevisitthebene tsofcommunitydriventoolsinSection4.2.
Conservative/ObscureDefaultforWritebackMechanism:Thegem5OOOmodelonlyschedulesinstructionsforissueifthereareguaranteedtobeenough“writebackbuffers”forthem,wherethetotalbuffersarecalculatedbywriteback-width×writeback-depth.Thedefault,acrossallISAs,isawriteback-depthof1.Thismeansthatifafewlonglatencyinstructionsholdupwriteback-bufferslots,thentheeffectiveissuewidthgoesto0.ForanOOO2-widecorewithbench-marksthathavelong-latencymemoryreferences,adding5bufferslotsincreasesperformancebymorethan5X.WedonotbelievethistradeoffisrepresentativeofrealOOOdesigns,andthisimportantparameterisnotsuf cientlyde nedinthedocumentationorsourcecode.
InconsistentPipelineReplayMechanism:gem5’sOOOmodelforspeculativeinstructionschedulingandpipelinere-playappearstobebothcontradictoryandunnecessarilycon-servative.Toexplain,adeeplypipelinedOOOcoremustspeculativelyscheduleinstructionstoenableback-to-backex-ecution.Whenanunexpectedlatencyoccurs,thescheduleforthemiss-dependentinstructionsneedstobecorrected.Ingem5,whenaloadissuestoablockedcache,gem5conser-vativelymodelsthe“correction”tothespeculativescheduleby ushingtheentirepipeline.Thelargerissueisthatafterapipeline ush,instructionsareimmediatelyrescheduled,evenifthecacheremainsblocked.Thisleadstoacycleofrepeated ushingoftheentirepipeline.Whiletheperformancedoesnottakeasigni canthit,theamountofenergycandoubleonsomebenchmarksversusadesignwithahandfulmoreMSHRstopreventthecachefromblocking.
Tobeconsistent,anarchitecturewhich ushesthepipelineonacache-blockshouldalso ushthepipelineonothervariable-latencyevents.However,gem5doesnot ushthepipelineoneventslikecachemisses,whichwouldhavevari-ablelatency.Inshort,thepipelinereplaymechanismissimul-taneouslybothhighlyconservativeandoptimistic.
Inef cient/MislabeledMicro-ops:gem5micro-opsareop-timizedmoreforcorrectnessandeconomyratherthanef -ciency.Oneexampleisthatthesamemicro-opthatperformsconditionalmovesalsoperformsregularregistermoves.Thismeansthatregularmoveswillincurthedynamicdependenceandenergycostofreadingthedestinationregister,eventhoughtheyarecompletelyoverwritingit.Also,thoughthegem5 agregisterimplementationhasgreatlyimprovedinrecentversions,afewinstructionsstillrequireextradependenciesandregisterreadsbecauseof agregistergrouping.Oneex-ampleishowlogicalinstructions(likeXOR)don’twritetheAF ag,butsinceitisgroupedwiththeother ags,itmustbereadbeforewritten.Thisisarguablyacceptable,butdif culttounderstandandaccessasauser.
Animportantyet xableproblemisthatsomemicro-opsare2.Errorsinsimulators
Webeginby rstoutliningsomeexampleinstancesoferrorsinmainstreamandpopularsimulators.Webelievetheexis-tenceoftheseerrorsshouldneitherbesurprising,noraretheyintendedasanattackonparticularsimulatorsorsimulatorauthors;anylargebodyofcodewillhaveerrors.Weonlybringattentiontoaddsomecontexttoourpitfallsandaidinsubstantiation.Ifanything,ourcriticismissquarelyaimedatusersofsuchtools,forexampleGovindarajuetal.[12].Errorreportsareavailableathttp://www.cs.wisc.edu/vertical/sim-harmful,whichhavebeenveri edbyatleastoneotherpersonnotaf liatedwithourresearchgroup.Theirpurposeistopointoutthetypeofproblemswhichcanbedetrimentalifusersarenotaware.Inthissection,foreachtool,we rstpresentobservationsaboutanissue,thengiveouropinionstheissue’simplications.
2.1.gem5
Tobeclear,theerrorsdiscussedinthissectionhaveonlybeenveri edontheX86versionofgem5,andthemicro-opis-suescanonlyapplytoX86.Also,someofthebelowerrorshavebeencommunicatedtothequiteactivegem5community,
内容需要下载文档才能查看
which(nofp
memorymultiplies,upinthedatafromgem5toperformenergyanalysison oatingpointcodewouldproduceincorrectresultsbypotentiallyintegerfactors.2.2.McPAT
Unclear/Over ttedFunctionalUnit(FU)EnergyModel-ing:IntheMcPATmodel,ifthecoreisOOO,thenasmalldynamiccomponentofenergyisaddedforeachFUregardlessofwhethertheFUisbeingused.Thisconstantiscitedas“averagenumbersfromIntel4Gand773Mhz(Wattch)”.WhythisoccursinOOObutnotInorderprocessorscouldbeduetoover ttinginvalidation.Anotherrelatedexampleisfortheper-accessenergyofanFU.Iftheprocessoris“embed-ded,”thenthispowerisdividedbytwo,citing:“AccordingtoARMdataembeddedprocessorhasmuchlowerperaccenergy”.Whetherornotthese(inouropinion)seeminglyarbitrarydecisionsarevalidornot,sincetheyarenoteas-ilyaccessibleordecipherablebytheuser,theymaycometoincorrectconclusionsaboutthequantitativeresults.
ErrorinPipelineandClockPowerMcPATcalculatesanestimateofthepipelineandclockpowerconsideringswitch-ingfactorsinpipeline ip- ops.ThispowerisnotreporteddirectlyinMcPAT,rather,itisdistributedequallyamongstthevariousprocessorstructures,makingitdif culttodeter-minewhenthereareerrors.Figure2showsthedynamicpowerwhichthepipelinecontributesforinorderandOOOprocessors(65nm),whichcanonlybeseenbyinstrumentingtheMcPATsourcecode.OurexperimentsshowthatthiscomponentofpoweriseffectivelydroppedforallOOOcoreexperimentslastinglongerthanafewcycles.Theerrorappearstobeintro-ducedwhenconvertingbetweenpowerandenergy,whereafactorofthenumberofcyclesislostfortheOOOcoreonly.ThisapparenterrorisinallversionsofMcPATthatwetested(fromv0.7tov1.1March2014).Theimplicationofthiserroristhatitcreatesuncertaintyabouttheestimationofpipelineandclockpower3.
2.3.GPGPUSimV2.x
Inthissubsection,weconsiderawidelyadoptedversionoftheGPGPUSimtool,anddescribeseveralmissingorabstractedcomponentsofitsarchitecturalmodel.WediscussGPG-PUSimV2.x,eventhoughitisnotthelatestversionofthetool,speci callybecausemanyresearchersarestillusingthisversion[16,17],andwebelievethefollowingclaimsabout
thoseawareoftheexactdetailsofMcPAT,whenitisusedina ne-grainedmode(calledeverycycle-asopposedtotheXMLinterfaceofcallingatendofmillionsofcyclesofsimulation),thisissuewilldisappear.However,theXMLbulkmodeisthemostprevalentusageofMcPATinliterature.
3For
Figure2:McPATpipelinepowerfora65nmidleprocessor.
itsmodelingfeaturescanbemadewithoutcontroversy.GPG-PUSim3.xhas xedmanyoftheseissues,(seeslide20inthetutorial[2]).
RegisterFilemicroarchitecture:Theoperandcollector(single-portedregister lebanks+arbiter+X-bar+col-lectorunits)ismodeledassuming xedlatencyaccessestotheSRAMwithsomeadditionalqueuinglatency.Itdoesnotmodellow-leveldetails,likecontention,whichimpactperformanceinhigh-computebandwidthscenarios.
Thread/warp/wavefrontschedulinganddispatch:Threadschedulingisfunctional,andwhileanumberofdifferentwarpschedulingschemesareimplemented,thesearenotmodeledinthemicroarchitecture,theyaresimplygeneratedfunctionally.
BranchdivergencestructuresandBranchUnit:Similartothreaddispatch,branchdivergencetrackingstructuresarefunctionallyemulatedaspartoftheabstracthardwaremodel,andthebranchunitmicroarchitectureisnotmodeledatthecycle-level.
Theeffectofomittingthedetailedmodelingofthesemicro-architecturalfeatures,andaccountingforthemabstractlyorfunctionally,isthatitencouragesarchitectsnottoreasonaboutthemicroarchitecturalfeasibilityoftheproposedtechnique.Forexample,considerdevelopingandevaluatinganon-trivialwarpschedulingtechniqueinGPGPUSimV2.x.Itsmodelwouldbeafunctionalone,meaningthatitwouldnotcapturetheindividualcomponentsofthehardware,theircommunica-tion,ortheirpipelinestages.ThiswouldbetantamounttoaCPUload-storequeuedesignevaluationwhichfunctionallymodelsthedependencepredictor,whileignoringcacheportcontentionetc.FortheCPUdomain,thismashupofhigh-levelmodelingandlow-levelsimulationwouldnotbeconsideredsuf cienttounderstandtheeffectivenessofatechniquequan-titatively.2.4.GPUWattch
GiventhestraightforwardreadingoftheGPUWattch[18]pa-per,itsmethodologyhasaformofmodelingerrorwhichwecall“mathematicallyirrelevant”modeling.Wede nethisasmodelwhich,whentakenasawhole,containsmathematically-irrelevantsub-components.The rstpartofthissubsection
willdescribehowthisformoferrorappliestothemethodol-ogy(aspresented)inLengetal.[18].Essentially,thedetailedmodelingusingMcPAT,empiricalmemorymodelsandsyn-thesisbasedmodelsaren’tmeaningfultothe nalobtainedprediction.However,additionalunpresenteddetailsoftheGPUWattchmethodologyhelpjustifythedetailedmodeling.Therefore,wewillsubsequentlydiscusssomeofthesedetails,andconcludewiththeimplicationsforappropriatemodelus-age.
GPUWattchPowerModeling(aspresented)GPUWattchmodelsthecycle-levelpowerofGPUarchitecturesby rstus-ingGPGPUSimtoobtainactivityfactors.Then,GPUWattchcalculatesthedynamicpowerofaparticularbenchmarkPbench,asthesumoftheactivityfactorsαbench,comp,multipliedbythemaximumpowerofthecomponent,PMAXcomp.ThePMAXcomppowervalueisobtainedthroughhighlydetailedmodelingusingacombinationofMcPAT-basedmodeling,em-piricalmodels,andsynthesis-basedmodels.TheGPUWattchauthorsstatethatsinceMcPATistunedtoCPUs,andsincetherearemanyundocumentedGPUfeatures,theyneedtocorrectforthisbyaddinganerrortermforeachcomponentxcomp,anduseleast-squaresestimation(linearregression)toestimatetheerrors.Their nalmodelfordynamicpoweris:
Pbench=
Ifthismethodologywasdirectlyapplied,thecomputationofPMAXismathematicallyirrelevant.Performingalin-eartransformationontheexplanatoryvariablesofalinearregressiondoesnotaffecttheerrororpredictionaccuracy.Infact,runningthebelowregression,whichdoesnothavePMAXcompvalues,wouldbemathematicallyequivalent,andtheresultingregressioncoef cientsaresimplyscaledasfol- lows:xcomp=PMAXcomp×xcomp.
Pbench=
comp
∑
αbench,comp×xcomp
(2)
WhatthismeansisthatthePMAXcompvariablesaremath-ematicallymeaninglesstothe nalmodel.Therefore,userswhoapplytheGPUWattchmethodologyaswrittenwillputunnecessaryeffortintodetailedpowermodeling(whichwouldincludeMcPAT,empiricalmodelandsynthesismodeldevel-opment).
GPUWattchPowerModeling(asimplemented)Themethodologywhichisimplementedactuallydoesemploythedetailedpowermodelingresultsduringthescalingparameterselectionforsomepurpose,asweexplainnext5.
First,insteadofscalingtheinternalpowervaluesofMc-PAT’soptimizationframework,whatisactuallyscaledaretheactivitycountswhicharefedasinputstoMcPAT.ThisassumesthatMcPAT’schoiceofcomponentswouldbeunaf-fectedbythedifferentpowerscalingfactorappliedtovariouscomponents.
Second,insteadofautomaticlinearregression,theycal-culatetherootmeansquareerroroftheirpredictionsandmanuallymodifythescalingcoef cientstoreducetheerror.Thisalonewouldbethemanualequivalenttolinearregression,andhencewouldstillhavethemathematicalirrelevancebug.However,theauthorsalsoboundthescalingcoef cientsbybetween10×to50×foron-coreandoff-corecomponentsrespectively(here,theauthorsexplainthattheboundischosenbasedonthecon denceintheoriginaldetailedmodel).Theauthorsobservethatwithoutboundingthescalingcoef cients,theerrorisactuallyless:amathematically“better”model.However,theper-componentbreakdownswithpurelinearre-gressiondonotmatchexpectedintuition(toobigornegativescalingfactors).Therefore,theboundsonscalingfactorsserveasaroughguidelineinattainingaplausiblepowerdistribution.Overall,webelieveitispossibletouseapurelymathematicalapproach,applyingthesametypeofroughintuition,toachievethesamequalityofresultswithoutdetailedpowermodelinglikeMcPAT.
WhatareGPUWattch’sappropriateusecases?ThemethodologybehindtheGPUWattchpowermodelhasim-plicationsforitsappropriateusage.Ourpositionisthatitthatitcanonlybeappropriatelyemployedwhenaphysicalartifactwithmeasurablepowernumbersareavailable.Forthe
authorsgraciouslyprovidedusdetailsontheirmethodology,andwe
taketheblameifwehavemadeanymistakesinreproducingithere.
5The
comp
∑
αbench,comp×PMAXcomp×xcomp(1)
Inlinearregressionterminology,αbench,comp×PMAXcomparetheexplanatoryorinputvariables,xcomparetheregressioncoef cientsandPbenchisthedependentvariable.Atthispoint,theauthors’methodologyisasfollows:
Weiterativelyre nethepowermodelonthebasisofthesourcesofthevariousinaccuraciesthatLSE[regression]identi es.Forinstance,inourinfrastructure(i.e.,McPAT)thepowerestimationforcertaincomponentsisbiasedtowardCPUimplementations.WenarrowtheresultinginaccuracygapfortheGPUpowermodelby xingourinitialassump-tionsabouttheimplementationandthenapplyingthescalingfactorsthatareobtainedfromLSE.
WecontendthatadirectinterpretationoftheirmethodologywouldbetoruntheregressioninEquation1usingmeasuredvaluesofPbenchofsomemicrobenchmarksto ndthecompo-nentwiseerrorsxcomp,thenmodifythesourcecodetomultiplytheoriginalcomponentwisepowerαbench,comp×PMAXcompbythe“scalingfactor”forthatcomponent,xcomp,toobtainthe nalpowerestimate.Notethatthisprocedureisperformedonaplatformspeci cbasis4.
theXML lesprovidedinthetoolforGTX480and
QuadroFX5600,thescalingcoef cientsaretheseriesof32paramnamesstart-ingatline31(TOT_INST,FP_INT,IC_H,etc.)Inthesourcecode,ingpgpu_sim_wrapper.c,theseareusedinmethodslikeset_inst_power,set_regfile_poweretc.toscaleuptheMcPATcomputedvalues.
4Considering
twoplatformswhichhavecon gurationsnow,theGTX480wasreleasedin2010,andthetheQuadroFX5600isevenolder.GeneratinganewGPUWattchcon gurationrequiresattainingdetailedpowermeasurements,includingphysicallyinstrumentingtheGPUpowersupplywithsensingresistors,followedbyanapplicationofthemanualerror-minimizationproceduredescribedabove.
Thereasonwhyaphysicalartifactisnecessaryisthatthescalingfactors,xcomp,areplatformspeci c.Asanexample,considerthepowerofregister leaccessintheGTX480andFX5600.TheMcPATscalingbetweenthetwodesignsdoesnotcapturetheirarchitecturaldifferences,whichshowsupintheGPUWattchmodelastheratiooftheirregister lescalingfactors,whichis1.7×.
IfwewanttoconsiderahypotheticalGPUwithdifferentcon gurationparameters,withoutchangingthescalingfactors,weshouldnotexpecttheGPUWattchpowermodeltobevalid.Toexplain,theaveragescalingfactormagnitudeis22×fortheGTX480,and8×forQuadroFX5600.Toclaimthatthehypo-theticalGPUcon gurationisvalid,theargumentthatwouldhavetobemadeisthatMcPATgetsthepowerwrongbyanor-derofmagnitude,butsomehowcangettherelativescalingofcomponentscorrect.Thisisapositionwebelieveisuntenablewithoutevidence.Thelackofvalidatedcon gurabilitywouldimpedeanaccuratedesignspaceexploration.
GoodusesforGPUWattchwouldincludeestimatingtheenergyimpactofpolicychangeswhichaffecttheactivityfac-tors,orinaddingcomponentswhichhaveexternallyvalidatedpowercharacteristics(again,ifthetargetarchitecturealreadyhasaGPUWattchpowermodel).Revisitingtheconceptoffootprint,thesearebothsmall-footprintevaluationscenarios.WeclarifyherethattheauthorsofGPUWattchnevermentiondesignspaceexplorationassomethingtheirtoolismeantfor.Soagain,ourcriticismisaimedattoolusersratherthandevel-opers,andadditionallythereviewerwhonowthinksenergyestimationforGPUresearchisalwaysdoable.
accessible,makingbugsdif cultto ndevenwithcarefuldataanalysis.Thatisbecausemanyofthefeaturesareobscuredbehindimplicitassumptions,lackofdocumentationandlackofgoodreportingofresults.OneexampleishowthepipelinepowerisreportedinMcPAT.Sinceitisimplicitlydistributedamongsttheindividualcomponentsoftheprocessor,whatappearstobeasigni canterrorisobfuscated.Errorslikethisputresearchersinadif cultposition.Shouldtheygo xthetoolwhichisalreadyvalidated?Andwhatifanothererrorintheoppositedirectioniscancelingouttheeffects?
Suggestions:Authorsshould rstvalidateandsanitycheckthesimulatorindividually.Further,whenitmakessense,theyshouldconsiderbuildingtrace-driventoolsthatmodelthe rstordereffectstheyareawareoff,insteadofusingcycle-accuratetools.Webelieveitisbettertohaveatoolwithknownabstractionerrorsthananunknownblackbox.Reviewersandthecommunityneedstochangeitsmindsetaswell–havingblindfaithin“standardtools,”whilecompletelydiscountingothertoolsisnotappropriate.Werevisittheissueofopenversusin-housetoolsinSection4.
3.2.Pitfall2:Falsecon dencefromvalidation-over-generalizationinsimulatorpapers,ortoolmisusesSimulatorwriterstypicallymakenarrowandfactuallyconsis-tentstatementsaboutvalidation,andsomeexamplesarebelow.However,thenatureofvalidationisoftenmisunderstoodbyusers,andthesetoolsareputtouseinwaysnotintendedfor,includingmakingquantitativegeneralizations.
gem5’sOOOmodeliswidelyused,butasobservedinarecentpaper[13]andourobservationsabove,ithasseveralspeci cationerrors.Thoughthegem5authorsthemselvesdonotclaimitassuch,somedoclaimitisa“validatedsimu-lator.”Clearly,thiscannotbetakenasalleffectsmodeled.Forinstance,atechniquethatworksontheinstructionfront-endmustpayattentiontogem5’sbaselineand rst xthespeci cationerrordescribedhere[13].
ConsideringMcPAT,accordingtotheirowndocumentationandcodecomments,constantsaresometimeschosentomatchthevalidationtargets.Weagreethisisareasonabledecisioninsomecases,especiallywhenhighlycustomizedlogicisemployed(e.g.functionalunitimplementations).Thedangeriswhenresearchersattempttogeneralizetheresultsoutsidethevalidatedprocessors.Theseconstantswilllikelynotbeappropriate.
ForGPUWattch,itmightbetemptingforresearcherstoper-formsensitivitystudiesbyvaryingMcPATparameters.Thepathofleastresistancewouldbetousethesamescalingfac-tors,insteadofmeasuringthepowerofaknownGPUandderivingnewscalingfactorsusingtheGPUWattchmethodol-ogy.Forreasonsdescribedintheprevioussection,wearguethatwithoutobtainingnewscalingfactors,thistypeofsensi-tivityanalysiswouldbeinappropriate.
Suggestions:Usewithcautionvalidatedsimulators.Lookfordetailsonthesimulator’sdesignandfactorthosedecisions3.Pitfalls
Thissectiondescribeseightpitfallsofmodernsimulatorsandsimulatorusage.Foreachpitfall,wedescribethehigh-levelproblemandsubstantiateourpositionwithempiricalevidence.Wethengiveouropinionsonhowbesttoavoidthepitfall.3.1.Pitfall1:Errorsinsimulatorsareinaccessibleto
users
Asoutlinedabove,simulatortoolscanhavesigni cantabstrac-tion,modeling,andspeci cationerrors.Furthermore,sincesimulatorsaredistributedasC/C++codewithlittlespeci -cation,itisdif cultforenduserstoevenbecomeawareoftheseerrors.Withoutunderstandingwhetherthesimulatoriscorrectlycapturingtheparticularphenomenonadesignerisinterestedin,off-the-shelfusagerendersthemineffectiveforeven rst-orderanalysisofeffects.
Sometimes,thefeaturesofmodelingtoolsarenoteasily
下载文档
热门试卷
- 2016年四川省内江市中考化学试卷
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
- 山东省滨州市三校2017届第一学期阶段测试初三英语试题
- 四川省成都七中2017届高三一诊模拟考试文科综合试卷
- 2017届普通高等学校招生全国统一考试模拟试题(附答案)
- 重庆市永川中学高2017级上期12月月考语文试题
- 江西宜春三中2017届高三第一学期第二次月考文科综合试题
- 内蒙古赤峰二中2017届高三上学期第三次月考英语试题
- 2017年六年级(上)数学期末考试卷
- 2017人教版小学英语三年级上期末笔试题
- 江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
- 重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
- 江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
- 江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
- 山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
- 【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
- 四川省简阳市阳安中学2016年12月高二月考英语试卷
- 四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
- 安徽省滁州中学2016—2017学年度第一学期12月月考高三英语试卷
- 山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
- 福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
- 甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷
网友关注
- 初中常用相对分子质量及常用化学计算公式
- 评《科学探究》复习课
- 华师大八年级科学第一章机械运动和力测试卷
- 《生命的课堂 我们是主角》说课稿--李文广
- 关于加强基础教育管理改革的思路和对策1
- §2.3(2)大气的压强
- 电阻
- 老花镜DIY手工制作天文望远镜
- 八年级科学试卷
- 填写方式说明
- 社会主义核心价值观进校园实施方案
- 晨检内容
- 现代家长的误区
- 化学式计算
- 生物质能
- 校园文化
- 2015年慈溪慈吉初二第一次月考试卷
- 卢龙县第三届骨干教师个人自评报告
- 信息技术在课堂教学中的利与弊
- 最新2015年七下科学第一次月考质量检测试卷
- 第三章 空气与生命 测验
- 八年级第一次月考物理试卷1
- 七年级科学物态变化
- 家长会发言稿
- 中学第十九届田径运动会前期筹备工作报道
- 创卫宣传标语
- 说理论述题(有解析过程)
- (学校)学生特殊体质调查表
- 透镜难题-易错题(附详解)
- 河北省某县信息技术教师选调考试试题
网友关注视频
- 北师大版小学数学四年级下册第15课小数乘小数一
- 七年级下册外研版英语M8U2reading
- 3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
- 冀教版小学英语五年级下册lesson2教学视频(2)
- 【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
- 沪教版牛津小学英语(深圳用) 五年级下册 Unit 12
- 【获奖】科粤版初三九年级化学下册第七章7.3浓稀的表示
- 冀教版小学英语四年级下册Lesson2授课视频
- 三年级英语单词记忆下册(沪教版)第一二单元复习
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 4
- 青岛版教材五年级下册第四单元(走进军营——方向与位置)用数对确定位置(一等奖)
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,湖北省
- 北师大版数学四年级下册第三单元第四节街心广场
- 北师大版数学四年级下册3.4包装
- 苏教版二年级下册数学《认识东、南、西、北》
- 冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
- 19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T502436
- 第12章 圆锥曲线_12.7 抛物线的标准方程_第一课时(特等奖)(沪教版高二下册)_T274713
- 七年级英语下册 上海牛津版 Unit3
- 第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T3751175
- 外研版英语七年级下册module1unit3名词性物主代词讲解
- 冀教版英语五年级下册第二课课程解读
- 河南省名校课堂七年级下册英语第一课(2020年2月10日)
- 【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
- 外研版英语七年级下册module3 unit2第一课时
- 外研版英语三起6年级下册(14版)Module3 Unit2
- 苏科版数学 八年级下册 第八章第二节 可能性的大小
- 外研版英语七年级下册module3 unit2第二课时
- 苏科版八年级数学下册7.2《统计图的选用》
- 二年级下册数学第二课
精品推荐
- 2016-2017学年高一语文人教版必修一+模块学业水平检测试题(含答案)
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
分类导航
- 互联网
- 电脑基础知识
- 计算机软件及应用
- 计算机硬件及网络
- 计算机应用/办公自动化
- .NET
- 数据结构与算法
- Java
- SEO
- C/C++资料
- linux/Unix相关
- 手机开发
- UML理论/建模
- 并行计算/云计算
- 嵌入式开发
- windows相关
- 软件工程
- 管理信息系统
- 开发文档
- 图形图像
- 网络与通信
- 网络信息安全
- 电子支付
- Labview
- matlab
- 网络资源
- Python
- Delphi/Perl
- 评测
- Flash/Flex
- CSS/Script
- 计算机原理
- PHP资料
- 数据挖掘与模式识别
- Web服务
- 数据库
- Visual Basic
- 电子商务
- 服务器
- 搜索引擎优化
- 存储
- 架构
- 行业软件
- 人工智能
- 计算机辅助设计
- 多媒体
- 软件测试
- 计算机硬件与维护
- 网站策划/UE
- 网页设计/UI
- 网吧管理